Skip to content

bootstrap: stream commit-graph fetch to bound planning memory#61

Merged
nodo merged 2 commits into
mainfrom
nodo/memory-improvment
May 20, 2026
Merged

bootstrap: stream commit-graph fetch to bound planning memory#61
nodo merged 2 commits into
mainfrom
nodo/memory-improvment

Conversation

@nodo
Copy link
Copy Markdown
Collaborator

@nodo nodo commented May 20, 2026

Replace the per-branch memory.NewStorage() planning fetch with a streaming extractor that pulls only (commit -> parent hashes) tuples from the tree:0-filtered pack.

Previously, bootstrap planning materialized the entire commit set (~1.4M decoded commits for linux, ~5 GiB) just to read each commit's ParentHashes. The new path reads the pack incrementally, extracting parents from each commit as it arrives and discarding the bytes; a bounded LRU keeps recent objects in memory only long enough to resolve deltas. Planning then walks the resulting parents map directly.

Measured against replicate --all-refs https://github.com/torvalds/linux.git into a local git-http-backend target:

                            before     after     delta
  Peak Go heap (inuse)      5.42 GiB   1.47 GiB  -73%
  Peak OS RSS               5.69 GiB   1.63 GiB  -71%
  Wall time (4 GiB batches) 32m 04s    19m 09s   -40%
  Refs pushed               2862/2862  2862/2862 same

Note

Medium Risk
Moderate risk: introduces new low-memory pack parsing logic (including temp-file spill and delta-base caching) and refactors v2 fetch envelope handling, which could affect correctness/performance on unusual pack/delta layouts or constrained disks.

Overview
Adds a streaming ExtractCommitParents path that parses a tree:0 packfile and returns only (commit -> parent hashes), using a bounded LRU cache for delta resolution and spilling non-seekable inputs to a temp pack file to keep go-git in low-memory mode.

Introduces RefService.FetchCommitParents and refactors v2 fetch parsing into consumeV2FetchPack, allowing callers to either store the pack (UpdateObjectStorage) or consume the demuxed pack stream.

Updates batched bootstrap planning to fetch only commit-parent metadata (instead of materializing a full in-memory object store) and adds new planner walkers FirstParentChainFromParents/TopoChainFromParents plus tests to validate they match the storer-based traversal.

Reviewed by Cursor Bugbot for commit ede27fd. Configure here.

Replace the per-branch memory.NewStorage() planning fetch with a
streaming extractor that pulls only (commit -> parent hashes) tuples
from the tree:0-filtered pack.

Previously, bootstrap planning materialized the entire commit set
(~1.4M decoded commits for linux, ~5 GiB) just to read each commit's
ParentHashes. The new path reads the pack incrementally, extracting
parents from each commit as it arrives and discarding the bytes; a
bounded LRU keeps recent objects in memory only long enough to
resolve deltas. Planning then walks the resulting parents map
directly.

Measured against `replicate --all-refs https://github.com/torvalds/linux.git`
into a local git-http-backend target:

                            before     after     delta
  Peak Go heap (inuse)      5.42 GiB   1.47 GiB  -73%
  Peak OS RSS               5.69 GiB   1.63 GiB  -71%
  Wall time (4 GiB batches) 32m 04s    19m 09s   -40%
  Refs pushed               2862/2862  2862/2862 same
@nodo nodo force-pushed the nodo/memory-improvment branch from ede27fd to f5cc40c Compare May 20, 2026 14:33
@nodo nodo force-pushed the nodo/memory-improvment branch from 6a511ec to ef72e26 Compare May 20, 2026 15:12
@nodo nodo merged commit 3579e0d into main May 20, 2026
3 checks passed
@nodo nodo deleted the nodo/memory-improvment branch May 20, 2026 15:49
@cursor cursor Bot mentioned this pull request Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants