Prevent pack file race condition during checkpoint sync#1276
Conversation
Entire-Checkpoint: 8436c49de6b9
Entire-Checkpoint: 2aaa0ca9a3ae
Entire-Checkpoint: 5dec9ad3f03b
Entire-Checkpoint: 3c18f86e29f8
The previous gate (filtered_fetches AND shallow) left a class of users exposed: anyone whose repo became shallow for a non-Entire reason (a hand-run git clone --depth=N, an older CLI version that pre-dated the filtered-fetches setting, an unrelated tool) still saw the reconcile path walk parent chains past .git/shallow via go-git and propose to cherry-pick hundreds of stale commits onto the remote. Drop the gate. Any shallow checkpoint repo unshallows on the next fetch, regardless of how the shallow state was introduced. Test now covers both filtered_fetches on and off. Entire-Checkpoint: 5aacb2ba139e
collectCommitChain and the cherry-pick path used go-git to walk commit.ParentHashes. go-git does not consult .git/shallow, so on a shallow checkpoint repo the walk strolled past the boundary into stale pack objects — producing a phantom chain that the reconcile path then proposed to cherry-pick onto the remote tip, duplicating commits that already lived on the remote under their original SHAs. Read the shallow set once at the top of each reconcile/rebase entry point and pass it down. collectCommitChain stops at shallow commits; treeChangesForCherryPick treats shallow-boundary commits as roots and diffs them against an empty tree, so the cherry-pick contributes the commit's full content rather than a delta against a stale parent. This is independent of the auto-unshallow fix: a future shallow file introduced by any other tool (manual git clone --depth, an unrelated plumbing helper) would otherwise still trigger the same bug. Entire-Checkpoint: 49c40909c8f5
…File - fetchWorkingDir: wrap filepath.Abs error, document why os.Getwd fallback is intentional (caller passed empty Dir to mean "inherit cwd"; tests rely on this), nolint forbidigo with reason. - loadShallowHashes: nolint gosec G304 — path is derived from git's own --git-common-dir output, not user input. No behavior change. Entire-Checkpoint: b5bbff32a63d
… is off The previous gating in getMetadataTree / getV2MetadataTree skipped the tree-only fetch entirely when filtered_fetches was disabled. With no fetch step, the local ref lookup fell through to whatever the worktree last saw, which could be stale (a collaborator pushed a newer checkpoint, or our last fetch was before the producer published). The "full fetch" fallback then never ran because the stale local lookup "succeeded." Replace the gate with a first-attempt fetch that is always issued: tree-only when filtered_fetches is enabled (cheaper), full fetch otherwise. For v2 the secondary fallback is set to nil when the first-attempt is already the full fetch, to avoid a duplicate network round-trip. Restores the pre-PR property that a fetch always precedes the local lookup. Regression covered by TestRunExplainExport_JSONFetchesRemoteV2WhenLocalV2RefSelectsDualMode. Entire-Checkpoint: 60d8c085123a
…tor test setup - Drop fetchWorkingDir/isShallowRepositoryInDir indirection in isShallowRepository: cmd.Dir = "" already inherits parent cwd, so we don't need to resolve it ourselves. Removes the os.Getwd nolint and the filepath.Abs wrap. - Trim multi-paragraph doc on loadShallowHashes and the inline shallow rationale in treeChangesForCherryPick (cherryPickOnto's doc already explains the WHY). - Resume.go: collapse paired firstFetch/firstFetchLabel assignments into one line each; trim over-explanatory "First-attempt fetch ensures..." block in getV2MetadataTree. - Factor TestFetch_UnshallowsShallowRepository's setup into setupShallowClone + runIsolatedGit helpers so the table body is now the actual assertion rather than 70 lines of duplicated scaffolding. Entire-Checkpoint: cd78c8c0f7bd
Two issues raised in review:
1. Generic Fetch auto-unshallowed any shallow repo, including
FetchAndCheckoutRemoteBranch's user-branch fetches. A deliberately
shallow user clone would silently convert to full history just by
running resume/checkout — far outside the metadata-repair scope
this PR intended.
2. FetchMetadataTreeOnly / FetchV2MainTreeOnly were no longer "tree-
only" after --depth=1 was dropped. They now downloaded the full
commit chain of the metadata ref (minus blobs when filtered fetches
are enabled). On long-lived checkpoint branches every resume/explain
paid the full-history cost.
Fix: replace the implicit auto-unshallow with two explicit flags on
FetchOptions:
- Shallow (--depth=1): set on tip-only probes. Cheap, creates
.git/shallow state. Safe now because the reconcile path's commit
walk respects .git/shallow (earlier commit in this PR).
- Unshallow (--unshallow when shallow): set on metadata-repair
paths that need full ancestry (FetchMetadataBranch,
FetchV2MainRef, fetchAndRebaseSessionsCommon, doctor v2
disconnection check). Migrates legacy shallow state on demand.
Generic FetchAndCheckoutRemoteBranch sets neither — user branches
stay as the user configured them.
resume.go's getMetadataTree / getV2MetadataTree can revert to
always-call-tree-only since tree-only is genuinely cheap again;
drops the filtered_fetches-conditional firstFetch switcheroo and
its accompanying logging. Test renamed to TestFetch_Unshallow with
two cases (Unshallow=true deepens, Unshallow=false leaves alone)
plus TestFetch_Shallow asserting --depth=1 actually shallows a
clone.
Entire-Checkpoint: 9fa8691d4b8a
git's --unshallow removes ALL shallow boundaries in .git/shallow, not just the one for the ref being fetched. On a shallow clone of a large repo it pulls the entire repository history (we observed 687 MB on entirehq/entire). Used from the push hook, this hangs every push on shallow repos until the full clone arrives. The downstream reconcile/rebase paths already respect .git/shallow (collectCommitChain stops at shallow boundaries; collectCommitsSince uses git rev-list which respects them natively). When the local view is shallow-truncated, the merge-base check may report "disconnected" where the actual chains share ancestry below the boundary — but the cherry-pick path it routes through still produces a correct result, because it picks up exactly the commits that aren't yet on the remote. Remove Unshallow from: - fetchAndRebaseSessionsCommon (push hot path) - FetchMetadataBranch (resume/explain fallback — only needs blobs) - FetchV2MainRef (same) Keep Unshallow on metadata_reconcile's doctor v2 disconnection check since it's only reached from `entire doctor`, an explicit user-invoked diagnostic where the cost is acceptable. Verified by re-running the full unit suite (5304 tests) and the fetch-flag regression coverage from TestFetch_Unshallow / TestFetch_Shallow. Entire-Checkpoint: 8184f5f2a970
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 6af9e93. Configure here.
| // FetchMetadataBranch will undo that when full ancestry is later needed. | ||
| func FetchMetadataTreeOnly(ctx context.Context) error { | ||
| return fetchMetadataFromOrigin(ctx, true /* shallow */, false /* noFilter */) | ||
| return fetchMetadataFromOrigin(ctx, fetchMetadataOpts{Shallow: true}) |
There was a problem hiding this comment.
FetchMetadataBranch no longer unshallows, breaking SafelyAdvanceLocalRef fallback
High Severity
The new comment on FetchMetadataTreeOnly claims "FetchMetadataBranch will undo that when full ancestry is later needed," but FetchMetadataBranch now explicitly does NOT --unshallow (Unshallow defaults to false). The old fullMetadataFetchArgs helper that added --unshallow was deleted without preserving this behavior. In the resume.go fallback flow, FetchMetadataTreeOnly creates shallow state via --depth=1, then FetchMetadataBranch is called as a fallback but no longer removes the shallow boundary. SafelyAdvanceLocalRef inside FetchMetadataBranch then fails with "no merge base… run 'entire doctor'" whenever the local metadata branch differs from remote, because git merge-base cannot traverse past the shallow boundary. Previously this path succeeded automatically.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 6af9e93. Configure here.


https://entire.io/gh/entireio/cli/trails/218
Summary
--no-auto-gc) to prevent a race condition wheregit gcrepacks/deletes pack files while go-git is reading from them, causingENOENTerrors during checkpoint sync--depth=1) — shallow repos break commands that need checkpoint ancestry/history, and the bandwidth savings are now handled by filtered fetches insteadfiltered_fetchesis enabled and the repo is shallow (from a previous--depth=1fetch), the next filtered fetch adds--unshallowto migrate away from shallow statefiltered_fetches—FetchMetadataTreeOnlyandFetchV2MainTreeOnlyonly make sense when filtered fetches are enabled (they skip blobs); without the setting, callers fall back to full fetches that include blob contentContext
After enabling
filtered_fetchesin the project settings, users hit this error duringgit push:[entire] Warning: couldn't sync entire/checkpoints/v1: failed to rebase local
commits onto remote: failed to flatten tip tree: failed to get subtree
36/db97269a69: open .git/objects/pack/pack-.pack: no such file or directory
Each filtered fetch creates a promisor pack file. Accumulating promisor packs reaches the
gc.autoPackLimitthreshold, triggeringgc --autowhich repacks and deletes old pack files.go-git snapshots the pack file list when the repository is opened, then fails when it tries to read from a pack that was deleted by gc between the snapshot and the read.
New version of #1024.
Note
Medium Risk
Changes core git fetch, ref promotion, and metadata reconciliation paths used during push, resume, and doctor; mistakes could leave stale checkpoint refs or over-fetch history, but behavior is covered by new integration tests.
Overview
Checkpoint sync now runs
git fetchwith--no-auto-gcso automatic GC cannot repack or delete packfiles while go-git still reads them (fixing intermittentENOENTon.git/objects/packduring rebase/sync).Fetch behavior is split explicitly:
FetchOptionsgains documentedShallowandUnshallowflags (with shallow detection viarev-parse --is-shallow-repository) instead of implicitly adding--unshallowon every full metadata fetch.FetchV2MainTreeOnlyno longer uses--depth=1, because shallow boundaries made remote tips look unrelated andSafelyAdvanceLocalRefcould leave stale local v2 refs; a regression test asserts the repo stays non-shallow. Generic push/rebase fetches also stop using--unshallow, with comments that unshallow is clone-global and can pull an entire monorepo; repair paths that need full ancestry passUnshallow: true(e.g. doctor v2 temp fetch).Reconciliation respects shallow history: commit walks and cherry-picks load
.git/shallowand treat boundary commits as roots so replay does not walk into stale parents. V2 disconnected-ref detection and repair (IsV2MainDisconnected,ReconcileDisconnectedV2Ref) mirror v1 vials-remote, temp fetch, and cherry-pick onto the remote tip.Reviewed by Cursor Bugbot for commit 6af9e93. Configure here.