Prevent pack file race condition during checkpoint sync#1024
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses checkpoint sync failures caused by git gc --auto repacking/deleting pack files while go-git is reading them, by preventing auto-GC during fetches and by improving fetch behavior (removing shallow fetches, migrating legacy shallow repos, and gating tree-only fetch fast paths on filtered_fetches).
Changes:
- Add
--no-auto-gcto git fetch operations to prevent packfile races during checkpoint sync. - Remove shallow checkpoint fetch behavior and add an auto-
--unshallowmigration for legacy shallow repos whenfiltered_fetchesis enabled. - Only use “tree-only” fetch paths when
filtered_fetchesis enabled; otherwise fall back to full fetches.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| cmd/entire/cli/strategy/common.go | Removes outdated comment about --depth=1 compatibility with ancestry checks. |
| cmd/entire/cli/resume.go | Gates tree-only metadata fetch path on filtered_fetches; adjusts v2 tree-only fetch injection. |
| cmd/entire/cli/git_operations.go | Removes shallow fetch option plumbing; updates metadata/v2 fetch helpers accordingly. |
| cmd/entire/cli/explain.go | Uses tree-only vs full fetch depending on filtered_fetches setting. |
| cmd/entire/cli/checkpoint/v2_read.go | Adds --no-auto-gc to v2 full-ref fetches. |
| cmd/entire/cli/checkpoint/remote/git.go | Adds --no-auto-gc to fetches; adds shallow-repo detection and --unshallow migration logic. |
| cmd/entire/cli/checkpoint/remote/git_test.go | Updates tests for new --no-auto-gc argument; adds a test for unshallowing legacy shallow repos. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 8cb8c82. Configure here.
Entire-Checkpoint: 8436c49de6b9
Entire-Checkpoint: 2aaa0ca9a3ae
Entire-Checkpoint: 5dec9ad3f03b
Entire-Checkpoint: 3c18f86e29f8
The previous gate (filtered_fetches AND shallow) left a class of users exposed: anyone whose repo became shallow for a non-Entire reason (a hand-run git clone --depth=N, an older CLI version that pre-dated the filtered-fetches setting, an unrelated tool) still saw the reconcile path walk parent chains past .git/shallow via go-git and propose to cherry-pick hundreds of stale commits onto the remote. Drop the gate. Any shallow checkpoint repo unshallows on the next fetch, regardless of how the shallow state was introduced. Test now covers both filtered_fetches on and off. Entire-Checkpoint: 5aacb2ba139e
collectCommitChain and the cherry-pick path used go-git to walk commit.ParentHashes. go-git does not consult .git/shallow, so on a shallow checkpoint repo the walk strolled past the boundary into stale pack objects — producing a phantom chain that the reconcile path then proposed to cherry-pick onto the remote tip, duplicating commits that already lived on the remote under their original SHAs. Read the shallow set once at the top of each reconcile/rebase entry point and pass it down. collectCommitChain stops at shallow commits; treeChangesForCherryPick treats shallow-boundary commits as roots and diffs them against an empty tree, so the cherry-pick contributes the commit's full content rather than a delta against a stale parent. This is independent of the auto-unshallow fix: a future shallow file introduced by any other tool (manual git clone --depth, an unrelated plumbing helper) would otherwise still trigger the same bug. Entire-Checkpoint: 49c40909c8f5
…File - fetchWorkingDir: wrap filepath.Abs error, document why os.Getwd fallback is intentional (caller passed empty Dir to mean "inherit cwd"; tests rely on this), nolint forbidigo with reason. - loadShallowHashes: nolint gosec G304 — path is derived from git's own --git-common-dir output, not user input. No behavior change. Entire-Checkpoint: b5bbff32a63d
… is off The previous gating in getMetadataTree / getV2MetadataTree skipped the tree-only fetch entirely when filtered_fetches was disabled. With no fetch step, the local ref lookup fell through to whatever the worktree last saw, which could be stale (a collaborator pushed a newer checkpoint, or our last fetch was before the producer published). The "full fetch" fallback then never ran because the stale local lookup "succeeded." Replace the gate with a first-attempt fetch that is always issued: tree-only when filtered_fetches is enabled (cheaper), full fetch otherwise. For v2 the secondary fallback is set to nil when the first-attempt is already the full fetch, to avoid a duplicate network round-trip. Restores the pre-PR property that a fetch always precedes the local lookup. Regression covered by TestRunExplainExport_JSONFetchesRemoteV2WhenLocalV2RefSelectsDualMode. Entire-Checkpoint: 60d8c085123a
…tor test setup - Drop fetchWorkingDir/isShallowRepositoryInDir indirection in isShallowRepository: cmd.Dir = "" already inherits parent cwd, so we don't need to resolve it ourselves. Removes the os.Getwd nolint and the filepath.Abs wrap. - Trim multi-paragraph doc on loadShallowHashes and the inline shallow rationale in treeChangesForCherryPick (cherryPickOnto's doc already explains the WHY). - Resume.go: collapse paired firstFetch/firstFetchLabel assignments into one line each; trim over-explanatory "First-attempt fetch ensures..." block in getV2MetadataTree. - Factor TestFetch_UnshallowsShallowRepository's setup into setupShallowClone + runIsolatedGit helpers so the table body is now the actual assertion rather than 70 lines of duplicated scaffolding. Entire-Checkpoint: cd78c8c0f7bd
Two issues raised in review:
1. Generic Fetch auto-unshallowed any shallow repo, including
FetchAndCheckoutRemoteBranch's user-branch fetches. A deliberately
shallow user clone would silently convert to full history just by
running resume/checkout — far outside the metadata-repair scope
this PR intended.
2. FetchMetadataTreeOnly / FetchV2MainTreeOnly were no longer "tree-
only" after --depth=1 was dropped. They now downloaded the full
commit chain of the metadata ref (minus blobs when filtered fetches
are enabled). On long-lived checkpoint branches every resume/explain
paid the full-history cost.
Fix: replace the implicit auto-unshallow with two explicit flags on
FetchOptions:
- Shallow (--depth=1): set on tip-only probes. Cheap, creates
.git/shallow state. Safe now because the reconcile path's commit
walk respects .git/shallow (earlier commit in this PR).
- Unshallow (--unshallow when shallow): set on metadata-repair
paths that need full ancestry (FetchMetadataBranch,
FetchV2MainRef, fetchAndRebaseSessionsCommon, doctor v2
disconnection check). Migrates legacy shallow state on demand.
Generic FetchAndCheckoutRemoteBranch sets neither — user branches
stay as the user configured them.
resume.go's getMetadataTree / getV2MetadataTree can revert to
always-call-tree-only since tree-only is genuinely cheap again;
drops the filtered_fetches-conditional firstFetch switcheroo and
its accompanying logging. Test renamed to TestFetch_Unshallow with
two cases (Unshallow=true deepens, Unshallow=false leaves alone)
plus TestFetch_Shallow asserting --depth=1 actually shallows a
clone.
Entire-Checkpoint: 9fa8691d4b8a
git's --unshallow removes ALL shallow boundaries in .git/shallow, not just the one for the ref being fetched. On a shallow clone of a large repo it pulls the entire repository history (we observed 687 MB on entirehq/entire). Used from the push hook, this hangs every push on shallow repos until the full clone arrives. The downstream reconcile/rebase paths already respect .git/shallow (collectCommitChain stops at shallow boundaries; collectCommitsSince uses git rev-list which respects them natively). When the local view is shallow-truncated, the merge-base check may report "disconnected" where the actual chains share ancestry below the boundary — but the cherry-pick path it routes through still produces a correct result, because it picks up exactly the commits that aren't yet on the remote. Remove Unshallow from: - fetchAndRebaseSessionsCommon (push hot path) - FetchMetadataBranch (resume/explain fallback — only needs blobs) - FetchV2MainRef (same) Keep Unshallow on metadata_reconcile's doctor v2 disconnection check since it's only reached from `entire doctor`, an explicit user-invoked diagnostic where the cost is acceptable. Verified by re-running the full unit suite (5304 tests) and the fetch-flag regression coverage from TestFetch_Unshallow / TestFetch_Shallow. Entire-Checkpoint: 8184f5f2a970
10f42e6 to
a12d142
Compare

https://entire.io/gh/entireio/cli/trails/6148e08144d9
Summary
--no-auto-gc) to prevent a race condition wheregit gcrepacks/deletes pack files while go-git is reading from them, causingENOENTerrors during checkpoint sync--depth=1) — shallow repos break commands that need checkpoint ancestry/history, and the bandwidth savings are now handled by filtered fetches insteadfiltered_fetchesis enabled and the repo is shallow (from a previous--depth=1fetch), the next filtered fetch adds--unshallowto migrate away from shallow statefiltered_fetches—FetchMetadataTreeOnlyandFetchV2MainTreeOnlyonly make sense when filtered fetches are enabled (they skip blobs); without the setting, callers fall back to full fetches that include blob contentContext
After enabling
filtered_fetchesin the project settings, users hit this error duringgit push:[entire] Warning: couldn't sync entire/checkpoints/v1: failed to rebase local
commits onto remote: failed to flatten tip tree: failed to get subtree
36/db97269a69: open .git/objects/pack/pack-.pack: no such file or directory
Each filtered fetch creates a promisor pack file. Accumulating promisor packs reaches the
gc.autoPackLimitthreshold, triggeringgc --autowhich repacks and deletes old pack files.go-git snapshots the pack file list when the repository is opened, then fails when it tries to read from a pack that was deleted by gc between the snapshot and the read.
Note
Medium Risk
Touches checkpoint sync/fetch behavior and Git plumbing flags, so regressions could impact resume/explain flows or fetch performance across different repo states (shallow vs full). Changes are scoped and covered by an added unshallow migration test.
Overview
Prevents packfile races during checkpoint sync by adding
--no-auto-gcto all checkpoint-relatedgit fetchinvocations (including blob-by-hash and v2/full/*fetches), avoiding background repacks while go-git is reading.Removes legacy shallow checkpoint fetches and, when
filtered_fetchesis enabled, automatically migrates existing shallow repos by detecting shallow state and adding--unshallowon subsequent filtered fetches.Adjusts
resume/explainmetadata fetch fallbacks so the “tree-only” fast path is only used whenfiltered_fetchesis enabled; otherwise callers fall back to full, blob-inclusive fetches. Adds a test ensuring filtered fetches unshallow a legacy shallow clone.Reviewed by Cursor Bugbot for commit 8cb8c82. Configure here.