Make entire explain work with partial-clone#1069
Merged
gtrrz-victor merged 7 commits intomainfrom Apr 30, 2026
Merged
Conversation
`entire explain <checkpoint-id>` failed with "no checkpoint or commit found" when the metadata branch was on a configured `checkpoint_remote` but blobs hadn't been fetched yet. Three coordinated fixes: 1. v1Store now has a blob fetcher configured (the piece resume already had via SetBlobFetcher). Without it, go-git's Tree.File() returns ErrFileNotFound on a missing blob, which ReadCommitted translates to ErrCheckpointNotFound — making it look like the checkpoint doesn't exist when really only the blob is absent. 2. FetchBlobs now uses `git fetch-pack` instead of porcelain `git fetch`. Porcelain enforces partial-clone integrity checks that reject blob-only responses with "did not send all necessary objects", so `git fetch <url> <blob-sha>` always failed against GitHub for repos with filtered_fetches enabled. Plumbing skips those checks and just downloads the requested objects. 3. FetchingTree.File now tries `git cat-file` BEFORE the network fetch. In partial-clone repos a blob is commonly on disk but invisible to go-git's storer (filtered out, or in a packfile not in the cached index). Without the short-circuit, every File() burned a network round-trip on already-local data — which compounded into per-File 25s SSH-handshake-and-rejection delays before this fix. The matching empty-list fetch chain for explain now mirrors resume.getMetadataTree (checkpoint_remote → treeless origin → full origin), and a per-checkpoint PreFetch batches all subtree blobs in one fetch-pack call. Cold runs are ~1-2s; warm runs are sub-second. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 74b3190b7dfb
The v1 fix in 645fb12 left v2 vulnerable to the same bug: when a metadata blob is on the remote but absent locally, V2GitStore's read path used raw `*object.Tree.File()` calls which return ErrFileNotFound on missing blobs — and ReadCommitted treats that as "checkpoint not found". This blocks `entire explain` against partial-clone repos as soon as checkpoints_v2 is enabled. v2 fix mirrors v1: - V2GitStore gains `blobFetcher` + `SetBlobFetcher` (parallel to GitStore). - ReadCommitted, ReadSessionCompactTranscript, ReadSessionMetadataAndPrompts, and ReadSessionContent now wrap their cp/session subtrees in FetchingTree before calling File(). FetchingTree's cat-file fallback + on-demand fetcher recover blobs that go-git's storer can't see. - explain configures FetchBlobsByHash on both v1 and v2 stores. - prefetchCheckpointBlobs walks both v1 and v2 cp subtrees so a single fetch-pack round-trip primes both stores. Regression tests in cmd/entire/cli/integration_test/explain_test.go: - TestExplain_CheckpointSucceedsAfterTreelessFetch (v1) - TestExplain_CheckpointV2SucceedsAfterTreelessFetch (v2) Both reproduce the bug-triggering state by: 1. Pushing a checkpoint to a bare remote. 2. Cloning into a fresh TempDir with `--filter=blob:none --depth=1` over `file://` URL (uploadpack.allowFilter set on the bare). This gives the clone tree entries but no blobs — the actual partial-clone state. 3. Asserting at least one metadata blob is genuinely missing (otherwise the test would silently pass without exercising the fix). 4. Running explain in the clone and verifying the prompt text is in the output. Verified: stashing the v2 fix and running TestExplain_CheckpointV2SucceedsAfterTreelessFetch fails with `failed to read checkpoint: checkpoint not found` — the exact original bug — confirming the test catches regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 7febd2b70c1f
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit ae4b4e5. Configure here.
Entire-Checkpoint: 3ebcceee3024
`entire explain <id>` against a treeless checkpoint metadata branch can take 1-2 seconds for the fetch-pack round-trip. Previously the command was silent during that window — easy to mistake for a hang. Adds two visual indicators on stderr: 1. "Fetching checkpoint metadata from remote" — shown when the empty- matches path triggers `getMetadataTree` / `getV2MetadataTree`. 2. "Fetching N checkpoint blob(s) from remote" — shown when the pre-resolve prefetch detects N missing blobs in the cp subtree(s). The spinner uses the same Braille frames as the activity TUI's `spinner.Dot` so the visual matches across commands. When stderr is not a terminal (CI, redirected output, agent subprocess), the spinner is suppressed and the message is printed once with a trailing "..." — non-interactive callers still see what's happening without ANSI noise. `prefetchCheckpointBlobs` was restructured to count missing blobs upfront (via the new exported `FetchingTree.CollectMissingBlobs`) so the spinner only shows when network work is genuinely needed; warm runs where everything is already local stay silent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: c3c1ad356bad
…arm runs stay silent Spinner from the previous commit only wrapped the explicit fetch paths, but the slow part of `entire explain` against a treeless metadata branch is actually `newExplainCheckpointLookup` → `ListCommitted`, which reads metadata.json via `git cat-file` for every checkpoint in the branch (~10ms × N). 7-10s before any output, no visible feedback. Wrap the lookup creation in `runExplainAuto` with the spinner. Add a 250ms initial delay to startSpinner so warm/fast runs (sub-second) don't flicker — the spinner only appears once the operation has actually been running long enough that the user might wonder if it's hung. Also drop the suppression-mode "msg..." print for non-TTY: redirected output and CI stay completely clean (matches the existing convention for progress dots). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 7ccf5948e128
The "Loading checkpoints" spinner only wraps `newExplainCheckpointLookup`
(ListCommitted). After that returns, several more network/IO steps run
silently before output prints:
- ResolveCommittedReaderForCheckpoint reads metadata.json via cat-file
- readLatestSessionContentForExplain reads session metadata, prompt,
transcript blobs (more cat-file calls)
- getAssociatedCommits walks git log for matching trailers
On a deep repo or one with many missing-blob reads, this second phase
itself can take seconds. The user previously saw the lookup spinner,
then silence for "twice as long", then output.
Add a second spinner with message "Loading checkpoint <id>" covering the
post-lookup pipeline. Stop strictly before any write to w (stdout) so
the stderr spinner frames never interleave with stdout output. The
existing 250ms initial delay applies, so warm runs that complete the
data-load in under a quarter second still produce no spinner output.
For --generate, the spinner stops before generation prints its own
progress, then a "Reloading checkpoint" spinner resumes for the
post-generation read.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 2f90eccad241
pjbgf
previously approved these changes
Apr 28, 2026
The data-load spinner started AFTER prefetchCheckpointBlobs, but the prefetch's missing-blob analysis is itself slow on a deep checkpoint subtree: it spawns one `git cat-file -e` subprocess per blob entry. For a checkpoint with many sessions × tasks × files, that's hundreds of subprocess calls — silent seconds between the lookup spinner ending and the data-load spinner starting. Move the spinner start to wrap prefetchCheckpointBlobs too, and drop the inner "Fetching N blobs from remote" spinner that the prefetch function used to spawn. Two spinners on the same line would conflict; one continuous spinner is the right UX. The fetch-count detail moves into a debug log. Extract the prefetch + summary read + content read sequence into a loadCheckpointForExplain helper so runExplainCheckpointWithLookup stays under maintidx limits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: c48334ab4ab6
gtrrz-victor
approved these changes
Apr 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Problem
entire explain <checkpoint-id>failed with "no checkpoint or commit found" against acheckpoint_remote-backed setup withfiltered_fetches: true. The checkpoint was on the remote and discoverable in the local tree, but the metadata blob was absent locally — and go-git'sTree.File()returnsErrFileNotFoundfor missing blobs, which the v1 / v2 read paths treated as "checkpoint doesn't exist." Even when it eventually returned correct data, it could take 10+ seconds on a cold cache with zero progress feedback — easy to mistake for a hang.Fix — correctness
Three coordinated changes:
SetBlobFetcherbut explain wasn't calling it. Added the same to V2GitStore and wired both.FetchingTree. v1's read path already used FetchingTree (with cat-file fallback for partial-clone-filtered blobs). Mirrored that for v2'sReadCommitted/ReadSession*methods.git fetch-packfor blob SHAs. Porcelaingit fetch <url> <hash>enforces partial-clone integrity checks that reject blob-only responses; plumbing skips those checks. SwitchedFetchBlobsaccordingly.Plus:
FetchingTree.Filenow triesgit cat-fileBEFORE the network fetch — saves multi-second round-trips when a blob is on disk but invisible to go-git's storer (typical partial-clone state).Fix — progress feedback
Wrapped the entire pre-output pipeline with two sequential spinners on stderr:
⣾ Loading checkpointsduringnewExplainCheckpointLookup(the slowListCommittedwalk that reads every checkpoint's metadata.json viagit cat-file)⣾ Loading checkpoint <id>duringprefetchCheckpointBlobs(including itscat-file -eanalysis loop), the actual fetch, summary + session reads, andgetAssociatedCommitsgit log walkSame Braille frames as the activity TUI for visual consistency. 250ms initial delay so warm/fast runs stay silent — no flicker. Stops strictly before any write to stdout so spinner frames and output never interleave. Suppressed entirely when stderr isn't a terminal (CI, redirected output, agent subprocesses).
Tests
Two new integration tests reproduce the partial-clone state in a fresh TempDir clone (
file://URL +uploadpack.allowFilter=trueon the bare so--filter=blob:noneis actually honored — the local-path transport ignores filters) and verify explain succeeds:TestExplain_CheckpointSucceedsAfterTreelessFetch(v1)TestExplain_CheckpointV2SucceedsAfterTreelessFetch(v2)Both assert at least one metadata blob is genuinely missing locally before running explain — without that pre-condition the tests could silently pass against a fully-cloned repo. Verified for regression: stashing the v2 fix makes the v2 test fail with the exact original symptom (
failed to read checkpoint: checkpoint not found).Performance
fetch-packcall covers all blobs in the checkpoint subtree, no per-blob round-tripscat-filereads local blobs directly, no network