Make `entire explain` work with partial-clone by Soph · Pull Request #1069 · entireio/cli

Soph · 2026-04-28T19:03:36Z

Problem

entire explain <checkpoint-id> failed with "no checkpoint or commit found" against a checkpoint_remote-backed setup with filtered_fetches: true. The checkpoint was on the remote and discoverable in the local tree, but the metadata blob was absent locally — and go-git's Tree.File() returns ErrFileNotFound for missing blobs, which the v1 / v2 read paths treated as "checkpoint doesn't exist." Even when it eventually returned correct data, it could take 10+ seconds on a cold cache with zero progress feedback — easy to mistake for a hang.

Fix — correctness

Three coordinated changes:

Configure blob fetcher on both stores. v1Store had SetBlobFetcher but explain wasn't calling it. Added the same to V2GitStore and wired both.
Wrap v2 reads in FetchingTree. v1's read path already used FetchingTree (with cat-file fallback for partial-clone-filtered blobs). Mirrored that for v2's ReadCommitted / ReadSession* methods.
Use git fetch-pack for blob SHAs. Porcelain git fetch <url> <hash> enforces partial-clone integrity checks that reject blob-only responses; plumbing skips those checks. Switched FetchBlobs accordingly.

Plus: FetchingTree.File now tries git cat-file BEFORE the network fetch — saves multi-second round-trips when a blob is on disk but invisible to go-git's storer (typical partial-clone state).

Fix — progress feedback

Wrapped the entire pre-output pipeline with two sequential spinners on stderr:

⣾ Loading checkpoints during newExplainCheckpointLookup (the slow ListCommitted walk that reads every checkpoint's metadata.json via git cat-file)
⣾ Loading checkpoint <id> during prefetchCheckpointBlobs (including its cat-file -e analysis loop), the actual fetch, summary + session reads, and getAssociatedCommits git log walk

Same Braille frames as the activity TUI for visual consistency. 250ms initial delay so warm/fast runs stay silent — no flicker. Stops strictly before any write to stdout so spinner frames and output never interleave. Suppressed entirely when stderr isn't a terminal (CI, redirected output, agent subprocesses).

Tests

Two new integration tests reproduce the partial-clone state in a fresh TempDir clone (file:// URL + uploadpack.allowFilter=true on the bare so --filter=blob:none is actually honored — the local-path transport ignores filters) and verify explain succeeds:

TestExplain_CheckpointSucceedsAfterTreelessFetch (v1)
TestExplain_CheckpointV2SucceedsAfterTreelessFetch (v2)

Both assert at least one metadata blob is genuinely missing locally before running explain — without that pre-condition the tests could silently pass against a fully-cloned repo. Verified for regression: stashing the v2 fix makes the v2 test fail with the exact original symptom (failed to read checkpoint: checkpoint not found).

Performance

Cold run (blobs missing locally): ~1-2s — one batched fetch-pack call covers all blobs in the checkpoint subtree, no per-blob round-trips
Warm run (blobs on disk): sub-second — cat-file reads local blobs directly, no network

`entire explain <checkpoint-id>` failed with "no checkpoint or commit found" when the metadata branch was on a configured `checkpoint_remote` but blobs hadn't been fetched yet. Three coordinated fixes: 1. v1Store now has a blob fetcher configured (the piece resume already had via SetBlobFetcher). Without it, go-git's Tree.File() returns ErrFileNotFound on a missing blob, which ReadCommitted translates to ErrCheckpointNotFound — making it look like the checkpoint doesn't exist when really only the blob is absent. 2. FetchBlobs now uses `git fetch-pack` instead of porcelain `git fetch`. Porcelain enforces partial-clone integrity checks that reject blob-only responses with "did not send all necessary objects", so `git fetch <url> <blob-sha>` always failed against GitHub for repos with filtered_fetches enabled. Plumbing skips those checks and just downloads the requested objects. 3. FetchingTree.File now tries `git cat-file` BEFORE the network fetch. In partial-clone repos a blob is commonly on disk but invisible to go-git's storer (filtered out, or in a packfile not in the cached index). Without the short-circuit, every File() burned a network round-trip on already-local data — which compounded into per-File 25s SSH-handshake-and-rejection delays before this fix. The matching empty-list fetch chain for explain now mirrors resume.getMetadataTree (checkpoint_remote → treeless origin → full origin), and a per-checkpoint PreFetch batches all subtree blobs in one fetch-pack call. Cold runs are ~1-2s; warm runs are sub-second. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 74b3190b7dfb

The v1 fix in 645fb12 left v2 vulnerable to the same bug: when a metadata blob is on the remote but absent locally, V2GitStore's read path used raw `*object.Tree.File()` calls which return ErrFileNotFound on missing blobs — and ReadCommitted treats that as "checkpoint not found". This blocks `entire explain` against partial-clone repos as soon as checkpoints_v2 is enabled. v2 fix mirrors v1: - V2GitStore gains `blobFetcher` + `SetBlobFetcher` (parallel to GitStore). - ReadCommitted, ReadSessionCompactTranscript, ReadSessionMetadataAndPrompts, and ReadSessionContent now wrap their cp/session subtrees in FetchingTree before calling File(). FetchingTree's cat-file fallback + on-demand fetcher recover blobs that go-git's storer can't see. - explain configures FetchBlobsByHash on both v1 and v2 stores. - prefetchCheckpointBlobs walks both v1 and v2 cp subtrees so a single fetch-pack round-trip primes both stores. Regression tests in cmd/entire/cli/integration_test/explain_test.go: - TestExplain_CheckpointSucceedsAfterTreelessFetch (v1) - TestExplain_CheckpointV2SucceedsAfterTreelessFetch (v2) Both reproduce the bug-triggering state by: 1. Pushing a checkpoint to a bare remote. 2. Cloning into a fresh TempDir with `--filter=blob:none --depth=1` over `file://` URL (uploadpack.allowFilter set on the bare). This gives the clone tree entries but no blobs — the actual partial-clone state. 3. Asserting at least one metadata blob is genuinely missing (otherwise the test would silently pass without exercising the fix). 4. Running explain in the clone and verifying the prompt text is in the output. Verified: stashing the v2 fix and running TestExplain_CheckpointV2SucceedsAfterTreelessFetch fails with `failed to read checkpoint: checkpoint not found` — the exact original bug — confirming the test catches regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 7febd2b70c1f

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

Comment @cursor review or bugbot run to trigger another review on this PR

^{Reviewed by Cursor Bugbot for commit ae4b4e5. Configure here.}

Entire-Checkpoint: 3ebcceee3024

`entire explain <id>` against a treeless checkpoint metadata branch can take 1-2 seconds for the fetch-pack round-trip. Previously the command was silent during that window — easy to mistake for a hang. Adds two visual indicators on stderr: 1. "Fetching checkpoint metadata from remote" — shown when the empty- matches path triggers `getMetadataTree` / `getV2MetadataTree`. 2. "Fetching N checkpoint blob(s) from remote" — shown when the pre-resolve prefetch detects N missing blobs in the cp subtree(s). The spinner uses the same Braille frames as the activity TUI's `spinner.Dot` so the visual matches across commands. When stderr is not a terminal (CI, redirected output, agent subprocess), the spinner is suppressed and the message is printed once with a trailing "..." — non-interactive callers still see what's happening without ANSI noise. `prefetchCheckpointBlobs` was restructured to count missing blobs upfront (via the new exported `FetchingTree.CollectMissingBlobs`) so the spinner only shows when network work is genuinely needed; warm runs where everything is already local stay silent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: c3c1ad356bad

…arm runs stay silent Spinner from the previous commit only wrapped the explicit fetch paths, but the slow part of `entire explain` against a treeless metadata branch is actually `newExplainCheckpointLookup` → `ListCommitted`, which reads metadata.json via `git cat-file` for every checkpoint in the branch (~10ms × N). 7-10s before any output, no visible feedback. Wrap the lookup creation in `runExplainAuto` with the spinner. Add a 250ms initial delay to startSpinner so warm/fast runs (sub-second) don't flicker — the spinner only appears once the operation has actually been running long enough that the user might wonder if it's hung. Also drop the suppression-mode "msg..." print for non-TTY: redirected output and CI stay completely clean (matches the existing convention for progress dots). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 7ccf5948e128

The "Loading checkpoints" spinner only wraps `newExplainCheckpointLookup` (ListCommitted). After that returns, several more network/IO steps run silently before output prints: - ResolveCommittedReaderForCheckpoint reads metadata.json via cat-file - readLatestSessionContentForExplain reads session metadata, prompt, transcript blobs (more cat-file calls) - getAssociatedCommits walks git log for matching trailers On a deep repo or one with many missing-blob reads, this second phase itself can take seconds. The user previously saw the lookup spinner, then silence for "twice as long", then output. Add a second spinner with message "Loading checkpoint <id>" covering the post-lookup pipeline. Stop strictly before any write to w (stdout) so the stderr spinner frames never interleave with stdout output. The existing 250ms initial delay applies, so warm runs that complete the data-load in under a quarter second still produce no spinner output. For --generate, the spinner stops before generation prints its own progress, then a "Reloading checkpoint" spinner resumes for the post-generation read. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 2f90eccad241

The data-load spinner started AFTER prefetchCheckpointBlobs, but the prefetch's missing-blob analysis is itself slow on a deep checkpoint subtree: it spawns one `git cat-file -e` subprocess per blob entry. For a checkpoint with many sessions × tasks × files, that's hundreds of subprocess calls — silent seconds between the lookup spinner ending and the data-load spinner starting. Move the spinner start to wrap prefetchCheckpointBlobs too, and drop the inner "Fetching N blobs from remote" spinner that the prefetch function used to spawn. Two spinners on the same line would conflict; one continuous spinner is the right UX. The fetch-count detail moves into a debug log. Extract the prefetch + summary read + content read sequence into a loadCheckpointForExplain helper so runExplainCheckpointWithLookup stays under maintidx limits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: c48334ab4ab6

Soph and others added 2 commits April 28, 2026 20:00

Copilot AI review requested due to automatic review settings April 28, 2026 19:03

Soph requested a review from a team as a code owner April 28, 2026 19:03

cursor Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread cmd/entire/cli/explain.go Outdated

Comment thread cmd/entire/cli/integration_test/explain_test.go

Soph and others added 4 commits April 28, 2026 21:14

pr comments

cff7248

Entire-Checkpoint: 3ebcceee3024

pjbgf previously approved these changes Apr 28, 2026

View reviewed changes

Soph dismissed pjbgf’s stale review via c7c9368 April 29, 2026 07:01

gtrrz-victor approved these changes Apr 29, 2026

View reviewed changes

gtrrz-victor merged commit 28aa01f into main Apr 30, 2026
9 checks passed

gtrrz-victor deleted the soph/explain-fetch branch April 30, 2026 15:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `entire explain` work with partial-clone#1069

Make `entire explain` work with partial-clone#1069
gtrrz-victor merged 7 commits intomainfrom
soph/explain-fetch

Soph commented Apr 28, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Conversation

Soph commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix — correctness

Fix — progress feedback

Tests

Performance

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Soph commented Apr 28, 2026 •

edited

Loading