Checkpoints v2: Fix migration rerun logic (speedups, avoid duplicated migration efforts)#1114
Merged
computermode merged 13 commits intomainfrom May 5, 2026
Merged
Checkpoints v2: Fix migration rerun logic (speedups, avoid duplicated migration efforts)#1114computermode merged 13 commits intomainfrom
computermode merged 13 commits intomainfrom
Conversation
Entire-Checkpoint: ff49fc0893a5
Detect checkpoints with /main written but /full artifacts missing — the state a Ctrl+C between the metadata write and the generation packer flush leaves behind — and queue the missing sessions for the same packer used by fresh migrations. Result: * New v1 checkpoints are migrated on rerun. * Interrupted runs resume without --force. * Fully-migrated checkpoints stay skipped. * --force still prunes and re-migrates from scratch. Also recognize the pre-rename filenames (full.jsonl, full.jsonl.NNN, content_hash.txt — see a3cd771) as valid /full-session artifacts. Without this, archived /full/<n> generations written by an older CLI looked "missing" on rerun and were repacked into duplicate generations (a real repo went 46 → 86 archived gens from one rerun). Both the new (raw_transcript / raw_transcript_hash.txt) and legacy names are now accepted. CleanupV1TranscriptFiles is removed — --force prunes the entire checkpoint subtree (legacy files included), so targeted /full/current cleanup is redundant under the new semantics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 7391783beb62
Walk every /full/* ref (current plus all archived /full/<n>) at the start of each migrate invocation and rewrite session subtrees so any pre-rename entry names — full.jsonl, full.jsonl.NNN, content_hash.txt — are renamed to their current equivalents (raw_transcript, raw_transcript.NNN, raw_transcript_hash.txt). Blob hashes are preserved; only tree entry names change. A repo already on the current naming is a no-op (no commits created). Without this, a repo migrated under a pre-a3cd77122 CLI ended up with archived generations carrying legacy names that the read paths could no longer find. The previous commit's recognize-both-names fix prevented duplicate generations on rerun, but left the legacy entries sitting on disk as dark matter. Now they get renamed in place on the next migrate run, so v2 ends up on a single naming convention regardless of which CLI version originally populated it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: d0f97513ade1
…e run" This reverts commit 3be8eed. Entire-Checkpoint: da7d3f277ea6
The pre-rename filenames (full.jsonl / content_hash.txt — see a3cd771) don't appear in any real repo we've inspected: every v2 archived /full/<n> was written under the current naming (raw_transcript / raw_transcript_hash.txt). Recognizing the legacy names was speculative defense for a path no real data follows. Removes the extra switch arms in inspectFullSessionArtifacts plus the test case + helper that were exercising them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 610ce5c951cf
Three changes that together turn the loop and post-loop phases of entire migrate from "hours" to "seconds + the irreducible cost of reading the v1 raw transcripts that need packing" on repos with thousands of partially-migrated checkpoints. * Build a /full/* presence index once per migration run. HasFullSessionArtifacts lists every git ref in the repo and re-walks every /full/* tree on each call. The migration loop calls it once per session — for every v1 checkpoint — so on a repo with N refs and K archived /full/<n> generations the cost is O(checkpoints x sessions x N x K). The new BuildFullSessionArtifactsIndex walks each /full/* tree once at the top of migrateCheckpointsV2 and records sessions with both raw_transcript and raw_transcript_hash.txt. The loop's presence checks become O(1) map lookups. * Read only metadata.json when collecting missing-/full sessions. collectMissingFullSessionsForPacking previously called ReadSessionMetadataAndPrompts purely to extract Metadata.SessionID, but that function also reads prompt.txt and transcript.jsonl. The third lookup spammed FetchingTree.File "entry not found" debug logs at thousands of cps/sec on partial-state repos because compact transcripts are absent from /main for the same sessions whose /full is missing. New ReadSessionMetadata reads only metadata.json. * Skip just-packed archives in RepairV2GenerationMetadata. The migration packer writes each fresh /full/<n> with generation.json computed in-memory from the just-packed transcripts (via AggregateTranscriptTimestamps), so re-deriving timestamps from the archived blobs in the repair pass is wasted work. The packer now records every ref it writes, runMigrateCheckpointsV2 passes that list to RepairV2GenerationMetadata via the new ExcludeRefs option, and the repair pass filters those candidates out before doing any per-archive computation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 3983fe320ba8
RepairV2GenerationMetadata does a `git ls-remote` plus a transcript- blob walk for every archived /full/<n> ref. On a fully-migrated repo those archives don't change between runs, so paying that cost on every `entire migrate --checkpoints v2` invocation is wrong — it's the dominant cause of "the progress bar finishes and then it hangs" on big repos. Gate the call on `len(freshlyPackedRefs) > 0`. Repair runs when the migration actually wrote new state (fresh checkpoints or a resumed partial migration); a no-op rerun exits as soon as the loop completes. The previously-existing path for fixing malformed archives still works whenever migration does any packing — the malformed archive isn't in ExcludeRefs and gets repaired. The path that no longer fires automatically is "no migration work, just verify everything that already existed", which only matters on a malformed-but-stable repo and can be triggered explicitly via --force. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: e03e093352f3
Contributor
There was a problem hiding this comment.
Pull request overview
Refines the hidden entire migrate --checkpoints v2 flow so reruns avoid redundant work in the checkpoints-v2 migration pipeline. The PR mainly speeds up reruns by indexing existing /full/* artifacts, repairing only missing packed state, and reducing post-migration generation-metadata work.
Changes:
- Adds a
/full/*presence index plus lighter metadata reads to make rerun checks cheaper. - Reworks rerun behavior to resume interrupted
/main→/fullpacking and skip already-packed checkpoints. - Avoids recomputing generation metadata for freshly packed refs and limits when the archived-generation repair pass runs.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
cmd/entire/cli/strategy/generation_repair.go |
Adds repair options so callers can exclude freshly written archived refs. |
cmd/entire/cli/strategy/generation_repair_test.go |
Tests the new exclude-refs repair behavior. |
cmd/entire/cli/migrate.go |
Reworks migration rerun logic, adds /full indexing, and changes when repair runs. |
cmd/entire/cli/migrate_test.go |
Updates migration tests around interrupted reruns, no-op reruns, and new checkpoint pickup. |
cmd/entire/cli/checkpoint/v2_store_test.go |
Replaces legacy cleanup tests with index-behavior tests. |
cmd/entire/cli/checkpoint/v2_read.go |
Adds a metadata-only v2 read path for hot loops. |
cmd/entire/cli/checkpoint/v2_generation.go |
Adds in-memory transcript timestamp aggregation for packed generations. |
cmd/entire/cli/checkpoint/v2_committed.go |
Implements the /full artifact index used by migration reruns. |
ReadSessionMetadata bypassed wrapWithFetcher, unlike the other v2 readers in the same file. On treeless or partial clones (or with a stale go-git object cache), that would surface as ErrCheckpointNotFound during migration reruns even when metadata.json was reachable through the configured blob fetcher. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: b302c4eebaf6
After the rerun fast path was switched to a /full/* presence check, older v2 checkpoints with archived raw transcripts but blank transcript.jsonl on /main could only be repaired via --force. Restore the lighter recovery path: when every session already has /full artifacts but some compact transcripts are missing, rebuild them from v1 and write via UpdateCommitted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 2c4b1bc42be7
Drops the single-field RepairV2GenerationMetadataOptions struct in favor
of a plain excludeRefs slice, switches FullSessionArtifactsIndex to
map[string]struct{} (the ref-name value was unused), and tightens the
comment landscape so the gating-rationale story is told in one place
instead of three. Adds a regression test for the resume-via-packing
path that ensures root-level v1 task metadata stays attached to the
latest v2 session rather than the older session being repacked.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 2da5d3818b9b
pfleidi
reviewed
May 5, 2026
pfleidi
reviewed
May 5, 2026
pfleidi
reviewed
May 5, 2026
pfleidi
previously approved these changes
May 5, 2026
pfleidi
approved these changes
May 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
https://entire.io/gh/entireio/cli/trails/292
Summary
Fixes a few issues that arose when rerunning `entire migrate --checkpoints v2:
/full/<n>archives on supposed no-ops (one of our repos grew from gen 46 -> 86),For repos with thousands of checkpoints, the issues above meant that migrations would take 20+ minutes to run even if there was nothing to be done.
Changes
/fullartifacts, pack only the missing sessions when some are absent, fresh-migrate when the checkpoint isn't in/mainyet,--forceto redo from scratchgeneration.jsonin-memory. A newBuildFullSessionArtifactsIndexturns per-session presence checks into O(1) map lookups, andReadSessionMetadatareads onlymetadata.jsonNote
Medium Risk
Touches checkpoint migration and generation repair flows, changing when archives are created and when metadata repair runs; mistakes could skip needed packing or leave inconsistent
/full/*state. Scope is contained to migration/repair tooling, but impacts large repos and historical data layout.Overview
Improves
entire migrate --checkpoints v2rerun behavior so already-migrated checkpoints are skipped when/full/*artifacts are present, while partially migrated checkpoints (written to/mainbut missing/full/*) resume by packing only missing sessions instead of re-migrating everything.Speeds up the migration hot path by prebuilding a
BuildFullSessionArtifactsIndexfor O(1) per-session/full/*presence checks, adding a lightweightReadSessionMetadata(metadata-only) read, and computinggeneration.jsontimestamps from already-loaded transcripts viaAggregateTranscriptTimestamps.Reduces post-migration overhead by running
RepairV2GenerationMetadataonly when new/full/<n>refs were written, and addsExcludeRefssupport so freshly packed generations are not re-walked during repair. Removes legacy v1 transcript cleanup and compact-transcript backfill/fast-path logic from the migration rerun flow, with tests updated to lock in the new idempotent/resume semantics.Reviewed by Cursor Bugbot for commit 4b333b0. Configure here.