Skip to content

Fix v2 checkpoint migration generation packing#1091

Merged
pfleidi merged 11 commits intomainfrom
fix/entire-migrate-checkpoint-order
May 1, 2026
Merged

Fix v2 checkpoint migration generation packing#1091
pfleidi merged 11 commits intomainfrom
fix/entire-migrate-checkpoint-order

Conversation

@pfleidi
Copy link
Copy Markdown
Contributor

@pfleidi pfleidi commented May 1, 2026

https://entire.io/gh/entireio/cli/trails/275

What

  • Pack migrated v1 checkpoints into v2 archived full generations chronologically, so lower generation refs contain older migrated checkpoints.
  • Repair archived v2 generation metadata for already-migrated repositories without renumbering refs or recopying transcripts.
  • Keep force migration and rerun repair paths consistent by pruning stale archived generations and repacking only missing raw full artifacts.

How

  • Sort migratable checkpoints oldest-first, write v2/main, then build archived v2/full generation refs directly from sorted batches.
  • Recompute archived generation timestamp envelopes from raw transcripts and update local/remote refs with lease-aware pushes.
  • Add cleanup and migration coverage for chronological packing, generation repair, stale archived refs, partial repairs, and idempotent reruns.

Verification

  • mise run check
  • mise run lint

Note

Medium Risk
Touches checkpoint migration/cleanup logic and rewrites archived refs/entire/checkpoints/v2/full/* refs (including optional remote pushes with --force-with-lease), so bugs could corrupt or delete retained transcript generations if edge cases are missed.

Overview
Fixes v2 checkpoint migration so migrated v1 checkpoints are packed into archived v2 /full/<n> generations in oldest-first order and /full/current is left empty/ready for post-migration writes, rather than relying on incremental writes that could invert generation ordering.

Adds a new archived generation metadata repair flow that recomputes generation.json timestamp envelopes from raw transcript timestamps (falling back to checkpoint metadata/generation.json) and can update remote-only archived refs via lease-guarded pushes; migrate --checkpoints v2 now runs this repair and fails if any generations can’t be repaired.

Updates retention cleanup to determine v2 generation age from raw transcript timestamps (with safer remote candidate tracking), refactors v2 generation timestamp computation to be reusable (ComputeGenerationTimestampsFromTrees, exported NextGenerationNumber/MergeGenerationTime), and expands tests to cover chronological packing, rerun/force pruning of archived generations, task-metadata merge behavior, and metadata repair.

Reviewed by Cursor Bugbot for commit fb658bc. Configure here.

pfleidi added 6 commits April 30, 2026 14:59
Entire-Checkpoint: c04bf934600e
Entire-Checkpoint: 6efb3e372613
Entire-Checkpoint: 11ea5830e18e
Reuse existing checkpoint-package helpers in the migration code:
export MergeGenerationTime and NextGenerationNumber and drop the
near-identical migrate.go duplicates. Replace repeated migration
author string literals with constants. Inline a single-call-site
ref-exists wrapper, fold three near-duplicate "queued missing raw
transcript" branches into one, and drop a narrating comment.

In the v2 generation repair path, replace the always-equal-to-RefOID
RemoteOID string with a HasRemote bool, and replace the per-candidate
pushTargetErr plumbing with a memoizing repairPushTarget that resolves
the remote URL lazily on first push.

Entire-Checkpoint: 2b12a9eb59c4
Copilot AI review requested due to automatic review settings May 1, 2026 00:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves v1→v2 checkpoint migration behavior by ensuring migrated v1 checkpoints are packed into archived v2 /full/* generations in chronological order, and adds a repair pass to correct archived generation timestamp metadata based on raw transcript timestamps (including remote-only scenarios).

Changes:

  • Add archived v2 generation metadata repair (generation.json) derived from raw transcript timestamp envelopes, with lease-aware remote updates.
  • Update v1→v2 migration to sort migratable checkpoints oldest-first and pack migrated raw transcripts directly into archived v2 /full/* generations (leaving /full/current empty).
  • Extend unit tests to cover chronological packing, rerun/partial repair behavior, force-migration pruning, and cleanup retention based on raw transcript timestamps.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
cmd/entire/cli/strategy/generation_repair.go Implements archived v2 generation metadata repair from raw transcript timestamps, including optional remote push with --force-with-lease.
cmd/entire/cli/strategy/generation_repair_test.go Adds coverage for local and remote-only generation repair behavior.
cmd/entire/cli/strategy/cleanup.go Switches v2 generation retention calculations to prefer raw transcript timestamp envelopes; tracks remote presence on candidates.
cmd/entire/cli/migrate.go Changes migration flow to pack raw transcripts into archived generations (chronological batching), adds generation repair invocation + reporting, and updates force-prune behavior for archived generations.
cmd/entire/cli/migrate_test.go Adds tests for chronological generation packing, rerun packing of missing artifacts, force-prune recomputation, and metadata repair reporting.
cmd/entire/cli/clean_test.go Updates retention test to validate eligibility based on raw transcript timestamps; adds helper to create archived generations with raw transcripts.
cmd/entire/cli/checkpoint/v2_generation.go Adds ComputeGenerationRawTranscriptTimestamps, exports MergeGenerationTime and NextGenerationNumber, and uses raw transcript envelopes where appropriate.
cmd/entire/cli/checkpoint/v2_generation_test.go Updates tests for exported NextGenerationNumber and adds coverage ensuring raw transcript timestamp computation ignores /main metadata.
CLAUDE.md Documents the new cleanup and generation repair components.

Comment thread cmd/entire/cli/migrate.go
Comment thread cmd/entire/cli/migrate.go
Comment thread cmd/entire/cli/migrate.go Outdated
Comment thread cmd/entire/cli/migrate.go Outdated
pfleidi added 3 commits April 30, 2026 17:19
Share a single private walker between ComputeGenerationCheckpointTimestamps
and ComputeGenerationRawTranscriptTimestamps; the only behavioral difference
is whether the v2 /main metadata tree is consulted before the raw transcript
fallback.

Stream packing during migration: a new generationPacker buffers up to
DefaultMaxCheckpointsPerGeneration migrated checkpoints and flushes a single
archived /full/<n> ref each time the buffer fills, so peak heap stays bounded
by one batch worth of transcripts. The next generation number is resolved
lazily on first flush so force-migration prune steps that remove existing
archived refs are visible before we pick the next slot.

Have repairOneV2GenerationMetadata fall back to ComputeGenerationCheckpointTimestamps
when raw-transcript timestamps return found=false, matching the prune-side
addRecomputedGenerationJSON helper. On remote push failure (or push-target
resolution failure), restore the local ref to its pre-repair commit so local
does not silently diverge from origin.

Extract a shared pushWithLease helper used by both deleteRemoteRef and
pushRepairedV2Generation.

Replace the &migratedFullCheckpoint{} no-work sentinel returned from
migrateOneCheckpoint's idempotent backfill path with an errNoFullArtifactsToPack
sentinel error, matching the existing errAlreadyMigrated / errNoMigratableSessions
pattern.

Add unit tests for sortMigratableCheckpoints (chronological, tie-break,
zero-time placement) and for RepairV2GenerationMetadata's no-op paths
(no candidates, already-correct generation.json).

Entire-Checkpoint: 878a1a9997a1
Entire-Checkpoint: a3e7718a054d
@pfleidi pfleidi requested a review from Copilot May 1, 2026 01:16
@pfleidi
Copy link
Copy Markdown
Contributor Author

pfleidi commented May 1, 2026

Bugbot run

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes v1→v2 checkpoint migration so migrated raw full transcripts are packed into archived refs/entire/checkpoints/v2/full/<generation> refs in chronological order, and adds a repair pass that can recompute generation.json timestamp envelopes from raw transcripts (including lease-guarded remote updates). It also updates v2 generation retention cleanup to use raw transcript timestamps as the primary source of truth, with expanded test coverage around packing, reruns, pruning, and repair.

Changes:

  • Pack migrated v1 checkpoints into archived v2 full generations oldest-first (and keep v2/full/current empty post-migration).
  • Add archived generation metadata repair that rewrites generation.json from raw transcript timestamps and optionally pushes repairs with --force-with-lease.
  • Update generation-retention eligibility to prefer raw transcript timestamps; export helpers (NextGenerationNumber, MergeGenerationTime) and add extensive tests.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
cmd/entire/cli/strategy/generation_repair.go Adds archived v2 generation.json repair + optional lease-aware remote push.
cmd/entire/cli/strategy/generation_repair_test.go Tests local and remote-only repair flows and correctness of rewritten metadata.
cmd/entire/cli/strategy/cleanup.go Switches retention timestamp derivation to raw transcripts; adds pushWithLease helper + remote awareness on candidates.
cmd/entire/cli/migrate.go Implements incremental generation packing, raw-timestamp envelopes, force-prune updates for archives, and integrates repair pass.
cmd/entire/cli/migrate_test.go Adds/updates tests for chronological packing, reruns, force pruning, repair invocation, and task metadata behavior.
cmd/entire/cli/clean_test.go Adds helper to create archived generations with raw transcripts; updates retention test to use raw timestamps.
cmd/entire/cli/checkpoint/v2_generation.go Exports NextGenerationNumber/MergeGenerationTime; introduces ComputeGenerationTimestampsFromTrees.
cmd/entire/cli/checkpoint/v2_generation_test.go Updates tests for exported generation numbering and validates new timestamp computation behavior.
CLAUDE.md Updates strategy-file index to include cleanup + generation repair files.

Comment thread cmd/entire/cli/strategy/cleanup.go
Comment thread cmd/entire/cli/migrate.go
Comment thread cmd/entire/cli/migrate.go Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit fb658bc. Configure here.

pfleidi added 2 commits April 30, 2026 18:28
writeMigratedFullGeneration is now a caller of this method while it is
itself producing generation.json, so the previous "fall back to
generation.json" guidance no longer fits all callers. Describe the
contract neutrally so callers pick their own fallback.

Entire-Checkpoint: 79e86fdf1a18
Entire-Checkpoint: ac5a0a85d185
@pfleidi pfleidi marked this pull request as ready for review May 1, 2026 17:50
@pfleidi pfleidi requested a review from a team as a code owner May 1, 2026 17:50
@pfleidi pfleidi merged commit 7b871ec into main May 1, 2026
9 checks passed
@pfleidi pfleidi deleted the fix/entire-migrate-checkpoint-order branch May 1, 2026 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants