Skip to content

Checkpoints V2: Support checkpoint_transcript_start for compact transcript.jsonl files#877

Draft
computermode wants to merge 39 commits intomainfrom
transcript-start-at-metadata
Draft

Checkpoints V2: Support checkpoint_transcript_start for compact transcript.jsonl files#877
computermode wants to merge 39 commits intomainfrom
transcript-start-at-metadata

Conversation

@computermode
Copy link
Copy Markdown
Contributor

@computermode computermode commented Apr 8, 2026

Ensures we have checkpoint_transcript_start in the metadata.json for the compact transcript.jsonl files.

Also updates the migrate command to respect the start lines (previously, if I migrated, then added v1 checkpoints and migrated again, it wouldn't calculate checkpoint_transcript_start).

I tested this by creating multiple commits and checkpoints with a Claude session and inspecting the v2 refs to ensure that the checkpoint_transcript_start lines were added to the metadata.json file as expected.

Testing with Codex next...


Note

Medium Risk
Adjusts how checkpoint_transcript_start is computed and stored for v2 /main and during migration, which can affect transcript scoping and downstream consumers if offsets are wrong. Changes are localized and covered by new tests, but touch persistence/state and migration behavior.

Overview
Ensures v2 /main metadata writes checkpoint_transcript_start in the compact transcript.jsonl line domain by adding CompactTranscriptStart to WriteCommittedOptions and using it in v2_committed.go.

Updates manual-commit condensation and session state to track compact_transcript_start, compute/fallback the compact offset when missing, and advance it after each condensation so subsequent checkpoints are correctly scoped.

Improves migrate --checkpoints v2 to compute and persist compact transcript start offsets when generating transcript.jsonl, with new unit tests validating v2 metadata uses the compact offset (not full.jsonl offsets) across write, condense, and migration paths.

Reviewed by Cursor Bugbot for commit c12965c. Configure here.

peyton-alt and others added 30 commits March 30, 2026 18:45
Pre-session dirty files (CLI config files from `entire enable`, leftover
changes from previous sessions) were incorrectly counted as human
contributions, deflating agent percentage.

Root cause: PA1 (first prompt attribution) captures worktree state at
session start. This data was used to correct agent line counts (correct)
but also added to human contributions (wrong).

Fix:
- Split prompt attributions into baseline (PA1) and session (PA2+)
- PA1 data still subtracted from agent work (correct agent calc)
- PA1 contributions excluded from relevantAccumulatedUser
- PA1 removals excluded from totalUserRemoved
- Include PendingPromptAttribution during condensation for agents
  that skip SaveStep (e.g., Codex mid-turn commits)
- Add .entire/ filter to attribution calc (matches existing PA filter)
- Fix wrapcheck lint errors in updateCombinedAttributionForCheckpoint

Verified end-to-end: 100% agent with config files committed alongside.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: b0cb4216f6bc
…ibution

Checkpoint package changes required by the attribution baseline fix:
- PromptAttributionsJSON field on WriteCommittedOptions and CommittedMetadata
- UpdateCheckpointSummary method on GitStore for multi-session aggregation
- CombinedAttribution field on CheckpointSummary
- Preserve existing CombinedAttribution during summary rewrites

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: b8963737336c
…arentCommitHash

Fixes all 4 issues from Copilot and Cursor Bugbot review:

1. Precompute parentCommitHash on postCommitActionHandler struct
   using ParentHashes[0] (avoids extra object read, no silent error)
2. Remove duplicated 6-line parentCommitHash computation from
   HandleCondense and HandleCondenseIfFilesTouched
3. Thread parentTree through condenseOpts/attributionOpts and use it
   for non-agent file line counting — ensures diffLines uses parent→HEAD
   (consistent with parentCommitHash file scoping) instead of
   sessionBase→HEAD which over-counted intermediate commit changes
4. Add ParentTreeForNonAgentLines test proving the fix (TDD verified:
   HumanAdded=8 without fix → HumanAdded=3 with fix)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 12f5c4373467
Three fixes for multi-session attribution:

1. Cross-session file exclusion: Thread allAgentFiles (union of all
   sessions' FilesTouched) through the attribution pipeline. Files
   created by other agent sessions are no longer counted as human work.

2. Exclude .entire/ from commit session fallback: When the commit
   session has no FilesTouched and falls back to all committed files,
   filter out .entire/ metadata created by `entire enable`.

3. PA1 baseline uses base tree for new sessions: New sessions
   (StepCount == 0) always diff against the base commit tree, not
   the shared shadow branch which may contain other sessions' state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 209a37190167
Entire-Checkpoint: 3790cba265e6
Entire-Checkpoint: c9595c52ab4a
Entire-Checkpoint: 9f07aeebbf93
Entire-Checkpoint: f1c37c8efc47
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tering

- Test AllAgentFiles cross-session exclusion in CalculateAttributionWithAccumulated
- Test committedFilesExcludingMetadata filters .entire/ paths

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The combined_attribution field now diffs parent→HEAD once and classifies
files as agent vs human based on the union of sessions with real
checkpoints (SaveStep ran). Filters .entire/ and .claude/ config paths.

Also adds ReadSessionMetadata for lightweight per-session metadata reads.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mmit-inflation

Fix attribution inflation from intermediate commits
don't show multiple spaces for codex single line start message rendering
Entire-Checkpoint: 36db97269a69
Entire-Checkpoint: 93066e1dac3c
Entire-Checkpoint: 4fdb72622b7f
Entire-Checkpoint: 730e93f6b572
Initialize compact transcript offsets from existing checkpoint offsets during state normalization and add tests to preserve migration behavior.

Made-with: Cursor
Entire-Checkpoint: 4678bd55995f
Checkpoints V2: add migration option
Copilot AI review requested due to automatic review settings April 8, 2026 19:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for correctly tracking checkpoint_transcript_start for v2 compact transcript.jsonl artifacts by introducing a compact-transcript-specific offset and propagating it through condensation and migration.

Changes:

  • Introduces CompactTranscriptStart/CompactTranscriptLines plumbing to track compact transcript offsets separately from full.jsonl offsets.
  • Updates v2 /main committed metadata writing to use the compact transcript start offset for checkpoint_transcript_start.
  • Enhances migrate to compute and persist compact transcript offsets when generating transcript.jsonl.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
cmd/entire/cli/strategy/manual_commit_types.go Extends condensation result to report compact transcript line counts.
cmd/entire/cli/strategy/manual_commit_test.go Adds coverage ensuring v2 /main writes checkpoint_transcript_start correctly for compact transcripts.
cmd/entire/cli/strategy/manual_commit_hooks.go Updates session state to advance/reset compact transcript offsets across condensations/carry-forward.
cmd/entire/cli/strategy/manual_commit_condensation.go Plumbs compact transcript start into write options and computes compact transcript line deltas.
cmd/entire/cli/session/state.go Adds persisted compact_transcript_start to session state with legacy backfill behavior.
cmd/entire/cli/session/state_test.go Adds tests for CompactTranscriptStart normalization/backfill and JSON round-trip.
cmd/entire/cli/migrate.go Computes compact transcript offsets during migration and stores them in v2 write options.
cmd/entire/cli/checkpoint/v2_store_test.go Tests that v2 /main metadata uses CompactTranscriptStart for checkpoint_transcript_start.
cmd/entire/cli/checkpoint/v2_committed.go Switches v2 /main metadata field to use compact transcript start offset.
cmd/entire/cli/checkpoint/checkpoint.go Adds CompactTranscriptStart to WriteCommittedOptions for v2 metadata writing.

Entire-Checkpoint: 92926498c799
@computermode
Copy link
Copy Markdown
Contributor Author

bugbot review

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit c12965c. Configure here.

Entire-Checkpoint: 3e7abfc2d4a5
Entire-Checkpoint: 1b4ddd35692a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

5 participants