feat(checkpoint): point metadata.json at full compact transcript.jsonl#1510
feat(checkpoint): point metadata.json at full compact transcript.jsonl#1510computermode wants to merge 1 commit into
Conversation
The root metadata.json sessions[].transcript pointer now prefers the compact transcript.jsonl over the raw full.jsonl. The pointer is derived from the actual tree entries so it can never dangle: when the best-effort compact transcript is skipped (compaction failure, empty/oversized output, older checkpoints) it falls back to full.jsonl. UpdateCommitted (deferred finalization) re-derives the pointer too. CLI read paths resolve transcripts by filename and ignore the pointer; it is consumed by external readers. The compact transcript now mirrors full.jsonl: it is stored in full (never trimmed) instead of pre-sliced per checkpoint. Each checkpoint's start within it is recorded as a new compacted-line offset, compact_transcript_start, in the session metadata. This is distinct from checkpoint_transcript_start (which indexes the raw full.jsonl) because compaction merges/drops lines, so the raw offset cannot index the compacted file. The offset is computed at write time by subtracting the checkpoint's own compacted line count from the full compact's. Consumers (entire.io server) must slice transcript.jsonl numerically by compact_transcript_start and must not re-apply checkpoint_transcript_start slicing; that server change is a prerequisite before downstream reads switch to the new pointer. Entire-Checkpoint: 61d5c3e264d3
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Want fixes drafted automatically? Bugbot Autofix can create code changes for findings. A team admin can enable Autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit da4f65e. Configure here.
| compactStart := s.writeCompactTranscript(ctx, agentType, startLine, compactBytes, sessionPath, entries) | ||
|
|
||
| return nil | ||
| return compactStart, true, nil |
There was a problem hiding this comment.
Stale compact after finalize failure
High Severity
During UpdateCommitted, replaceTranscript refreshes full.jsonl but leaves a prior transcript.jsonl tree entry when compact regeneration fails or is skipped. transcriptPointer still prefers that blob, and metadata can record a new compact_transcript_start against content that no longer matches the updated raw transcript—so server/UI consumers following the root pointer can read stale compact data.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit da4f65e. Configure here.
There was a problem hiding this comment.
Pull request overview
This PR updates the committed checkpoint metadata contract so root metadata.json sessions[].transcript prefers the compact transcript.jsonl when available, while also changing compact transcript storage to be full (never trimmed) and introducing a new per-checkpoint compact-line offset (compact_transcript_start) for consumers to slice correctly.
Changes:
- Flip
sessions[].transcriptto point attranscript.jsonlwhen present (fallback tofull.jsonl), derived from tree entries to avoid dangling pointers. - Store
transcript.jsonlas a full compact transcript for the whole turn and addcompact_transcript_startto per-sessionmetadata.json. - Update and extend tests + docs to reflect the new pointer behavior and offset semantics.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| docs/architecture/sessions-and-checkpoints.md | Updates architecture docs for full compact transcript storage + new offset + pointer preference. |
| cmd/entire/cli/checkpoint/committed.go | Implements pointer derivation, full compact transcript write, offset computation, and UpdateCommitted re-derivation logic. |
| cmd/entire/cli/checkpoint/committed_compact_transcript_test.go | Updates/extends tests for pointer flip, full compact storage, and compact_transcript_start slicing behavior. |
| cmd/entire/cli/checkpoint/checkpoint.go | Adds CompactTranscriptStart to CommittedMetadata and updates checkpoint layout comments. |
| cmd/entire/cli/checkpoint/checkpoint_test.go | Clarifies helper contract and adds a targeted nolint for generalized session-index helper. |
| CLAUDE.md | Updates repo/strategy documentation to reflect the new transcript pointer and offset contract. |
| if agentType == agent.AgentTypeCodex { | ||
| compactBytes = codex.SanitizePortableTranscript(compactBytes) | ||
| } | ||
| s.writeCompactTranscript(ctx, agentType, startLine, compactBytes, sessionPath, entries) | ||
| compactStart := s.writeCompactTranscript(ctx, agentType, startLine, compactBytes, sessionPath, entries) | ||
|
|
||
| return nil | ||
| return compactStart, true, nil |


What
Makes the root
metadata.jsonsessions[].transcriptpointer prefer the compacttranscript.jsonlover the rawfull.jsonl, and changes how the compact transcript is stored so it mirrorsfull.jsonl.Two coupled changes:
Pointer flip (no dangling).
sessions[].transcriptnow points attranscript.jsonlwhen it was written, falling back tofull.jsonlotherwise. The value is derived from the actual tree entries via atranscriptPointerhelper, so it can never reference a missing file — the compact transcript is best-effort and is skipped on compaction failure, empty/oversized output, or for checkpoints written by older CLI versions.Full compact transcript + offset.
transcript.jsonlis now stored in full (never trimmed), identical for every checkpoint/session in a turn, mirroringfull.jsonl. Each checkpoint's start within it is recorded as a new compacted-line offset,compact_transcript_start, in the sessionmetadata.json. This is distinct fromcheckpoint_transcript_start(which indexes the rawfull.jsonl) because compaction merges/drops lines, so the raw offset cannot index the compacted file.How
writeCompactTranscriptcompacts withStartLine: 0(full) and returns the offset viacompactStartOffset=countLines(full) − countLines(scoped), wherescopedis the compaction of the checkpoint's own portion (agent-correct, since it reuses each agent'sStartLinehandling). The offset is threaded out throughwriteTranscript→writeSessionToSubdirectory.UpdateCommitted(deferred finalization) re-derives both the offset (viareplaceTranscript, which now reports whether it regenerated vs. short-circuited) and the root pointer. Session metadata is updated through a sharedmutateSessionMetadatahelper (replaceSkillEventswas refactored onto it).Reviewer / deployment notes
transcript.jsonlnumerically bycompact_transcript_startand must not re-applycheckpoint_transcript_startslicing (full.jsonl-unit, wrong against compact content). This must land before downstream reads switch to the new pointer.agent.MaxChunkSizethan the old sliced version → skipped (best-effort), pointer falls back tofull.jsonl; the compact transcript is not chunked. Mid-turn checkpoints now compact twice (full + scoped) to derive the offset.Verification
mise run checkpasses — fmt, lint (0 issues), unit tests, 59 integration tests, and the E2E canary.Rendered Markdown
Note
Medium Risk
Changes checkpoint metadata contract and transcript layout for server/UI consumers; offset miscalculation would over-include or drop content, though CLI paths are unchanged and pointer derivation avoids dangling refs.
Overview
Committed checkpoints now treat
transcript.jsonllikefull.jsonl: the compact file is stored in full (not pre-trimmed per checkpoint), and each session recordscompact_transcript_startinmetadata.jsonso external readers slice the compact file by compact-line offset—not bycheckpoint_transcript_start(rawfull.jsonllines).Root
sessions[].transcriptnow preferstranscript.jsonlwhen that blob exists in the tree, viatranscriptPointer(falls back tofull.jsonlon compaction skip/failure or older checkpoints).writeCompactTranscriptcompacts withStartLine: 0and derives the offset withcompactStartOffset(full line count minus scoped compaction).UpdateCommittedupdates that offset when transcripts regenerate, refreshes the root pointer after deferred finalize, and usesmutateSessionMetadatafor session JSON edits.CLI rewind/resume/explain still read
full.jsonlby filename and ignore the pointer. Docs and tests reflect the new storage and pointer behavior.Reviewed by Cursor Bugbot for commit da4f65e. Configure here.