Skip to content

feat(checkpoint): point metadata.json at full compact transcript.jsonl#1510

Closed
computermode wants to merge 1 commit into
mainfrom
feat/metadata-point-transcript-jsonl
Closed

feat(checkpoint): point metadata.json at full compact transcript.jsonl#1510
computermode wants to merge 1 commit into
mainfrom
feat/metadata-point-transcript-jsonl

Conversation

@computermode

@computermode computermode commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

What

Makes the root metadata.json sessions[].transcript pointer prefer the compact transcript.jsonl over the raw full.jsonl, and changes how the compact transcript is stored so it mirrors full.jsonl.

Two coupled changes:

  1. Pointer flip (no dangling). sessions[].transcript now points at transcript.jsonl when it was written, falling back to full.jsonl otherwise. The value is derived from the actual tree entries via a transcriptPointer helper, so it can never reference a missing file — the compact transcript is best-effort and is skipped on compaction failure, empty/oversized output, or for checkpoints written by older CLI versions.

  2. Full compact transcript + offset. transcript.jsonl is now stored in full (never trimmed), identical for every checkpoint/session in a turn, mirroring full.jsonl. Each checkpoint's start within it is recorded as a new compacted-line offset, compact_transcript_start, in the session metadata.json. This is distinct from checkpoint_transcript_start (which indexes the raw full.jsonl) because compaction merges/drops lines, so the raw offset cannot index the compacted file.

How

  • writeCompactTranscript compacts with StartLine: 0 (full) and returns the offset via compactStartOffset = countLines(full) − countLines(scoped), where scoped is the compaction of the checkpoint's own portion (agent-correct, since it reuses each agent's StartLine handling). The offset is threaded out through writeTranscriptwriteSessionToSubdirectory.
  • UpdateCommitted (deferred finalization) re-derives both the offset (via replaceTranscript, which now reports whether it regenerated vs. short-circuited) and the root pointer. Session metadata is updated through a shared mutateSessionMetadata helper (replaceSkillEvents was refactored onto it).
  • CLI read paths (rewind/resume/explain) resolve transcripts by filename and ignore the pointer.

Reviewer / deployment notes

  • Server prerequisite (entire.io): consumers must slice transcript.jsonl numerically by compact_transcript_start and must not re-apply checkpoint_transcript_start slicing (full.jsonl-unit, wrong against compact content). This must land before downstream reads switch to the new pointer.
  • Behavior: the full compact is more likely to exceed agent.MaxChunkSize than the old sliced version → skipped (best-effort), pointer falls back to full.jsonl; the compact transcript is not chunked. Mid-turn checkpoints now compact twice (full + scoped) to derive the offset.

Verification

mise run check passes — fmt, lint (0 issues), unit tests, 59 integration tests, and the E2E canary.

Rendered Markdown


Note

Medium Risk
Changes checkpoint metadata contract and transcript layout for server/UI consumers; offset miscalculation would over-include or drop content, though CLI paths are unchanged and pointer derivation avoids dangling refs.

Overview
Committed checkpoints now treat transcript.jsonl like full.jsonl: the compact file is stored in full (not pre-trimmed per checkpoint), and each session records compact_transcript_start in metadata.json so external readers slice the compact file by compact-line offset—not by checkpoint_transcript_start (raw full.jsonl lines).

Root sessions[].transcript now prefers transcript.jsonl when that blob exists in the tree, via transcriptPointer (falls back to full.jsonl on compaction skip/failure or older checkpoints). writeCompactTranscript compacts with StartLine: 0 and derives the offset with compactStartOffset (full line count minus scoped compaction). UpdateCommitted updates that offset when transcripts regenerate, refreshes the root pointer after deferred finalize, and uses mutateSessionMetadata for session JSON edits.

CLI rewind/resume/explain still read full.jsonl by filename and ignore the pointer. Docs and tests reflect the new storage and pointer behavior.

Reviewed by Cursor Bugbot for commit da4f65e. Configure here.

The root metadata.json sessions[].transcript pointer now prefers the compact
transcript.jsonl over the raw full.jsonl. The pointer is derived from the
actual tree entries so it can never dangle: when the best-effort compact
transcript is skipped (compaction failure, empty/oversized output, older
checkpoints) it falls back to full.jsonl. UpdateCommitted (deferred
finalization) re-derives the pointer too. CLI read paths resolve transcripts
by filename and ignore the pointer; it is consumed by external readers.

The compact transcript now mirrors full.jsonl: it is stored in full (never
trimmed) instead of pre-sliced per checkpoint. Each checkpoint's start within
it is recorded as a new compacted-line offset, compact_transcript_start, in the
session metadata. This is distinct from checkpoint_transcript_start (which
indexes the raw full.jsonl) because compaction merges/drops lines, so the raw
offset cannot index the compacted file. The offset is computed at write time by
subtracting the checkpoint's own compacted line count from the full compact's.

Consumers (entire.io server) must slice transcript.jsonl numerically by
compact_transcript_start and must not re-apply checkpoint_transcript_start
slicing; that server change is a prerequisite before downstream reads switch to
the new pointer.

Entire-Checkpoint: 61d5c3e264d3
Copilot AI review requested due to automatic review settings June 23, 2026 23:01

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Want fixes drafted automatically? Bugbot Autofix can create code changes for findings. A team admin can enable Autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit da4f65e. Configure here.

compactStart := s.writeCompactTranscript(ctx, agentType, startLine, compactBytes, sessionPath, entries)

return nil
return compactStart, true, nil

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale compact after finalize failure

High Severity

During UpdateCommitted, replaceTranscript refreshes full.jsonl but leaves a prior transcript.jsonl tree entry when compact regeneration fails or is skipped. transcriptPointer still prefers that blob, and metadata can record a new compact_transcript_start against content that no longer matches the updated raw transcript—so server/UI consumers following the root pointer can read stale compact data.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit da4f65e. Configure here.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the committed checkpoint metadata contract so root metadata.json sessions[].transcript prefers the compact transcript.jsonl when available, while also changing compact transcript storage to be full (never trimmed) and introducing a new per-checkpoint compact-line offset (compact_transcript_start) for consumers to slice correctly.

Changes:

  • Flip sessions[].transcript to point at transcript.jsonl when present (fallback to full.jsonl), derived from tree entries to avoid dangling pointers.
  • Store transcript.jsonl as a full compact transcript for the whole turn and add compact_transcript_start to per-session metadata.json.
  • Update and extend tests + docs to reflect the new pointer behavior and offset semantics.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
docs/architecture/sessions-and-checkpoints.md Updates architecture docs for full compact transcript storage + new offset + pointer preference.
cmd/entire/cli/checkpoint/committed.go Implements pointer derivation, full compact transcript write, offset computation, and UpdateCommitted re-derivation logic.
cmd/entire/cli/checkpoint/committed_compact_transcript_test.go Updates/extends tests for pointer flip, full compact storage, and compact_transcript_start slicing behavior.
cmd/entire/cli/checkpoint/checkpoint.go Adds CompactTranscriptStart to CommittedMetadata and updates checkpoint layout comments.
cmd/entire/cli/checkpoint/checkpoint_test.go Clarifies helper contract and adds a targeted nolint for generalized session-index helper.
CLAUDE.md Updates repo/strategy documentation to reflect the new transcript pointer and offset contract.

Comment on lines 1905 to +1910
if agentType == agent.AgentTypeCodex {
compactBytes = codex.SanitizePortableTranscript(compactBytes)
}
s.writeCompactTranscript(ctx, agentType, startLine, compactBytes, sessionPath, entries)
compactStart := s.writeCompactTranscript(ctx, agentType, startLine, compactBytes, sessionPath, entries)

return nil
return compactStart, true, nil
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants