Store and push compact transcript.jsonl in v1 checkpoints#1419
Conversation
Committed checkpoint writes now generate a compact transcript (transcript.jsonl) from the full transcript via transcript/compact and store it in the session directory next to full.jsonl, so it is pushed with the entire/checkpoints/v1 branch. The root metadata.json sessions[].transcript pointer targets transcript.jsonl when it was generated and falls back to full.jsonl otherwise (unparseable or external-agent transcripts, checkpoints from older CLI versions). Unlike the removed v2 (cumulative compact plus compact-unit offset), the stored compact is pre-sliced to the checkpoint's own portion: compact.Compact runs with StartLine = checkpoint_transcript_start. v1's checkpoint_transcript_start must keep full.jsonl units for CLI readers, so a pre-sliced file avoids a new offset field and is self-describing for consumers. Generation is best-effort and never fails the checkpoint write; finalization (UpdateCommitted) regenerates the compact from the new content and keeps the previous one on generation failure so the metadata pointer never dangles. CLI read paths (rewind/resume/explain) are unaffected: they read full.jsonl by filename, not through the metadata pointer. Entire-Checkpoint: 63b41777384a
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit afd9428. Configure here.
There was a problem hiding this comment.
Pull request overview
Adds a compact, checkpoint-scoped transcript artifact to v1 committed checkpoints and updates the checkpoint summary metadata to prefer that compact transcript when it is successfully generated, while preserving existing CLI read behavior that continues to read full.jsonl by filename.
Changes:
- Store
transcript.jsonl(compact, pre-sliced tocheckpoint_transcript_start) alongsidefull.jsonlfor each committed v1 session, best-effort. - Point
metadata.jsonsessions[].transcriptattranscript.jsonlwhen compact generation succeeds, otherwise fall back tofull.jsonl. - Add tests and documentation clarifying the dual-transcript layout and pointer semantics.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| docs/architecture/sessions-and-checkpoints.md | Documents the new dual-transcript checkpoint layout and metadata pointer behavior. |
| cmd/entire/cli/paths/paths.go | Introduces CompactTranscriptFileName constant for transcript.jsonl. |
| cmd/entire/cli/checkpoint/committed.go | Generates/stores compact transcripts on write/finalize and sets metadata pointer accordingly (best-effort). |
| cmd/entire/cli/checkpoint/committed_compact_transcript_test.go | Adds unit tests for compact transcript write, scoping, fallback, and regeneration on finalize. |
| cmd/entire/cli/checkpoint/checkpoint.go | Documents SessionFilePaths.Transcript pointer semantics and updates checkpoint tree shape comments. |
| CLAUDE.md | Updates architecture notes for manual-commit strategy to include compact transcript behavior. |
# Conflicts: # CLAUDE.md # docs/architecture/sessions-and-checkpoints.md
The compact transcript (transcript.jsonl) is still generated on every committed write and during finalization, and written into the checkpoint tree so it is pushed alongside full.jsonl. The metadata.json sessions[].transcript pointer now stays on full.jsonl; pointing it at the compact transcript is deferred to a later change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Entire-Checkpoint: 029ccd754882
The initial-write path (writeTranscript) sanitizes Codex transcripts via codex.SanitizePortableTranscript before compaction, but the finalize path (replaceTranscript) passed raw bytes to writeCompactTranscript. Because sanitization drops compaction/encrypted lines, the checkpoint-scoped compact (sliced by StartLine) diverged from the initial write for Codex after UpdateCommitted. Sanitize the compact's input on the finalize path so it matches the initial write. The sanitization runs exactly once per path — the initial path already passes sanitized bytes, so writeCompactTranscript no longer re-sanitizes. Addresses Cursor Bugbot "Codex sanitize skipped on finalize". Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Entire-Checkpoint: c6722b602c45
The store-layer tests re-validated the compact line format (v, agent slug, per-line type), which transcript/compact already covers. Drop those assertions and the parseCompactLines helper so the tests assert only the store's own behavior: that transcript.jsonl is written, scoped, and the metadata pointer stays on full.jsonl. Also drop the now-unused agent blank imports and encoding/json. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Entire-Checkpoint: 716059686529
writeTranscript returned a bool on main ("was a transcript written"). The
compact-transcript change switched it to return the pointer filename (or ""
when nothing was written) so the pointer could target transcript.jsonl. That
pointer move was deferred — the metadata pointer is always full.jsonl now — so
the string return carried no more information than the bool, just an
empty-string sentinel. Restore the bool signature; the caller uses
paths.TranscriptFileName directly.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Entire-Checkpoint: f4a5bc6e74e2
|
Tested with a test repo via attach. Attaching a session added both the transcript.jsonl + full.jsonl files. |

https://entire.io/gh/entireio/cli/trails/635
What
Committed checkpoints on
entire/checkpoints/v1now carry a compact transcript (transcript.jsonl) next tofull.jsonlin each session directory, so it is pushed with the v1 branch. This is the first step: the compact transcript is generated and published, but nothing reads it yet.The root
metadata.jsonsessions[].transcriptpointer deliberately still targetsfull.jsonl. Flipping the pointer totranscript.jsonlis deferred to a follow-up change, so this PR is safe for remote consumers — they keep followingfull.jsonlexactly as before.How
GitStore.writeTranscript(initial write) andreplaceTranscript(finalization viaUpdateCommitted) generate the compact form with the existingtranscript/compactpackage and record it as a single blob in the checkpoint tree.compact.Compactruns withStartLine = checkpoint_transcript_start. This differs from the removed v2 (cumulative compact + compact-unit offset) deliberately — v1'scheckpoint_transcript_startmust keepfull.jsonlunits for CLI readers, and a pre-sliced file needs no offset to consume.transcript.jsonl.full.jsonlby exact filename, and the metadata pointer is unchanged, so nothing in the CLI consumes the compact transcript yet.Reviewer notes
transcript.jsonl. That has been pulled out — this PR only writes and pushes the file. Pointing metadata at it (and the corresponding server-side handling of compact-unit offsets) lands in a separate follow-up.main. Main has since removed the v1.1 read-mirror feature (v1_custom_ref_mirror.godeleted; the checkpoint store ignorescheckpoints_version), so this PR no longer references it.mise run fmt,mise run lint(0 issues),mise run test:ci(unit + integration + e2e canary: vogon 59/59, roger-roger 4/4) — all green.