Skip to content

Store and push compact transcript.jsonl in v1 checkpoints#1419

Merged
computermode merged 6 commits into
mainfrom
push-compact-for-v1
Jun 22, 2026
Merged

Store and push compact transcript.jsonl in v1 checkpoints#1419
computermode merged 6 commits into
mainfrom
push-compact-for-v1

Conversation

@computermode

@computermode computermode commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

https://entire.io/gh/entireio/cli/trails/635

What

Committed checkpoints on entire/checkpoints/v1 now carry a compact transcript (transcript.jsonl) next to full.jsonl in each session directory, so it is pushed with the v1 branch. This is the first step: the compact transcript is generated and published, but nothing reads it yet.

The root metadata.json sessions[].transcript pointer deliberately still targets full.jsonl. Flipping the pointer to transcript.jsonl is deferred to a follow-up change, so this PR is safe for remote consumers — they keep following full.jsonl exactly as before.

How

  • GitStore.writeTranscript (initial write) and replaceTranscript (finalization via UpdateCommitted) generate the compact form with the existing transcript/compact package and record it as a single blob in the checkpoint tree.
  • The stored compact is pre-sliced to the checkpoint's own portion: compact.Compact runs with StartLine = checkpoint_transcript_start. This differs from the removed v2 (cumulative compact + compact-unit offset) deliberately — v1's checkpoint_transcript_start must keep full.jsonl units for CLI readers, and a pre-sliced file needs no offset to consume.
  • Generation is best-effort: failures are logged and never fail the checkpoint write. During finalization a failed regeneration keeps the previous transcript.jsonl.
  • CLI read paths (rewind/resume/explain) read full.jsonl by exact filename, and the metadata pointer is unchanged, so nothing in the CLI consumes the compact transcript yet.

Reviewer notes

  • Scope intentionally narrowed: an earlier revision of this PR also flipped the metadata pointer to transcript.jsonl. That has been pulled out — this PR only writes and pushes the file. Pointing metadata at it (and the corresponding server-side handling of compact-unit offsets) lands in a separate follow-up.
  • Rebased/merged on latest main. Main has since removed the v1.1 read-mirror feature (v1_custom_ref_mirror.go deleted; the checkpoint store ignores checkpoints_version), so this PR no longer references it.
  • Verified locally: mise run fmt, mise run lint (0 issues), mise run test:ci (unit + integration + e2e canary: vogon 59/59, roger-roger 4/4) — all green.

Committed checkpoint writes now generate a compact transcript
(transcript.jsonl) from the full transcript via transcript/compact and
store it in the session directory next to full.jsonl, so it is pushed
with the entire/checkpoints/v1 branch. The root metadata.json
sessions[].transcript pointer targets transcript.jsonl when it was
generated and falls back to full.jsonl otherwise (unparseable or
external-agent transcripts, checkpoints from older CLI versions).

Unlike the removed v2 (cumulative compact plus compact-unit offset),
the stored compact is pre-sliced to the checkpoint's own portion:
compact.Compact runs with StartLine = checkpoint_transcript_start.
v1's checkpoint_transcript_start must keep full.jsonl units for CLI
readers, so a pre-sliced file avoids a new offset field and is
self-describing for consumers. Generation is best-effort and never
fails the checkpoint write; finalization (UpdateCommitted) regenerates
the compact from the new content and keeps the previous one on
generation failure so the metadata pointer never dangles. CLI read
paths (rewind/resume/explain) are unaffected: they read full.jsonl by
filename, not through the metadata pointer.

Entire-Checkpoint: 63b41777384a
Copilot AI review requested due to automatic review settings June 11, 2026 07:12

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit afd9428. Configure here.

Comment thread cmd/entire/cli/checkpoint/committed.go
Comment thread cmd/entire/cli/checkpoint/committed.go Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a compact, checkpoint-scoped transcript artifact to v1 committed checkpoints and updates the checkpoint summary metadata to prefer that compact transcript when it is successfully generated, while preserving existing CLI read behavior that continues to read full.jsonl by filename.

Changes:

  • Store transcript.jsonl (compact, pre-sliced to checkpoint_transcript_start) alongside full.jsonl for each committed v1 session, best-effort.
  • Point metadata.json sessions[].transcript at transcript.jsonl when compact generation succeeds, otherwise fall back to full.jsonl.
  • Add tests and documentation clarifying the dual-transcript layout and pointer semantics.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
docs/architecture/sessions-and-checkpoints.md Documents the new dual-transcript checkpoint layout and metadata pointer behavior.
cmd/entire/cli/paths/paths.go Introduces CompactTranscriptFileName constant for transcript.jsonl.
cmd/entire/cli/checkpoint/committed.go Generates/stores compact transcripts on write/finalize and sets metadata pointer accordingly (best-effort).
cmd/entire/cli/checkpoint/committed_compact_transcript_test.go Adds unit tests for compact transcript write, scoping, fallback, and regeneration on finalize.
cmd/entire/cli/checkpoint/checkpoint.go Documents SessionFilePaths.Transcript pointer semantics and updates checkpoint tree shape comments.
CLAUDE.md Updates architecture notes for manual-commit strategy to include compact transcript behavior.

computermode and others added 2 commits June 23, 2026 01:59
# Conflicts:
#	CLAUDE.md
#	docs/architecture/sessions-and-checkpoints.md
The compact transcript (transcript.jsonl) is still generated on every
committed write and during finalization, and written into the checkpoint
tree so it is pushed alongside full.jsonl. The metadata.json
sessions[].transcript pointer now stays on full.jsonl; pointing it at the
compact transcript is deferred to a later change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Entire-Checkpoint: 029ccd754882
@computermode computermode changed the title Store compact transcript.jsonl in v1 checkpoints, point metadata at it Store and push compact transcript.jsonl in v1 checkpoints (metadata pointer unchanged) Jun 22, 2026
@computermode computermode changed the title Store and push compact transcript.jsonl in v1 checkpoints (metadata pointer unchanged) Store and push compact transcript.jsonl in v1 checkpoints Jun 22, 2026
The initial-write path (writeTranscript) sanitizes Codex transcripts via
codex.SanitizePortableTranscript before compaction, but the finalize path
(replaceTranscript) passed raw bytes to writeCompactTranscript. Because
sanitization drops compaction/encrypted lines, the checkpoint-scoped compact
(sliced by StartLine) diverged from the initial write for Codex after
UpdateCommitted.

Sanitize the compact's input on the finalize path so it matches the initial
write. The sanitization runs exactly once per path — the initial path already
passes sanitized bytes, so writeCompactTranscript no longer re-sanitizes.

Addresses Cursor Bugbot "Codex sanitize skipped on finalize".

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Entire-Checkpoint: c6722b602c45
@computermode computermode marked this pull request as ready for review June 22, 2026 18:18
@computermode computermode requested a review from a team as a code owner June 22, 2026 18:18
computermode and others added 2 commits June 22, 2026 11:34
The store-layer tests re-validated the compact line format (v, agent slug,
per-line type), which transcript/compact already covers. Drop those
assertions and the parseCompactLines helper so the tests assert only the
store's own behavior: that transcript.jsonl is written, scoped, and the
metadata pointer stays on full.jsonl. Also drop the now-unused agent blank
imports and encoding/json.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Entire-Checkpoint: 716059686529
writeTranscript returned a bool on main ("was a transcript written"). The
compact-transcript change switched it to return the pointer filename (or ""
when nothing was written) so the pointer could target transcript.jsonl. That
pointer move was deferred — the metadata pointer is always full.jsonl now — so
the string return carried no more information than the bool, just an
empty-string sentinel. Restore the bool signature; the caller uses
paths.TranscriptFileName directly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Entire-Checkpoint: f4a5bc6e74e2
@computermode

Copy link
Copy Markdown
Contributor Author

Tested with a test repo via attach. Attaching a session added both the transcript.jsonl + full.jsonl files.

@pfleidi pfleidi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@computermode computermode merged commit f3ac74b into main Jun 22, 2026
9 checks passed
@computermode computermode deleted the push-compact-for-v1 branch June 22, 2026 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants