Skip to content

glebmish/claude-code-replay

Repository files navigation

claude-code-replay

CI License: MIT Node

Replay Claude Code session logs (*.jsonl) to reconstruct the lost project state — file by file, commit by commit, in the order events happened. The tool of last resort when a destructive command wiped the tree.

How it works

There are two replay layers, and the second is opt-in:

  1. Deterministic replay — walks every *.jsonl under --logs-dir (including subagent JSONLs under <session>/subagents/) in strict chronological order and applies file writes.
  2. Claude classifier (opt-in, --enable-llm-classifier) — every Bash event would otherwise be skipped. With the classifier on, each Bash event is sent to Claude (Sonnet 4.6) which decides execute or skip per event, with reasons. See How the classifier works.

Install

Prerequisites:

  • Node 20 or newer.
  • Claude Code CLI installed and authenticated (claude login) — only needed for the classifier (--enable-llm-classifier). The classifier reuses that auth via the Claude Agent SDK, so no separate Anthropic API key is needed.

Run without installing:

npx claude-code-replay --target … --source-root … [flags]

Or install globally:

npm install -g claude-code-replay
claude-code-replay --target … --source-root … [flags]

Or, build from source — clone, npm install, then either npm run replay -- <flags> directly or npm link to expose it as claude-code-replay on your PATH.

Quick start

claude-code-replay \
  --target      /tmp/myrepo-recovered/ \
  --source-root /Users/you/projects/myrepo \
  --enable-llm-classifier

What you'll see on stdout (from a real 304-event replay):

INFO collecting events from /Users/you/.claude/projects/-Users-you-projects-myrepo
INFO collected 304 events
INFO building snapshot index from file-history-snapshot entries
INFO snapshot index covers 43 paths
INFO classifier: 4 batch(es) over 208 payload events (117 Bash, 91 context); sizes=[71,64,59,14]
INFO classifier model=claude-sonnet-4-6, mode=base, source-roots=1
INFO classifier batch 1/4 cache hit
INFO classifier batch 2/4 cache hit
INFO classifier batch 3/4 cache hit
INFO classifier batch 4/4 cache hit
INFO classifier returned 208 decisions
=== claude-code-replay summary ===
events total:        304
  replayed:          64 (of 64)
  skipped:           240
bash executed:       25 of 25
classifier batches:  4 (4 cached, 0 live)
halted:              no
elapsed:             3.70s
target files:        732   (8519381 bytes total)

The summary omits rows that would be zero on a typical run (overrides, cwd-filtered Bash, snapshot heals, lenient-read skips). Per-event CLASSIFY / APPLY / CHECK traces and detailed classifier diagnostics are gated behind --debug. Real errors (argv parse failures, classifier API errors) go to stderr; this run log goes to stdout so you can pipe it without losing diagnostics.

Exit codes: 0 success, 2 argv error, 10 halted on command failure.

Flags

Required

  • --target <path> — directory the replay writes into. Must be distinct from every logs dir (in either direction); replayed rm -rf . could otherwise destroy the logs mid-run.
  • --source-root <path> — original absolute cwd from the session. Compared verbatim against event.cwd in the logs, so it must match character-for-character (no relative paths, no symlink-resolved paths). Repeatable for sessions that moved across roots.

Replay window

  • --logs-dir <path> — directory containing the session *.jsonl files (and any <session>/subagents/ JSONLs). Optional, repeatable. By default, one logs dir is inferred from each --source-root as ~/.claude/projects/<encoded-source-root> (every / in the absolute source-root is replaced with -). Inferred dirs that don't exist on disk are silently skipped; explicit --logs-dir values are added on top of the inferred set and must exist.
  • --cutoff <iso-ts> — drop events at or after this ISO 8601 timestamp at parse time. Use when the session's later events include the destructive operation you're recovering from.
  • --start <iso-ts> — start replay at the first event whose timestamp is at or after this. Composes with --cutoff to define a window. The target dir is trusted to already reflect the state events before --start would have produced.
  • --from-index <N> — start replay at event index N (events 0..N-1 are not classified or applied). Composes with --start; whichever lands later wins. The halt-and-resume primitive: on a halt at K, fix the cause and resume with --from-index K.

Verification

  • --strict — disable both heal layers (snapshot heal and apply-reads heal). Any Read mismatch or missing target halts immediately. Useful when measuring how much of a replay needs healing (e.g. when evaluating a classifier — heal counts in default mode signal what the classifier left on the table).
  • --strict-reads — halt on the first failed Read checkpoint instead of the default (log + continue). Useful for debugging which event triggered a missing-file scenario; the default-on lenient behaviour is what keeps long replays from stopping every time the classifier correctly omits a producing Bash chain (see the cascade rule in docs/classifier-prompt.md).

Bash classifier (opt-in)

  • --enable-llm-classifier — opt in to LLM calls. Required to use the classifier at all. The base prompt always includes a git-focused supplement that calls out git add / git commit / git branch / git checkout / git merge / git rebase / git revert / git reset / git tag / git filter-repo (non-exhaustive; the same logic extends to any equivalent state-mutating command, and to heredoc/sed writes whose content a later commit captures). Restoring the original git history is the dominant real-world use case for replay, so it ships as the default rather than an opt-in flag.
  • --custom-intent "<intent>" — append a natural-language intent describing what the replay should accomplish. Use for behaviour beyond the built-in git focus, e.g. "keep all dependency installs (npm/pip) so node_modules ends up populated" or "skip any docker/podman commands; this replay runs without a daemon". Repeatable; each value is joined with a newline.
  • --override-classifier-cache — skip reading from the classifier cache and force a fresh LLM call, but still write the new response back to the cache (overwriting any existing entry).
  • --skip-uncached-tail — if the cached run's last_event_ts falls inside the current logs, drop every event with a later timestamp before the classifier sees them. The classifier then full-hits the cache and the runtime replays only what was already cached. Intended for "re-run yesterday's replay against today's slightly grown logs without paying for the new tail." If no cache exists, the flag warns and proceeds without truncation. Caveat: events past the cap go unclassified — if the appended tail contains a destructive command, you won't see it.

Per-event overrides

  • --override-skip <INDEX> — repeatable. Force event INDEX to skip, regardless of any rule-based or LLM classification. Works on any event type (Bash, Read, Edit, Write, checkpoint).
  • --override-execute <INDEX>[=CMD] — repeatable. Force event INDEX (Bash only) to execute. Bare form runs the event's original command; =CMD runs the substring CMD instead (must be a literal substring of the event's original command — same constraint as the LLM classifier's decision.command). Subject to the same cwd-inside-source-roots check as classifier-approved executes.

Diagnostics

  • --dry-run — classify only, no execution. Walks the event stream, prints the summary, but does not apply Writes/Edits, verify Read checkpoints, or execute approved Bash. Combine with --debug to see the per-event CLASSIFY line for every event.
  • --debug — turn on the per-event CLASSIFY / APPLY / CHECK trace (one line per event) plus verbose classifier instrumentation. Off by default because the default run keeps to a handful of INFO setup lines and the final summary.

How the classifier works

All requests go through the Claude Agent SDK (claude-sonnet-4-6, multi-turn streaming, no tool use). It reuses Claude Code's existing auth — no separate API key needed. The default model id targets the 200k-context variant; switching to the 1M-context variant ([1m] suffix) requires Anthropic "Usage credits" opt-in and is currently a source-level toggle in src/llm-classifier/sdk.ts.

The Bash payload is split into batches of 50–100 events, cut at the first git commit past the threshold. Each batch becomes one user turn in a single conversation, so the system prompt and earlier batches are cache-served on subsequent turns.

Per-batch responses are cached at $XDG_CACHE_HOME/claude-code-replay/<encoded-target>/batch-NNNN.json plus a single meta.json. The encoding mirrors Claude Code's own project-dir scheme: every / in the absolute --target path becomes -, so /tmp/myrepo-recovered lives at $XDG_CACHE_HOME/claude-code-replay/-tmp-myrepo-recovered/. The cache directory is intentionally outside --target so a replayed git add . can't sweep it in.

Cache invalidation. All-or-nothing: one shared key covers every batch, so any input change invalidates the entire run at once. On a mismatch the stale entries are wiped, the classifier recomputes from scratch, and the run log states which input changed (e.g. INFO classifier cache miss: inputs changed (session logs); wiping stale entries and recomputing 4 batch(es)). The following changes invalidate:

  • Editing the system prompt (src/llm-classifier/prompts.ts).
  • Adding, removing, or changing any --custom-intent.
  • Changing the --source-root set.
  • Claude Code logs set changes — a new session JSONL appears, an existing one grows, or --cutoff (applied at parse time) crops the set. Resuming the same logs with a different --from-index / --start does NOT invalidate.
  • --override-classifier-cache — forces a fresh call without consulting the cache; results are still written back.
  • Pointing at a different --target — the cache subdir is the encoded target path, so a different target is a different cache namespace (the old one is left orphaned, not deleted).

The literal system prompt — including the file-dependency cascade rule — lives in src/llm-classifier/prompts.ts. docs/classifier-prompt.md explains its rules and the speculation/cache machinery with worked examples; the README does not duplicate the prompt itself.

Limitations

  • Only Write, Edit, Read are deterministic. Anything else (Bash, Task, TodoWrite, WebFetch, MCP tools, …) is skipped in the default path. The classifier closes the gap for Bash only; the rest stays skipped.
  • The classifier is an assistant, not an oracle. With --enable-llm-classifier, every approved execute runs a real shell command in --target. Run --dry-run and review the CLASSIFY / APPLY stream before trusting it on a fresh tree.
  • Default-lenient Read checkpoints mean a misclassified Bash chain can silently produce missing-file Reads that get skipped rather than halted. Use --strict-reads (or the broader --strict) when debugging suspected cascade misses.

Architecture

A contributor-facing module map of src/ lives in docs/architecture.md. The system prompt the classifier ships with is in src/llm-classifier/prompts.ts, with behavioural explanation in docs/classifier-prompt.md. The empirically-derived Claude Code session log format the replayer reads is documented in docs/log-format.md.

Tests

npm test          # vitest run
npm run typecheck # tsc --noEmit

Changelog

See GitHub Releases for per-version release notes.

License

MIT — see LICENSE.

About

Replay Claude Code session logs to reconstruct lost project files, commit by commit.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors