cc-logger

Agent QA for Claude Code. Replay sessions like game film, trace every sub-agent and tool call, and compare repeated runs to catch workflow drift. Captures every prompt, sub-agent, tool call, and Claude's narration between them into Postgres so you can see how the work actually happened, not just what came out.

Two agent runs can produce identical-looking outputs while one took the happy path and the other recovered from three failed WebFetches, fell back to a different source, and got lucky. The outputs look the same. The processes are not. This is the tool that catches that.

cc-logger sessions                                              # what ran recently
cc-logger inspect <session-id>                                  # one session, full tree
cc-logger insights --days 7                                     # cross-session patterns
psql $DATABASE_URL -f queries/13_tool_sequence_conformance.sql  # drift across runs

Who this is for:

Solo Claude Code users who want to review their sessions like game film.
Teams and agencies running the same agent repeatedly across clients or projects, who need to know whether each run is following the canonical path or drifting.
Operators of production agent systems who want tool-level traces and the ability to compare behavior across runs.

Built on Claude Code's HTTP hooks — no patches or modifications to Claude Code itself.

What you get

Run a Claude Code session, then cc-logger inspect <session-id> shows you the full tree of what happened:

SESSION 94b8ee2b-b51f-4125-a116-82adaf4066af
  started 2026-05-13 11:55  ended 2026-05-13 20:08  (8h 13m)
  end_reason: exit
  prompt: 'Look into our top accounts and brainstorm a summer outreach strategy.
           Map district fiscal sustainability against state academic calendars
           and propose which personas are reachable in each week of June-August.'

  [root  completed]
    · 'I'll start by pulling the campaign performance and matching it against
       district fiscal data, then fan out to verify against state sources.'
    Bash       'glab api "groups/12345/projects?search=accounts"'              3.0s  ok
    Bash       'psql ... -c "SELECT campaign, replies FROM eb_campaigns ..."'  2.0s  ok
    Bash       'ls /Users/me/Downloads/k12-district-fiscal-sustainability'      0ms  ok
    · 'Got the baseline. Now I'll spawn three sub-agents in parallel: one for
       NY state data, one for Ohio, one for the cross-state academic calendar.'
    Agent      'Map district fiscal sustainability data'                       38s   ok
      [general-purpose ab72bd109071caf  completed]
        · 'Searching state Comptroller / Auditor databases for fiscal-stress designations.'
        WebSearch  'K-12 district fiscal stress New York Comptroller 2024 2025'  5.9s  ok
        WebFetch   'https://www.osc.ny.gov/state-agencies/audits/fiscal-stress'  8.2s  ok
        WebSearch  'Ohio Auditor school district fiscal distress 2024'          6.1s  ok
        WebFetch   'https://ohioauditor.gov/auditsearch/Reports/2024'           64s   FAIL
        · 'Ohio Auditor blocked the fetch. Falling back to Comptroller summary.'
        ... 47 more tool calls
        → 'Found 31 districts in NY designated fiscal stress, 12 in OH...'
    Agent      'Cross-reference academic calendars by state'                  5m 55s ok
      [general-purpose ab9dcc9453675d9  completed]
        ... 64 tool calls
    ... 7 more sub-agents

  10 invocations (9 sub-agents), 556 tool calls (24 failed, 0 pending), 142 text blocks

cc-logger insights adds the cross-session view — power-law distribution of where your time goes, top failure domains, sub-agent fan-out patterns, hourly activity.

Comparing runs: the conformance loop

The film-room mode is retrospective — watch one session, learn from it, move on. The other mode is conformance testing: when you run the same agent repeatedly, are the runs actually doing the same thing?

Two runs of the same agent can produce identical-looking output where one took the happy path and the other recovered from three WebFetch failures, fell back to a different source, and got lucky. The outputs look the same. The processes are not. Conformance testing catches that.

Three queries make up the loop:

# 1. Find drift across many runs of similar work
psql $DATABASE_URL -f queries/13_tool_sequence_conformance.sql

# 2. Diff two specific runs side by side
psql $DATABASE_URL \
  -v sid1="'<session-a>'" -v sid2="'<session-b>'" \
  -f queries/14_compare_two_runs.sql

# 3. Find the branching point + Claude's narration there
psql $DATABASE_URL \
  -v sid1="'<session-a>'" -v sid2="'<session-b>'" \
  -f queries/15_branching_points.sql

Real example from two runs of the same brief-generating agent (35 min / 19 tools vs. 6 min / 13 tools):

=== Branch summary ===
pos | run_a_tool | run_a_hint                    | run_b_tool | run_b_hint
  5 | Bash       | ls .../runs/latest-brief.md   | Write      | .../runs/latest-brief.md

=== Narration around the branch (±60s) ===
RUN-A | 07:15 | "Collection complete. Now I'll read the context and synthesize..."
RUN-A | 07:16 | "Now I have full context. Let me synthesize and send."
RUN-A | 07:17 | "Brief is 959 words; hard cap is 800. Trimming."
RUN-B | 07:52 | "919 words — need to trim under 800. Tightening."

Run A did an extra precautionary ls before writing, then spent two minutes on context-reading narration before the first word-count check. Run B was more direct. Same agent, same data, same output — different process. Four rows of diagnostic.

This is the layer that turns a logger into agent QA. If you run the same agent across many clients, projects, or days, query 13 tells you which runs are doing it the canonical way and which are snowflakes. Queries 14 + 15 tell you what changed and why.

Quickstart (Docker Compose)

git clone https://github.com/kkrlstrm/cc-logger.git
cd cc-logger
cp .env.example .env                # defaults work for local Docker setup
docker compose up -d                # boots Postgres + cc-logger

# Migrate schema
docker compose exec cc-logger python migrations/001_initial_schema.py --apply

# Wire Claude Code hooks (merges into ~/.claude/settings.json with a backup)
python scripts/install-hooks.py

# Run a Claude Code session, then check what's been captured:
docker compose exec cc-logger python -m cc_logger.cli sessions

Alternative: run natively (no Docker)

uv sync
# Put a Postgres connection string in .env (Neon, Supabase, RDS, local install — anything)
uv run python migrations/001_initial_schema.py --apply
./scripts/install.sh                # installs launchd (macOS) or systemd (Linux) auto-start
python scripts/install-hooks.py     # wires the Claude Code hooks

What it captures

Every Claude Code session (start, end, initial prompt, end reason)
Every sub-agent invocation (root + children, with linkage to the spawning Agent tool call)
Every tool call in the capture allowlist (Agent, Bash, Edit, Write, WebFetch, WebSearch, and mcp__.*)
Tool input + tool response payloads as JSONB; anything >50KB spills to a separate artifacts table
Claude's text narration between tool calls — read from the Claude Code transcript file at every Stop / SubagentStop, stored in the messages table. (Extended thinking blocks are encrypted by Anthropic — only text blocks are capturable.)
Optional regex redaction of common secret patterns before write (on by default)

Privacy

By default, common secret patterns (API keys, bearer tokens, passwords in connection strings) are redacted before any payload is written to Postgres. Known gaps are tracked as xfail tests in tests/test_redaction_known_gaps.py. Disable with REDACT_SECRETS=0 if you want raw capture. Full story in docs/PRIVACY.md.

Tool capture allowlist

Captured: Agent, Bash, Edit, Write, WebFetch, WebSearch, mcp__.* Skipped: Read, Glob, Grep, TodoWrite (to keep volume sane).

CLI

cc-logger sessions [--limit N]              # list recent sessions
cc-logger inspect <session-id>              # render the session tree
cc-logger insights [--days N]               # cross-session analytics

Canned queries

Fifteen ready-to-run SQL files in queries/.

The conformance loop (the differentiated value — see "Comparing runs" above):

13_tool_sequence_conformance.sql — groups sessions by their root-agent tool sequence to surface drift across repeat runs of the same agent. Modal paths vs. snowflakes.
14_compare_two_runs.sql — side-by-side diff of two specific sessions, with a match column flagging where they did the same thing vs. where they diverged.
15_branching_points.sql — for two sessions, finds the first position where their tool sequences differ, and pulls Claude's narration (±60s) around that branch.

Single-session and aggregate analytics:

01_session_summary.sql — recent sessions with counts and duration
02_tool_usage.sql — tool mix and reliability (last 24h)
03_subagent_tree.sql — sub-agent tree for a session
04_failures_breakdown.sql — what's timing out vs. failing fast
05_hourly_activity.sql — when you actually work
06_repeat_fail_domains.sql — WebFetch hosts to avoid
07_slowest_subagents.sql — outlier deep dives
08_orphaned_calls.sql — Bash calls that never reported back
09_tool_calls_over_time.sql — tool volume by day
10_subagent_fanout_distribution.sql — how often you fan out, and how wide
11_longest_sessions_by_prompt.sql — which prompts produced the longest sessions
12_error_rates_by_tool.sql — fail % per tool name

Schema

Five tables: sessions, agent_invocations, tool_calls, artifacts, messages. Full reference in docs/SCHEMA.md.

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
launchd		launchd
migrations		migrations
queries		queries
scripts		scripts
src/cc_logger		src/cc_logger
systemd		systemd
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cc-logger

What you get

Comparing runs: the conformance loop

Quickstart (Docker Compose)

Alternative: run natively (no Docker)

What it captures

Privacy

Tool capture allowlist

CLI

Canned queries

Schema

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

cc-logger

What you get

Comparing runs: the conformance loop

Quickstart (Docker Compose)

Alternative: run natively (no Docker)

What it captures

Privacy

Tool capture allowlist

CLI

Canned queries

Schema

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages