Skip to content

feat(cli): observe + debrief + note — ambient collaboration instrumentation#11

Merged
NagyVikt merged 1 commit intomainfrom
agent/claude/observe-debrief
Apr 23, 2026
Merged

feat(cli): observe + debrief + note — ambient collaboration instrumentation#11
NagyVikt merged 1 commit intomainfrom
agent/claude/observe-debrief

Conversation

@NagyVikt
Copy link
Copy Markdown
Collaborator

Summary

Three small CLI commands that make the end-of-day observation exercise ambient — data accumulates passively while you work, and a single command produces the structured post-mortem at the end.

Command Role
`cavemem note <text...>` Timestamped scratch pad, variadic argv (no quoting required)
`cavemem observe` Live 3s-refresh dashboard of tasks, claims, handoffs, events
`cavemem debrief` End-of-day 5-section post-mortem over the last 24h

Why now

We already shipped auto-claim (#10) without real evidence that agents weren't claiming proactively on their own. The debrief's section 3 — claim/edit ratio — is the falsification test. If >70% we keep auto-claim as a safety net; if <20% it's carrying the load and should stay. Either way, a measurement.

What's new

CLI commands (`apps/cli/src/commands/`):

  • `note.ts` — inserts under reserved `observer` session; optional `--task `.
  • `observe.ts` — paints a frame every 3s with tasks, participants, recent claims, pending handoffs, last 6 events, and the diagnostic footer.
  • `debrief.ts` — five sections: tool-usage ratio, auto-join verification, proactive-claim ratio, handoff outcomes, interleaved timeline.

Hooks (`packages/hooks/src/handlers/post-tool-use.ts`):

  • Records `file_path` in `tool_use` metadata for Edit/Write/MultiEdit/NotebookEdit. The analytics queries depend on this surface; parsing content at query time would require reversing compression.

Storage (`packages/storage/src/storage.ts`):

  • New queries: `pendingHandoffs`, `recentEditsWithoutClaims`, `toolUsageBySession`, `participantJoinFor`, `editVsClaimStats`, `handoffStatusDistribution`, `handoffAcceptLatencies`, `mixedTimeline`. All read-only single-query methods, most use `json_extract`.

Tests: existing `runner.test.ts` assertion on PostToolUse metadata updated to include `file_path`. CLI smoke test updated with three new command names.

Test plan

  • Full workspace `pnpm test` (storage 17/17, core 15/15, hooks 18/18, mcp 13/13, worker 12/12, cli 4/4)
  • `pnpm typecheck` (12/12 projects)
  • `pnpm build` — CLI bundle grew `56.39 KB` (from `44.97 KB`)
  • Biome clean on touched files

Deliberately not included

No end-to-end test for `observe` (live dashboard) or `debrief` (needs 24h of real data to be meaningful). The underlying storage queries are exercised by the existing Storage tests; the commands themselves are thin renderers.

🤖 Generated with Claude Code

…laboration days

Three commands that make observation ambient. The theory of the
experiment is that watching two agents collaborate for a day is worth
more than any amount of theorising about which feature to build next,
but only if the data accumulates passively while you work and the
end-of-day write-up is guided instead of freeform.

cavemem note <text...>
  Records a timestamped scratch note under a reserved `observer`
  session. Variadic argv so quoting doesn't kill adoption: type
  `cavemem note codex stepped on claude` and it just works. Notes
  flow through the same observations pipeline as agent activity, so
  they interleave in task timelines and show up in search.

cavemem observe
  Live dashboard (3s refresh) of active tasks, participants, recent
  claims, pending handoffs, and the last ~6 events per task. The
  footer line — "edits without proactive claims (last 5m)" — is the
  live falsification test for the auto-claim hypothesis: empty means
  agents are claiming proactively, populated means the safety net is
  carrying the load.

cavemem debrief
  End-of-day post-mortem with five guided sections:
    1. Tool-usage ratio per session (invisible / occasional / integrated)
    2. Auto-join landing (sessions that joined within 2s of start)
    3. Proactive-claim ratio (claim-kind obs vs tool_use edit count)
    4. Handoff outcome distribution + median accept latency
    5. Chronological timeline with observer notes interleaved

Post-tool-use now records file_path in tool_use metadata when the
tool is Edit/Write/MultiEdit/NotebookEdit. The observe/debrief
queries depend on this surface — parsing content at query time would
require reversing compression; recording at write time is a couple
bytes per observation.

Storage adds: pendingHandoffs, recentEditsWithoutClaims,
toolUsageBySession, participantJoinFor, editVsClaimStats,
handoffStatusDistribution, handoffAcceptLatencies, mixedTimeline.
All read-only, all single SQLite queries; most use json_extract.

All gates green.
@NagyVikt NagyVikt merged commit 7c0d745 into main Apr 23, 2026
0 of 3 checks passed
@NagyVikt NagyVikt deleted the agent/claude/observe-debrief branch April 23, 2026 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant