feat(cli): observe + debrief + note — ambient collaboration instrumentation by NagyVikt · Pull Request #11 · recodeee/colony

NagyVikt · 2026-04-23T22:06:30Z

Summary

Three small CLI commands that make the end-of-day observation exercise ambient — data accumulates passively while you work, and a single command produces the structured post-mortem at the end.

Command	Role
`cavemem note <text...>`	Timestamped scratch pad, variadic argv (no quoting required)
`cavemem observe`	Live 3s-refresh dashboard of tasks, claims, handoffs, events
`cavemem debrief`	End-of-day 5-section post-mortem over the last 24h

Why now

We already shipped auto-claim (#10) without real evidence that agents weren't claiming proactively on their own. The debrief's section 3 — claim/edit ratio — is the falsification test. If >70% we keep auto-claim as a safety net; if <20% it's carrying the load and should stay. Either way, a measurement.

What's new

CLI commands (`apps/cli/src/commands/`):

`note.ts` — inserts under reserved `observer` session; optional `--task `.
`observe.ts` — paints a frame every 3s with tasks, participants, recent claims, pending handoffs, last 6 events, and the diagnostic footer.
`debrief.ts` — five sections: tool-usage ratio, auto-join verification, proactive-claim ratio, handoff outcomes, interleaved timeline.

Hooks (`packages/hooks/src/handlers/post-tool-use.ts`):

Records `file_path` in `tool_use` metadata for Edit/Write/MultiEdit/NotebookEdit. The analytics queries depend on this surface; parsing content at query time would require reversing compression.

Storage (`packages/storage/src/storage.ts`):

New queries: `pendingHandoffs`, `recentEditsWithoutClaims`, `toolUsageBySession`, `participantJoinFor`, `editVsClaimStats`, `handoffStatusDistribution`, `handoffAcceptLatencies`, `mixedTimeline`. All read-only single-query methods, most use `json_extract`.

Tests: existing `runner.test.ts` assertion on PostToolUse metadata updated to include `file_path`. CLI smoke test updated with three new command names.

Test plan

Full workspace `pnpm test` (storage 17/17, core 15/15, hooks 18/18, mcp 13/13, worker 12/12, cli 4/4)
`pnpm typecheck` (12/12 projects)
`pnpm build` — CLI bundle grew `56.39 KB` (from `44.97 KB`)
Biome clean on touched files

Deliberately not included

No end-to-end test for `observe` (live dashboard) or `debrief` (needs 24h of real data to be meaningful). The underlying storage queries are exercised by the existing Storage tests; the commands themselves are thin renderers.

🤖 Generated with Claude Code

…laboration days Three commands that make observation ambient. The theory of the experiment is that watching two agents collaborate for a day is worth more than any amount of theorising about which feature to build next, but only if the data accumulates passively while you work and the end-of-day write-up is guided instead of freeform. cavemem note <text...> Records a timestamped scratch note under a reserved `observer` session. Variadic argv so quoting doesn't kill adoption: type `cavemem note codex stepped on claude` and it just works. Notes flow through the same observations pipeline as agent activity, so they interleave in task timelines and show up in search. cavemem observe Live dashboard (3s refresh) of active tasks, participants, recent claims, pending handoffs, and the last ~6 events per task. The footer line — "edits without proactive claims (last 5m)" — is the live falsification test for the auto-claim hypothesis: empty means agents are claiming proactively, populated means the safety net is carrying the load. cavemem debrief End-of-day post-mortem with five guided sections: 1. Tool-usage ratio per session (invisible / occasional / integrated) 2. Auto-join landing (sessions that joined within 2s of start) 3. Proactive-claim ratio (claim-kind obs vs tool_use edit count) 4. Handoff outcome distribution + median accept latency 5. Chronological timeline with observer notes interleaved Post-tool-use now records file_path in tool_use metadata when the tool is Edit/Write/MultiEdit/NotebookEdit. The observe/debrief queries depend on this surface — parsing content at query time would require reversing compression; recording at write time is a couple bytes per observation. Storage adds: pendingHandoffs, recentEditsWithoutClaims, toolUsageBySession, participantJoinFor, editVsClaimStats, handoffStatusDistribution, handoffAcceptLatencies, mixedTimeline. All read-only, all single SQLite queries; most use json_extract. All gates green.

NagyVikt merged commit 7c0d745 into main Apr 23, 2026
0 of 3 checks passed

NagyVikt deleted the agent/claude/observe-debrief branch April 23, 2026 22:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cli): observe + debrief + note — ambient collaboration instrumentation#11

feat(cli): observe + debrief + note — ambient collaboration instrumentation#11
NagyVikt merged 1 commit intomainfrom
agent/claude/observe-debrief

NagyVikt commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NagyVikt commented Apr 23, 2026

Summary

Why now

What's new

Test plan

Deliberately not included

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant