feat(cli): observe + debrief + note — ambient collaboration instrumentation#11
Merged
feat(cli): observe + debrief + note — ambient collaboration instrumentation#11
Conversation
…laboration days
Three commands that make observation ambient. The theory of the
experiment is that watching two agents collaborate for a day is worth
more than any amount of theorising about which feature to build next,
but only if the data accumulates passively while you work and the
end-of-day write-up is guided instead of freeform.
cavemem note <text...>
Records a timestamped scratch note under a reserved `observer`
session. Variadic argv so quoting doesn't kill adoption: type
`cavemem note codex stepped on claude` and it just works. Notes
flow through the same observations pipeline as agent activity, so
they interleave in task timelines and show up in search.
cavemem observe
Live dashboard (3s refresh) of active tasks, participants, recent
claims, pending handoffs, and the last ~6 events per task. The
footer line — "edits without proactive claims (last 5m)" — is the
live falsification test for the auto-claim hypothesis: empty means
agents are claiming proactively, populated means the safety net is
carrying the load.
cavemem debrief
End-of-day post-mortem with five guided sections:
1. Tool-usage ratio per session (invisible / occasional / integrated)
2. Auto-join landing (sessions that joined within 2s of start)
3. Proactive-claim ratio (claim-kind obs vs tool_use edit count)
4. Handoff outcome distribution + median accept latency
5. Chronological timeline with observer notes interleaved
Post-tool-use now records file_path in tool_use metadata when the
tool is Edit/Write/MultiEdit/NotebookEdit. The observe/debrief
queries depend on this surface — parsing content at query time would
require reversing compression; recording at write time is a couple
bytes per observation.
Storage adds: pendingHandoffs, recentEditsWithoutClaims,
toolUsageBySession, participantJoinFor, editVsClaimStats,
handoffStatusDistribution, handoffAcceptLatencies, mixedTimeline.
All read-only, all single SQLite queries; most use json_extract.
All gates green.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three small CLI commands that make the end-of-day observation exercise ambient — data accumulates passively while you work, and a single command produces the structured post-mortem at the end.
Why now
We already shipped auto-claim (#10) without real evidence that agents weren't claiming proactively on their own. The debrief's section 3 — claim/edit ratio — is the falsification test. If >70% we keep auto-claim as a safety net; if <20% it's carrying the load and should stay. Either way, a measurement.
What's new
CLI commands (`apps/cli/src/commands/`):
Hooks (`packages/hooks/src/handlers/post-tool-use.ts`):
Storage (`packages/storage/src/storage.ts`):
Tests: existing `runner.test.ts` assertion on PostToolUse metadata updated to include `file_path`. CLI smoke test updated with three new command names.
Test plan
Deliberately not included
No end-to-end test for `observe` (live dashboard) or `debrief` (needs 24h of real data to be meaningful). The underlying storage queries are exercised by the existing Storage tests; the commands themselves are thin renderers.
🤖 Generated with Claude Code