test(tasks): lock down handoff lifecycle + hook injection#9
Merged
Conversation
Two new test files that target the layers most likely to break silently:
MCP-level state mutation and SessionStart context injection. Plus three
small source-side fixes required to make those tests meaningful.
apps/mcp-server/test/task-threads.test.ts — 4 MCP integration tests:
1. Atomic claim transfer: claim -> handoff -> accept migrates the
claim to the receiver; the gap between handoff and accept holds
no owner so a third agent cannot grab the file in-flight.
2. Decline cancels + records a reason: declined handoffs MUST NOT
transfer claims (the ugliest possible failure mode), and the
decline observation is visible in task_timeline so the sender
sees the refusal next turn.
3. Expiry rejection: accepting past expires_at throws and flips
status from 'pending' to 'expired' so the sender's next turn
sees the outcome instead of a handoff that looks live forever.
4. task_updates_since symmetry: each session only sees other
sessions' posts — not its own. Tests both directions because a
single-sided filter would pass a one-direction assertion.
packages/hooks/test/task-injection.test.ts — end-to-end proof that a
fresh codex session landing on a branch where claude just left a
handoff sees the handoff in additionalContext with a copy-paste-ready
accept call (including session_id) and a decline hint.
Source-side changes this test set required:
- Decline is now its own coordination kind ('decline') instead of a
generic 'note', so task_timeline consumers can filter for it.
- SessionStart's buildTaskPreface is exported so the integration test
can drive it directly without the full runner/transport stack.
- SessionStart + UserPromptSubmit prefaces now include session_id in
the suggested accept/decline tool calls — agents were dropping
session_id from ad-hoc calls and hitting validation errors.
- UserPromptSubmit caps injection at 6 most-recent messages with a
"N older messages omitted" footer so one chatty agent cannot drown
out another's context.
All gates green: typecheck (12/12), mcp-server 13/13, hooks 11/11,
core 15/15, worker 12/12, storage 15/15, cli 4/4. Biome clean on
touched files.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Puts a foundation under the Task Threads primitive shipped in #8 before more features get built on top. Two test files target the layers most likely to break silently — MCP-level state mutation and SessionStart context injection — plus three small source fixes the tests required.
What's new
apps/mcp-server/test/task-threads.test.ts— 4 MCP integration tests:claim → handoff → acceptmigrates the claim to the receiver. Verifies the claim slot is empty between handoff and accept so a third agent can't grab the file in-flight.task_timelineso the sender sees refusal next turn.expires_atthrows and flips status frompendingtoexpiredso the sender's next turn sees the outcome instead of a handoff that looks live forever.task_updates_sincesymmetry — each session only sees other sessions' posts. Tests both directions because a one-sided filter would pass a one-sided assertion.packages/hooks/test/task-injection.test.ts— end-to-end proof that a fresh codex session landing on a branch where claude just left a handoff sees the handoff inadditionalContextwith a copy-paste-ready accept call (includingsession_id) and a decline hint.Source fixes the tests required
declineis now its ownCoordinationKind(not a genericnote) sotask_timelineconsumers can filter for it.buildTaskPrefaceis exported fromsession-start.tsso the integration test can drive it directly without the full runner/transport stack.SessionStart+UserPromptSubmitprefaces now includesession_idin suggested accept/decline tool calls — agents were droppingsession_idfrom ad-hoc calls and hitting validation errors.UserPromptSubmitcaps injection at 6 most-recent messages with an(N older messages omitted)footer so one chatty agent cannot drown out another's context.Test plan
pnpm buildgreen🤖 Generated with Claude Code