Skip to content

test(tasks): lock down handoff lifecycle + hook injection#9

Merged
NagyVikt merged 1 commit intomainfrom
agent/claude/task-threads-tests
Apr 23, 2026
Merged

test(tasks): lock down handoff lifecycle + hook injection#9
NagyVikt merged 1 commit intomainfrom
agent/claude/task-threads-tests

Conversation

@NagyVikt
Copy link
Copy Markdown
Collaborator

Summary

Puts a foundation under the Task Threads primitive shipped in #8 before more features get built on top. Two test files target the layers most likely to break silently — MCP-level state mutation and SessionStart context injection — plus three small source fixes the tests required.

What's new

apps/mcp-server/test/task-threads.test.ts — 4 MCP integration tests:

  1. Atomic claim transferclaim → handoff → accept migrates the claim to the receiver. Verifies the claim slot is empty between handoff and accept so a third agent can't grab the file in-flight.
  2. Decline cancels + records a reason — declined handoffs MUST NOT transfer claims (the ugliest possible failure mode). Decline observation is visible in task_timeline so the sender sees refusal next turn.
  3. Expiry rejection — accepting past expires_at throws and flips status from pending to expired so the sender's next turn sees the outcome instead of a handoff that looks live forever.
  4. task_updates_since symmetry — each session only sees other sessions' posts. Tests both directions because a one-sided filter would pass a one-sided assertion.

packages/hooks/test/task-injection.test.ts — end-to-end proof that a fresh codex session landing on a branch where claude just left a handoff sees the handoff in additionalContext with a copy-paste-ready accept call (including session_id) and a decline hint.

Source fixes the tests required

  • decline is now its own CoordinationKind (not a generic note) so task_timeline consumers can filter for it.
  • buildTaskPreface is exported from session-start.ts so the integration test can drive it directly without the full runner/transport stack.
  • SessionStart + UserPromptSubmit prefaces now include session_id in suggested accept/decline tool calls — agents were dropping session_id from ad-hoc calls and hitting validation errors.
  • UserPromptSubmit caps injection at 6 most-recent messages with an (N older messages omitted) footer so one chatty agent cannot drown out another's context.

Test plan

  • MCP server tests: 13/13
  • Hooks tests: 11/11 (includes 1 new integration test)
  • Core tests: 15/15 (existing decline test still green under the kind change)
  • Typecheck 12/12 projects
  • pnpm build green
  • Biome clean on touched files

🤖 Generated with Claude Code

Two new test files that target the layers most likely to break silently:
MCP-level state mutation and SessionStart context injection. Plus three
small source-side fixes required to make those tests meaningful.

apps/mcp-server/test/task-threads.test.ts — 4 MCP integration tests:
  1. Atomic claim transfer: claim -> handoff -> accept migrates the
     claim to the receiver; the gap between handoff and accept holds
     no owner so a third agent cannot grab the file in-flight.
  2. Decline cancels + records a reason: declined handoffs MUST NOT
     transfer claims (the ugliest possible failure mode), and the
     decline observation is visible in task_timeline so the sender
     sees the refusal next turn.
  3. Expiry rejection: accepting past expires_at throws and flips
     status from 'pending' to 'expired' so the sender's next turn
     sees the outcome instead of a handoff that looks live forever.
  4. task_updates_since symmetry: each session only sees other
     sessions' posts — not its own. Tests both directions because a
     single-sided filter would pass a one-direction assertion.

packages/hooks/test/task-injection.test.ts — end-to-end proof that a
fresh codex session landing on a branch where claude just left a
handoff sees the handoff in additionalContext with a copy-paste-ready
accept call (including session_id) and a decline hint.

Source-side changes this test set required:
- Decline is now its own coordination kind ('decline') instead of a
  generic 'note', so task_timeline consumers can filter for it.
- SessionStart's buildTaskPreface is exported so the integration test
  can drive it directly without the full runner/transport stack.
- SessionStart + UserPromptSubmit prefaces now include session_id in
  the suggested accept/decline tool calls — agents were dropping
  session_id from ad-hoc calls and hitting validation errors.
- UserPromptSubmit caps injection at 6 most-recent messages with a
  "N older messages omitted" footer so one chatty agent cannot drown
  out another's context.

All gates green: typecheck (12/12), mcp-server 13/13, hooks 11/11,
core 15/15, worker 12/12, storage 15/15, cli 4/4. Biome clean on
touched files.
@NagyVikt NagyVikt merged commit 0a7e994 into main Apr 23, 2026
0 of 3 checks passed
@NagyVikt NagyVikt deleted the agent/claude/task-threads-tests branch April 23, 2026 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant