feat(mcp): coordinator agent with task visibility, pilot/co-pilot control, and expanded options#100
feat(mcp): coordinator agent with task visibility, pilot/co-pilot control, and expanded options#100brooksc wants to merge 39 commits intojohannesjo:mainfrom
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2e523b0 to
b436f19
Compare
|
@cledoux95 as you initiated this I'd love to get your thoughts on my building further on your work. Here's what I'm trying to solve for.
What I'm hoping to get to:
1-3 is working here (although the take control/return control may need some work) I'm thinking I need some webhook or way for agentN to notify the coordinator when it's done (hook...?) Anyway... this may take some time to polish. @johannesjo What do you think about adding a "Beta" section in settings, when you're ready to integrate it - it can live in Beta for a little while so others can opt-in and shake it out if they want. I think it'll take some time to live with this to continue to refine it. I'm using it myself going forward. |
|
Thank you very much @brooksc (and @cledoux95 !) !! This is impressive work, and the integration fixes on top of #31 are exactly the kind of stuff that's painful to discover from the outside. I want to ship this, but I agree with you that it's not ready as a default-on feature, and I think your "Beta in Settings" instinct is the right framing. Let me make that concrete. Yes to beta-gating, with a real flag. Specifically:
Three things I'd want addressed before I merge, even into beta:
On your unsolved step #4 (coordinator noticing "agent done"): I think Claude Code's Graduation criteria I'd want to hit before flipping the flag on by default:
If you're up for the gating work I'm happy to keep this open and help where useful. And to be clear: I do want to ship this – the pilot/co-pilot pattern is the right shape for human-in-the-loop orchestration, and you've already shaved off a lot of the rakes. |
|
@johannesjo — thanks for the detailed review, this is exactly the kind of feedback that makes it worth sharing early. Lots to respond to. Where things stand: working e2eSince the original PR I've built considerably more on top. Here's what's confirmed working end-to-end: Coordinator creation & MCP tooling
Sub-task lifecycle
Coordinator ↔ sub-agent notification — and why we didn't use Stop hooks This is where we diverge from your suggestion, and I think for good reason. You suggested Instead, sub-agents call the Additional notification paths:
Pilot / co-pilot control handoff — how it works and two open items Each sub-task panel shows a banner indicating who's driving. When the coordinator has control the banner reads "Coordinator driving" in grey. Clicking "Take Control" blocks the coordinator's Two things to note as known gaps: first, the button labels ("Take Control" / "Return to Orchestrator") overstate what's happening — the coordinator agent keeps running, only its ability to write to that terminal is paused. "Pause coordinator" / "Resume coordinator" would be more accurate. Second, the sub-task's PromptInput and raw xterm terminal are not currently gated on control state, so a user could type simultaneously with a coordinator Unit test coverage What isn't done yetBeta gating — you're right that zero-footprint opt-in is the right approach. We haven't done the Settings dialog placement — with coordinator mode, Docker, verbose logging, and more to come, Settings is getting long. A tabbed settings dialog (General / Experimental) would be a natural home for the beta flag, but that's its own PR and should land independently of this one. Docker + coordinator — currently mutually exclusive. Sub-tasks spawned by a coordinator run as native host processes, defeating Docker isolation. Two approaches documented in Post-restart MCP path integration test — fair ask. The port/token rotation path is the right thing to test. Will add.
Minor items (in KNOWN-TODOS.md): orphaned sub-task badge UI, re-stage notification after user manually sends an edited prompt, configurable notification delay, control handoff input gating and button renaming. Happy to tackle the beta gating as the next chunk. Does the |
4057d03 to
0be08ca
Compare
|
Thanks @brooksc — the depth of the response (and the 379 tests!) tells me where this is heading, and I'm on board with the shape. A few replies and a few new things I noticed reading the diff today.
Beta gating shape: skipPermissions guardrail — the propagation checkbox is the right shape. Good refinement: a "propagate skip-permissions to sub-tasks" checkbox in the New Task dialog, defaulting off even when the coordinator itself was launched with skip-permissions. Don't auto-inherit silently. The 40-tasks-at-a-time workflow is real and worth optimizing for. A few things I caught on a code pass that haven't come up yet:
Item #1 is the only one that affects whether the test plan currently passes; the rest can ride along with the beta-gating PR. Excited about this — the pilot/co-pilot framing has aged well over the discussion. |
Adds HTTP REST endpoints for task management (create, list, get, prompt, wait, diff, output, signal-done, merge, close) plus an MCP log ring buffer. Also adds AGENTS.md (project context for AI agents) and rebuild-integration.sh. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… diff truncation metadata, gitIsolation schema Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- wait_for_signal_done redesign: removed taskId parameter; returns whichever sub-task completes next with a `remaining` count so the coordinator can loop naturally without guessing order - merge_task / review_and_merge_task: operate in the sub-task's worktree so the target branch doesn't need to be checked out in the main repo - Docker: sub-agents spawned via docker exec into the coordinator's container when coordinator runs with Docker mode enabled - Preamble injection: writes signal_done instructions to the right file per agent type (AGENTS.md / GEMINI.md / .agent.md / settings.local.json) in the worktree rather than the project root; file is cleaned up by close_task - Notification system: suppress pending notifications during signal waits to prevent double-notify; escalate to orphaned notification after 10 missed autofire attempts; clear stale notifications on coordinator deregistration - get_task_output: append truncation sentinel when scrollback exceeds 20 000 chars so the coordinator knows it saw partial output - wait_for_idle: return reason field (idle / exited / human_control) - Autofire: countdown UI; don't treat previous staged text as user edit - Persistence: persist mcpConfigPath across app restarts; rewrite config on startup with current port/token - Tests: 63 unit tests in coordinator.test.ts; git mergeWorktreePath test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Coordinators can drive sub-tasks via send_prompt; users can reclaim a task at any time with Take Control and hand it back with Release Control. This PR extends that model to the coordinator task itself. Control bar (TaskPanel) - Renamed "Pause/Resume coordinator" → "Take Control / Release Control" for clearer ownership language - Extended to coordinator tasks (coordinatorMode) in addition to sub-tasks (coordinatedBy): the same bar now appears on the coordinator's own panel - Auto-mode label: "Auto mode" for coordinator tasks, "Coordinator driving" for sub-tasks Input locking (PromptInput + TerminalView) - PromptInput textarea is disabled when controlledBy === 'coordinator'; onKeyDown/onInput guards provide belt-and-suspenders protection - TerminalView: term.options.disableStdin toggled reactively so xterm stops forwarding keystrokes to the PTY (the real input path users take) - Autofire interval skips ticks (no miss count) when controlledBy is human Coordinator task panel UX - Invisible pointer-events overlay gives coordinator task the same cursor:not-allowed treatment sub-tasks already had - Overlay click propagates to maybeShowControlHint, enabling the discoverability tooltip (shown ≤3 times, dismissible, "Don't show again") - PromptInput panel hidden (display:none, minSize getter → 0) in auto mode while keeping the component mounted so autofire keeps running Control state persistence - controlledBy persisted in PersistedTask; restored on app restart - Default: 'coordinator' for coordinator/coordinated tasks, undefined otherwise - setTaskControl skips the MCP_ControlChanged IPC for coordinator tasks (backend only knows sub-tasks, not the coordinator task itself) Bug fixes - MCP_TaskCreated handler now sets controlledBy:'coordinator' on new sub-tasks (was missing, leaving textarea unlocked) - Closing a coordinator task clears controlledBy on all children so their textareas re-enable when the control bar disappears - setTaskControl calls saveState() so Take Control survives autosave Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- design.md: comprehensive write-up of the coordinator mode feature — architecture diagram, Coordinator class internals, MCP tool surface, wait_for_signal_done redesign rationale, notification system, control handoff, autofire, discoverability hint, settings, security, Docker, New Task Dialog changes, and test coverage summary - TODOS.md (renamed from KNOWN-TODOS.md): easy coordinator test tasks, medium known issues (get_task_output truncation, merge_task live run, autofire timing edge case, MCP config staleness), hard backend hydration issues, and frontend test requirements (items 12–16) - Removed AGENTS.md from project root (preamble is now written per-agent into the worktree by the coordinator, not the project root) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Verifies that newly created sub-tasks have controlledBy set to 'coordinator' and coordinatedBy set to the coordinator task ID, with a regression guard ensuring controlledBy is always defined. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…l tick Extracts the autofire tick decision logic into processAutoFireTick (autofire-tick.ts) so it can be tested as a pure function without mounting the SolidJS component. Tests verify: - controlledBy==='human' returns 'paused' without touching the miss counter - controlledBy==='coordinator' increments miss count and fires when prompt visible Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extracts disableStdin logic into computeDisableStdin() and adds tests for coordinator-controlled (true), human-controlled (false), and undefined controlledBy (false) scenarios. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The merge_task tool merges into baseBranch (the coordinator's feature branch), not necessarily main. Update both the MCP tool description and coordinator preamble to reflect the correct behavior. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The coordinator preamble was being injected twice — once in NewTaskDialog.tsx (from src/lib/coordinator-preamble.ts) and again in src/store/tasks.ts (from src/store/coordinator-preamble.ts), causing coordinators to start with duplicated and contradictory instructions. Remove src/lib/coordinator-preamble.ts and its prepend in NewTaskDialog.tsx, consolidating all rules into the store version as single source of truth. The store version now includes the BAD/GOOD task-assignment guidance, baseBranch reminder, max-3-concurrent rule, file-overlap guidance, and the verify-before-assigning rule from the deleted lib version. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…_merge_task Both handlers now append ' [NOT COMMITTED — will be auto-committed on merge]' to any file entry where committed === false, matching the safety context the file-object type already carries. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
In mergeTask(), before delegating to gitMergeTask(), attempt an auto-commit of any uncommitted changes in the task worktree. If the commit fails and git status still shows dirty files, throw an error to abort the merge. If nothing was staged (git commit fails due to "nothing to commit"), swallow silently. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tasks with needsReview set now return 'review' from getTaskAttentionState() and getTaskDotStatus(), making them visible in the sidebar instead of appearing idle/waiting. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously only store.taskOrder was scanned, so collapsed coordinated children disappeared from the strip. Now uses getCoordinatorChildren() which covers both active and collapsedTaskOrder. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On create failure after preamble injection, only newly-created files were cleaned up (unlinked). Existing files that had the preamble appended were left modified on disk. Now tracks original content per agent type and restores it on failure, or unlinks if the file was newly created. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When a coordinator task is collapsed, child tasks remain locked with no way to interact, and staged notifications cannot fire because the coordinator's PromptInput is not mounted. Prevent this broken state by returning early from collapseTask() when the task has coordinatorMode. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eProjectWithTasks Reorder the close sequence so non-coordinator tasks are closed first, then coordinators last. Prevents a partial-close state where a coordinator fails but its children were already removed from the original (now stale) snapshot. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add stripPreambleFromBranch() to remove injected <sub-task-mode> content from AGENTS.md, GEMINI.md, and .agent.md before staging. Call it before git add -A in the auto-commit step so preamble files are absent or restored to their original content when the commit runs. Newly created preamble-only files are deleted; modified files are restored to their pre-injection content. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…b-task preamble cleanup - Rewrite coordinator preamble with strict sliding-window pattern, native background Agent landing flow, and explicit rules for baseBranch, dirty worktree, merge-before-close, scope discipline, and test verification - Add MAX_CONCURRENT placeholder substituted at task creation time - Add max concurrent tasks numeric input to NewTaskDialog (default 3) - Update sub-task preamble to require tests/typecheck before signal_done - Add TODOS.md items R1-R4 (regressions) and johannesjo#12-johannesjo#19 (new findings) - Rule 6c now passes coordinator worktree path to landing Agent Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…erge_task->close_task flow
Only items 7 and 8 remain (known edge cases with no fix yet). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
README keyboard shortcuts were reformatted by a sub-agent — not our change to own. CONTRIBUTING.md was created by a sub-agent but this isn't our repo to add docs like that. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0be08ca to
ad6746d
Compare
Update — coordinator mode: major second pass, largely self-developedWe've done a substantial second pass on this feature. The headline: most of the work in this update was done by the coordinator itself — we used the tool to fix the tool. Video demo (recorded mid-session, before max-concurrent UI was added — shows the core flow live) Design document: The coordinator ran this PRTo validate the implementation, we gave the coordinator a single prompt:
From that one prompt, the coordinator:
The only human involvement was watching the coordinator work and confirming the final result — no manual coding. What changed since the initial revisionControl model renamed and extended
Max concurrent tasks UI Coordinator preamble rewritten from scratch
13 bugs fixed (all found during live coordinator sessions, most fixed by sub-agents):
Tests: 464 passing across 33 test files — this PR adds ~2,057 lines of new test code covering coordinator notification logic, control handoff, PTY idle detection, and git operations. What's still openTwo known edge cases, intentionally deferred:
Not yet tested: Docker integration — the coordinator Test plan
🤖 Generated with Claude Code |
|
I rechecked the current head (
|
1. Sub-task MCP tool filtering: server.ts now returns only signal_done
in ListTools when running as a sub-task (--task-id set, no
--coordinator-id). CallTool rejects any other tool with isError.
2. Restart path agentId: App.tsx passes task.agentIds[0] to
MCP_HydrateCoordinatedTask; register.ts uses it instead of
crypto.randomUUID(). hydrateTask() now sets up the output callback
and subscriber so wait_for_idle and idle detection work after restart.
3. wait_for_idle human takeover: setTaskControl('human') now resolves
pending idleResolvers immediately with { reason: 'human_control' }
instead of leaving them hanging. HTTP route and client type now
propagate reason through to the MCP caller.
4. createTask failure cleanup: expanded catch block calls cleanupTask()
to remove the worktree, kill any spawned agent, and clear all
in-memory state (tasks, tailBuffers, decoders, subscribers, MCP
config). Previously only the preamble file was restored.
5. signalDoneReceived/needsReview persistence: added both fields to
PersistedTask, included in both save blocks (active + collapsed),
and restored in both load blocks.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Resolved conflicts keeping our coordinator additions: - git.ts / git.test.ts: kept ours (merge_task resolution fix + tests) - DiffViewerDialog.tsx: kept ours (adds direct mode support to CommitNavBar Show) - TaskChangedFilesSection.tsx: kept ours (hasCommitNav checks both worktree and direct) - TerminalView.tsx: kept ours (computeDisableStdin import) - NewTaskDialog.tsx: re-applied coordinator mode additions on top of main's base Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Gap 1 (Test 2): signal_done appeared in coordinator tool list — coordinator
agents could see a tool they should never call. Extracted tool-list logic into
mcp-tool-list.ts (pure, testable) so selectTools() enforces the role boundary:
sub-tasks get only signal_done, coordinators get everything except signal_done.
Added 6 unit tests in mcp-tool-list.test.ts that would have caught this.
Gap 2 (Test 7): wait_for_signal_done returned { taskId, name, remaining } but
the test plan specifies { taskId, name, status, signalDoneAt, remaining }.
Added status and signalDoneAt (ISO timestamp) to WaitForSignalDoneResult and
both resolve call sites in coordinator.ts. Updated coordinator.test.ts to
match the new shape using toMatchObject.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ol-v2 Resolved conflicts keeping our coordinator fields: - autosave.ts: kept coordinatorNotificationDelayMs alongside shareDockerAgentAuth - persistence.ts: kept coordinator fields (delay, mode, hint count) in all three blocks - SettingsDialog.tsx: kept our full section layout; main's shareDockerAgentAuth checkbox was already present at line 623 in our version Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Items 9-11 track unit tests missing for: edit-suppresses-autofire, /tmp config cleanup on close, and coordinator checkbox re-enable. Each maps to a specific test plan section and names the target test file. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
I rechecked current head
These are mostly boundary/restore/scoping issues, but they affect the exact scenarios the beta gate is meant to make safe. |
Overview
This PR builds on the initial coordinating agent implementation by @cledoux95 (PR #31) with substantial fixes and new capabilities.
The core motivation: I wanted to use a coordinator agent to drive parallel workstreams, but still be able to monitor each sub-task, answer questions when an agent gets stuck, and take over a task when I want to make a decision myself — then hand it back. Think of it like a pilot/co-pilot handoff: explicit, visible, and safe.
Credit
The foundation of this PR is @cledoux95's work in #31, which introduced the coordinating agent concept, the MCP server/client, and the
create_task/send_prompt/wait_for_idle/merge_tasktools. This PR would not exist without that work.What's new / changed
Sub-task visibility (addresses the core UX gap)
In #31, sub-tasks created by the coordinator were not visible in the sidebar. The coordinator agent was essentially working in the dark. Now each sub-task spawned via
create_taskappears as its own sidebar panel, giving you full terminal visibility into what every agent is doing — exactly like a manually created task.Pilot/co-pilot control handoff
Each coordinated sub-task shows a banner indicating who is driving:
You can click "Take Control" at any time to pause the coordinator for that task, interact with the agent yourself, then click "Return to Orchestrator" when done.
Expanded
create_taskoptionsThe coordinator agent can now specify:
skipPermissions: true— passes--dangerously-skip-permissionsto the sub-agent so it runs fully autonomously without tool-approval interruptionsgitIsolation: "worktree" | "direct" | "none"— controls git isolation mode (defaults to"worktree"as before)Fixes to #31
Several bugs were found and fixed while integrating and testing:
findFreePort(7777, 7800)tries ports sequentially until one is free--mcp-configarg — the coordinator agent was spawned without the MCP config flag, so it had no MCP tools and fell back to bash orchestrationTaskAITerminalnow passes--mcp-config <path>whentask.mcpConfigPathis setspawnAgentdirectly AND the renderer'sTerminalViewalso called it, killing the backend spawn and losing the output subscriptionspawnAgentcall; orchestrator now usesonPtyEvent('spawn', ...)to subscribe to output each time the renderer spawns the agent — survives restarts toofetch failedon all MCP tool callsApp.tsxnow awaitsStartMCPServerfor each persisted coordinator task before the agents are spawned, refreshing the config file with the new port/tokendeleteTaskwrong call signature — the orchestrator calleddeleteTaskwith positional args but the function now takes an options objectdeleteTask({ agentIds, branchName, deleteBranch, projectRoot })400for invalidname,prompt,projectId,skipPermissions,gitIsolationwaitForIdlehangs when human takes control —setTaskControl('human')left pending waiters stuck until timeoutsetTaskControlnow unblocks pendingwaitForIdlecallers on any control changemcp_control_changedmissing from preload allowlistALLOWED_CHANNELSinpreload.cjsfetch failedgave no actionable infoMCPClientnow wraps network errors: "Cannot reach Parallel Code at http://127.0.0.1:PORT. Is the app running?"Tests
Added 25 new tests across two files:
electron/mcp/prompt-detect.test.ts—stripAnsiandchunkContainsAgentPrompt(the core ofwaitForIdleidle detection)electron/mcp/orchestrator.test.ts— control handoff logic:sendPromptblocked/unblocked,waitForIdlebehavior under human control, PTY exit propagationThe tests caught the
waitForIdlehang bug described above before it shipped.Test plan
fetch failed)create_taskwithskipPermissions: truespawns sub-agent without permission promptsnpm run check && npm test— all pass🤖 Generated with Claude Code