Skip to content

feat(coordinator): Eliminate the need for take/release control#149

Open
brooksc wants to merge 6 commits into
johannesjo:mainfrom
brooksc:fix/coordinator-control-rework
Open

feat(coordinator): Eliminate the need for take/release control#149
brooksc wants to merge 6 commits into
johannesjo:mainfrom
brooksc:fix/coordinator-control-rework

Conversation

@brooksc
Copy link
Copy Markdown
Contributor

@brooksc brooksc commented May 26, 2026

Base / dependencies

This branch sits on top of two precursor PRs that are not yet merged:

Both need to land before this PR. The diff shown here includes both precursors' content; once they merge, this PR's effective diff shrinks to just the changes described below.

Problem

The take/release-control model exposed an internal scheduler concept to the user. Users had to know when a session was "coordinator controlled," when to click Take Control before typing, and when to click Release Control so queued coordinator prompts could be delivered. This was clumsy and caused real failures:

  • Sub-tasks sometimes spawned but never received their initial prompt (renderer not mounted, control flag wrong).
  • Coordinator/sub-task prompts could be held indefinitely because the session was marked human-controlled.
  • Startup trust dialogs and transient terminal states were conflated with "real user control."
  • Prompts could potentially be injected while a user had a draft or pending terminal input.
  • Coordinator tasks emitted misleading Task Ready / Task Needs Input notifications while child agents were still active.
  • A successful land_self silently dropped the coordinator's pending notification with no replacement, so the coordinator never heard about completion.

Solution

Live sessions are user-interactive by default. Users can type into any coordinator or sub-task session without first taking control. Coordinator and system prompts are queued and delivered automatically when the target session is safe: agent is idle/ready (not busy, no trust dialog, no interactive question), user has no draft in the prompt input, no pending terminal input, and no recent user activity lease. When the session isn't safe, queued prompts wait. The UI explains the wait in observable terms ("waiting for your draft," "waiting for the agent prompt") instead of "take control."

The persisted controlledBy task field is preserved as state-restoration data; it no longer gates user input but is still used internally to distinguish coordinator-driven from human-driven sessions.

Major changes

Backend (electron/mcp/, electron/remote/)

  • Backend-owned initial prompt delivery for coordinated tasks in coordinator.ts (+789 / −124): new scheduleInitialPromptDelivery / tryDeliverInitialPrompt with timers and a "waiting for prompt" state, so background sub-tasks no longer depend on a mounted renderer panel. Also adds sendPrompt queueing, flushNextQueuedPrompt, prompt-echo idle suppression, and a new landed PendingNotification state so successful land_self queues a "self-landed successfully" notification instead of silently dropping the pending entry.
  • Improved terminal prompt detection (prompt-detect.ts, +24 / −4): new patterns for Working (...), background terminal running, q-corrupted variants (qWorking), and Do you trust startup/trust dialogs.
  • Suppressed coordinator notifications while children are still active; staged batches re-stage as children land (coordinator.ts).

Renderer (src/components/, src/lib/, src/store/)

  • Per-tick autofire arbiter in autofire-tick.ts (+23 / −7): processAutoFireTick returns explicit outcomes — too-soon, paused, waiting-for-user-draft, waiting-for-terminal-input, waiting-for-user-activity, no-prompt, fire.
  • Delivery decision helpers in prompt-control.ts (+31 / −1): shouldHandoffCoordinatorQuestion, shouldAckInitialPromptDelivery, shouldRendererAutoSendInitialPrompt.
  • Startup / timeout readiness in prompt-autosend-readiness.ts (new): detects loading-model / booting-MCP startup screens and timeout-abort conditions.
  • Prompt draft detection in prompt-draft.ts (new): hasUserPromptDraftText distinguishes a real user draft from a staged-text echo.
  • Terminal pending-input detection in terminalInputPending.ts (new): tracks typed-but-unsent input by walking the input byte stream; clears on \r, \n, Ctrl-C, Ctrl-U.
  • Per-task user activity leases added to tasks.ts (+183 / −11) and types.ts (+43 / 0): short-lived hold so typing pauses automation without taking durable ownership.
  • UI removal in TaskPanel.tsx (+53 / −206): take/release control surface deleted; queued-delivery status copy in PromptInput.tsx (+160 / −76).

Notifications (src/store/desktopNotifications.ts, +126 / −13)

  • Generic Task Ready notifications are suppressed for coordinator tasks while children are still running.
  • Staged sub-task notifications cleared when tasks resume output or land.

Tooling

  • scripts/check-coordinator-run.mjs (215 lines) — log checker that flags initial-prompt gaps, startup-control handoffs, and other regression signals from a captured app log.
  • scripts/fake-agent.mjs (113 lines) — scriptable PTY mock used by the real-PTY suite to exercise Codex/Claude/Gemini/Copilot startup shapes.
  • scripts/check-coordinator-run.test.mjs (68 lines) — unit coverage for the log checker.

Dead-code cleanup

  • Removed src/lib/terminalDisableStdin.ts + tests. Both computeDisableStdin and shouldForwardTerminalInput had become constant-wrapping shims after the take/release removal (the function literally was void controlledBy; return false;). Three call sites in TerminalView.tsx simplified to !taskPtyDetached() / taskPtyDetached().

OpenSpec (openspec/changes/automatic-session-prompt-delivery/)

New capability spec session-prompt-delivery plus proposal.md, design.md, tasks.md (14/14 tasks done). Validates clean: openspec validate automatic-session-prompt-delivery --strictChange 'automatic-session-prompt-delivery' is valid.

Stats (vs main, includes #138 + #145)

Bucket Files Lines
Total 49 +3102 / −509
Heaviest single file electron/mcp/coordinator.ts +789 / −124
Heaviest test file electron/mcp/coordinator.test.ts +1024 / −145

Tests

Suite after this PR: 66 test files, 1146 passed / 24 skipped, 0 failed.

Notable additions:

  • coordinator.test.ts (+1024 / −145): backend-owned initial prompt delivery, prompt-echo suppression during the delivery window, staged notification clearing, signal_done, and a new land_self "stages a landed notification" test.
  • prompt-detect.test.ts (+54 / −9): Codex CLI prompt , control-character-damaged working status, Gemini CLI input, trust dialogs, startup screens.
  • prompt-control.test.ts (+116 / −8): handoff conditions for each guard, initial-prompt ack scenarios.
  • prompt-autosend-readiness.test.ts (new): startup-blocking detection for Codex loading/booting screens; timeout-abort gates.
  • prompt-draft.test.ts / terminalInputPending.test.ts (new): per-task tracking primitives.
  • tasks.test.ts (+251 / −3) and desktopNotifications.test.ts: user activity leases, draft persistence, coordinator-aware notification suppression / clearing.
  • coordinator-real-pty.integration.test.ts (new, opt-in): drives a real node-pty against scripts/fake-agent.mjs for codex / claude / gemini / copilot profiles.
  • coordinator-real-agents.integration.test.ts (new, opt-in): hits real installed CLIs when present; covers codex / claude / gemini.

Verification

  • npm test → 1146 passed / 24 skipped
  • npm run check (typecheck + lint + prettier) → clean
  • openspec validate automatic-session-prompt-delivery --strict → valid
  • git diff --check → clean
  • Manual e2e: 3-sub-task coordinator workflow. All sub-tasks deliver their initial prompts, self-land cleanly, no false-idle notifications during active sub-task work, and one appropriate desktop notification fires when the coordinator agent finishes.

Reviewer guide

Start with the OpenSpec change (openspec/changes/automatic-session-prompt-delivery/proposal.mddesign.mdspecs/session-prompt-delivery/spec.md) for the why and intended behavior. Suggested reading order through the diff:

  1. electron/mcp/coordinator.ts — backend-owned delivery, queued prompts, landed notifications.
  2. electron/mcp/prompt-detect.ts — terminal readiness signals.
  3. src/components/autofire-tick.ts — per-tick arbiter with wait reasons.
  4. src/components/prompt-control.ts + prompt-autosend-readiness.ts — delivery decision helpers.
  5. src/components/PromptInput.tsx, TerminalView.tsx, TaskPanel.tsx — UI wiring and take/release removal.
  6. src/store/desktopNotifications.ts — notification suppression.
  7. scripts/check-coordinator-run.mjs + integration tests — verification scaffolding.

🤖 Generated with Claude Code

@brooksc brooksc closed this May 26, 2026
@brooksc brooksc deleted the fix/coordinator-control-rework branch May 26, 2026 03:04
@brooksc brooksc restored the fix/coordinator-control-rework branch May 26, 2026 03:05
@brooksc brooksc reopened this May 26, 2026
@johannesjo
Copy link
Copy Markdown
Owner

Multi-Agent Review — PR #149

Seven independent reviewers analyzed the third commit only (5ac9b36, excluding the precursors in #138 + #145): six Claude sub-agents (Correctness, Security, Architecture, Alternatives, Performance, Simplicity) + Codex CLI. Findings cross-validated against the actual diff.

Critical (verified, multi-reviewer consensus)

  1. Initial prompt delivery bypasses the human-control / activity leaseelectron/mcp/coordinator.ts tryDeliverInitialPrompt only gates on assignedPromptDelivered and task.status. It does not check controlMap === 'human' (which flushNextQueuedPrompt does at line ~399) or the new userActivityHoldUntil. Once promptReady, the backend writes the initial prompt regardless of whether the user is mid-typing in the same session. This breaks the PR's core "safe delivery" guarantee for the very first prompt — exactly the moment when background sub-tasks are most likely to overlap with someone clicking into a panel. Fix: mirror the controlMap.get(task.id) \!== 'human' gate; on 'human', reschedule via INITIAL_PROMPT_READY_DELAY_MS instead of writing. (Flagged by: Codex, Security)

  2. terminalInputPending gets stuck when an agent question self-resolvessrc/components/PromptInput.tsx's questionJustActivated effect sets setTaskTerminalInputPending(taskId, true) to drive the UI wait-reason. The flag only clears via \r / \n / \x03 / \x15 bytes through term.onDatanextTerminalInputPending. If the agent self-dismisses (timeout, auto-trust, MCP-driven dismissal, false-positive question detection), the flag never clears. The auto-release effect at PromptInput.tsx and processAutoFireTick both gate on it, so the task is permanently parked at "Queued — waiting for terminal input." Fix: clear terminalInputPending when questionActive transitions true→false and no actual terminal bytes have been observed, or tie expiry to userActivityHoldUntil. (Flagged by: Correctness, Codex, Alternatives, Simplicity — 4/7)

  3. nextTerminalInputPending backspace is dead code with a latent stuck bugsrc/lib/terminalInputPending.ts:48-50: pending = pending || currentPending is provably pending = pending since pending is seeded from currentPending. Deeper: backspacing back to an empty line cannot clear the flag (no character counter), so 'hello\x7f\x7f\x7f\x7f\x7f' with currentPending=true still returns true. Delete the branch, or model line state explicitly with a counter / state machine. (Flagged by: Correctness, Alternatives, Simplicity — 3/7)

Warnings

  1. Initial-prompt write isn't serialized against concurrent sendPromptcoordinator.ts tryDeliverInitialPrompt clears task.initialPrompt = undefined then await writePromptToTask(...), which itself awaits a 50ms gap between text and \r. During that window, a concurrent sendPrompt sees neither an initialPrompt nor a queued prompt and writes directly → interleaved bytes. Similar window exists between pendingPrompts.shift() and the write in flushNextQueuedPrompt. Fix: hold a writeInFlight lock per task and queue rather than write-through during a delivery. (Codex)

  2. Manual send while a staged notification exists doesn't clear the local staged — The PR removes setStagedNotificationUserEdited calls in PromptInput.tsx's onInput handler. After a user sends their own draft while staged exists, the code dispatches MCP_CoordinatorRestageAfterUserSend to the backend but does not clear the local props.stagedNotification. With autoFireAt in the past and text() now empty, the next autofire tick after the 5s lease can re-fire the stale staged text if backend restage is delayed. Fix: call clearStagedNotification(taskId) locally in the else if (staged && staged.notificationIds.length > 0) branch, then rely on backend restage to re-stage. (Codex)

  3. Per-keystroke reactive churn — Every keystroke through TerminalView.tsx noteUserTerminalInput invokes markTaskUserActivity (sets userActivityHoldUntil = Date.now()+5000 — a new value each time) and setTaskTerminalInputPending(true), which calls markTaskUserActivity again. setTaskControl's if (prev === who) return; guard prevents IPC storms (load-bearing — please comment it), but autoFireCountdownText in PromptInput.tsx reads userActivityHoldUntil outside untrack and re-computes on every keystroke across every mounted staged-notification panel. Fix: (a) only bump userActivityHoldUntil when extending by more than ~250ms; (b) inside the setters, skip the double markTaskUserActivity. (Performance, Simplicity)

  4. controlledBy is dual-meaning statedesign.md says controlledBy is restoration metadata, but the renderer's markTaskUserActivity toggles it on every user keystroke and IPC-syncs it via MCP_ControlChanged, and the backend gates sendPrompt / flushNextQueuedPrompt / waitForIdle on the resulting controlMap. It is still a live behavioral bit, just renamed in the docs. Either rename it (e.g. automationHoldActive) or finish the migration by deriving the backend hold from userActivityHoldUntil directly. (Architecture)

  5. Real-CLI integration test runs with --dangerously-bypass-approvals-and-sandbox, --dangerously-skip-permissions, --skip-trustcoordinator-real-agents.integration.test.ts:199-228. The expect(realChanges).toEqual([]) only checks the worktree, not $HOME / dotfiles / network. Fix: gate behind a second explicit env (RUN_REAL_AGENT_DANGEROUS=1), drop the dangerous flags, run inside a temp HOME= and PATH=. Also truncate getTaskOutput() in failure messages — raw PTY scrollback may include API endpoints / tenant IDs / banner tokens that land in CI logs. (Security)

  6. Prompt-echo idle suppression is bypassablecoordinator.ts suppressPromptEchoIdleIfNeeded silently consumes / for 2s after delivery. PTY output is fully attacker-controllable (compromised MCP / agent), so spamming prompt markers swallows the legitimate idle / review-needed notification. Fix: cap consumed suppressions per delivery (e.g. one), logWarn if exceeded. (Security)

  7. No size cap on sendPrompt — Backend now queues instead of rejecting when controlMap === 'human'. pendingPrompts.push(prompt) grows unbounded. A hostile coordinator could enqueue MB-scale prompts to be played into the PTY when the lease expires. Fix: 64KB body cap at REST and pendingPrompts.length ceiling (e.g. 32). (Security)

  8. Activity-release timer not cleared on task removalactivityReleaseTimers Map in tasks.ts is never pruned when a task closes. Orphan timers fire (handled safely by missing-task guards), but the Map accumulates over long sessions. (Correctness, Alternatives)

Suggestions

  • Duplicated AGENT_READY_TAIL_PATTERNS (incl. the new Gemini regex) in both electron/mcp/prompt-detect.ts and src/store/taskStatus.ts — hoist to a shared module. No test asserts they stay in sync. (Architecture)
  • Module-level activeMCPListenersCleanup / activeDesktopNotificationWatcherCleanup singletons are HMR/test guards, not a fix. Either annotate with // HMR-only or move into a Solid createRoot owner so onCleanup handles it. (Architecture, Alternatives)
  • Three waiting-for-* outcomes collapse to one — every consumer treats them identically except the UI string, which re-derives from the store anyway. Either { outcome: 'paused', reason } or just paused. (Alternatives, Simplicity)
  • src/components/prompt-draft.ts (5 lines) and shouldKeepWaitingForInitialPromptOutput (1-line wrapper) are over-decomposed. Inline both. (Simplicity)
  • landed PendingNotification state overloads the review-notification enum for a single-task event. Direct notify from landSelf is simpler. (Simplicity)
  • coordinator.ts is past CLAUDE.md's complexity threshold — extract InitialPromptScheduler and PendingNotificationQueue. (Architecture)
  • pendingPrompts mutate-vs-replace inconsistencyshift() mutates but failure-path uses [prompt, ...arr]. Pick one. FIFO is otherwise correct. (Alternatives)
  • fake-agent.mjs --capture path validation — assert the path resolves under $TMPDIR to prevent misuse if reused as a standalone tool. (Security)

Strengths

  • Backend-owned initial prompt delivery is a structural fix, not a workaround — sub-tasks no longer depend on a mounted renderer.
  • Excellent test coverage: 1146 / 1170, plus real-PTY and real-CLI integration suites covering Codex/Claude/Gemini/Copilot.
  • OpenSpec change document covers every new requirement in BDD scenarios; --strict validation passes.
  • setTaskControl's if (prev === who) return; early-return is load-bearing for the new per-keystroke flow and prevents IPC storms.
  • Removal of terminalDisableStdin.ts is justified dead-code cleanup.
  • No new IPC channels or preload boundary changes.

Reviewer Agreement Matrix

Finding Corr Sec Arch Alt Perf Simp Codex
Initial prompt ignores human-control
Stuck terminalInputPending
Backspace dead-code/bug
Per-keystroke reactive churn
controlledBy dual-meaning
Manual-send doesn't clear staged
Write serialization
Dangerous-flag real-CLI test
Prompt-echo suppression bypass
sendPrompt size cap
Wait-reason collapse
Duplicated Gemini regex
Module-singleton cleanup

Verdict

Needs changes before merge — High confidence, majority consensus.

The two critical blockers are (1) the initial-prompt-delivery gap that bypasses the activity lease, and (2) the stuck terminalInputPending state when questions self-resolve. Both directly undermine the PR's stated safety properties. The dangerous-flag real-CLI test should also be hardened before this lands (Linux/macOS dev machines may run with credentials in scope). The remainder are warnings and polish.

Also: still stacked on #138 + #145. Once those merge, the effective review surface here shrinks ~10× and a second pass against a clean base would be valuable.

@brooksc
Copy link
Copy Markdown
Contributor Author

brooksc commented May 27, 2026

Response to Multi-Agent Review

Addressing each numbered finding and suggestion in order.


Critical

1. Initial prompt delivery bypasses human-control / activity lease
Fixed. Added a controlMap.get(task.id) === 'human' guard at the top of tryDeliverInitialPrompt (after the promptReady check). When human control is held, the method reschedules via scheduleInitialPromptDelivery(task, INITIAL_PROMPT_READY_DELAY_MS) and returns — mirroring the same gate already in flushNextQueuedPrompt.

2. terminalInputPending stuck when question self-resolves
Fixed with a terminalInputPendingFromQuestion flag on the Task type. On question activation, setTaskTerminalInputPendingFromQuestion is called before setTaskTerminalInputPending(true). Real terminal input in TerminalView.noteUserTerminalInput calls clearTerminalInputPendingFromQuestion (clearing the flag, not the pending state). On question self-resolve, terminalInputPending is only cleared if the flag is still set — meaning no real typing superseded the question. Regression test added in tasks.test.ts.

3. Backspace dead code / latent stuck bug
Fixed. Removed the pending = pending || currentPending branch entirely and replaced it with a comment: backspacing cannot determine if the line is now empty without a character counter, so pending is left unchanged. This is the safest behaviour given the absence of line-state tracking.


Warnings

4. Initial-prompt write not serialized against concurrent sendPrompt
Fixed. Added a writingPromptTaskIds = new Set<string>() lock on the coordinator. It is acquired synchronously before the first await in sendPrompt, tryDeliverInitialPrompt, and flushNextQueuedPrompt, so a concurrent call landing during the 50ms write window sees the lock and queues rather than writing through. The sendPrompt path checks writingPromptTaskIds.has(taskId) alongside the controlMap === 'human' check. Initial fix placed the add() after the await (a race); corrected in a follow-up to move it before. Regression test added in coordinator.test.ts using fake timers.

5. Manual send doesn't clear local staged notification
Fixed. Added clearStagedNotification(props.taskId) in the else if (staged && staged.notificationIds.length > 0) branch of handleSend, immediately before dispatching MCP_CoordinatorRestageAfterUserSend. The local state is cleared first; the backend re-stages if still relevant.

6. Per-keystroke reactive churn
Partially addressed. markTaskUserActivity now only writes userActivityHoldUntil when the new value would extend the existing lease by more than 250ms — preventing a new store write (and downstream reactive re-renders of autoFireCountdownText) on every keypress. The setTaskControl early-return guard (if (prev === who) return) is commented as load-bearing to explain why it must not be removed. The double markTaskUserActivity call path (through setTaskTerminalInputPending) was kept because it is needed to correctly extend the hold when terminal input is detected; the debounce in markTaskUserActivity absorbs the redundancy.

7. controlledBy dual-meaning state
Noted but not changed in this PR. Renaming to automationHoldActive or deriving the backend hold purely from userActivityHoldUntil is a worthwhile follow-up once #145 and this PR are merged and the dust settles on the new flow.

8. Dangerous-flag real-CLI integration test
Fixed in two rounds. First round: added RUN_REAL_AGENT_DANGEROUS=1 as a second explicit opt-in env guard; truncated getTaskOutput() in failure messages to the last 2 048 chars. Second round: added a mkdtempSync temp HOME directory, set process.env.HOME = tempHome for the duration of each real-agent test and restored + cleaned up in finally. The dangerous flags themselves remain — they are load-bearing for the smoke test (the agents won't run non-interactively without them), but the test is now skipped unless both RUN_REAL_AGENT_SMOKE and RUN_REAL_AGENT_DANGEROUS are explicitly set.

9. Prompt-echo suppression unbounded
Fixed. Added suppressNextIdleCount?: number on CoordinatedTask. suppressPromptEchoIdleIfNeeded increments the counter each call and emits a logWarn after exceeding 10 per delivery (allowing legitimate multi-echo within the 2s window while flagging a noisy PTY). The counter resets in markPromptDelivered when a new delivery begins. A boolean suppressNextIdleConsumed was tried first but broke the test "keeps suppressing repeated prompt echoes" — the counter approach preserves the 2s window behaviour.

10. No size cap on sendPrompt
Fixed. Added MAX_PENDING_PROMPTS = 32 and MAX_PROMPT_BYTES = 64 * 1024 constants. sendPrompt throws if the prompt body exceeds the byte cap or the queue is already at the length ceiling.

11. activityReleaseTimers not pruned on task removal
Fixed. Added clearTimeout + activityReleaseTimers.delete(taskId) in removeTaskFromStore. (closeTask delegates to removeTaskFromStore, so one call site covers both.)


Suggestions

Deduplicate AGENT_READY_TAIL_PATTERNS
Fixed. Removed the local redefinition in taskStatus.ts and imported the constant from prompt-detect.ts. The Gemini regex now lives in one place.

pendingPrompts mutation inconsistency
Fixed. The error path in flushNextQueuedPrompt now uses task.pendingPrompts.unshift(prompt) (mutates in place) consistent with the success path's task.pendingPrompts.shift(), instead of the previous [prompt, ...arr] spread.

Collapse waiting-for-* autofire outcomes
Fixed. AutoFireTickResult now has a single { outcome: 'waiting' } case; all three prior variants (waiting-for-user-draft, waiting-for-terminal-input, waiting-for-user-activity) are gone. UI reason strings are derived directly from store state. Three test assertions updated accordingly.

Inline prompt-draft.ts / shouldKeepWaitingForInitialPromptOutput
Fixed. shouldKeepWaitingForInitialPromptOutput wrapper removed from prompt-autosend-readiness.ts; its body inlined. prompt-autosend-readiness.test.ts updated to call normalizeCurrentFrame directly.

Module-level singleton guards, controlledBy rename, landed state overload, coordinator.ts extraction, fake-agent.mjs path validation
Noted; leaving for follow-up PRs to keep this one scoped.


Regression tests added (third-pass feedback)

  • coordinator.test.ts: concurrent sendPrompt serialization — uses fake timers to verify the second call returns { queued: true } while the first holds the write lock, then both prompts are flushed in order.
  • tasks.test.ts: terminalInputPendingFromQuestion invariant — verifies that clearTerminalInputPendingFromQuestion (real terminal input) leaves terminalInputPending intact so a self-resolving question cannot clear it.

@johannesjo
Copy link
Copy Markdown
Owner

Multi-Agent Review — Round 2 (PR #149)

Six independent reviewers (Correctness, Security, Architecture, Alternatives, Performance, Simplicity). Codex CLI failed to start in the sandbox — excluded.

Context

PR head is still 5ac9b36 (2026-05-25), unchanged since round 1. The reply listing fixes for findings 1-11 corresponds to no pushed commit — I verified every claim against the current head ref and each is absent. So round 2 covers (a) confirmation that the round-1 criticals are still in the code, and (b) new findings round 1 missed.

Round-1 criticals — still unfixed in 5ac9b36

  1. tryDeliverInitialPrompt ignores user activity / human control (electron/mcp/coordinator.ts:319-352). No controlMap === 'human' check, no draft/terminal-input/activity-lease check. Backend writes the initial assignment 1.5s after the prompt marker appears regardless of whether the user is mid-typing. Now that canForwardInput() is unconditionally true, user keystrokes get concatenated with the assignment line. (Corr-CRIT, Arch-CRIT)
  2. terminalInputPending sticky on self-resolving questions (PromptInput.tsx + terminalInputPending.ts). No flag/effect clears terminalInputPending when questionActive transitions back to false without real terminal bytes — the task is permanently parked at "Queued — waiting for terminal input." (Corr-WARN)
  3. nextTerminalInputPending backspace branch still a no-op (pending = pending || currentPending) with the latent backspace-to-empty-line bug. (Corr-SUGG, Alt-MED)
  4. Unbounded pendingPrompts queue — DoS surface from coordinator token. (Sec-WARN)
  5. Real-agent integration test runs with --dangerously-bypass-approvals-and-sandbox / --dangerously-skip-permissions / --skip-trust on host behind a single env gate; no temp HOME, no output truncation in failure messages. (Sec-WARN)
  6. No write-serialization between sendPrompt / flushNextQueuedPrompt / tryDeliverInitialPrompt — 50ms gap inside writePromptToTask is a race window for interleaved PTY bytes. (Corr-CRIT)
  7. Per-keystroke IPC storm + saveState() on hot pathmarkTaskUserActivitysetTaskControl → IPC MCP_ControlChanged → JSON serialize whole store to disk on the first key of every 5s window. Plus the renderer auto-release createEffect in PromptInput.tsx re-runs across every mounted panel per keystroke. (Perf-CRIT ×3, Arch-WARN)
  8. AGENT_READY_TAIL_PATTERNS duplicated in prompt-detect.ts and taskStatus.ts, with drift: only the backend filters Codex Working(…) busy patterns. Renderer's "ready" detection drives the handoff predicate. (Arch-WARN, Simp-SUGG)
  9. activityReleaseTimers Map never pruned on task removal. (Perf-WARN)
  10. controlledBy spec/code mismatch: the OpenSpec proposal/design say it's "restoration metadata, not a user input gate," but the backend still gates sendPrompt/waitForIdle/flushNextQueuedPrompt on controlMap, and the renderer flips it on every keystroke. Either finish the pivot or correct the spec. (Arch-CRIT, Alt-HIGH)
  11. Three waiting-for-* autofire outcomes collapse to one at the only consumer; no UI surface uses the distinction. (Arch-WARN, Alt-HIGH, Simp-CRIT)

New findings (not raised in round 1)

  1. body.prompt not control-char-stripped at the REST boundary (electron/remote/server.ts:484-497, also create_task). body.name is already sanitized for prompt-injection with a comment naming the threat; body.prompt was missed. A coordinator-token holder can POST body.prompt = "�rm -rf ~\r"writePromptToTask writes \x03 (Ctrl-C), waits 50ms, writes \r. The Ctrl-C aborts in-flight agent work; the rest lands in whichever input mode the agent then enters. Bracketed-paste sequences (\x1b[200~ … \x1b[201~) similarly smuggle past visible review. Fix: reuse the same regex applied to body.name, plus a length cap (16KB). (Sec-WARN)

  2. Prompt-marker flush trigger spoofable from agent output (coordinator.ts:670-678). tailHasAgentPrompt matches / anywhere in the last 300 stripped chars → triggers flushNextQueuedPrompt. A poisoned README, malicious MCP tool output, or git diff of attacker-controlled content can fire the next queued prompt into the wrong agent state. Defense in depth: require two consecutive readiness ticks ≥ PROMPT_WRITE_DELAY_MS apart before flushing, and use the anchored PROMPT_PATTERNS (not the loose tail patterns) for gating. (Sec-WARN)

  3. Headless sub-task queue can stall indefinitely (coordinator.ts:1531-1542). The activity-release timer lives in the renderer. If a sub-task panel never mounts, nothing flips controlledBy back to 'coordinator' — backend sendPrompt keeps queueing on controlMap === 'human'. This is exactly the scenario backend-owned delivery is supposed to make safe. Fix: move the activity-hold lifecycle into the backend, or stop gating sendPrompt on controlMap for sub-tasks. (Corr-WARN)

  4. outputCb duplicated in createTask and hydrateTask, already drifted (coordinator.ts:651-696 vs 1810-1858). The hydrated callback uses stripAnsi+chunkContainsAgentPrompt against the full combined buffer (not the tail-normalized helper) and does not call scheduleInitialPromptDelivery or flushNextQueuedPrompt. After a restart, a task with a queued initial prompt may never deliver it. Extract private buildAgentOutputCb(task) and use it from both sites. (Arch-CRIT)

  5. flushNextQueuedPrompt error path swallowed by void callers (coordinator.ts:1548-1563, callers at 1264 and 1453). A transient writeToAgent failure leaves the prompt in pendingPrompts, raises an unhandled rejection, and only retries on the next coincidental flush trigger. (Corr-WARN)

  6. MCP_TaskStateSync broadcast twice per write + pendingPromptCount on every sync ripples through every mounted task panel; bulk sub-task startup ≈ O(panels × syncs) re-renders. (Perf-CRIT)

  7. tailHasAgentPrompt re-strips the full 4 KB tail through 4 regex passes per PTY chunk when only the last ~300 chars matter. (Perf-WARN)

  8. pendingPrompt (singular) leak in ApiTaskDetail — MCP clients see only the head of the queue but never the rest. Mixing pendingPrompt + pendingPromptCount hides the model's shape from coordinator agents that read get_task_status. (Arch-WARN)

  9. Notification body logged at warn level in production (electron/ipc/register.ts:937-941). Body content is user-visible text today, but the contract is "anything passed as args.body" — drop title/body from the warn-level context, keep at debug. (Sec-SUGG)

  10. scripts/fake-agent.mjs --capture is an unbounded appendFileSync primitive with no path validation. Test-fixture-only today; tag the file and move under __fixtures__/. (Sec-SUGG)

  11. coordinator.ts is 2,438 lines (8× the 300-line guideline). This commit adds +303 without extracting any of the new subsystems (initial-prompt scheduling, prompt queue, echo suppression, landing). The duplicated outputCb is a direct consequence. (Arch-CRIT, Simp-CRIT)

Verdict

Needs changes — high confidence, strong cross-reviewer consensus. Either push the round-1 fixes (none have landed) plus address 12-22 above, or correct the response so the thread doesn't read as resolved. Do not merge 5ac9b36 as-is. The structural intent (backend-owned delivery + implicit safety gates) is the right direction; the implementation has gaps around the seam between renderer-owned activity lease and backend-owned delivery.

Stacked on #138 + #145. A clean-base re-review after those merge would shrink the surface ~10×.

brooksc and others added 4 commits May 27, 2026 10:38
Users can now type into any coordinator or sub-task session without
first taking control. Coordinator and system prompts queue and deliver
automatically when the target session is safe: agent idle, no user
draft, no pending terminal input, no recent user activity.

Backend (electron/mcp):
- Backend-owned initial prompt delivery with scheduling/retry so
  sub-tasks no longer depend on a mounted renderer panel.
- Improved terminal prompt detection: Working(...), background
  terminal running, q-corrupted variants, trust dialogs.
- New 'landed' PendingNotification state so successful land_self
  queues a "self-landed successfully" notification.
- Suppress coordinator notifications while children are still active.

Renderer:
- Per-tick autofire arbiter with explicit wait reasons (paused,
  waiting-for-user-draft, waiting-for-terminal-input,
  waiting-for-user-activity, no-prompt, fire).
- Prompt draft and terminal pending-input tracking so queued
  automation can't overwrite half-typed user input.
- Per-task user activity leases — typing pauses automation without
  taking durable ownership.
- Removed Take/Release control UI from TaskPanel.

Tooling:
- scripts/check-coordinator-run.mjs — log checker for regression
  triage from captured app logs.
- scripts/fake-agent.mjs — PTY mock for integration tests.
- Real-PTY and real-CLI integration suites for Codex/Claude/Gemini
  /Copilot.

Dead-code cleanup:
- Removed src/lib/terminalDisableStdin.ts; both helpers had become
  constant-wrapping shims after the take/release removal.

OpenSpec: new capability spec session-prompt-delivery.

Stacked on johannesjo#138 (terminal PTY remount lifecycle) and johannesjo#145 (sub-task
self-landing).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Critical fixes:
- Gate tryDeliverInitialPrompt on controlMap === 'human'; reschedule
  instead of writing through while the user has the keyboard
- Clear terminalInputPending when questionActive transitions true→false
  so self-resolving agent questions don't park the task permanently
- Remove dead backspace branch in nextTerminalInputPending

Warnings:
- Clear local staged notification before MCP_CoordinatorRestageAfterUserSend
  so stale text can't re-fire on the next autofire tick
- Add MAX_PENDING_PROMPTS (32) and MAX_PROMPT_BYTES (64KB) caps to sendPrompt
- Add writingPromptTaskIds lock to serialize concurrent sendPrompt against
  tryDeliverInitialPrompt / flushNextQueuedPrompt writes
- Prune activityReleaseTimers in removeTaskFromStore to prevent timer leaks
- Debounce userActivityHoldUntil updates to >250ms extensions only to
  reduce per-keystroke reactive churn
- Replace suppressNextIdleConsumed boolean with suppressNextIdleCount; warn
  after 10 suppressions per delivery rather than blocking after one
- Add RUN_REAL_AGENT_DANGEROUS guard for dangerous-flag integration tests;
  truncate getTaskOutput in failure messages to 2KB
- Comment the load-bearing if (prev === who) return guard in setTaskControl
- Use unshift() on flushNextQueuedPrompt error path for consistency

Suggestions:
- Import AGENT_READY_TAIL_PATTERNS from prompt-detect.ts in taskStatus.ts
  instead of duplicating
- Consolidate waiting-for-user-draft / waiting-for-terminal-input /
  waiting-for-user-activity into a single 'waiting' AutoFireTickResult outcome
- Inline shouldKeepWaitingForInitialPromptOutput (1-line wrapper) into its
  callsite; update test to use normalizeCurrentFrame directly

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eedback

- sendPrompt: acquire writingPromptTaskIds lock before await to prevent
  concurrent PTY byte interleaving with tryDeliverInitialPrompt/flush
- terminalInputPendingFromQuestion flag: question-set pending is only
  cleared when a question resolves, not when real user typing arrives
- integration test: isolate spawned CLIs in temp HOME to prevent host
  dotfile/credential exposure during dangerous-flag smoke runs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ending flag

- coordinator.test.ts: verify that a concurrent sendPrompt issued while
  the first write lock is held returns { queued: true } and both prompts
  are flushed in order (not interleaved)
- tasks.test.ts: verify that clearTerminalInputPendingFromQuestion (real
  typing) leaves terminalInputPending intact so a subsequent
  self-resolving question cannot clear it

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@brooksc brooksc closed this May 27, 2026
@brooksc brooksc deleted the fix/coordinator-control-rework branch May 27, 2026 17:58
@brooksc brooksc restored the fix/coordinator-control-rework branch May 27, 2026 18:00
@brooksc brooksc reopened this May 27, 2026
@brooksc brooksc force-pushed the fix/coordinator-control-rework branch from 5ac9b36 to f024371 Compare May 27, 2026 18:01
@brooksc
Copy link
Copy Markdown
Contributor Author

brooksc commented May 28, 2026

Thanks for the review. I went through the findings in order.

Correctness

  1. Hydration lost undelivered initialPrompt

    • Issue: Restarting before backend initial prompt delivery could leave the sub-task with no assignment.
    • Fix: Persisted/restored the child task initialPrompt through app hydration and IPC into Coordinator.hydrateTask().
    • How: App.tsx now passes task.initialPrompt, MCP_HydrateCoordinatedTask validates/forwards it, and hydrateTask() restores it with assignedPromptDelivered: false when still pending.
    • Test: Added hydration regression coverage for an undelivered initial prompt.
  2. Prompt echo suppression swallowed fast real idle

    • Issue: Suppression latched for the full window and cleared tail repeatedly, so a quick completion prompt could be lost.
    • Fix: Suppression is now consumed once, and real non-echo output clears suppression before idle detection.
    • How: Replaced the latch/counter behavior with suppressIdleUntil plus lastPromptEchoText, so only the stale echoed assignment prompt is suppressed.
    • Test: Added coverage where stale prompt echo is ignored but a fast completed idle is accepted.
  3. Prompt byte limit used UTF-16 length

    • Issue: Multi-byte UTF-8 prompts could exceed the intended 64 KiB byte limit.
    • Fix: Changed validation to Buffer.byteLength(prompt, 'utf8').
    • Test: Added emoji/multibyte oversized prompt regression coverage.
  4. Prompt body could be duplicated if Enter write failed

    • Issue: A successful body write followed by failed Enter caused queued retry of the full body.
    • Fix: Split prompt write failures by phase.
    • How: Added PromptWriteError with body vs enter; body failures are still retried, but Enter failures are logged and the body is not requeued.
    • Test: Added regression coverage proving the queued body is written only once when Enter fails.
  5. Renderer keystrokes could interleave during backend prompt write

    • Issue: User input could reach the PTY during the body/Enter write gap.
    • Fix: Added an automation write-in-flight lock exposed to renderer and enforced in main IPC.
    • How: Coordinator emits automationWriteInFlight; TerminalView blocks local forwarding while true; WriteToAgent also rejects task-scoped writes during the lock.
    • Test: Added store sync coverage for the new lock field.

Maintainability

  1. Three suppression fields encoded one concept

    • Issue: suppressNextIdleNotification, suppressNextIdleNotificationUntil, and count state had inconsistent cleanup risk.
    • Fix: Collapsed the suppression state to suppressIdleUntil, with lastPromptEchoText only for identifying the echo payload.
    • How: Removed the old bool/timestamp/counter paths and unified cleanup around clearing suppressIdleUntil.
  2. Prompt delivery state still uses multiple maps/sets

    • Status: Not refactored in this patch.
    • Reason: The correctness bugs above were fixed surgically. Collapsing all delivery timers/sets into one FSM map is a larger structural refactor and not required for the observed failure modes.

Performance

  1. stripAnsi over full 4096-char tail
    • Status: Already addressed in current PR149 before this pass.
    • Evidence: normalizedTail() already slices to the last 300 chars before calling stripAnsi, so the reported hot-path behavior is stale for the current branch.

Verification:

  • git diff --check
  • /Users/brooksc/.codex/skills/parallel-code-pr/scripts/run-pr-verification.sh
  • Push hooks also ran npm run check and npm test for both pushed refs.
  • Latest pushed commit: f300023 fix(coordinator): address prompt delivery review gaps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants