Fix managed terminal launch race (bare-zsh orphans) (#408)#409
Conversation
Managed Claude/Codex terminals created by /crow-workspace intermittently came up as a bare zsh with no agent. Two race modes, both because setup.sh launched the agent out-of-band: 1. Blind-paste race: setup.sh did `new-terminal --managed` + `sleep 3` + `crow send`, pasting the launch command before the shell's line editor (zle) was live, so the keystrokes were dropped. Even `--command` was racy because registerTerminal pasted immediately without waiting for readiness. 2. No-window swallowed failure: under load `new-window` exceeded its 2s timeout, threw, and the new-terminal RPC only logged it — registering a window-less terminal while reporting success. Fix — route brand-new managed terminals through the existing race-free deferred-launch path (the one already used for restored terminals): - new-terminal RPC holds the launch command in AppState.pendingLaunchCommands and registers the window with command:nil; SessionService pastes it on the `.shellReady` sentinel (zle is live then), consuming both the pending command and autoLaunch membership so launchAgent can't also fire. The deferred paste replicates the `send` RPC's hook-config + OTEL env prep. - Bounded retry of registerTerminal with a longer per-call new-window timeout; on exhaustion mark the terminal .failed and return launch_failed instead of reporting success. - setup.sh launches via `new-terminal --command` (no sleep/send), keeps its own --rc/--name (the app can't inject for a `cd && claude` form), parses launch_failed, and polls the new `readiness` field on `list-terminals` to warn (non-fatally) if the agent never started. - Conservative reaper at launch: kill only cockpit windows that no terminal references AND that sit at a bare login shell (never a window running an agent, or one a terminal still references). Tests: pure resolveLaunch/registerWithRetry/shouldReapWindow helpers, a SessionService readiness-branch test, and a concurrent N-managed-terminal real-tmux regression asserting every window runs the launched process with zero bare shells. Serialized the TmuxBackend integration suite so its timing-sensitive tests don't contend. All package + app suites green. 🐦⬛ Generated with Claude Code, orchestrated by Crow Co-Authored-By: Claude <noreply@anthropic.com> Crow-Session: 1B4F94D3-4045-4DB0-9442-BA35B6050D6A
dgershman
left a comment
There was a problem hiding this comment.
Code & Security Review
The deferred-launch design is the right shape: routing brand-new managed terminals through the existing .shellReady sentinel that already serves restored terminals is a clean reuse of a proven path, and pendingLaunchCommands being in-memory-only correctly avoids re-pasting a stale command on relaunch. The added readiness field on list-terminals plus the launch_failed error code give setup.sh a real success/failure signal instead of the prior fire-and-forget. Test coverage is solid: pure unit tests over resolveLaunch/registerWithRetry/shouldReapWindow, an instance test over the .shellReady branch, and a concurrent N-managed-terminal real-tmux regression. Below are the issues that should land in the same round trip.
Code Quality
🟡 Yellow — hydrate-fresh path still races on the same launch command (Sources/Crow/App/SessionService.swift:309). Brand-new managed terminals now persist terminal.command = launch_cmd (the cd … && claude … form), where pre-PR they persisted nil because setup.sh deferred the launch out-of-band. The hydrate fallback at rebuildHydratedSessionWithTmux (when tmuxBinding is nil or adoptTerminal throws — tmux socket changed, window was killed, post-reboot) calls TmuxBackend.shared.registerTerminal(..., command: terminal.command, ...), and registerTerminal synchronously blasts that command via sendText(id, text: command + \"\\n\") before any sentinel fires (Packages/CrowTerminal/Sources/CrowTerminal/TmuxBackend.swift:234). That's exactly the dropped-keystrokes race this PR set out to remove — just shifted to the recovery path. Two reasonable shapes: (a) on the hydrate-fresh path, pass command: nil to registerTerminal and rely on the existing autoLaunchTerminals + launchAgent rebuild that hydrate already wires up for restored terminals, or (b) re-stage the same pendingLaunchCommands + deferred-paste machinery on this fallback. Either fix keeps the regression from sneaking back in via the recovery path.
🟡 Yellow — registerWithRetry blocks the main actor for up to 12 s (Sources/Crow/App/AppDelegate.swift:1349, Packages/CrowTerminal/Sources/CrowTerminal/TmuxController.swift:26). await MainActor.run { … } wraps the whole RPC body, and registerWithRetry(attempts: 3) { … newWindowTimeout: 4.0 } is invoked synchronously inside it. TmuxController.run blocks the calling thread until the subprocess returns or kill() fires at timeout, so a worst-case three timed-out attempts pins the main actor for ~12 s — long enough to beachball the UI and stall every other MainActor.run-dispatched RPC (e.g. concurrent setup.sh invocations from /crow-batch-workspace). The previous behavior was 2 s and broken; the new behavior is reliable but visibly stalls under the same load this PR is trying to harden. Hoist the registration off the main actor (Task.detached returning the TmuxBinding, then re-enter MainActor for the state mutations), or at minimum bound the total budget (e.g. attempts: 2 with backoff and a shorter per-call timeout) so the worst case stays inside the previous user-perceived window.
🟡 Yellow — prepareManagedLaunchText duplicates the send RPC handler (Sources/Crow/App/SessionService.swift:415 vs Sources/Crow/App/AppDelegate.swift:1478). The OTEL env-var list, the agent-token guard, and the hook-config write are copy-pasted; the PR even adds a // NOTE: keep the two in sync comment, which is a maintenance trap rather than a design. Extract a single helper (e.g. on the agent or a shared static on AppDelegate) that returns (textWithExports, didLaunch) and call it from both sites. Bonus: the send RPC sets terminalReadiness = .agentLaunched only when the launch token is present, whereas pasteDeferredLaunch sets it unconditionally — a shared helper makes that asymmetry visible and easy to align.
🟡 Yellow — DeferredLaunchTests mutates singleton TmuxBackend.shared.onReadinessChanged without .serialized (Tests/CrowTests/DeferredLaunchTests.swift:1). Each test calls service.wireTerminalReadiness(), which overwrites the shared closure, and then fires TmuxBackend.shared.onReadinessChanged?(terminalID, …). If Swift Testing runs the two @MainActor tests in parallel (the default), one test can stomp the other's closure before its sentinel fires — same flake mode you just hit on the TmuxBackend integration suite and serialized (Packages/CrowTerminal/Tests/CrowTerminalTests/TmuxBackendTests.swift:14). Add .serialized to the @Suite so this stays green under concurrent test execution.
Security Review
Strengths
pendingLaunchCommandsis documented and implemented as in-memory-only (Packages/CrowCore/Sources/CrowCore/AppState.swift:226), so a launch command can't be replayed across app restarts.close-terminalclearspendingLaunchCommandsalongsideterminalReadinessandautoLaunchTerminals(Sources/Crow/App/AppDelegate.swift:1428) — no orphan command can survive a terminal removal.- The reaper is conservative:
shouldReapWindowonly fires when the window is both unreferenced and sitting at a bare login shell, andreapUnboundCockpitWindowsdefensively unionsbindings.valuesso a window adopted/registered this run is structurally unreapable (Packages/CrowTerminal/Sources/CrowTerminal/TmuxBackend.swift:386). The 8 s delay inAppDelegateis belt-and-suspenders given hydrate runs synchronously on the main actor.
Concerns
- None new — no auth, secret-handling, or injection surface is touched.
terminal_readinessinsetup.shparsescrow-owned JSON it just produced, so the regex-on-JSON shape is fragile but not a security boundary.
Summary Table
| Color | Meaning | Verdict effect |
|---|---|---|
| Red | Must fix | Request changes |
| Yellow | Should fix | Request changes |
| Green | Consider | Approve allowed |
Recommendation: Request Changes — driven by [0 Red, 4 Yellow, several Green] findings. The deferred-launch shape is correct; the four Yellows are the kind that compound (a regression on a recovery path + a main-actor stall + duplicated launch prep + a flaky test) and worth catching in this round rather than as #408 follow-ups.
|
Live finding while setting up this PR's own workspace — a window-index-reuse bug the reaper exposes. (Observed from Sequence on the fix build (app started 22:44):
Root cause hypothesis: after the reaper kills windows, tmux window-index reuse lets a new terminal's stored window index collide with a surviving window (here the Manager's). The deferred-launch-on- Why it didn't corrupt the Manager: the launch is sentinel-gated, so the deferred paste never fired into window :4. Good — but if any code path did paste on a weaker signal, it would inject Repro: with orphans present, open a session so the reaper runs, then create a new managed terminal — it can land on a recycled index pointing at a live window. Also: the app stopped running shortly after (socket now refused). Couldn't tell from this log alone whether it crashed or was quit; flagging in case it's related. |
… serialization
Four Yellow findings from the review:
- Hydrate-fresh recovery race: brand-new managed terminals no longer persist
the launch command on the SessionTerminal row (persist `registerCommand`,
nil when deferring). The launch lives only in in-memory pendingLaunchCommands,
so the hydrate-fresh fallback can't blind-paste it into a not-yet-ready shell
on relaunch. Restored managed terminals relaunch via autoLaunch/launchAgent as
before — and this no longer relies on the fragile `command.contains("claude")`
nil-ing for non-claude agents.
- registerWithRetry main-actor stall: tightened to 2 attempts × 3s (was 3 × 4s),
capping worst-case main-actor stall at ~6s instead of ~12s; common case is one
fast attempt.
- Duplicated launch prep: extracted AppDelegate.prepareAgentLaunchText (hook
config + OTEL env + agent-token guard), returning (text, didLaunch). The `send`
RPC and pasteDeferredLaunch now share it, removing the copy-paste and the
"keep in sync" comment; the `.agentLaunched` gating is now consistent.
- DeferredLaunchTests: marked .serialized — the instance tests overwrite the
singleton onReadinessChanged closure, which can stomp under parallel execution.
All package + app suites green.
🐦⬛ Generated with Claude Code, orchestrated by Crow
Co-Authored-By: Claude <noreply@anthropic.com>
Crow-Session: 1B4F94D3-4045-4DB0-9442-BA35B6050D6A
|
Thanks for the thorough review — all four Yellows addressed in f1eee8d. 🟡 Hydrate-fresh recovery race ( 🟡 🟡 🟡 All package + app suites pass locally. Re-requesting review. |
dgershman
left a comment
There was a problem hiding this comment.
Code & Security Review
Critical Issues
None.
Security Review
Strengths:
- Path-traversal guard on
cwd(isPathWithinDevRoot) preserved on the new-terminal path. - No new trust boundary: the launch command is still authored by an authenticated local socket client; OTEL env vars carry no secrets.
- In-memory only
pendingLaunchCommands— never persisted, so a relaunch can't re-paste a stale command (explicitly noted in AppState.swift:226-230 and defensively cleared on adopt at SessionService.swift:258-260). - Persisted row deliberately stores
registerCommand(nil for deferred launches) rather than the raw launch command, preventing the hydrate-fresh fallback inrehydrateTerminalSurfacefrom blind-pasting it into a not-yet-ready shell (AppDelegate.swift:1323-1335).
Concerns: None.
Code Quality
Strengths:
- Correctly routes brand-new managed terminals through the same race-free
.shellReadypath the restored path already used — eliminates the duplicate launch logic. prepareAgentLaunchTextis a clean extraction: both thesendRPC and the deferred-launch paste now share hook-config + OTEL prep, so they can't drift.- State seeding (
terminalReadiness,pendingLaunchCommands,autoLaunchTerminals) happens beforeregisterTerminal, with a clear comment about why (AppDelegate.swift:1339-1351). Both producer (new-terminalRPC underMainActor.run) and consumer (onReadinessChangedclosure inSessionService, both@MainActor-isolated) are serialized on the main actor, so the sentinel fire is guaranteed to see populated state and theterminalsmap. pasteDeferredLaunchconsumes bothpendingLaunchCommandsandautoLaunchTerminalsatomically (SessionService.swift:386-387) —launchAgent's own membership check is now belt-and-suspenders against a double launch..timedOutdoes NOT consume the pending command (SessionService.swift:352-359 + DeferredLaunchTests.swift:114-126), preserving Retry/foreground re-arm recovery.- The orphan reaper is properly conservative: pure
shouldReapWindowpolicy is unit-tested, real list unionskeepWindowIndiceswith in-memorybindings.values, and the bare-shell whitelist excludes everything that could plausibly be an agent or session anchor. - The concurrent N-managed-terminal real-tmux test directly exercises the regression and asserts on zero bare shells — the right shape of test for this bug.
.serializedonTmuxBackendTestsis well-justified in the comment (real-tmux integration tests contend for CPU during shell startup).- The CLI poll surface (
launch_failedonnew-terminal+readinessonlist-terminals) gives setup.sh honest signal without breaking the existing JSON shape.
Consider (Green)
AppDelegate.swift:439-444— the 8s post-launch sleep before the reaper runs is empirical. If app launch + initial main-thread work is very slow,appState.terminals[*].tmuxBinding?.windowIndexmay not yet reflect a freshly re-registered window. Thebindingsunion inreapUnboundCockpitWindowscovers this in practice, so the risk is low; a follow-up could replace the timer with an explicit "rehydration settled" signal if you want to remove the magic constant.setup.sh terminal_readinessparses the flat list-terminals JSON withgrep -o '{[^{}]*}'. Fine for the current schema (no nested objects per terminal), but worth a note if future fields add nesting.registerWithRetryretries immediately with no backoff. Intentional per the inline comment (2×3s budget), so this is preserved by design — but if a future change widens the budget it'd be worth revisiting.
Tests Run
PR description states all package + app suites pass. Spot-checked test structure — pure helpers (resolveLaunch, registerWithRetry, shouldReapWindow) are well-isolated and the readiness-branch test seeds state exactly the way the RPC does.
Summary Table
| Color | Meaning | Verdict effect |
|---|---|---|
| Red | Must fix | Request changes |
| Yellow | Should fix | Request changes |
| Green | Consider | Approve allowed |
Recommendation: Approve — driven by [0 Red, 0 Yellow, 3 Green] findings. The fix correctly identifies the two race modes (blind-paste vs window-less terminal), routes brand-new terminals through the existing race-free .shellReady path, surfaces failures honestly via launch_failed + readiness, and adds a conservative reaper for already-leaked windows. The Green items are stylistic/forward-looking, not blocking.
Closes #408
Problem
Managed Claude/Codex terminals created by
/crow-workspace(and the batch variant) intermittently came up as a barezshshell with no agent running (~40% under load; 26/63 cockpit windows were orphaned on the reporting host). Two race modes, both becausesetup.shlaunched the agent out-of-band:setup.shdidcrow new-terminal --managed→sleep 3→crow send "<cd && claude …>". The paste landed before the shell's line editor (zle) was live, so the keystrokes were dropped → bare zsh. The naive--commandfix was also racy:TmuxBackend.registerTerminalspawned the wrapper as the window root and then immediatelysendText(command)with no readiness wait.TmuxController.newWindow's tmux subprocess exceeded the 2 s timeout, threwTmuxError.timedOut, and thenew-terminalRPC only logged it — registering a window-less terminal while returning success.Fix
The app already had a race-free launch (
SessionService.launchAgentfired on the.shellReadysentinel), but only for restored terminals. This routes brand-new managed terminals through the same path:new-terminalholds the command inAppState.pendingLaunchCommandsand registers the window withcommand: nil; the deferred paste happens inwireTerminalReadinesson.shellReady(whenzleis live), consuming both the pending command andautoLaunchTerminalsmembership solaunchAgentcan't double-fire. The deferred paste replicates thesendRPC's hook-config + OTEL env prep, so hook routing and telemetry are preserved.registerTerminalwith a longer per-callnew-windowbudget; on exhaustion the terminal is marked.failedand the RPC returnslaunch_failedinstead of reporting success.setup.shlaunches vianew-terminal --command(nosleep/send), keeps its own--rc/--name(the app can't inject for acd && claudeform), parseslaunch_failed, and polls a newreadinessfield oncrow list-terminalsto warn non-fatally if the agent never started.Tests
resolveLaunch,registerWithRetry,shouldReapWindow.SessionServicereadiness-branch test (deferred paste consumes pending + autoLaunch;.timedOutdoes not).TmuxBackendintegration suite so its timing-sensitive tests don't contend.All package + app suites pass (
make testequivalent run per package).🤖 Generated with Claude Code