[#580] Add waitForAgentChattrReady after every AC spawn#583
[#580] Add waitForAgentChattrReady after every AC spawn#583realproject7 merged 2 commits intomainfrom
Conversation
spawnChattr() in server/index.js now awaits port readiness (30s timeout) before setting state to "running". cmdStart() in bin/quadwork.js polls /api/health before printing success. On fast-start installs the wait resolves in 1-2s; on slow starts this prevents the health monitor from triggering false-down → auto-restart → ghost agent cascade. Fixes #580 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
realproject7
left a comment
There was a problem hiding this comment.
RE2 APPROVED ✓
Reviewed diff against #580 acceptance criteria:
- ✅
spawnChattr()now awaitswaitForAgentChattrReady(chattrPort, 30000)before setting state to "running" — sets "error" state on timeout - ✅
cmdStart()polls/api/healthevery 2s for up to 30s before printing success — prints warning on timeout - ✅ Fast-start environments unaffected (resolves as soon as AC binds)
- ✅ Build passes per Dev
Notes:
- Correctly reuses existing
waitForAgentChattrReadyfromagentchattr-registry.jsin server/ bin/quadwork.jscorrectly inlines its own HTTP-based health check (no access to server modules)- All three
spawnChattr()call sites (start, restart, update) properly awaited - Timeout behavior is graceful: warns/sets error state but doesn't crash
No issues found.
project7-interns
left a comment
There was a problem hiding this comment.
Verdict: REQUEST CHANGES
Summary
The readiness wait is added in the right places, but the server start/restart paths still report success when AgentChattr never becomes ready within 30s.
Findings
- [high] Start/restart can return success even after readiness timeout
- File:
server/index.js:919 - Suggestion: when
waitForAgentChattrReady(chattrPort, 30000)returns false, treat the spawn as failed for the caller as well. Right nowspawnChattr()setsstate: "error"but still returnschild, so thestart/restarthandlers continue to sendres.json({ ok: true, state: "running", pid })atserver/index.js:1007andserver/index.js:1080. Returnnullor throw on timeout so the HTTP response matches the actual error state.
- File:
Decision
Requesting changes because the main failure case this ticket is trying to harden still surfaces as a successful start/restart to API callers, which can leave the dashboard and automation believing AC is up when it already timed out.
When AC fails to bind within 30s, return null so HTTP handlers correctly report the error state to callers instead of responding with ok:true/state:running. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
project7-interns
left a comment
There was a problem hiding this comment.
Verdict: APPROVE
Summary
The readiness wait is now applied in the intended spawn paths, and the timeout case correctly propagates as a failed start/restart instead of a false success.
Findings
- None.
Decision
Approving because the server-side timeout path now returns null from spawnChattr(), which makes the start and restart handlers report the actual error state. That closes the regression from the prior revision and matches issue #580's acceptance criteria.
Summary
spawnChattr()inserver/index.jsnowawaitswaitForAgentChattrReady(port, 30000)before setting state to "running" — if AC doesn't bind within 30s, state is set to "error" insteadcmdStart()inbin/quadwork.jspolls/api/healthevery 2s (up to 30s) before printing success; warns on timeoutTest plan
npm run buildpassesFixes #580
🤖 Generated with Claude Code