Skip to content

[#580] Add waitForAgentChattrReady after every AC spawn#583

Merged
realproject7 merged 2 commits intomainfrom
task/580-wait-for-ac-ready
Apr 27, 2026
Merged

[#580] Add waitForAgentChattrReady after every AC spawn#583
realproject7 merged 2 commits intomainfrom
task/580-wait-for-ac-ready

Conversation

@realproject7
Copy link
Copy Markdown
Owner

Summary

  • spawnChattr() in server/index.js now awaits waitForAgentChattrReady(port, 30000) before setting state to "running" — if AC doesn't bind within 30s, state is set to "error" instead
  • cmdStart() in bin/quadwork.js polls /api/health every 2s (up to 30s) before printing success; warns on timeout
  • On fast-start installs the wait resolves in 1-2s with no behavioral change; prevents false-down detection on slow starts that caused ghost agent cascades

Test plan

  • Fresh install with slow AC startup: verify no auto-restart cascade
  • Existing fast-start install: verify startup completes in ~1-2s with no extra delay
  • npm run build passes

Fixes #580

🤖 Generated with Claude Code

spawnChattr() in server/index.js now awaits port readiness (30s timeout)
before setting state to "running". cmdStart() in bin/quadwork.js polls
/api/health before printing success. On fast-start installs the wait
resolves in 1-2s; on slow starts this prevents the health monitor from
triggering false-down → auto-restart → ghost agent cascade.

Fixes #580

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Owner Author

@realproject7 realproject7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE2 APPROVED

Reviewed diff against #580 acceptance criteria:

  1. spawnChattr() now awaits waitForAgentChattrReady(chattrPort, 30000) before setting state to "running" — sets "error" state on timeout
  2. cmdStart() polls /api/health every 2s for up to 30s before printing success — prints warning on timeout
  3. ✅ Fast-start environments unaffected (resolves as soon as AC binds)
  4. ✅ Build passes per Dev

Notes:

  • Correctly reuses existing waitForAgentChattrReady from agentchattr-registry.js in server/
  • bin/quadwork.js correctly inlines its own HTTP-based health check (no access to server modules)
  • All three spawnChattr() call sites (start, restart, update) properly awaited
  • Timeout behavior is graceful: warns/sets error state but doesn't crash

No issues found.

Copy link
Copy Markdown
Collaborator

@project7-interns project7-interns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: REQUEST CHANGES

Summary

The readiness wait is added in the right places, but the server start/restart paths still report success when AgentChattr never becomes ready within 30s.

Findings

  • [high] Start/restart can return success even after readiness timeout
    • File: server/index.js:919
    • Suggestion: when waitForAgentChattrReady(chattrPort, 30000) returns false, treat the spawn as failed for the caller as well. Right now spawnChattr() sets state: "error" but still returns child, so the start/restart handlers continue to send res.json({ ok: true, state: "running", pid }) at server/index.js:1007 and server/index.js:1080. Return null or throw on timeout so the HTTP response matches the actual error state.

Decision

Requesting changes because the main failure case this ticket is trying to harden still surfaces as a successful start/restart to API callers, which can leave the dashboard and automation believing AC is up when it already timed out.

When AC fails to bind within 30s, return null so HTTP handlers
correctly report the error state to callers instead of responding
with ok:true/state:running.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@project7-interns project7-interns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: APPROVE

Summary

The readiness wait is now applied in the intended spawn paths, and the timeout case correctly propagates as a failed start/restart instead of a false success.

Findings

  • None.

Decision

Approving because the server-side timeout path now returns null from spawnChattr(), which makes the start and restart handlers report the actual error state. That closes the regression from the prior revision and matches issue #580's acceptance criteria.

@realproject7 realproject7 merged commit 546c108 into main Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[#579-1] Add waitForAgentChattrReady after every AC spawn

2 participants