Skip to content

Retry command center connection indefinitely#76

Merged
mlund01 merged 3 commits into
mainfrom
claude/dazzling-yonath-e14b6a
Apr 23, 2026
Merged

Retry command center connection indefinitely#76
mlund01 merged 3 commits into
mainfrom
claude/dazzling-yonath-e14b6a

Conversation

@mlund01
Copy link
Copy Markdown
Owner

@mlund01 mlund01 commented Apr 22, 2026

Summary

  • connectWithRetry now loops forever at a 3s interval instead of giving up after 10 attempts (or 1 with auto_reconnect off). Squadron will keep trying to reach the command center until it succeeds or the process is shut down.
  • Added wsbridge.Client.Done() so the retry loop can observe shutdown cleanly and bail out on Ctrl-C / SIGTERM.
  • auto_reconnect still gates whether we attempt to reconnect after a connection is lost — it just no longer caps how long we keep trying once we've decided to try.

Test plan

  • ./squadron engage with no command center running — squadron logs repeated retry attempts and connects when the command center comes up.
  • Ctrl-C during the retry loop exits cleanly instead of spinning.
  • Kill the command center mid-session with auto_reconnect = true — squadron reconnects when it comes back, no matter how long it was down.

mlund01 added 3 commits April 21, 2026 23:00
Previously squadron gave up after 10 connection attempts (or just 1 when
auto_reconnect was off). If the command center was slow to start or
briefly unreachable, squadron would bail out and sit there disconnected.

Now connectWithRetry loops forever at a 3s interval, only aborting when
the client is shut down (Ctrl-C / SIGTERM). Surfaces client.Done() so
the retry loop can observe shutdown cleanly.
SIGTERM during the now-indefinite connection retry loop would hard-kill
the process before cleanup ran, orphaning the command center subprocess
and MCP/plugin children — especially noticeable in background mode,
where `squadron disengage` relies on graceful shutdown.

Register signal handling before connectWithRetry. The handler closes the
client, which closes client.Done() and unblocks the retry loop so the
normal cleanup path runs. If shutdown fires during retry, skip the rest
of startup and fall through to cleanup immediately.
After a natural websocket drop, Run() flipped c.connected to false
and returned an error. The reconnect goroutine then read !IsConnected()
as "shutting down" and exited — never reaching the AutoReconnect
branch. Reconnect was effectively dead code for the common case.

Rewrite the watchdog loop to gate exit on the shutdown channel (the
actual shutdown signal), not on IsConnected(). Drop the AutoReconnect
check: as long as squadron is running, it should always reconnect.
AutoReconnect in config is now effectively always-on.

Also stop cancelling c.ctx on register failure — that left the client
permanently unable to reconnect once registration hiccuped.
@mlund01 mlund01 merged commit c209707 into main Apr 23, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant