Skip to content

fix: connection stability — stop node reconnect storms, fix bootstrap token handling#287

Merged
shanselman merged 2 commits into
openclaw:masterfrom
ranjeshj:fix/connection-stability-and-pairing
May 6, 2026
Merged

fix: connection stability — stop node reconnect storms, fix bootstrap token handling#287
shanselman merged 2 commits into
openclaw:masterfrom
ranjeshj:fix/connection-stability-and-pairing

Conversation

@ranjeshj
Copy link
Copy Markdown
Contributor

@ranjeshj ranjeshj commented May 6, 2026

Summary

Fixes several connection management issues that could cause the gateway to be flooded with reconnect attempts, and improves the setup code / bootstrap token flow.

Changes

🔴 Critical: Node client reconnect storm during pairing

When a node device connects and requires pairing approval (NOT_PAIRED), the node client would reconnect aggressively (1s→2s→4s→...→60s forever), generating a new pairing request on every attempt. This floods the gateway and can trigger rate limiting.

Root cause: WindowsNodeClient never overrode ShouldAutoReconnect(), so reconnect always ran. Additionally, OnDisconnected() cleared the _isPendingApproval flag before the reconnect check could use it.

Fix: Added a _pairingBlocked flag (marked volatile for cross-thread visibility) that persists across disconnect/error. Reconnect is blocked until approval is received or the client is reinitialized. Also added detection for terminal auth errors (token mismatch, origin not allowed, rate limit) — the node client previously had no detection for these.

🔴 Critical: Bootstrap token persisted as operator token

When applying a setup code (QR), the single-use bootstrap token was saved to Settings.Token. On app restart, this stale bootstrap token was used as a regular auth token, causing immediate "token mismatch" errors and reconnect storms against the gateway.

Fix: Bootstrap tokens are now saved only to Settings.BootstrapToken. InitializeGatewayClient falls back to BootstrapToken when Token is empty and bootstrap handoff mode is requested. The onboarding TestConnection also no longer persists bootstrap values into Settings.Token.

🟡 High: Operator client leak on reinitialize

InitializeGatewayClient() created a new client without disposing the previous one. The old client's WebSocket and reconnect loop continued running, creating duplicate connections.

Fix: Added _gatewayClient?.Dispose() before creating the new client.

🟡 High: No backoff jitter

Both the operator and node clients used deterministic backoff delays, so they'd hit the gateway at the same time on every retry.

Fix: Added 0-25% random jitter to reconnect delays (Random.Shared is thread-safe in .NET 6+).

Other fixes

  • Token PasswordBox → TextBox — users can now see the token they pasted in the setup wizard and hub connection page
  • Clear stale tray data on disconnect — sessions, channels, nodes, and models are cleared when status transitions to Disconnected/Error, preventing the tray menu from showing old data
  • Rate limit cleared on successful connect_rateLimited is reset on hello-ok so transient rate limits don't require an app restart
  • Onboarding setup code field — removed auto-paste-on-focus (disruptive UX that overwrites user input); state only updates on valid decode to prevent focus loss during typing
  • Relaunch setup wizard — added button to Debug page for re-running the onboarding flow

Testing

  • All 1,633 tests pass (1,220 shared + 393 tray)
  • Adversarial review across 6 models (Claude Opus 4.7, Sonnet 4.5, Opus 4.6, Haiku 4.5, GPT-5.5, GPT-5.4) — all findings addressed
  • Manual testing on ARM64 with local + WSL gateways

Files changed (11 files, +104/-51)

File Changes
WindowsNodeClient.cs ShouldAutoReconnect override, _pairingBlocked, rate limit detection
WebSocketClientBase.cs Backoff jitter
App.xaml.cs Client dispose, bootstrap fallback in init, clear stale data, setup wizard action
Onboarding/Pages/ConnectionPage.cs Bootstrap token fix, auto-paste removal, focus fix
Pages/ConnectionPage.xaml PasswordBox → TextBox
Pages/ConnectionPage.xaml.cs Bootstrap token fix
Pages/DebugPage.xaml + .cs Relaunch setup wizard button
Windows/HubWindow.xaml.cs OpenSetupAction property
Windows/SetupWizardWindow.cs PasswordBox → TextBox
WebSocketClientBaseTests.cs Test tolerance for jitter

ranjeshj and others added 2 commits May 6, 2026 07:35
… token handling

Critical fixes for connection management bugs introduced in PR openclaw#272:

1. Node reconnect storm during pairing (WindowsNodeClient)
   - Added ShouldAutoReconnect() override with _pairingBlocked flag
   - Flag survives OnDisconnected() (which clears _isPendingApproval)
   - Added rate-limit detection for terminal auth errors
   - Marked _pairingBlocked/_rateLimited as volatile for thread safety
   - Clear _rateLimited on successful hello-ok (transient, not permanent)

2. Backoff jitter (WebSocketClientBase)
   - Added 0-25% random jitter to prevent thundering herd when
     operator + node clients reconnect simultaneously

3. Client leak on reinitialize (App.xaml.cs)
   - Added _gatewayClient?.Dispose() before creating new client
   - Old clients were keeping reconnect loops alive as zombies

4. Bootstrap token not saved as Settings.Token
   - Setup code decoder no longer persists bootstrap to Settings.Token
   - Prevents reconnect storms on app restart with stale bootstrap token
   - TestConnection skips writing bootstrap value to Settings.Token
   - InitializeGatewayClient falls back to BootstrapToken for bootstrap flow

5. Token PasswordBox → TextBox
   - Users can see what they pasted (SetupWizardWindow + ConnectionPage)

6. Clear stale tray data on disconnect
   - Sessions/channels/nodes/models cleared when disconnected/error
   - Tray menu no longer shows old data alongside 'Disconnected'

7. Onboarding UX fixes
   - Removed disruptive auto-paste-on-focus from setup code field
   - Setup code state only updates on valid decode (prevents focus loss)
   - Added 'Relaunch First-Run Setup' button to Debug page

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@shanselman shanselman merged commit 2fcfe76 into openclaw:master May 6, 2026
8 checks passed
@ranjeshj ranjeshj deleted the fix/connection-stability-and-pairing branch May 12, 2026 00:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants