fix: connection stability — stop node reconnect storms, fix bootstrap token handling#287
Merged
shanselman merged 2 commits intoMay 6, 2026
Conversation
… token handling Critical fixes for connection management bugs introduced in PR openclaw#272: 1. Node reconnect storm during pairing (WindowsNodeClient) - Added ShouldAutoReconnect() override with _pairingBlocked flag - Flag survives OnDisconnected() (which clears _isPendingApproval) - Added rate-limit detection for terminal auth errors - Marked _pairingBlocked/_rateLimited as volatile for thread safety - Clear _rateLimited on successful hello-ok (transient, not permanent) 2. Backoff jitter (WebSocketClientBase) - Added 0-25% random jitter to prevent thundering herd when operator + node clients reconnect simultaneously 3. Client leak on reinitialize (App.xaml.cs) - Added _gatewayClient?.Dispose() before creating new client - Old clients were keeping reconnect loops alive as zombies 4. Bootstrap token not saved as Settings.Token - Setup code decoder no longer persists bootstrap to Settings.Token - Prevents reconnect storms on app restart with stale bootstrap token - TestConnection skips writing bootstrap value to Settings.Token - InitializeGatewayClient falls back to BootstrapToken for bootstrap flow 5. Token PasswordBox → TextBox - Users can see what they pasted (SetupWizardWindow + ConnectionPage) 6. Clear stale tray data on disconnect - Sessions/channels/nodes/models cleared when disconnected/error - Tray menu no longer shows old data alongside 'Disconnected' 7. Onboarding UX fixes - Removed disruptive auto-paste-on-focus from setup code field - Setup code state only updates on valid decode (prevents focus loss) - Added 'Relaunch First-Run Setup' button to Debug page Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes several connection management issues that could cause the gateway to be flooded with reconnect attempts, and improves the setup code / bootstrap token flow.
Changes
🔴 Critical: Node client reconnect storm during pairing
When a node device connects and requires pairing approval (
NOT_PAIRED), the node client would reconnect aggressively (1s→2s→4s→...→60s forever), generating a new pairing request on every attempt. This floods the gateway and can trigger rate limiting.Root cause:
WindowsNodeClientnever overrodeShouldAutoReconnect(), so reconnect always ran. Additionally,OnDisconnected()cleared the_isPendingApprovalflag before the reconnect check could use it.Fix: Added a
_pairingBlockedflag (markedvolatilefor cross-thread visibility) that persists across disconnect/error. Reconnect is blocked until approval is received or the client is reinitialized. Also added detection for terminal auth errors (token mismatch,origin not allowed,rate limit) — the node client previously had no detection for these.🔴 Critical: Bootstrap token persisted as operator token
When applying a setup code (QR), the single-use bootstrap token was saved to
Settings.Token. On app restart, this stale bootstrap token was used as a regular auth token, causing immediate "token mismatch" errors and reconnect storms against the gateway.Fix: Bootstrap tokens are now saved only to
Settings.BootstrapToken.InitializeGatewayClientfalls back toBootstrapTokenwhenTokenis empty and bootstrap handoff mode is requested. The onboardingTestConnectionalso no longer persists bootstrap values intoSettings.Token.🟡 High: Operator client leak on reinitialize
InitializeGatewayClient()created a new client without disposing the previous one. The old client's WebSocket and reconnect loop continued running, creating duplicate connections.Fix: Added
_gatewayClient?.Dispose()before creating the new client.🟡 High: No backoff jitter
Both the operator and node clients used deterministic backoff delays, so they'd hit the gateway at the same time on every retry.
Fix: Added 0-25% random jitter to reconnect delays (
Random.Sharedis thread-safe in .NET 6+).Other fixes
_rateLimitedis reset on hello-ok so transient rate limits don't require an app restartTesting
Files changed (11 files, +104/-51)
WindowsNodeClient.csShouldAutoReconnectoverride,_pairingBlocked, rate limit detectionWebSocketClientBase.csApp.xaml.csOnboarding/Pages/ConnectionPage.csPages/ConnectionPage.xamlPages/ConnectionPage.xaml.csPages/DebugPage.xaml+.csWindows/HubWindow.xaml.csOpenSetupActionpropertyWindows/SetupWizardWindow.csWebSocketClientBaseTests.cs