fix(onboarding): skip wizard for paired operators and make "Keep my setup" actually dismiss#340
Conversation
…etup" dismiss Two bugs reported by Scott Hanselman against master: 1. Tray app launched the onboarding wizard on every start even when the user already had a working remote-gateway operator configuration. StartupSetupState.RequiresSetup only short-circuited for node mode (EnableNodeMode + node device token) or MCP-only mode, so an operator with a non-default gateway URL + stored device token still got the wizard popped at OnLaunched. Fix: add an operator-mode short-circuit that requires BOTH a stored operator device token AND a non-default GatewayUrl (guards against orphan tokens after uninstall and against half-finished setups that never picked a gateway target). 2. On the SetupWarning page warn-and-confirm UI, clicking "Keep my setup" only toggled in-page state. Because OnboardingWindow defaulted SetupPath = Advanced when existing config was detected, the global nav-bar Next button stayed enabled, so the user was one click from advancing into ConnectionPage anyway. Fix: add OnboardingState.Dismiss() that raises a new Dismissed event; OnboardingWindow handles it by setting a _dismissedWithoutCompletion guard, then Close()ing the window. OnClosed now skips TryCompleteOnboarding when that guard is set so OnboardingCompleted is NOT fired and existing settings / gateway connection are preserved. SetupWarningPage.CancelReplace calls Props.Dismiss(). Belt-and-suspenders: drop the auto-default of SetupPath = Advanced for existing-config users in OnboardingWindow. With SetupPath left null, the nav-bar Next button is disabled on SetupWarning so the user MUST pick "Replace my setup", "Keep my setup", or "Advanced setup" explicitly — no accidental Next-into-setup path remains. Tests: - StartupSetupStateTests: operator paired with remote gateway returns false; operator token + default URL still returns true (stale-token guard); non-default URL alone (no token) still returns true. - OnboardingStateTests: Dismiss fires Dismissed but NOT Finished; safe without subscribers. Validation: - ./build.ps1 succeeded - Shared.Tests: 1548 passed, 28 skipped - Tray.Tests: 1175 passed Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fixes from a Hanselman adversarial code review (Opus + Codex parallel): 1. Per-gateway tokens (Codex HIGH) — RequiresSetup only scanned the legacy root identity (device-key-ed25519.json at the dataPath root). Modern pairings via DeviceIdentityStore write tokens at <dataPath>/gateways/<gatewayId>/device-key-ed25519.json (see GatewayConnectionManager._activeIdentityPath = perGatewayIdentityDir). Operators paired post-GatewayRegistry would still see the wizard pop on every launch. Fix: HasAnyOperatorDeviceToken now scans the legacy root AND every gateways/* subdir. 2. SSH-tunnel false positive (Codex HIGH) — SSH topology routes via ws://127.0.0.1:LocalPort and the user typically leaves GatewayUrl at default. HasNonDefaultGatewayUrl alone returned false. Fix: HasAnyConfiguredGatewayTarget treats (UseSshTunnel + non-empty SshTunnelHost) as a configured target. 3. NodeMode + MCP precedence regression (Codex MEDIUM) — original code was 'if (NodeMode && nodeToken) false; return !MCP;' which let MCP-only mode bypass setup even when NodeMode was accidentally true without a node token. The first patch made NodeMode short-circuit first, breaking that precedence. Fix: check EnableMcpServer BEFORE EnableNodeMode so MCP wins, matching original semantics. 4. _dismissedWithoutCompletion stuck on Close exception (Opus MEDIUM) — the flag was set BEFORE Close(); if Close() threw, the flag stayed true and TryCompleteOnboarding was permanently suppressed for the window's lifetime, wedging the user. Fix: reset the flag in the catch block so the X-button / Finish path still works. 5. DefaultGatewayUrl duplication (Opus HIGH) — the constant existed in both StartupSetupState and OnboardingExistingConfigGuard with only a comment promising sync. Fix: promote OnboardingExistingConfigGuard.DefaultGatewayUrl to public const (single source of truth) and reference it from StartupSetupState. Added DefaultGatewayUrl_MatchesGuardConstant invariant test. 6. CancelReplace UI flash (Opus MEDIUM) — setConfirmingReplace(false) was called immediately before Props.Dismiss(), causing a brief re-render of the 'Set up locally' button before the window closed. Fix: drop the dead state change. Tests added (5): - RequiresSetup_ReturnsFalse_WhenSshTunnelConfiguredWithStoredToken - RequiresSetup_ReturnsTrue_WhenSshTunnelEnabledButNoHostConfigured - RequiresSetup_ReturnsFalse_WhenOperatorTokenStoredOnlyInPerGatewayDir - RequiresSetup_ReturnsFalse_WhenMcpEnabledEvenWithNodeModeAndNoNodeToken - DefaultGatewayUrl_MatchesGuardConstant Validation: - ./build.ps1 succeeded - Shared.Tests: 1548 passed, 28 skipped - Tray.Tests: 1180 passed (5 new); all 16 onboarding-fix tests green Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Thanks for jumping on this. The overall direction looks right and CI is green, but I would hold this PR for one more fix before merge. I ran a deeper review against the final two-commit diff. The main startup/dismiss flow looks sound: MCP precedence is preserved, per-gateway token scanning was added for startup, SSH tunnel mode is accounted for, and the Blocking / should fix before mergeExisting-config guard still misses per-gateway operator tokens
HasOperatorDeviceToken: DeviceIdentity.HasStoredDeviceToken(_identityDataPath)That only sees the legacy root Why this matters:
Suggested fix:
Example test shape: [Fact]
public void HasExistingConfiguration_ReturnsTrue_WhenOperatorTokenStoredOnlyInPerGatewayDir()
{
using var temp = TempSettings.Create();
var perGatewayDir = Path.Combine(temp.Path, "gateways", "gw-abc");
Directory.CreateDirectory(perGatewayDir);
StoreDeviceToken(perGatewayDir);
var settings = new SettingsManager(temp.Path);
var guard = new OnboardingExistingConfigGuard(settings, temp.Path, setupStatePath: Path.Combine(temp.Path, "setup-state.json"));
var summary = guard.GetSummary();
Assert.True(summary.HasOperatorDeviceToken);
Assert.True(summary.HasAny);
}Lower-confidence / consider while touching this areaStartup token scan may treat orphan gateway identity dirs as usable
The safer long-term model is probably:
I would not necessarily block this PR on the full registry-aware refactor if you want to keep it small, but it is worth either addressing now or tracking as a follow-up.
|
…smiss Addresses Scott Hanselman's review on PR openclaw#340: Blocking fix: - OnboardingExistingConfigGuard.GetSummary().HasOperatorDeviceToken only checked DeviceIdentity.HasStoredDeviceToken on the legacy root path. Modern pairings store the operator token at <dataPath>/gateways/<id>/device-key-ed25519.json via DeviceIdentityStore, so a fresh-paired user opening Setup/Reconfigure could overwrite a working gateway without seeing the "Replace my setup / Keep my setup" warning. - Extracted the per-gateway scan (previously private to StartupSetupState) to OnboardingExistingConfigGuard.HasAnyOperatorDeviceToken as the single source of truth. StartupSetupState.HasUsableOperatorConfiguration and GetSummary() both call it now, so the startup auto-launch decision and the in-wizard guard always agree on what counts as paired. Hardening (Scott's lower-confidence suggestion): - OnboardingState.Dismiss() is now idempotent. A double-click or repeated handler invocation no longer fires the lifecycle signal twice. Tests added: - OnboardingExistingConfigGuardTests.HasExistingConfiguration_ReturnsTrue_ WhenOperatorTokenStoredOnlyInPerGatewayDir — Scott's exact test shape. - OnboardingStateTests.Dismiss_IsIdempotent_FiresDismissedAtMostOnce. Follow-up tracked separately (per Scott's note): - Make the startup token scan registry-aware (prefer the active GatewayRegistry record's identity dir over arbitrary gateways/* dirs) to avoid orphan dirs from suppressing onboarding for a different active gateway. Validation: - ./build.ps1 succeeded - Shared.Tests: 1548 passed, 28 skipped - Tray.Tests: 1182 passed (+2 new) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Thanks @shanselman — addressed in f65b742 (now pushed to PR). Blocking fix Extracted the per-gateway scan to OnboardingExistingConfigGuard.HasAnyOperatorDeviceToken as the single source of truth (was private to StartupSetupState). Both StartupSetupState.HasUsableOperatorConfiguration and OnboardingExistingConfigGuard.GetSummary().HasOperatorDeviceToken call it now, so the startup decision and the in-wizard guard agree on what counts as paired. Used your exact test shape — Hardening (also done) Made Deferred follow-up The orphan-gateway-dir concern (registry-aware token resolution preferring the active record's identity dir over arbitrary Validation
|
|
I'm still clicking keep and it's still being ignored, and it's still sending me to the next page. The same dialog pops up when I run it again, every time. @indierawk2k2 |
|
Follow-up from manual repro after merge: the Keep my setup dismissal path itself works, but startup setup detection still missed one local-node shape. Sanitized repro state:
Broken-run log excerpt: Root cause: #340 added the per-gateway scan for operator tokens, but I opened follow-up PR #375 with the fix and regression tests. After rebuilding x64, relaunch no longer opens onboarding; the post-launch log shows normal node auth instead: |
…+ wire V2 dismiss Our V2 branch was cut at a03c454 (before PR openclaw#340 landed in master). PR openclaw#340 fixed two real bugs in the legacy onboarding flow: "Keep my setup" didn't actually dismiss the wizard, and ExistingConfigGuard only scanned the legacy root identity dir for operator tokens (per-gateway tokens under gateways/<id>/device-key-ed25519.json were missed). PR openclaw#375 (Scott's, still open) extends the per-gateway scan symmetrically to the node-token role, fixing a startup-relaunch-reopens-onboarding bug for local node-mode profiles whose node token lives only per-gateway. This commit: 1. Merges origin/master to absorb PR openclaw#340's three commits (Dismissed event/method on OnboardingState, OnboardingWindow.OnOnboardingDismissed handler, SetupWarningPage CancelReplace -> Props.Dismiss, removed default-to-Advanced in returning-user path, HasAnyOperatorDeviceToken per-gateway scan, StartupSetupState rewrite). Auto-resolved a single conflict in OnboardingWindow.cs via the ORT strategy. 2. Applies PR openclaw#375's symmetric per-gateway scan locally: - Refactored OnboardingExistingConfigGuard.HasAnyOperatorDeviceToken to delegate to a new HasAnyDeviceTokenForRole(dataPath, role). - GetSummary now uses HasAnyDeviceTokenForRole for the node side too, not just operator (the asymmetric defect that PR openclaw#375 catches). - StartupSetupState.HasStoredNodeDeviceToken now delegates to the guard's per-gateway-aware helper so the startup auto-launch decision and the in-wizard guard agree. 3. Wires V2 equivalents of PR openclaw#340's Dismiss machinery so the V2 redesigned UI gets the same fix, not just the legacy UI: - OnboardingV2State adds Dismissed event + idempotent Dismiss() method (matches legacy OnboardingState.Dismiss semantics: log, fire once). - V2 WelcomePage's "Keep my setup" handler now calls Props.Dismiss() instead of just toggling setConfirmingReplace(false). Mirrors legacy SetupWarningPage CancelReplace post-PR-openclaw#340. - OnboardingV2Bridge surfaces a Dismissed event (subscribes to V2 state, forwards to host). - OnboardingWindow's V2 mount now subscribes to bridge.Dismissed and reuses the existing _dismissedWithoutCompletion flag + Close pattern from PR openclaw#340's OnOnboardingDismissed so OnClosed skips TryCompleteOnboarding the same way for both legacy and V2 dismiss paths. Validation: build.ps1 -> all 4 projects green Shared.Tests -> 1548 passed / 28 skipped / 0 failed Tray.Tests -> 1178 passed / 0 failed (was 1164 -- +14 from PR openclaw#340's OnboardingExistingConfigGuardTests + OnboardingStateTests + StartupSetupStateTests new coverage we just absorbed) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…+ wire V2 dismiss Our V2 branch was cut at a03c454 (before PR openclaw#340 landed in master). PR didn't actually dismiss the wizard, and ExistingConfigGuard only scanned the legacy root identity dir for operator tokens (per-gateway tokens under gateways/<id>/device-key-ed25519.json were missed). PR openclaw#375 (Scott's, still open) extends the per-gateway scan symmetrically to the node-token role, fixing a startup-relaunch-reopens-onboarding bug for local node-mode profiles whose node token lives only per-gateway. This commit: 1. Merges origin/master to absorb PR openclaw#340's three commits (Dismissed event/method on OnboardingState, OnboardingWindow.OnOnboardingDismissed handler, SetupWarningPage CancelReplace -> Props.Dismiss, removed default-to-Advanced in returning-user path, HasAnyOperatorDeviceToken per-gateway scan, StartupSetupState rewrite). Auto-resolved a single conflict in OnboardingWindow.cs via the ORT strategy. 2. Applies PR openclaw#375's symmetric per-gateway scan locally: - Refactored OnboardingExistingConfigGuard.HasAnyOperatorDeviceToken to delegate to a new HasAnyDeviceTokenForRole(dataPath, role). - GetSummary now uses HasAnyDeviceTokenForRole for the node side too, not just operator (the asymmetric defect that PR openclaw#375 catches). - StartupSetupState.HasStoredNodeDeviceToken now delegates to the guard's per-gateway-aware helper so the startup auto-launch decision and the in-wizard guard agree. 3. Wires V2 equivalents of PR openclaw#340's Dismiss machinery so the V2 redesigned UI gets the same fix, not just the legacy UI: - OnboardingV2State adds Dismissed event + idempotent Dismiss() method (matches legacy OnboardingState.Dismiss semantics: log, fire once). - V2 WelcomePage's "Keep my setup" handler now calls Props.Dismiss() instead of just toggling setConfirmingReplace(false). Mirrors legacy SetupWarningPage CancelReplace post-PR-openclaw#340. - OnboardingV2Bridge surfaces a Dismissed event (subscribes to V2 state, forwards to host). - OnboardingWindow's V2 mount now subscribes to bridge.Dismissed and reuses the existing _dismissedWithoutCompletion flag + Close pattern from PR openclaw#340's OnOnboardingDismissed so OnClosed skips TryCompleteOnboarding the same way for both legacy and V2 dismiss paths. Validation: build.ps1 -> all 4 projects green Shared.Tests -> 1548 passed / 28 skipped / 0 failed Tray.Tests -> 1178 passed / 0 failed (was 1164 -- +14 from PR openclaw#340's OnboardingExistingConfigGuardTests + OnboardingStateTests + StartupSetupStateTests new coverage we just absorbed) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Reported by @shanselman against latest master:
Root cause
StartupSetupState.RequiresSetuponly short-circuited for node mode (EnableNodeMode + node device token) or MCP-only mode. An operator with a non-default gateway URL + stored device token still got the wizard popped atOnLaunched.SetupWarningPage.CancelReplaceonly toggled in-page state. Worse,OnboardingWindowdefaultedSetupPath = SetupPath.Advancedwhenever existing config was detected, so the global nav-bar Next button stayed enabled and the user was one click from advancing intoConnectionPage.Fix (commit b3b5b2f)
StartupSetupState.RequiresSetup: returning users with both an operator device token and a configured gateway target skip the wizard.OnboardingState.Dismiss()/Dismissedevent.OnboardingWindowlistens, sets a_dismissedWithoutCompletionguard, andClose()s —OnClosedskipsTryCompleteOnboardingwhen guarded soOnboardingCompleteddoes NOT fire and existing settings/gateway connection are preserved.SetupWarningPage.CancelReplacecallsProps.Dismiss().SetupPath = SetupPath.AdvancedinOnboardingWindow. WithSetupPathleft null, the global Next button is disabled on SetupWarning, so the user MUST pick Replace my setup, Keep my setup, or Advanced setup explicitly.Adversarial review fixes (commit 8338c5d)
Ran a Hanselman-style dual-model review (Claude Opus + GPT Codex) before sending. The two models surfaced complementary issues; all six actionable findings are folded in:
gateways/<id>/device-key-ed25519.json)HasAnyOperatorDeviceTokenUseSshTunnel + SshTunnelHostcounted as a configured targetNodeMode=true + MCP=true + no node token)Close()exception left_dismissedWithoutCompletion=truepermanentlyDefaultGatewayUrlduplicated across two filesOnboardingExistingConfigGuard.DefaultGatewayUrl(public const) + invariant testCancelReplaceUI flash from deadsetConfirmingReplace(false)Tests added
StartupSetupStateTests: operator paired with remote returns false; operator+default URL still requires setup; non-default URL alone still requires setup; null/whitespace URL; SSH tunnel + token; SSH enabled but no host; per-gateway-only token; MCP+NodeMode+no node token bypass;DefaultGatewayUrl_MatchesGuardConstantinvariant.OnboardingStateTests:DismissfiresDismissed; does NOT fireFinished; safe with no handler.Validation
./build.ps1✅OpenClaw.Shared.Tests: 1548 passed, 28 skippedOpenClaw.Tray.Tests: 1180 passed (+5 new)cc @shanselman — please give it a look when you can.