Skip to content

fix(onboarding): skip wizard for paired operators and make "Keep my setup" actually dismiss#340

Merged
shanselman merged 3 commits into
openclaw:masterfrom
indierawk2k2:fix/onboarding-skip-existing-config
May 14, 2026
Merged

fix(onboarding): skip wizard for paired operators and make "Keep my setup" actually dismiss#340
shanselman merged 3 commits into
openclaw:masterfrom
indierawk2k2:fix/onboarding-skip-existing-config

Conversation

@indierawk2k2
Copy link
Copy Markdown
Contributor

Reported by @shanselman against latest master:

  1. Tray app pops the onboarding wizard at launch even though there is already a working configuration with a connection to a remote gateway.
  2. "Keep my setup" doesn't keep the setup — clicking it on the warn-and-confirm UI doesn't dismiss the wizard; the user can still advance into setup via the global Next button.

Root cause

  1. StartupSetupState.RequiresSetup only short-circuited for node mode (EnableNodeMode + node device token) or MCP-only mode. An operator with a non-default gateway URL + stored device token still got the wizard popped at OnLaunched.
  2. SetupWarningPage.CancelReplace only toggled in-page state. Worse, OnboardingWindow defaulted SetupPath = SetupPath.Advanced whenever existing config was detected, so the global nav-bar Next button stayed enabled and the user was one click from advancing into ConnectionPage.

Fix (commit b3b5b2f)

  • New operator-mode short-circuit in StartupSetupState.RequiresSetup: returning users with both an operator device token and a configured gateway target skip the wizard.
  • New OnboardingState.Dismiss() / Dismissed event. OnboardingWindow listens, sets a _dismissedWithoutCompletion guard, and Close()s — OnClosed skips TryCompleteOnboarding when guarded so OnboardingCompleted does NOT fire and existing settings/gateway connection are preserved.
  • SetupWarningPage.CancelReplace calls Props.Dismiss().
  • Belt-and-suspenders: dropped the auto-default of SetupPath = SetupPath.Advanced in OnboardingWindow. With SetupPath left null, the global Next button is disabled on SetupWarning, so the user MUST pick Replace my setup, Keep my setup, or Advanced setup explicitly.

Adversarial review fixes (commit 8338c5d)

Ran a Hanselman-style dual-model review (Claude Opus + GPT Codex) before sending. The two models surfaced complementary issues; all six actionable findings are folded in:

Finding Source Resolution
Per-gateway tokens missed (gateways/<id>/device-key-ed25519.json) Codex HIGH Scan added in HasAnyOperatorDeviceToken
SSH-tunnel users falsely re-prompted (loopback URL) Codex HIGH UseSshTunnel + SshTunnelHost counted as a configured target
MCP precedence regression (NodeMode=true + MCP=true + no node token) Codex MEDIUM MCP check moved before NodeMode
Close() exception left _dismissedWithoutCompletion=true permanently Opus MEDIUM Flag reset in catch block
DefaultGatewayUrl duplicated across two files Opus HIGH Promoted to OnboardingExistingConfigGuard.DefaultGatewayUrl (public const) + invariant test
CancelReplace UI flash from dead setConfirmingReplace(false) Opus MEDIUM Removed

Tests added

  • StartupSetupStateTests: operator paired with remote returns false; operator+default URL still requires setup; non-default URL alone still requires setup; null/whitespace URL; SSH tunnel + token; SSH enabled but no host; per-gateway-only token; MCP+NodeMode+no node token bypass; DefaultGatewayUrl_MatchesGuardConstant invariant.
  • OnboardingStateTests: Dismiss fires Dismissed; does NOT fire Finished; safe with no handler.

Validation

  • ./build.ps1
  • OpenClaw.Shared.Tests: 1548 passed, 28 skipped
  • OpenClaw.Tray.Tests: 1180 passed (+5 new)

cc @shanselman — please give it a look when you can.

Mike Harsh and others added 2 commits May 13, 2026 13:44
…etup" dismiss

Two bugs reported by Scott Hanselman against master:

1. Tray app launched the onboarding wizard on every start even when the
   user already had a working remote-gateway operator configuration.
   StartupSetupState.RequiresSetup only short-circuited for node mode
   (EnableNodeMode + node device token) or MCP-only mode, so an operator
   with a non-default gateway URL + stored device token still got the
   wizard popped at OnLaunched.

   Fix: add an operator-mode short-circuit that requires BOTH a stored
   operator device token AND a non-default GatewayUrl (guards against
   orphan tokens after uninstall and against half-finished setups that
   never picked a gateway target).

2. On the SetupWarning page warn-and-confirm UI, clicking "Keep my setup"
   only toggled in-page state. Because OnboardingWindow defaulted
   SetupPath = Advanced when existing config was detected, the global
   nav-bar Next button stayed enabled, so the user was one click from
   advancing into ConnectionPage anyway.

   Fix: add OnboardingState.Dismiss() that raises a new Dismissed event;
   OnboardingWindow handles it by setting a _dismissedWithoutCompletion
   guard, then Close()ing the window. OnClosed now skips
   TryCompleteOnboarding when that guard is set so OnboardingCompleted
   is NOT fired and existing settings / gateway connection are preserved.
   SetupWarningPage.CancelReplace calls Props.Dismiss().

   Belt-and-suspenders: drop the auto-default of SetupPath = Advanced for
   existing-config users in OnboardingWindow. With SetupPath left null,
   the nav-bar Next button is disabled on SetupWarning so the user MUST
   pick "Replace my setup", "Keep my setup", or "Advanced setup"
   explicitly — no accidental Next-into-setup path remains.

Tests:
- StartupSetupStateTests: operator paired with remote gateway returns false;
  operator token + default URL still returns true (stale-token guard);
  non-default URL alone (no token) still returns true.
- OnboardingStateTests: Dismiss fires Dismissed but NOT Finished; safe
  without subscribers.

Validation:
- ./build.ps1 succeeded
- Shared.Tests: 1548 passed, 28 skipped
- Tray.Tests: 1175 passed

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fixes from a Hanselman adversarial code review (Opus + Codex parallel):

1. Per-gateway tokens (Codex HIGH) — RequiresSetup only scanned the legacy
   root identity (device-key-ed25519.json at the dataPath root). Modern
   pairings via DeviceIdentityStore write tokens at
   <dataPath>/gateways/<gatewayId>/device-key-ed25519.json (see
   GatewayConnectionManager._activeIdentityPath = perGatewayIdentityDir).
   Operators paired post-GatewayRegistry would still see the wizard pop on
   every launch. Fix: HasAnyOperatorDeviceToken now scans the legacy root
   AND every gateways/* subdir.

2. SSH-tunnel false positive (Codex HIGH) — SSH topology routes via
   ws://127.0.0.1:LocalPort and the user typically leaves GatewayUrl at
   default. HasNonDefaultGatewayUrl alone returned false. Fix:
   HasAnyConfiguredGatewayTarget treats (UseSshTunnel + non-empty
   SshTunnelHost) as a configured target.

3. NodeMode + MCP precedence regression (Codex MEDIUM) — original code
   was 'if (NodeMode && nodeToken) false; return !MCP;' which let
   MCP-only mode bypass setup even when NodeMode was accidentally true
   without a node token. The first patch made NodeMode short-circuit
   first, breaking that precedence. Fix: check EnableMcpServer BEFORE
   EnableNodeMode so MCP wins, matching original semantics.

4. _dismissedWithoutCompletion stuck on Close exception (Opus MEDIUM) —
   the flag was set BEFORE Close(); if Close() threw, the flag stayed
   true and TryCompleteOnboarding was permanently suppressed for the
   window's lifetime, wedging the user. Fix: reset the flag in the
   catch block so the X-button / Finish path still works.

5. DefaultGatewayUrl duplication (Opus HIGH) — the constant existed in
   both StartupSetupState and OnboardingExistingConfigGuard with only a
   comment promising sync. Fix: promote
   OnboardingExistingConfigGuard.DefaultGatewayUrl to public const
   (single source of truth) and reference it from StartupSetupState.
   Added DefaultGatewayUrl_MatchesGuardConstant invariant test.

6. CancelReplace UI flash (Opus MEDIUM) — setConfirmingReplace(false)
   was called immediately before Props.Dismiss(), causing a brief
   re-render of the 'Set up locally' button before the window closed.
   Fix: drop the dead state change.

Tests added (5):
- RequiresSetup_ReturnsFalse_WhenSshTunnelConfiguredWithStoredToken
- RequiresSetup_ReturnsTrue_WhenSshTunnelEnabledButNoHostConfigured
- RequiresSetup_ReturnsFalse_WhenOperatorTokenStoredOnlyInPerGatewayDir
- RequiresSetup_ReturnsFalse_WhenMcpEnabledEvenWithNodeModeAndNoNodeToken
- DefaultGatewayUrl_MatchesGuardConstant

Validation:
- ./build.ps1 succeeded
- Shared.Tests: 1548 passed, 28 skipped
- Tray.Tests: 1180 passed (5 new); all 16 onboarding-fix tests green

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@shanselman
Copy link
Copy Markdown
Contributor

Thanks for jumping on this. The overall direction looks right and CI is green, but I would hold this PR for one more fix before merge.

I ran a deeper review against the final two-commit diff. The main startup/dismiss flow looks sound: MCP precedence is preserved, per-gateway token scanning was added for startup, SSH tunnel mode is accounted for, and the Keep my setup dismiss path avoids firing the completion pipeline.

Blocking / should fix before merge

Existing-config guard still misses per-gateway operator tokens

StartupSetupState.HasAnyOperatorDeviceToken(...) now correctly scans both the legacy root identity and modern per-gateway identity directories under gateways/<gateway-id>/device-key-ed25519.json. However, OnboardingExistingConfigGuard.GetSummary() still only checks:

HasOperatorDeviceToken: DeviceIdentity.HasStoredDeviceToken(_identityDataPath)

That only sees the legacy root device-key-ed25519.json. Modern fresh pairings store the operator token in the per-gateway identity directory, so a user with only a per-gateway token can be treated as having no existing operator identity by the onboarding replacement guard.

Why this matters:

  • Startup may now correctly skip onboarding for these users.
  • But if they manually open Setup/Reconfigure, or reach local setup through another path, SetupWarningPage / LocalSetupProgressPage can fail to show/block on the existing-config warning.
  • That means a modern paired user could accidentally overwrite a working gateway setup without the intended “Replace my setup / Keep my setup” protection.

Suggested fix:

  • Reuse the same token detection semantics from StartupSetupState in OnboardingExistingConfigGuard, or extract a shared helper so both paths agree.
  • Add a test that creates only gateways/<id>/device-key-ed25519.json and asserts OnboardingExistingConfigGuard.HasExistingConfiguration() is true and GetSummary().HasOperatorDeviceToken is true.

Example test shape:

[Fact]
public void HasExistingConfiguration_ReturnsTrue_WhenOperatorTokenStoredOnlyInPerGatewayDir()
{
    using var temp = TempSettings.Create();
    var perGatewayDir = Path.Combine(temp.Path, "gateways", "gw-abc");
    Directory.CreateDirectory(perGatewayDir);
    StoreDeviceToken(perGatewayDir);

    var settings = new SettingsManager(temp.Path);
    var guard = new OnboardingExistingConfigGuard(settings, temp.Path, setupStatePath: Path.Combine(temp.Path, "setup-state.json"));

    var summary = guard.GetSummary();
    Assert.True(summary.HasOperatorDeviceToken);
    Assert.True(summary.HasAny);
}

Lower-confidence / consider while touching this area

Startup token scan may treat orphan gateway identity dirs as usable

StartupSetupState.HasAnyOperatorDeviceToken(...) currently accepts any token under gateways/*. That fixes modern pairings, but it can also make an orphan identity directory count as usable even if it does not correspond to the active gateway record in gateways.json or the configured target.

The safer long-term model is probably:

  1. Prefer the active GatewayRegistry record and check that record’s identity directory.
  2. Fall back to legacy root identity only for migration/back-compat.
  3. Avoid letting an arbitrary leftover gateways/<old-id> directory suppress onboarding for a different active/configured gateway.

I would not necessarily block this PR on the full registry-aware refactor if you want to keep it small, but it is worth either addressing now or tracking as a follow-up.

OnboardingState.Dismiss() could be idempotent

Dismiss() currently raises Dismissed every time it is called. The current window guards make the blast radius small, but since this is now a lifecycle signal, it would be more robust to make it idempotent:

private bool _dismissed;

public void Dismiss()
{
    if (_dismissed) return;
    _dismissed = true;
    Logger.Info("[OnboardingState] Dismiss invoked (user chose to keep existing setup)");
    Dismissed?.Invoke(this, EventArgs.Empty);
}

This is not a blocker, just a hardening suggestion.

Summary

Please fix the per-gateway identity detection in OnboardingExistingConfigGuard before merge. That is the one actionable gap both review passes agreed on. After that, this should be in good shape.

…smiss

Addresses Scott Hanselman's review on PR openclaw#340:

Blocking fix:
- OnboardingExistingConfigGuard.GetSummary().HasOperatorDeviceToken only
  checked DeviceIdentity.HasStoredDeviceToken on the legacy root path.
  Modern pairings store the operator token at
  <dataPath>/gateways/<id>/device-key-ed25519.json via DeviceIdentityStore,
  so a fresh-paired user opening Setup/Reconfigure could overwrite a
  working gateway without seeing the "Replace my setup / Keep my setup"
  warning.
- Extracted the per-gateway scan (previously private to StartupSetupState)
  to OnboardingExistingConfigGuard.HasAnyOperatorDeviceToken as the single
  source of truth. StartupSetupState.HasUsableOperatorConfiguration and
  GetSummary() both call it now, so the startup auto-launch decision and
  the in-wizard guard always agree on what counts as paired.

Hardening (Scott's lower-confidence suggestion):
- OnboardingState.Dismiss() is now idempotent. A double-click or repeated
  handler invocation no longer fires the lifecycle signal twice.

Tests added:
- OnboardingExistingConfigGuardTests.HasExistingConfiguration_ReturnsTrue_
  WhenOperatorTokenStoredOnlyInPerGatewayDir — Scott's exact test shape.
- OnboardingStateTests.Dismiss_IsIdempotent_FiresDismissedAtMostOnce.

Follow-up tracked separately (per Scott's note):
- Make the startup token scan registry-aware (prefer the active
  GatewayRegistry record's identity dir over arbitrary gateways/* dirs)
  to avoid orphan dirs from suppressing onboarding for a different
  active gateway.

Validation:
- ./build.ps1 succeeded
- Shared.Tests: 1548 passed, 28 skipped
- Tray.Tests: 1182 passed (+2 new)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@indierawk2k2
Copy link
Copy Markdown
Contributor Author

Thanks @shanselman — addressed in f65b742 (now pushed to PR).

Blocking fix

Extracted the per-gateway scan to OnboardingExistingConfigGuard.HasAnyOperatorDeviceToken as the single source of truth (was private to StartupSetupState). Both StartupSetupState.HasUsableOperatorConfiguration and OnboardingExistingConfigGuard.GetSummary().HasOperatorDeviceToken call it now, so the startup decision and the in-wizard guard agree on what counts as paired. Used your exact test shape — HasExistingConfiguration_ReturnsTrue_WhenOperatorTokenStoredOnlyInPerGatewayDir — and asserted both summary.HasOperatorDeviceToken and HasExistingConfiguration().

Hardening (also done)

Made OnboardingState.Dismiss() idempotent. Added Dismiss_IsIdempotent_FiresDismissedAtMostOnce test.

Deferred follow-up

The orphan-gateway-dir concern (registry-aware token resolution preferring the active record's identity dir over arbitrary gateways/* dirs) is real but a bigger refactor. Per your suggestion to keep this PR small, I'm tracking it as a follow-up rather than folding it in here. Happy to file an issue if you'd like a tracker.

Validation

  • ./build.ps1
  • OpenClaw.Shared.Tests: 1548 passed, 28 skipped
  • OpenClaw.Tray.Tests: 1182 passed (+2 new vs previous push)

@shanselman shanselman merged commit b2f537b into openclaw:master May 14, 2026
9 checks passed
@shanselman
Copy link
Copy Markdown
Contributor

I'm still clicking keep and it's still being ignored, and it's still sending me to the next page. The same dialog pops up when I run it again, every time. @indierawk2k2

@shanselman
Copy link
Copy Markdown
Contributor

Follow-up from manual repro after merge: the Keep my setup dismissal path itself works, but startup setup detection still missed one local-node shape.

Sanitized repro state:

  • EnableNodeMode=true
  • root %APPDATA%\OpenClawTray\device-key-ed25519.json has no NodeDeviceToken
  • per-gateway %APPDATA%\OpenClawTray\gateways\<gateway-id>\device-key-ed25519.json has NodeDeviceToken

Broken-run log excerpt:

[OnboardingState] Dismiss invoked (user chose to keep existing setup)
[OnboardingWindow] Dismissed by user (keep-existing-setup) — closing without completing
...
[OnboardingWindow] OnWizardComplete skipping chat launch; route=SetupWarning, setupStillRequired=True

Root cause: #340 added the per-gateway scan for operator tokens, but StartupSetupState.HasStoredNodeDeviceToken still checked only the legacy root identity file. In node mode that made RequiresSetup(...) stay true even though the node token existed under the active gateway identity directory.

I opened follow-up PR #375 with the fix and regression tests. After rebuilding x64, relaunch no longer opens onboarding; the post-launch log shows normal node auth instead:

Connecting to last successful gateway during startup: ws://localhost:18789 (identity.DeviceToken)
Connecting to node: ws://localhost:18789
[HANDSHAKE]   isBootstrap=False, hasNodeDeviceToken=True
Node registered successfully! ID: c340a12b10f1dacb

indierawk2k2 pushed a commit to indierawk2k2/openclaw-windows-node that referenced this pull request May 14, 2026
…+ wire V2 dismiss

Our V2 branch was cut at a03c454 (before PR openclaw#340 landed in master). PR
openclaw#340 fixed two real bugs in the legacy onboarding flow: "Keep my setup"
didn't actually dismiss the wizard, and ExistingConfigGuard only scanned
the legacy root identity dir for operator tokens (per-gateway tokens
under gateways/<id>/device-key-ed25519.json were missed). PR openclaw#375
(Scott's, still open) extends the per-gateway scan symmetrically to the
node-token role, fixing a startup-relaunch-reopens-onboarding bug for
local node-mode profiles whose node token lives only per-gateway.

This commit:

1. Merges origin/master to absorb PR openclaw#340's three commits (Dismissed
   event/method on OnboardingState, OnboardingWindow.OnOnboardingDismissed
   handler, SetupWarningPage CancelReplace -> Props.Dismiss, removed
   default-to-Advanced in returning-user path, HasAnyOperatorDeviceToken
   per-gateway scan, StartupSetupState rewrite). Auto-resolved a single
   conflict in OnboardingWindow.cs via the ORT strategy.

2. Applies PR openclaw#375's symmetric per-gateway scan locally:
   - Refactored OnboardingExistingConfigGuard.HasAnyOperatorDeviceToken
     to delegate to a new HasAnyDeviceTokenForRole(dataPath, role).
   - GetSummary now uses HasAnyDeviceTokenForRole for the node side too,
     not just operator (the asymmetric defect that PR openclaw#375 catches).
   - StartupSetupState.HasStoredNodeDeviceToken now delegates to the
     guard's per-gateway-aware helper so the startup auto-launch
     decision and the in-wizard guard agree.

3. Wires V2 equivalents of PR openclaw#340's Dismiss machinery so the V2
   redesigned UI gets the same fix, not just the legacy UI:
   - OnboardingV2State adds Dismissed event + idempotent Dismiss()
     method (matches legacy OnboardingState.Dismiss semantics: log,
     fire once).
   - V2 WelcomePage's "Keep my setup" handler now calls Props.Dismiss()
     instead of just toggling setConfirmingReplace(false). Mirrors
     legacy SetupWarningPage CancelReplace post-PR-openclaw#340.
   - OnboardingV2Bridge surfaces a Dismissed event (subscribes to V2
     state, forwards to host).
   - OnboardingWindow's V2 mount now subscribes to bridge.Dismissed and
     reuses the existing _dismissedWithoutCompletion flag + Close
     pattern from PR openclaw#340's OnOnboardingDismissed so OnClosed skips
     TryCompleteOnboarding the same way for both legacy and V2 dismiss
     paths.

Validation:
  build.ps1 -> all 4 projects green
  Shared.Tests -> 1548 passed / 28 skipped / 0 failed
  Tray.Tests -> 1178 passed / 0 failed (was 1164 -- +14 from PR openclaw#340's
  OnboardingExistingConfigGuardTests + OnboardingStateTests +
  StartupSetupStateTests new coverage we just absorbed)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
indierawk2k2 pushed a commit to indierawk2k2/openclaw-windows-node that referenced this pull request May 15, 2026
…+ wire V2 dismiss

Our V2 branch was cut at a03c454 (before PR openclaw#340 landed in master). PR
didn't actually dismiss the wizard, and ExistingConfigGuard only scanned
the legacy root identity dir for operator tokens (per-gateway tokens
under gateways/<id>/device-key-ed25519.json were missed). PR openclaw#375
(Scott's, still open) extends the per-gateway scan symmetrically to the
node-token role, fixing a startup-relaunch-reopens-onboarding bug for
local node-mode profiles whose node token lives only per-gateway.

This commit:

1. Merges origin/master to absorb PR openclaw#340's three commits (Dismissed
   event/method on OnboardingState, OnboardingWindow.OnOnboardingDismissed
   handler, SetupWarningPage CancelReplace -> Props.Dismiss, removed
   default-to-Advanced in returning-user path, HasAnyOperatorDeviceToken
   per-gateway scan, StartupSetupState rewrite). Auto-resolved a single
   conflict in OnboardingWindow.cs via the ORT strategy.

2. Applies PR openclaw#375's symmetric per-gateway scan locally:
   - Refactored OnboardingExistingConfigGuard.HasAnyOperatorDeviceToken
     to delegate to a new HasAnyDeviceTokenForRole(dataPath, role).
   - GetSummary now uses HasAnyDeviceTokenForRole for the node side too,
     not just operator (the asymmetric defect that PR openclaw#375 catches).
   - StartupSetupState.HasStoredNodeDeviceToken now delegates to the
     guard's per-gateway-aware helper so the startup auto-launch
     decision and the in-wizard guard agree.

3. Wires V2 equivalents of PR openclaw#340's Dismiss machinery so the V2
   redesigned UI gets the same fix, not just the legacy UI:
   - OnboardingV2State adds Dismissed event + idempotent Dismiss()
     method (matches legacy OnboardingState.Dismiss semantics: log,
     fire once).
   - V2 WelcomePage's "Keep my setup" handler now calls Props.Dismiss()
     instead of just toggling setConfirmingReplace(false). Mirrors
     legacy SetupWarningPage CancelReplace post-PR-openclaw#340.
   - OnboardingV2Bridge surfaces a Dismissed event (subscribes to V2
     state, forwards to host).
   - OnboardingWindow's V2 mount now subscribes to bridge.Dismissed and
     reuses the existing _dismissedWithoutCompletion flag + Close
     pattern from PR openclaw#340's OnOnboardingDismissed so OnClosed skips
     TryCompleteOnboarding the same way for both legacy and V2 dismiss
     paths.

Validation:
  build.ps1 -> all 4 projects green
  Shared.Tests -> 1548 passed / 28 skipped / 0 failed
  Tray.Tests -> 1178 passed / 0 failed (was 1164 -- +14 from PR openclaw#340's
  OnboardingExistingConfigGuardTests + OnboardingStateTests +
  StartupSetupStateTests new coverage we just absorbed)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants