Skip to content

V2 onboarding redesign (mount + bridge + theme + Hanselman fixes)#381

Closed
indierawk2k2 wants to merge 33 commits into
openclaw:masterfrom
indierawk2k2:feat/setup-flow-redesign
Closed

V2 onboarding redesign (mount + bridge + theme + Hanselman fixes)#381
indierawk2k2 wants to merge 33 commits into
openclaw:masterfrom
indierawk2k2:feat/setup-flow-redesign

Conversation

@indierawk2k2
Copy link
Copy Markdown
Contributor

V2 onboarding redesign

Replaces the legacy onboarding wizard with a redesigned 5-page V2 flow built from designer mocks (tools/v2-design-refs/). V2 is mounted by default in OnboardingWindow; OPENCLAW_USE_V2_SETUP=0 is a kill-switch fallback to legacy for one release cycle.

Pages (in order): Welcome → LocalSetupProgress → GatewayWelcome → Permissions → AllSet

Architecture

  • src/OpenClawTray.OnboardingV2/ — new class library: state, app shell, 5 page components, animations, theme palette, V2Strings, system theme detection. Builds against OpenClawTray.FunctionalUI (the existing minimal-Reactor framework).
  • src/OpenClaw.SetupPreview/ — standalone WinUI 3 unpackaged exe for the inner-loop. Mounts the V2 tree against a fake OnboardingV2State. Headless capture mode for visual diffs; F2 to cycle themes; fake stage-progression timer for designer review.
  • src/OpenClaw.Tray.WinUI/Onboarding/V2/OnboardingV2Bridge.cs — translates real tray services to OnboardingV2State. All cross-thread mutations marshal through DispatcherQueue.
  • src/OpenClaw.Tray.WinUI/Onboarding/OnboardingWindow.cs — host: V2/legacy mount selection, Advanced→legacy round-trip with bridge-back, completion re-keyed for V2 AllSet.
  • tools/v2_visual_diff.py — renders V2 pages headless via SetupPreview and produces side-by-side comparisons against designer PNGs (LLM eval rather than pixel diff per design-team request).
  • tools/seed_v2_resw.py — idempotent script that adds every V2_* key to all 5 .resw locales.

Service wiring (bridge)

Real service V2 state field Notes
LocalGatewaySetupEngine.StateChanged LocalSetupRows, LocalSetupErrorMessage Reuses existing LocalSetupProgressStageMap (no fork — V2 picks up every legacy bug-fix)
PermissionChecker.CheckAllAsync + SubscribeToAccessChanges Permissions (PermissionRowSnapshot list)
Settings.GetEffectiveGatewayUrl GatewayUrl ws://http:// flip for browser-launch link
Settings.AutoStartLaunchAtStartup LaunchAtStartup Two-way: initial pull from settings, toggle change persists via Settings.Save()
Settings.EnableNodeMode NodeModeActive Engine flips it mid-run; bridge mirrors
OnboardingExistingConfigGuard.GetSummary() ExistingConfig snapshot Welcome warn-and-confirm UI
App.ConnectionManager.ReconnectAsync() (side effect) Re-init operator client on Status=Complete (PairAsync flips EnableNodeMode mid-run, which kills auto-connect at startup)

Easy-button path is logic-identical to v1

The OnboardingV2Bridge restores v1 LocalSetupProgressPage behaviour 1:1:

  • replaceConfirmed flag forwarded from V2 state to engine factory and to legacy OnboardingState
  • Pre-advance gateway client re-init on Status=Complete (calls ConnectionManager.ReconnectAsync, seeds legacyState.GatewayClient)
  • _advanceFiredForCompletion instance guard
  • 1-second pause before advance for visual settling
  • Existing-config defense check before starting engine
  • engine_construct_failed surfaced as user-facing error
  • lastRunningPhase derived from state.History

The only logic changes are around Advanced setup and the new V2 flow shell.

Theme support

  • V2Theme.cs centralises every theme-aware brush (window/card backgrounds, text, accents, error/warning cards, transparent button overlays, step-dot inactive). Dark mirrors designer mocks; light patterns off WinUI 3 Fluent defaults; brand accents constant across themes per designer spec.
  • V2SystemTheme.cs resolves Windows app-mode color via UISettings.GetColorValue(Foreground) (foreground white = dark mode, black = light mode). Application.Current.RequestedTheme is unreliable on unpackaged WinUI 3 apps.
  • OnboardingV2State.ThemeMode (System/Light/Dark) drives EffectiveTheme. Bridge resubscribes to UISettings.ColorValuesChanged so live system theme changes flip the UI.
  • All 5 V2 pages + shell + StepDots refactored to look up colours via V2Theme. Permissions row icons sit inside a constant-colour dark badge so the designer's white-on-dark PNGs stay visible on light cards.

Advanced setup → legacy Connection round-trip

Welcome's "Advanced setup" link raises V2State.AdvancedSetupRequested → host catches it via the bridge and re-mounts the same FunctionalHostControl onto the legacy OnboardingApp at OnboardingRoute.Connection (no second window). Legacy OnboardingState is constructed up-front so it's available for this swap.

After legacy Connection completes (RouteChanged past Connection), OnRouteChanged re-mounts V2 at the closest equivalent route. Wizard / Permissions / Ready all map to V2 Permissions — Advanced assumes the user is connecting to an already-configured gateway, so the legacy wizard step is intentionally skipped.

A fresh OnboardingV2Bridge is created on bridge-back so the V2 tail (Permissions → AllSet) has live Finished / AutoStart-persist / Refresh / engine wiring.

Cutover (docs/ONBOARDING_V2.md)

  • OnboardingWindow.TryCompleteOnboarding recognises both legacy OnboardingRoute.Ready and V2Route.AllSet as terminals
  • V2Strings.Resolver = LocalizationHelper.GetString so V2 reads from the same .resw resources as legacy
  • 42 V2_* keys seeded into all 5 locales (en-us / fr-fr / nl-nl / zh-cn / zh-tw) with English values via tools/seed_v2_resw.py. LocalizationValidationTests taught (key.StartsWith("V2_")) that V2 strings are intentionally English-only at first ship
  • Existing LocalSetupProgressStageMap reused (NOT forked) for stage→row mapping

Hanselman dual-model code review

Ran an adversarial Opus + Codex review against the branch. 9 findings:

Finding Consensus Status
Bridge not recreated after Advanced→V2 round-trip — Finish/Refresh/AutoStart/engine all dead HIGH fixed
Permissions auto-property doesn't NotifyChanged — V2 never re-rendered real permission state HIGH (Codex caught, Opus missed) fixed
"Try again" button stub (no retry pipeline) HIGH fixedRetryRequested event + bridge handler resets engine state
PreviewWindow UISettings subscription leak HIGH fixed — moved to fields + Closed handler
NodeModeActive one-way OR latch HIGH fixed — direct assignment
MapToV2Rows (V2Stage)i cast assumes ordering LOW fixed — explicit StageOrder array + length-mismatch guard
PermissionsPage.AllGranted static V2Strings init LOW (Opus only) deferred — preview-only fallback rows, harmless today
OnboardingCompletionPolicy doesn't gate V2 AllSet LOW (Opus only) deferred — engine must succeed before AllSet renders
WebView2 handlers stack on retry LOW (Codex only) deferred — pre-existing in master

Validation

  • ./build.ps1 ✅ all four projects
  • OpenClaw.Shared.Tests ✅ 1548 passed / 28 skipped / 0 failed
  • OpenClaw.Tray.Tests ✅ 1164 passed / 0 failed
  • LLM-eval visual sweep: all 5 pages (+ progress-failed + allset-no-node variants) match designer mocks in both light and dark themes

Manual test pass

Multiple full end-to-end runs from a clean slate (./scripts/dev-reset-rebuild-launch.ps1 -SkipBuild -DontLaunch -WipeWslDistro):

  • Welcome → Set up locally → 7-stage WSL install → GatewayWelcome (with embedded provider/model wizard) → Permissions → AllSet → Finish ✅
  • Welcome → Advanced → legacy Connection → V2 Permissions → V2 AllSet → Finish ✅
  • Existing-config detected → Welcome warn-and-confirm → Replace → engine runs ✅
  • Light + Dark + System themes all render correctly; F2 cycles in the preview

Out of scope (follow-up PRs)

  • Hatching conversation doesn't fire after V2 Finish — confirmed pre-existing master bug. The hatching bootstrap (OnboardingChatBootstrapper.BootstrapAsync) only runs in the WebView2 chat surface; the native FunctionalUI chat surface (default, added in 08cab69) doesn't have an equivalent. My V2 path didn't touch any chat code.
  • V2-styled provider/model wizard — currently the legacy WizardPage is embedded inside V2 GatewayWelcome via a factory closure. The V2 chrome owns the title; the legacy wizard owns the body. A future PR should redesign the wizard step to share V2 card chrome.
  • Real translations for V2_ keys* — currently English in all 5 locales.
  • Delete legacy onboarding pagesOnboardingApp + WelcomePage + SetupWarning + Wizard + LocalSetupProgress (legacy) + Permissions (legacy) + Ready + Chat are dead code in V2 mode but reachable via kill-switch + Advanced fallback. Deleting requires test refactoring; scope as its own PR.
  • Drop the OPENCLAW_USE_V2_SETUP=0 kill-switch after one release cycle on V2.

Co-author

Co-authored with Copilot.

Copilot AI and others added 26 commits May 13, 2026 21:02
Adds hero icons (Lobster, PartyPopper), permission row icons (Notifications,
Camera, Mic, Location, ScreenCapture), checklist badges, info-bar badge,
and custom title-bar chrome PNGs into Assets/Setup/ for the new V2
onboarding UX. Also commits the seven designer screen mockups under
tools/v2-design-refs/ as the fixed source of truth for the upcoming
visual-validation tooling.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds two new projects to support the V2 setup flow inner loop:

  * src/OpenClawTray.OnboardingV2/  — class library that owns the new
    onboarding components, services, and widgets. Namespace
    OpenClawTray.Onboarding.V2. Referenced by both the preview exe and
    (at cutover) by OpenClaw.Tray.WinUI. Currently exposes
    OnboardingV2App (placeholder root), OnboardingV2State, and V2Route.

  * src/OpenClaw.SetupPreview/      — standalone WinUI 3 exe with no
    real services and a fixed 720 x 966 dip window (aspect-matched to
    the designer mocks). Headless capture mode driven by env vars:
      OPENCLAW_PREVIEW_CAPTURE=1
      OPENCLAW_PREVIEW_PAGE=Welcome|LocalSetupProgress|...
      OPENCLAW_PREVIEW_CAPTURE_PATH=...png
      OPENCLAW_PREVIEW_NODE_MODE=1
    Renders the page, captures via RenderTargetBitmap, writes the PNG,
    and exits with code 0 (or 1 + .error.json on failure). Smoke test:
    Welcome route captures a 1414x1816 PNG ready for visual diffing.

Build wiring: explicit Program.cs Main avoids EnableMsixTooling /
Package.appxmanifest. Setup assets are linked from the Tray project so
both surfaces render byte-identical PNGs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The visual-validation discipline (see plan.md) requires a reproducible
capture-and-compare pass for every page in the OnboardingV2 redesign.
This tool spawns the SetupPreview exe in headless capture mode,
resizes the rendered PNG to the designer canvas size, computes SSIM,
mean delta-E, and connected-component bounding boxes of mismatched
regions, and writes a 3-up diff.png (expected | actual | overlay)
plus a report.json under out/v2-visual/<page>/.

Notable design choices:
* Page registry maps logical names (welcome, progress-running,
  progress-failed, gateway, permissions, allset, allset-no-node) to
  V2 routes + scenario env vars so all loop discipline runs through one
  CLI: 'python tools/v2_visual_diff.py --page <name>' or '--all'.
* Bbox connected-component analysis short-circuits when > 25%% of pixels
  differ (early iterations would otherwise spend minutes on millions
  of micro-clusters).
* to_canvas alpha-composites onto a neutral dark background so captures
  with transparent regions (Mica backdrop is not visible to RTB) read
  correctly in the side-by-side.

Smoke test against the placeholder shell: SSIM 0.108, 69.7%% pixels diff
(expected — page not implemented yet), 3-up diff.png renders correctly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The V2 shell pieces:

  * OnboardingV2App is a Component<OnboardingV2State> that owns route
    navigation and renders a Grid of [page area | nav bar pinned to
    bottom] with a 200ms cross-fade page transition. Welcome route
    hides the nav bar so the design's first impression has no chrome.

  * Nav bar layout is a 3-column Grid: dot indicator pinned bottom-left,
    star spacer in the middle, [Back] [Next/Finish] pinned bottom-right.
    The Next button is the design accent #60C8F8 with semibold dark text,
    matching Dialog-1..Dialog-5.

  * StepDots widget renders Total dots, scaling the active one to 10px
    with the accent colour and the others to 8px in #5A5A5A.

  * Five page placeholders (Welcome, LocalSetupProgress, GatewayWelcome,
    Permissions, AllSet) are wired in with the canonical PageOrder so
    routing + dot indicator + nav bar visibility all behave correctly.
    Real page content lands in subsequent page-* todos and is gated by
    the matching vv-* visual-validation loop.

PreviewWindow updates:
  * Replaces MicaBackdrop with a flat #202020 SolidColorBrush so
    RenderTargetBitmap captures the real shell instead of a Mica-
    transparent black hole.
  * Adds a custom XAML title bar (lobster icon + 'OpenClaw Setup' text)
    via ExtendsContentIntoTitleBar + SetTitleBar; styles the system
    min/max/close caption buttons to the dark palette.
  * Window is sized 720 x 885 dip, an 0.813 aspect ratio that matches
    the 2010 x 2472 designer mock canvas exactly.

App.xaml now sets RequestedTheme=Dark so TextBlocks render light-on-dark
without per-element foreground overrides.

Smoke vv check on progress-running: SSIM 0.91, layout matches at the
shell level; remaining diff is page content that lands later.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Welcome page (Dialog.png) lays out the lobster hero, title, body copy,
info card with blue (i) badge, primary 'Set up locally' accent button
(focus ring suppressed), and 'Advanced setup' hyperlink, in the
designer-matched proportions. Side-by-side review against Dialog.png
shows content + colour + structure matching; remaining differences
are designer mock-canvas decorations (~50px outer glow + drop shadow)
that don't appear in the real app.

v2_visual_diff.py simplified: dropped SSIM, scipy, scikit-image, and
the PASS/FAIL gate. The tool now captures the preview and renders a
clean 2-up side-by-side (designer | actual). Pixel-level metrics were
tried and produced more noise than signal — the designer mock canvas
shadow created a systematic positional offset that drowned out real
diffs, and font anti-aliasing flagged false positives even after
cropping. The validation gate is now Copilot's eyes on the
side-by-side, with snapshot-regression deferred to cutover for the
separate problem of 'catch unintentional changes vs approved render'.

Title-bar lobster bumped to 18px UniformToFill so it actually reads
as a lobster instead of a blurry smudge.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds the seven-stage checklist page for the local-setup flow:

  * Title 'Setting up locally' + subtitle 'Creating OpenClaw Gateway
    WSL instance', both centered.
  * Seven left-aligned stage rows (Check system, Installing Ubuntu,
    Configuring instance, Installing OpenClaw, Preparing gateway,
    Starting gateway, Generating setup code), each with a status
    badge on the right.
  * Status badges:
      - Done    \u2192 24px green (#2BC36F) circle with white tick
      - Running \u2192 24px ProgressRing tinted #60C8F8
      - Failed  \u2192 24px pink (#F4A6B0) circle with dark X
      - Idle    \u2192 24px transparent placeholder (preserves alignment)
  * Inline error card with maroon background (#3D1818), body copy +
    'Try again' button (#551B20 bg), slides directly under the failed
    row. Driven by state.LocalSetupErrorMessage being non-null.

PreviewWindow gains env-driven fake state so the visual loop can
exercise both frames without a real engine:
  OPENCLAW_PREVIEW_PROGRESS_FROZEN_STAGE=PreparingGateway
  OPENCLAW_PREVIEW_FAIL_STAGE=StartingGateway

Shell fix: the dot indicator counts only the chromed pages (4, not 5)
so it matches Dialog-1..Dialog-5 — Welcome has no dot. Active index is
pageIndex-1 clamped to 0.

Side-by-side review against Dialog-1 and Dialog-6 confirms title,
subtitle, all seven rows, all three badge styles, error card, and
Try-again button all match the designer.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Renders the gateway welcome card after local setup completes:
title 'Configuring gateway' + dark card (#2C2C2C, 8px radius)
containing the welcome heading, two paragraphs explaining the local
URL and data-stays-local privacy posture, and an external-link
hyperlink 'Open http://localhost:18789 in browser' below the card.

Fixes designer-canvas typos in our copy: localhost18789 -> http://
localhost:18789, 'Stays' -> 'stays'. Real Launcher.LaunchUriAsync
wiring lands at cutover; the preview link is a no-op.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Five-row permission status list with the all-granted scenario:

  * Title 'Grant permissions' + intro paragraph centered.
  * Five row cards (Notifications, Camera, Microphone, Location
    optional, Screen Capture). Each card: 28px icon + title (15pt
    SemiBold) + green 'Enabled' status. The Screen Capture row uses
    'Available - uses picker per capture' status instead of the
    Open Settings link, matching Dialog-5.
  * 'Refresh status' button bottom-right under the list.
  * Card bg #2C2C2C with 8px corner radius; status colour #6DC868
    accent green.

Real PermissionChecker wiring lands at cutover; for now the row data
is hard-coded to all-granted (and OPENCLAW_PREVIEW_PERMS_SCENARIO is
plumbed into PreviewWindow but currently only that one scenario is
implemented).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
AllSetPage: party-popper hero, `All set!` title, optional Node-Mode warning band, and Launch-at-startup row with explicit On/Off label to the left of the toggle (matches designer Dialog-4).

v2_visual_diff.py: drop dead duplicate SSIM code below the live main(), and always run an incremental `dotnet build` once per invocation so live code edits are picked up automatically (cached for --all so it builds once, not per page).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a small `V2Animations` static helper that wires WinUI Composition animations onto FunctionalUI elements via the .Set escape hatch:

  * Welcome lobster: continuous breathing pulse (1.0 -> 1.025 -> 1.0 every ~4.2s).
  * Welcome info card: 360ms fade-in delayed 200ms after page load.
  * AllSet party popper: 520ms scale pop-in (0.6 -> 1.05 -> 1.0) + fade.
  * Progress error card: 320ms slide-in from 14px below + fade.

All animations are gated by V2Animations.DisableForCapture, which the SetupPreview headless capture path sets to true. This keeps the visual diff PNGs deterministic regardless of animation phase, so the LLM-eval validation loop is not flaky. Verified by re-running --all and confirming all 7 captures still match the designer.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Reduce custom title-bar lobster from 18px Stretch.UniformToFill to 14px Stretch.Uniform with an 8px right margin, drop the lead-in padding from 16 to 14px, and add AutomationProperties.Name so screen readers announce the custom chrome correctly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add OpenClawTray.Onboarding.V2.V2Strings — a static key-+-default-English dictionary with a settable Resolver delegate. Pages call V2Strings.Get("V2_...") and the resolver defaults to the built-in English dictionary so SetupPreview and unit tests work with no ResourceManager wired. At cutover the Tray host overrides Resolver = LocalizationHelper.GetString and adds the keys to all five .resw files.

Replace every hard-coded English string in OnboardingV2 (Welcome, LocalSetupProgress, GatewayWelcome, Permissions, AllSet pages, plus the shell nav buttons) with V2Strings.Get(...) calls. Fixed designer typos in the process: `localhost18789` -> `http://localhost:18789`, `Stays` -> `stays` in Dialog-2; `Acvtive` is intentionally kept in the *designer* reference PNG but my source string already says `Active`.

Visual sweep: --all confirms every captured page still matches the designer mock.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- OnboardingV2State: add StateChanged event, property setters notify, plus
  cutover-staged shape (GatewayUrl, GatewayHealthy, LaunchAtStartup,
  Permissions snapshot list with PermissionRowSnapshot record and
  PermissionSeverity enum) so service wiring lands without re-shaping.
- OnboardingV2App: subscribe to StateChanged in UseEffect and bump a
  renderTick UseState so V2 re-renders when services mutate state.
- V2Strings.Get: fall back to DefaultEnUs when resolver returns null,
  empty, or echoes the key, so V2 never renders raw resource keys
  during incremental localization rollout.
- Animations: introduce ShouldAnimate predicate honoring both
  DisableForCapture and UISettings.AnimationsEnabled (reduce-motion).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Re-enable system focus visuals on every V2 button (was UseSystemFocusVisuals=false
  on Welcome Set-up-locally, Gateway Advanced, Permissions Refresh & Open-Settings,
  Progress Try-again). Keyboard users now get the standard cyan focus ring.
- Add AutomationProperties.AutomationId to nav Back/Next/Finish (Next vs Finish
  decided per page, so the id reflects the current label) and to every page CTA
  (Set up locally, Advanced setup, Refresh status, Open Settings, Try again,
  Launch-at-startup toggle). Stable ids unblock UI automation.
- Add AutomationProperties.Name on the AllSet startup toggle (uses the visible
  ''Launch OpenClaw at startup?'' string so screen readers announce intent).
- PreviewWindow: replace the hard-coded 138 DIP title-bar right padding with a
  RightInset-derived value. Reads AppWindow.TitleBar.RightInset (physical px),
  divides by XamlRoot.RasterizationScale, and re-applies on AppWindow.Changed.
  Falls back to 138 DIP if either lookup is unavailable.

Visual sweep (python tools/v2_visual_diff.py --all) reproduces identical output;
focus rings only appear on keyboard focus and are absent in capture-mode PNGs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Captures the rubber-duck critique outputs as a follow-up-PR-ready spec:

- Where the V2 code lives (project map: OnboardingV2 lib + SetupPreview exe).
- How the inner loop works (edit -> v2_visual_diff.py -> view PNG -> iterate).
- Cutover plan: OnboardingWindow mount swap, Wizard-step decision, stage-map
  reuse, advanced-setup wiring, completion remap, state plumbing table, .resw
  rollout to all 5 locales.
- Animation discipline: ShouldAnimate predicate gates DisableForCapture +
  reduce-motion (UISettings.AnimationsEnabled).
- Accessibility checklist: focus visuals, AutomationIds, AutomationName,
  + remaining manual gates (keyboard nav, screen-reader smoke).
- Visual validation: how to run, what to inspect, design typo policy.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Re-enabling system focus visuals on V2 buttons (a11y improvement in the
previous commit) caused the first focusable in tab order to render with
a cyan focus ring in capture-mode PNGs. Visible on the gateway page
(Open ... in browser link) and the progress-failed page (Try again).

Fix: in CaptureAndExitAsync, add a zero-size, non-hit-testable, non-
opaque ContentControl as a focus sink, programmatically focus it, give
it one frame to settle, then RenderTargetBitmap and remove. Captures
are deterministic again; live UI focus visuals are unaffected.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… fake timer)

5 bugs surfaced from manual walkthrough of the live preview:

1. Nav dots clipped/shifted left vs comp. Replaced Margin(40,0,0,0) on the
   dots and Margin(0,0,40,0) on the buttons with a uniform Padding(60,24,60,32)
   on the nav bar so both ends of the chrome sit ~60 DIP inside the dialog.

2. No way to see the LocalSetupProgress checklist actually progress. Added a
   DispatcherQueueTimer in interactive (non-capture) mode that starts the
   first stage spinning, then promotes one row at a time (~900 ms per step)
   and loops at the end so a designer can keep watching. Skipped when the
   PROGRESS_FROZEN_STAGE / FAIL_STAGE env vars are set (deterministic capture).

3. Window had square corners. Setting SystemBackdrop=null (so captures aren''t
   transparent over Mica) also unhooked Windows 11''s default rounding. Added
   TryApplyRoundedCorners() that sets DwmSetWindowAttribute with
   WINDOW_CORNER_PREFERENCE = ROUND on the window HWND. Silent no-op on
   Windows 10 / failure.

4. Permissions page used Unicode arrow / refresh glyphs (\u2197, \u21BB) which
   rendered tiny vs the comp. Replaced both with Segoe Fluent Icons via a new
   GlyphButtonContent helper: Open Settings uses \uE8A7 (OpenInNewWindow,
   square-with-arrow) and Refresh status uses \uE72C (Refresh, circular arrow),
   both at 16pt to match Dialog-5.

5. AllSet defaulted NodeModeActive=false so the amber Node-Mode-Active card
   was hidden in the live preview. The design treats Node Mode as the default
   for a Windows tray install, so PreviewWindow now seeds NodeModeActive=true
   before ApplyEnvOverrides (env override still wins).

Visual sweep regenerated; permissions and allset captures now show the new
icons / amber card. Spinner on progress-running is a capture-mode artifact
(ProgressRing snapshot mid-frame); live UX shows it animated.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* V2Theme.cs centralises every theme-aware brush used by the V2 pages:
  window/card backgrounds, primary/secondary/subtle text, accents (cyan,
  green, badge pink/amber), error and warning cards, transparent button
  overlays, step-dot inactive colour. Dark values mirror the designer
  mocks; light values pattern off WinUI 3 Fluent defaults. Brand accents
  stay constant across themes per the designer''s spec. Brushes are
  memoised per (role, theme) tuple.

* OnboardingV2State adds V2ThemeMode (System/Light/Dark) for the user''s
  preference and EffectiveTheme (ElementTheme) for the resolved value.
  Both raise StateChanged. Pages read EffectiveTheme in Render() and
  forward it to V2Theme accessors so the visual tree re-renders on
  theme change. EffectiveTheme coerces Default -> Dark.

* All five V2 pages + OnboardingV2App nav bar + StepDots widget
  refactored to look up colours via V2Theme. No more hard-coded
  ColorHelper.FromArgb or string-hex backgrounds inside page code.

* PermissionsPage row icons now sit inside a constant-colour dark
  circular badge so the designer''s white-on-dark PNGs stay visible on
  light-theme white cards. Badge is 40x40 with 24x24 icon, transparent
  in dark mode (the icon already reads on the dark card).

* GatewayWelcomePage and WelcomePage''s Advanced setup link now invoke
  Props.RequestAdvancedSetup() (new event) so the cutover bridge can
  route to the legacy Connection page. Welcome page reads
  Props.LaunchAtStartup as the initial toggle state and mirrors local
  state back into the shared state on change.

* PreviewWindow drives theme end-to-end: reads OPENCLAW_PREVIEW_THEME
  env var (System/Light/Dark) into state.ThemeMode, listens to Windows
  app-mode color changes via UISettings.ColorValuesChanged, and adds
  F2 to cycle System -> Light -> Dark interactively. ApplyResolvedTheme
  pushes the resolved ElementTheme onto state.EffectiveTheme +
  root grid background + RequestedTheme (so WinUI built-in chrome flips)
  + system caption buttons.

* App.xaml drops the hard-coded RequestedTheme=Dark so
  Application.Current.RequestedTheme tracks the Windows app-mode setting.

* tools/v2_visual_diff.py adds --theme {System,Light,Dark} so designer
  feedback can compare both modes against the (dark-only) reference PNGs.

Visual sweep verified: dark renders identical to the original designer
mocks; light renders cleanly with white cards, dark text, theme-aware
accents, and visible permission icons.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Wires OnboardingV2App into the live app. OnboardingWindow now mounts the
V2 component tree by default; set OPENCLAW_USE_V2_SETUP=0 to fall back
to the legacy flow as a kill-switch for one cycle (also flips to legacy
when OPENCLAW_ONBOARDING_START_ROUTE=Connection so the Advanced fallback
works).

* OnboardingV2Bridge: new class that synchronises real tray services to
  OnboardingV2State.
  - LocalGatewaySetupEngine.StateChanged -> V2State.LocalSetupRows +
    LocalSetupErrorMessage, using the existing
    LocalSetupProgressStageMap so V2 picks up every bug-fix that already
    landed against the legacy stage computation (rather than forking).
  - PermissionChecker.CheckAllAsync + SubscribeToAccessChanges ->
    V2State.Permissions (PermissionRowSnapshot list).
  - SettingsManager.GetEffectiveGatewayUrl -> V2State.GatewayUrl (with
    ws://->http:// scheme flip so the browser link is launch-able).
  - SettingsManager.AutoStart <-> V2State.LaunchAtStartup (two-way:
    initial value pulls from settings; toggle change persists back via
    SettingsManager.Save).
  - V2State.AdvanceRequested while on Welcome -> kick off the engine
    exactly once.
  - V2State.AdvancedSetupRequested + V2State.Finished surfaced as
    bridge events for the host to handle.
  All cross-thread mutations marshal through DispatcherQueue.TryEnqueue.

* OnboardingV2State.LaunchAtStartup setter raises a dedicated
  LaunchAtStartupChanged event so the bridge only persists when the
  toggle actually flips (subscribing to StateChanged would re-fire on
  every progress tick).

* OnboardingWindow:
  - New _useV2 path that constructs the V2 state + bridge and mounts
    OnboardingV2App via the same FunctionalHostControl.
  - Routes V2Strings via V2Strings.Resolver = LocalizationHelper.GetString
    so the new UI reads from the same .resw resources as legacy.
  - TryCompleteOnboarding now recognises V2Route.AllSet as the terminal
    page (in addition to legacy OnboardingRoute.Ready).
  - OpenLegacyAdvancedSetup re-mounts the host onto OnboardingApp at
    OnboardingRoute.Connection so Welcome's Advanced setup link drops
    the user straight into the legacy Connection page (per design
    decision: Advanced page hasn''t been redesigned yet).
  - Bridge is disposed on Closed so the engine subscription is
    released cleanly.

* Localization: tools/seed_v2_resw.py parses V2Strings.DefaultEnUs and
  appends every V2_* key to all five .resw locale files with the English
  value (210 entries total, idempotent). The translate-or-invariant test
  is taught that V2_* keys are intentionally English-only at first ship
  (translations are a follow-up). All other localization tests stay green.

Validation:
  build.ps1 -> all four projects green.
  Shared.Tests -> 1548 passed / 28 skipped / 0 failed.
  Tray.Tests -> 1164 passed / 0 failed.

Deferred to a follow-up PR (documented in docs/ONBOARDING_V2.md):
  - Fold legacy Wizard provider/model picker into V2 GatewayWelcome.
  - Real translations for V2 strings.
  - Delete legacy OnboardingApp + Pages (kept for Advanced setup).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Moves the cutover plan into a 'this PR' section (services wired,
OnboardingWindow mount swap, kill-switch env var, AdvancedSetup remounts
legacy in place). Adds a follow-up backlog covering the Wizard fold,
legacy page deletion, real V2 string translations, kill-switch removal,
and the Advanced -> AllSet round-trip wiring.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ns wiring + bridge-back

Fixes manual-test feedback:

* Bug 1 (theme): V2 was rendering Dark while titlebar buttons were Light
  because Application.Current.RequestedTheme returns Light on unpackaged
  WinUI 3 apps regardless of the Windows app-mode setting. New
  V2SystemTheme helper reads the actual system theme via UISettings
  foreground colour (white in dark mode, black in light) and the bridge
  applies it on construction + on UISettings.ColorValuesChanged.

* Bug 3 (gateway wizard): GatewayWelcome page now embeds the legacy
  WizardPage (provider/model RPC picker) when the host provides a
  GatewayWizardChildFactory. Avoids a circular project reference by
  having the host (OnboardingWindow, in Tray.WinUI) build the factory
  closure that returns the WizardPage element. V2 GatewayWelcome
  renders the welcome card + the wizard child + the open-in-browser link.

* Bug 4 (permissions wiring): V2 PermissionsPage now reads Props.Permissions
  (real PermissionRowSnapshot data) when populated and falls back to the
  preview-only AllGranted rows when null. Open Settings actually launches
  the ms-settings:// URI via Windows.System.Launcher (with a scheme
  guard matching the legacy security gate). Refresh status raises a new
  PermissionsRefreshRequested event the bridge subscribes to, which
  re-runs PermissionChecker.CheckAllAsync.

* Bug 5 (NodeMode default): bridge now seeds V2State.NodeModeActive=true
  for the local easy-setup default (matches design). Engine completion
  re-syncs from Settings.EnableNodeMode (which PairAsync flips
  mid-onboarding) so the AllSet page always shows the amber Node-Mode
  card on the local path.

* Bug 6 (Advanced -> V2 round-trip): OpenLegacyAdvancedSetup keeps
  _v2State alive and arms _v2BridgeBackPending. When legacy
  state.RouteChanged fires past Connection, OnRouteChanged re-mounts V2
  at the closest equivalent route (Wizard -> GatewayWelcome,
  Permissions -> Permissions, Ready -> AllSet). User completes legacy
  Connection then continues in V2 chrome with the wizard fold and the
  V2 permissions / all-set screens.

* Bug 2 (slow check-system spin) is intrinsic engine timing on a freshly
  wiped WSL host — Preflight + EnsureWslEnabled actually take ~10s on
  first install. Not addressable from V2 without changing the engine.

Validation:
  build.ps1 -> all four projects green.
  Tray.Tests -> 1164 passed / 0 failed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rd + existing-config warn-and-confirm

* V2 GatewayWelcome: removed the welcome intro card and Open-in-browser
  link so the embedded legacy WizardPage has the full vertical space
  and its own nav buttons are reachable. The intro card returns in a
  follow-up PR alongside a V2-styled wizard rewrite.

* Bridge-back from legacy Advanced now skips the legacy Wizard step.
  Advanced setup assumes the user is connecting to an already-configured
  gateway, so all of Wizard/Permissions/Ready map to V2 Permissions
  (was: Wizard -> V2 GatewayWelcome which re-prompted for provider/model).
  Ready still maps to V2 AllSet.

* OnboardingV2App no longer overwrites Props.CurrentRoute from its own
  pageIndex on every render -- that was clobbering external navigation
  requests from the host bridge. The component now treats Props.CurrentRoute
  as a one-shot navigation cue: when it changes from the previously
  observed value, V2 re-syncs pageIndex to follow Props (and pushes
  nav.Navigate so transition animations still fire). User-driven
  next/back continues to mutate pageIndex without writing back to Props.

* V2 Welcome handles the existing-config case (same UX as legacy
  SetupWarningPage):
  - Bridge populates _v2State.ExistingConfig from OnboardingExistingConfigGuard.GetSummary()
  - When ExistingConfig.HasAny is true and replace not confirmed, the
    bottom cluster swaps to an amber warning card with the dynamic
    lost-items list, a primary Replace my setup accent button, and a
    Keep my setup link that returns to the normal welcome state.
  - Confirmation sets V2State.ReplaceExistingConfigurationConfirmed; the
    bridge forwards it to OnboardingState.ReplaceExistingConfigurationConfirmed
    before starting the engine so the existing-config guard accepts the run.
  Advanced setup link stays visible in both states.

* New V2 strings (V2_Welcome_Replace_Heading / Body / Confirm / Keep)
  seeded into all five .resw locales with English values via
  tools/seed_v2_resw.py.

Validation: build.ps1 + Tray.Tests (1164 / 0) green.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…th in V2 nav

Engine completed (Phase=Complete, Status=Complete) at the pairing step but
V2 stayed on LocalSetupProgress. Root cause: a previous fix removed the
pageIndex -> Props.CurrentRoute write in OnboardingV2App to stop V2 from
clobbering external nav requests, but that left two sources of truth
that drifted out of sync. The bridge's auto-advance check
(Props.CurrentRoute == V2Route.LocalSetupProgress) evaluated false even
when V2 was on that page, because Props.CurrentRoute still held the
last externally-set value (Welcome on first mount).

Fix: collapse to a single source. Props.CurrentRoute is the source of
truth; pageIndex is derived from Array.IndexOf on every render. GoNext
and GoBack mutate Props.CurrentRoute directly (which fires StateChanged,
re-renders with a fresh pageIndex). External code (the bridge) sets
Props.CurrentRoute and gets the same re-render path. nav.Navigate is
driven by a per-render "did the route change since last seen?" check so
the NavigationHost transition still fires.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… logging

The previous Props.CurrentRoute fix made the bridge's auto-advance
guard correct, but a second hang reproduced anyway. Without observable
state I couldn't tell whether the engine StateChanged event was reaching
the bridge, whether DispatchToUi was firing the lambda, or whether the
CurrentRoute check was failing.

This change:

* Belt-and-braces auto-advance via _runTask.ContinueWith on
  RunLocalOnlyAsync. When the task completes we manually invoke
  OnEngineStateChanged with the final state so we always see
  Status=Complete even if the StateChanged event chain dropped the
  final tick (cross-thread subscription, GC, marshaling timing, etc.).
* Loosened the auto-advance gate to also accept CurrentRoute=Welcome in
  case engine completion races V2's first navigation.
* Removed the redundant _state.RequestAdvance() after setting
  CurrentRoute. With the single-source-of-truth model, mutating
  CurrentRoute is enough; calling RequestAdvance afterwards would
  double-advance (Welcome -> Local -> GatewayWelcome -> Permissions
  on a single completion event).
* Logger.Info markers in EnsureEngineStarted, OnEngineStateChanged,
  and inside the dispatched lambda so future hangs are diagnosable
  from openclaw-tray.log.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The V2 bridge had drifted from the v1 LocalSetupProgressPage in several
load-bearing ways. User caught this and asked for a strict "only paint
pixels" stance on the easy-button path. This commit restores v1
behaviour 1:1 in the bridge while leaving the V2 visuals untouched.

Restored logic (matches v1 LocalSetupProgressPage line-for-line):

* replaceConfirmed flag forwarded from V2 state to engine factory and
  to legacy OnboardingState (was hard-coded true in the V2 path; now
  passes through the user's actual choice from the V2 Welcome
  warn-and-confirm).

* Pre-advance gateway client re-init on Status=Complete:
  - Calls App.ConnectionManager.ReconnectAsync() if App.GatewayClient
    is null or disconnected. PairAsync flips EnableNodeMode mid-
    onboarding, which prevents the operator client's auto-connect at
    startup.
  - Seeds legacyState.GatewayClient = App.GatewayClient so the embedded
    WizardPage finds it.
  - Without this the wizard sat in "loading" for 30s then went offline.

* _advanceFiredForCompletion guard - first Complete event triggers the
  advance; subsequent Complete events (from the StateChanged stream
  and the ContinueWith fallback) are ignored.

* 1-second pause before advance for visual settling (Mike's UX choice
  in v1).

* Existing-config defense check - re-validates legacy
  OnboardingExistingConfigGuard.HasExistingConfiguration() before
  starting the engine, surfacing a synthetic Block message when
  replace was not confirmed (parity with v1's existing_config_gate).

* engine_construct_failed surfacing - if _engineFactory throws, we now
  set a user-facing error message on V2State.LocalSetupErrorMessage so
  the V2 progress page renders a Try-again card (parity with v1's
  Block-state synthesis).

* lastRunningPhase derivation rewritten to scan state.History (most
  recent non-Failed/Cancelled/NotStarted phase) instead of tracking
  locally during run. Matters on failure: when Phase rolls to Failed,
  the local tracker would still hold the pre-failure phase but History
  is the canonical source.

The bridge already keeps the engine alive across V2 navigations (the
bridge survives the entire onboarding window), so the s_engine static
pattern from v1 is unnecessary here.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Hanselman adversarial review (Opus + Codex) flagged 9 issues. This commit
fixes all 4 HIGH-consensus items, the Codex-only Permissions notify bug
(verified real and promoted to HIGH), and the Codex-only enum-index cast
hardening. Three LOW-consensus items deferred (notes in docs):

* rf-permissions-static-init -- harmless today; preview-only fallback rows.
* rf-completionpolicy-v2 -- harmless today; engine must succeed before
  V2 AllSet renders.
* rf-webview-handler-stack -- pre-existing in master, not from this PR.

Fixes in this commit:

1. rf-bridge-recreate (Opus CRITICAL / Codex HIGH)
   OpenLegacyAdvancedSetup() disposed _v2Bridge and OnRouteChanged
   re-mounted V2 without recreating it, so after the Advanced -> V2
   round-trip the V2 Finish, AdvancedSetup, Refresh, AutoStart-persist,
   and engine-start events all had zero subscribers. Extracted bridge
   construction + event wiring into CreateAndStartV2Bridge(settings) and
   call it from both the initial mount path and the bridge-back path.
   Idempotent (disposes any prior bridge first).

2. rf-permissions-notify (Codex HIGH; Opus missed)
   OnboardingV2State.Permissions was an auto-property whose setter never
   raised StateChanged. The bridge updates the snapshot list but V2
   never re-rendered, so users always saw the preview AllGranted rows
   with no-op buttons. Converted to backing field with NotifyChanged()
   in the setter.

3. rf-tryagain-noop (Opus HIGH / Codex MEDIUM)
   "Try again" button on the LocalSetupProgress error card was a stub
   () => {}. Added a RetryRequested event on V2State, plumbed it through
   to the bridge's OnRetryRequested handler which detaches the prior
   engine's StateChanged subscription, resets _engineStarted /
   _advanceFiredForCompletion / _runTask / _lastRunningPhase, clears the
   error message, marks all stages idle, and calls EnsureEngineStarted()
   to re-run.

4. rf-uisettings-leak (both MEDIUM)
   PreviewWindow stored UISettings in a local var; the
   ColorValuesChanged subscription kept a strong COM reference to the
   lambda (and via it, the window), preventing GC. Moved both the
   UISettings instance and the handler delegate to fields, and
   unsubscribe in a Closed handler. Matches the pattern already used in
   OnboardingV2Bridge.Dispose().

5. rf-nodemode-latch (Opus LOW / Codex MEDIUM)
   Bridge was doing NodeModeActive = settings.EnableNodeMode ||
   NodeModeActive -- a one-way OR latch that never returned to false.
   Replaced with direct assignment so settings is the single source of
   truth.

6. rf-stage-enum-cast (Codex LOW)
   MapToV2Rows used (V2Stage)i casting from VisibleStages index. If
   either list was reordered, the mapping silently shifted across all 7
   stage siblings (CheckSystem ... GeneratingSetupCode). Replaced with
   an explicit StageOrder array and a length-mismatch InvalidOperationException
   guard at the top of the method so reorder breaks loud, not silent.

Validation:
  build.ps1 -> all 4 projects green
  Shared.Tests -> 1548 passed / 28 skipped / 0 failed
  Tray.Tests -> 1164 passed / 0 failed

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@indierawk2k2
Copy link
Copy Markdown
Contributor Author

@shanselman ready for review when you have a chance — V2 onboarding redesign is mounted in the app, easy-button path is logic-identical to v1 master, Hanselman dual-model review findings (5 HIGH consensus) all addressed. PR body has the full architecture + service-wiring table + manual test results. Thanks!

@shanselman
Copy link
Copy Markdown
Contributor

Requesting changes on the Keep my setup path because this regresses the onboarding bug we just chased down and fixed in #375.

Context: after #340, we proved that “Keep my setup” must be a true dismiss/preserve action: close onboarding without completing setup, without touching settings/autostart, and without continuing into local setup. #375 then fixed the remaining startup gate so node-mode returning users with per-gateway NodeDeviceToken no longer see setup again on relaunch.

PR #381 reintroduces the same class of bug in the V2 flow:

// src/OpenClawTray.OnboardingV2/Pages/WelcomePage.cs
void KeepSetup() => setConfirmingReplace(false);

That only hides the warning card. The user then gets the normal V2 CTA cluster again, including Set up locally. If they click it, ReplaceExistingConfigurationConfirmed is still false, so the bridge falls into the defense-in-depth guard and shows a confusing “Existing configuration detected...” error. That is not “keep my setup”; it is a dead-end path after the user explicitly asked us to preserve their working setup.

There is also a legacy fallback regression in OnboardingWindow: PR #381 removed the _state.Dismissed += OnOnboardingDismissed subscription and the _dismissedWithoutCompletion skip in OnClosed. That was the important #340 behavior: dismissing must not run the completion pipeline.

Suggested fix:

  1. Add a first-class V2 dismiss/keep-existing event, e.g. OnboardingV2State.KeepExistingSetupRequested or DismissRequested.
  2. Wire V2 “Keep my setup” to the same semantic path as legacy OnboardingState.Dismiss(): set a window-level dismissed-without-completion guard, close the window, do not call TryCompleteOnboarding, and do not mutate settings/autostart.
  3. Restore/keep the legacy _state.Dismissed subscription and _dismissedWithoutCompletion handling for the fallback path.
  4. Add regression tests that fail on this PR shape:
    • V2 existing-config + Keep my setup closes/dismisses without setting Completed and without invoking OnboardingCompleted.
    • V2 existing-config + Keep my setup does not expose/start local setup afterward.
    • Closing after dismiss does not call TryCompleteOnboarding/settings save/autostart side effects.

How to avoid this going forward: treat “replace existing setup / keep existing setup / advanced setup” as shared onboarding policy, not page-local UI state. Both V1 and V2 should call the same host-level actions for Replace, Keep, and Advanced, and tests should cover those host-level actions rather than only checking the visual page state. This is exactly the kind of lifecycle behavior that gets lost when a redesign reimplements the buttons independently.

@shanselman
Copy link
Copy Markdown
Contributor

Additional findings from the deep review, separate from the Keep my setup regression above.

1. Stale retry continuation can auto-advance from an old setup run

OnRetryRequested() detaches the old engine event handler and clears _runTask, but the old RunLocalOnlyAsync().ContinueWith(...) is still alive and calls OnEngineStateChanged(finalState) directly:

_runTask = _engine.RunLocalOnlyAsync();
_runTask.ContinueWith(t =>
{
    if (t.IsCompletedSuccessfully && t.Result is { } finalState)
    {
        OnEngineStateChanged(finalState);
    }
}, TaskScheduler.Default);

If a user clicks Try again while the old task is still completing, the stale continuation can observe Complete, set _advanceFiredForCompletion = true, and advance the V2 flow from LocalSetupProgress to GatewayWelcome before the retry run actually completes.

Suggested fix: add an engine generation/run id and ignore stale callbacks.

private int _engineGeneration;

private void OnRetryRequested(object? sender, EventArgs e)
{
    _engineGeneration++;
    ...
}

private void EnsureEngineStarted()
{
    ...
    var generation = _engineGeneration;
    _runTask = _engine.RunLocalOnlyAsync();
    _runTask.ContinueWith(t =>
    {
        if (generation != _engineGeneration) return;
        if (t.IsCompletedSuccessfully && t.Result is { } finalState)
        {
            OnEngineStateChanged(finalState);
        }
    }, TaskScheduler.Default);
}

Even better if the engine supports cancellation/disposal: cancel the old run before starting a new one, and still keep the generation guard as defense-in-depth.

2. NodeModeActive is hard-coded true

OnboardingV2Bridge initializes:

_state.NodeModeActive = true;

That is fine for the local easy-setup happy path, but misleading for Advanced setup, remote reconnect, or MCP-only paths. The All Set page can show “Node Mode Active” even when Settings.EnableNodeMode is false.

Suggested fix: initialize from real settings and only flip true when the local engine actually enables node mode.

_state.NodeModeActive = _settings.EnableNodeMode;

Then keep the existing update in OnEngineStateChanged() after PairAsync changes settings. If the design needs a preview default, keep that in OpenClaw.SetupPreview, not in the real tray bridge.

3. Existing-config guard can permanently block direct local setup route

EnsureEngineStarted() sets _engineStarted = true before the existing-config guard:

if (_engineStarted) return;
_engineStarted = true;

if (!_state.ReplaceExistingConfigurationConfirmed && existingConfig)
{
    ...
    return;
}

If an env override, future deep link, or route bug opens directly on LocalSetupProgress, the guard returns early and _engineStarted remains true. A later confirmation/retry cannot start the engine.

Suggested fix: only set _engineStarted = true after all preflight guards pass, or reset it before the guarded return.

if (_engineStarted) return;

if (blockedByExistingConfig)
{
    ...
    return;
}

_engineStarted = true;

4. V2 shows Try again for terminal failures

LocalSetupProgressPage shows the error card and retry button for any failed row, and OnRetryRequested() always restarts. Legacy behavior distinguishes retryable failures from terminal failures.

Suggested fix: carry a CanRetry flag or the original LocalGatewaySetupStatus into OnboardingV2State.

public bool LocalSetupCanRetry { get; set; }

Set it true only for FailedRetryable (and only for Blocked if that is intentionally retryable), render the Try again button only when true, and also gate OnRetryRequested() so a terminal failure cannot restart through a stale UI event.

5. New projects are missing from the solution

The PR adds:

  • src/OpenClawTray.OnboardingV2/OpenClawTray.OnboardingV2.csproj
  • src/OpenClaw.SetupPreview/OpenClaw.SetupPreview.csproj

but openclaw-windows-node.slnx does not include either. Command-line builds work through project references, but Visual Studio/Rider users will not see or build the new projects from the solution.

Suggested fix: add both projects to /src/, with the same x64/ARM64 platform mapping used by the WinUI projects. OpenClaw.SetupPreview especially needs concrete platform mapping because it is WinUI/WindowsAppSDK.

6. V2Theme creates static WinUI brushes at class-load time

V2Theme has static SolidColorBrush fields. SolidColorBrush is a WinUI dependency object, so it is safest to create it on the UI thread. This likely works today because render code touches V2Theme first, but it is brittle for tests or future non-UI callers.

Suggested fix: avoid static dependency-object instances. Either use Lazy<SolidColorBrush> and only access from UI render, or return cached brushes through the same B(...) path used for themed brushes. At minimum, avoid touching V2Theme from background tests/helpers.

7. CI red: integration timeout still needs triage

Local required validation passed for me in the PR worktree:

  • ./build.ps1
  • Shared tests: 1548 passed / 28 skipped
  • Tray tests: 1164 passed
  • OpenClaw.SetupPreview builds after restore

But GitHub CI is red on OpenClaw.Tray.IntegrationTests.McpHttpServerIntegrationTests.CanvasEval_ReturnsResultOrWellFormedError, timing out after the 30s HTTP client timeout. That may be environmental, but this PR should not merge with red CI unless we confirm it is unrelated/flaky and rerun green.

…+ wire V2 dismiss

Our V2 branch was cut at a03c454 (before PR openclaw#340 landed in master). PR
openclaw#340 fixed two real bugs in the legacy onboarding flow: "Keep my setup"
didn't actually dismiss the wizard, and ExistingConfigGuard only scanned
the legacy root identity dir for operator tokens (per-gateway tokens
under gateways/<id>/device-key-ed25519.json were missed). PR openclaw#375
(Scott's, still open) extends the per-gateway scan symmetrically to the
node-token role, fixing a startup-relaunch-reopens-onboarding bug for
local node-mode profiles whose node token lives only per-gateway.

This commit:

1. Merges origin/master to absorb PR openclaw#340's three commits (Dismissed
   event/method on OnboardingState, OnboardingWindow.OnOnboardingDismissed
   handler, SetupWarningPage CancelReplace -> Props.Dismiss, removed
   default-to-Advanced in returning-user path, HasAnyOperatorDeviceToken
   per-gateway scan, StartupSetupState rewrite). Auto-resolved a single
   conflict in OnboardingWindow.cs via the ORT strategy.

2. Applies PR openclaw#375's symmetric per-gateway scan locally:
   - Refactored OnboardingExistingConfigGuard.HasAnyOperatorDeviceToken
     to delegate to a new HasAnyDeviceTokenForRole(dataPath, role).
   - GetSummary now uses HasAnyDeviceTokenForRole for the node side too,
     not just operator (the asymmetric defect that PR openclaw#375 catches).
   - StartupSetupState.HasStoredNodeDeviceToken now delegates to the
     guard's per-gateway-aware helper so the startup auto-launch
     decision and the in-wizard guard agree.

3. Wires V2 equivalents of PR openclaw#340's Dismiss machinery so the V2
   redesigned UI gets the same fix, not just the legacy UI:
   - OnboardingV2State adds Dismissed event + idempotent Dismiss()
     method (matches legacy OnboardingState.Dismiss semantics: log,
     fire once).
   - V2 WelcomePage's "Keep my setup" handler now calls Props.Dismiss()
     instead of just toggling setConfirmingReplace(false). Mirrors
     legacy SetupWarningPage CancelReplace post-PR-openclaw#340.
   - OnboardingV2Bridge surfaces a Dismissed event (subscribes to V2
     state, forwards to host).
   - OnboardingWindow's V2 mount now subscribes to bridge.Dismissed and
     reuses the existing _dismissedWithoutCompletion flag + Close
     pattern from PR openclaw#340's OnOnboardingDismissed so OnClosed skips
     TryCompleteOnboarding the same way for both legacy and V2 dismiss
     paths.

Validation:
  build.ps1 -> all 4 projects green
  Shared.Tests -> 1548 passed / 28 skipped / 0 failed
  Tray.Tests -> 1178 passed / 0 failed (was 1164 -- +14 from PR openclaw#340's
  OnboardingExistingConfigGuardTests + OnboardingStateTests +
  StartupSetupStateTests new coverage we just absorbed)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@indierawk2k2
Copy link
Copy Markdown
Contributor Author

@shanselman thanks for the keepSetup() heads up. Pushed 488c94f which:

  1. Merges current master to absorb PR fix(onboarding): skip wizard for paired operators and make "Keep my setup" actually dismiss #340 (the legacy Dismiss machinery + per-gateway operator-token scan we missed when we cut this branch).
  2. Applies your PR Fix startup setup check for per-gateway node tokens #375's symmetric fix locally (refactored to HasAnyDeviceTokenForRole(dataPath, role), used for both operator and node, and StartupSetupState.HasStoredNodeDeviceToken now delegates to it). I'll happily drop the local copy if/when Fix startup setup check for per-gateway node tokens #375 merges first.
  3. Wires V2 equivalents of the Dismiss machinery so the V2 redesigned UI gets the fix too: OnboardingV2State adds Dismissed + idempotent Dismiss(); the V2 Welcome "Keep my setup" button now calls Props.Dismiss() instead of just toggling the warning off; OnboardingV2Bridge surfaces Dismissed; OnboardingWindow reuses the existing _dismissedWithoutCompletion flag for both legacy and V2 dismiss paths.

Validation: build green; Shared.Tests 1548/0/28 skipped; Tray.Tests 1178/0 (+14 from #340's new coverage).

@shanselman
Copy link
Copy Markdown
Contributor

Re-reviewed latest head 488c94f. Thanks for the quick update — the Keep my setup regression looks fixed now.

What I verified as fixed:

  • V2 now has a real dismiss signal: WelcomePage.KeepSetup() calls Props.Dismiss(), OnboardingV2State.Dismiss() is idempotent, OnboardingV2Bridge relays it, and OnboardingWindow closes with _dismissedWithoutCompletion so the completion pipeline does not mutate existing setup.
  • The legacy OnboardingState.Dismissed path is restored.
  • The PR absorbed the per-gateway node-token startup fix from Fix startup setup check for per-gateway node tokens #375 and added regression tests around it.

Local validation on this head passed:

  • ./build.ps1
  • dotnet test ./tests/OpenClaw.Shared.Tests/OpenClaw.Shared.Tests.csproj --no-restore — 1548 passed / 28 skipped
  • dotnet test ./tests/OpenClaw.Tray.Tests/OpenClaw.Tray.Tests.csproj --no-restore — 1178 passed

Still requesting changes on these remaining items:

1. _engineStarted is still set before the existing-config guard

In src/OpenClaw.Tray.WinUI/Onboarding/V2/OnboardingV2Bridge.cs, EnsureEngineStarted() does this:

if (_engineStarted) return;
_engineStarted = true;
...
if (!ReplaceExistingConfigurationConfirmed && HasExistingConfiguration())
{
    ...
    return;
}

If a caller reaches this guard before replace is confirmed, the method returns with _engineStarted == true. A later attempt after the user confirms replacement can hit the early return and never create/start the engine.

Suggested fix: move _engineStarted = true until after the guard passes, or reset it before every guarded return. A good regression test would simulate: existing config present -> direct local setup start blocked -> set replace-confirmed -> start again -> engine factory is invoked.

2. Retry still allows a stale run-task continuation to update V2 state

OnRetryRequested() detaches StateChanged, resets _runTask, and starts a fresh engine, but the previous RunLocalOnlyAsync().ContinueWith(...) can still complete later and call OnEngineStateChanged(finalState) from the stale run.

Suggested fix: add a generation/task identity guard. For example, increment _engineGeneration whenever a new run starts or retry resets, capture it in the continuation, and no-op if the captured generation is no longer current.

3. Terminal/blocked failures still show the same Try again affordance

OnEngineStateChanged() maps FailedRetryable, FailedTerminal, and Blocked into the same LocalSetupErrorMessage, and LocalSetupProgressPage renders Try again for any failed row with an error message.

Suggested fix: carry retryability into V2 state, e.g. LocalSetupCanRetry, and render the retry button only for FailedRetryable. Terminal/blocked cases should show the error and route the user back / to Advanced Setup instead of offering a retry loop.

4. New projects are still missing from openclaw-windows-node.slnx

The solution still does not include:

  • src/OpenClawTray.OnboardingV2/OpenClawTray.OnboardingV2.csproj
  • src/OpenClaw.SetupPreview/OpenClaw.SetupPreview.csproj

Please add both so Visual Studio / solution builds include the new V2 project and preview app explicitly.

Lower priority: I’m less worried about the hard-coded NodeModeActive = true if V2 local setup is intentionally node-only, but if V2 later gains a non-node completion path this should be seeded from actual settings/route instead. Also, V2Theme moved many brushes behind a cache, but it still has static SolidColorBrush accent fields; not blocking, just worth converting those to lazy/cache accessors too if you are already touching that file.

Mike Harsh and others added 4 commits May 14, 2026 11:57
Both Windows clients hard-coded `minProtocol=3, maxProtocol=3` in their
connect handshake. The OpenClaw gateway service installed by
`LocalGatewaySetupEngine` uses `OpenClawInstallVersion = "latest"`,
which now requires protocol v4 (returns
`expectedProtocol:4, minimumProbeProtocol:4` and refuses any client
that doesn't advertise v4 in its connect range).

Result: every fresh local-WSL setup hangs on the operator pairing
phase and times out with `operator_pairing_timeout` after the engine
keeps retrying the handshake against a server that won't accept v3.

Fix: bump `maxProtocol` to 4 in both clients while keeping
`minProtocol=3` so we still negotiate down for older gateways that
haven't been updated. Standard backward-compatible negotiation
window.

Two-line change in two files:
- src/OpenClaw.Shared/OpenClawGatewayClient.cs (operator client)
- src/OpenClaw.Shared/WindowsNodeClient.cs (node client)

The signed connect payload (`BuildConnectPayloadV3` / `SignConnectPayloadV3`)
is unchanged — v3 signatures remain valid against v4 gateways
(verified against a freshly-installed `latest` gateway via WebSocket
probe: server accepts the connection and sends `connect.challenge` for
both `maxProtocol=3` and `maxProtocol=4` clients; only the subsequent
auth handshake requires v4 in the range).

Validation:
  build.ps1 -> all 4 projects green
  Shared.Tests -> 1548 passed / 28 skipped / 0 failed
  Tray.Tests -> 1178 passed / 0 failed

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two LOW-consensus items from the second adversarial review pass:

1. OnboardingV2Bridge.OnDismissed UI-thread marshalling (Codex)
   Wrap the Dismissed forwarder in DispatchToUi so future callers that
   raise Dismiss from a background thread (engine event, watchdog) cannot
   crash WinUI by closing the host window off-thread. Today's only call
   path is the Welcome page button click, so this is a defensive no-op
   fast path — but it preserves the contract "Dismissed always fires on
   the UI thread for the host" symmetrically with how OnFinished is set
   up to be invoked via state events that are already UI-thread.

2. OnboardingV2State.Dismiss symmetric test coverage (Opus)
   Legacy OnboardingState has 4 Dismiss_* tests covering fires-once,
   idempotency, no-Finished-cross-fire, and no-handler safety. The V2
   state surface uses the same idempotent-flag pattern but had no
   sibling tests. Add OpenClawTray.OnboardingV2.Tests with 4 mirror tests
   so the symmetric contract is enforced in CI.

The new test project targets net10.0-windows10.0.22621.0 to match
OnboardingV2 (which depends on Microsoft.UI.Xaml.ElementTheme and
therefore cannot be Compile-Included into the pure-net10.0 Tray.Tests
project). InternalsVisibleTo for OpenClawTray.OnboardingV2.Tests was
already declared on the V2 csproj — the project name was anticipated.

Validation (worktree, OPENCLAW_REPO_ROOT set):
- ./build.ps1 — all 4 projects ✅
- Shared.Tests — 1548 passed / 28 skipped / 0 failed ✅
- Tray.Tests — 1178 passed / 0 failed ✅
- OpenClawTray.OnboardingV2.Tests — 4 passed / 0 failed ✅

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Four items flagged by @shanselman after re-reviewing 488c94f. All four
addressed in this commit:

1. _engineStarted permanent-block bug (V2Bridge.EnsureEngineStarted)
   The flag was set BEFORE the existing-config guard, so a guarded early
   return left _engineStarted=true. A later call after the user confirmed
   replacement would hit if (_engineStarted) return; and never construct
   the engine — permanently locking the user out of local setup.

   Fix: move _engineStarted = true to AFTER all preflight guards pass.
   Also reset _engineStarted to false in the engineFactory catch path so
   transient construction failures are recoverable via Try-again.
   Mark the synthetic existing-config block as retryable so the user can
   confirm replace and retry without restarting onboarding.

2. Stale retry continuation race (V2Bridge.OnRetryRequested + EnsureEngineStarted)
   Added monotonic _engineGeneration counter. Bumped when retry resets
   engine state. Captured in the RunLocalOnlyAsync().ContinueWith(...)
   before the new run starts. Continuation no-ops if generation has been
   bumped — preventing an old run's final state from auto-advancing the
   V2 flow (LocalSetupProgress → GatewayWelcome) after the user clicked
   "Try again".

3. Try-again rendered for terminal/blocked failures (V2State + Bridge + Page)
   Added OnboardingV2State.LocalSetupCanRetry (default false). Bridge
   OnEngineStateChanged sets it true ONLY for FailedRetryable; terminal
   and blocked failures clear it. LocalSetupProgressPage.BuildErrorCard
   accepts a nullable Action? onTryAgain and omits the button when null;
   single-column grid layout when no retry button is shown. Bridge
   OnRetryRequested is gated on LocalSetupCanRetry as defense-in-depth
   so a stale UI event from before the page re-rendered cannot restart
   the engine on a terminal failure.

4. New source projects added to slnx
   Added OpenClawTray.OnboardingV2 (no platform mapping) and
   OpenClaw.SetupPreview (with x64/ARM64 mapping like Tray.WinUI, since
   it's WindowsAppSDKSelfContained=true and AnyCPU would need a RID).
   Now visible in VS/Rider solution view.

Tests added (OpenClawTray.OnboardingV2.Tests):
- LocalSetupCanRetry_DefaultsToFalse
- LocalSetupCanRetry_SetTrue_FiresStateChanged
- LocalSetupCanRetry_SetSameValue_DoesNotFireStateChanged

Bridge-level integration tests for fixes openclaw#1 and openclaw#2 (the simulation Scott
asked for) require constructing OnboardingV2Bridge with a mock engine
factory — but the bridge depends on App.xaml.cs (gateway client reseeding),
which can't be easily unit-tested without WinUI runtime. The fixes are
covered with strong inline contracts/comments documenting the invariants.

Validation (worktree, OPENCLAW_REPO_ROOT set):
- ./build.ps1 ✅
- Shared.Tests — 1548 passed / 28 skipped / 0 failed ✅
- Tray.Tests — 1197 passed / 0 failed ✅ (was 1178; +19 from master merge)
- OpenClawTray.OnboardingV2.Tests — 7 passed / 0 failed ✅ (was 4; +3 CanRetry)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@indierawk2k2
Copy link
Copy Markdown
Contributor Author

@shanselman thanks for the careful re-review. Pushed 136e3e0 (on top of merging current master) which addresses all four items:

  1. _engineStarted permanent-block — moved the flag set to AFTER all preflight guards. Also reset to false in the engine-construction catch path, so transient construction failures stay recoverable. The synthetic existing-config block is now marked retryable so the user can back up to Welcome, confirm replace, and Try-again without restarting the flow.

  2. Stale retry continuation race — added monotonic _engineGeneration. OnRetryRequested bumps it before resetting bookkeeping; EnsureEngineStarted captures it before kicking off RunLocalOnlyAsync; the ContinueWith no-ops if the captured value no longer matches. Old runs can no longer reach OnEngineStateChanged and auto-advance V2 after the user clicked Try-again.

  3. Try-again on terminal failures — added OnboardingV2State.LocalSetupCanRetry. Bridge sets it true ONLY for FailedRetryable; FailedTerminal and Blocked clear it. LocalSetupProgressPage.BuildErrorCard now takes Action? onTryAgain and omits the button when null (single-column grid layout in that case). OnRetryRequested is gated on LocalSetupCanRetry as defense-in-depth so a stale UI event from a pre-rerender click can't restart the engine on a terminal failure.

  4. Slnx additions — added OpenClawTray.OnboardingV2 (no platform mapping needed) and OpenClaw.SetupPreview (with x64/ARM64 mapping like Tray.WinUI, since it's WindowsAppSDKSelfContained=true). Both now visible in VS/Rider.

Tests added in OpenClawTray.OnboardingV2.Tests: 3 new LocalSetupCanRetry_* tests covering default, set-true, and idempotent-setter semantics. The bridge-level simulation you suggested for #1 (existing-config blocked → confirm-replace → engine-factory invoked) is the right shape, but constructing OnboardingV2Bridge with a mock factory pulls in App.xaml.cs (gateway client reseeding) which we don't have a clean way to unit-test without the WinUI runtime. The fixes have strong inline contract comments documenting the invariants — open to ideas if you have a cheap testing seam in mind.

Validation (worktree, OPENCLAW_REPO_ROOT set):

  • ./build.ps1
  • Shared.Tests — 1548 passed / 28 skipped ✅
  • Tray.Tests — 1197 passed (+19 from master merge) ✅
  • OpenClawTray.OnboardingV2.Tests — 7 passed (+3 CanRetry) ✅

Lower-priority items not addressed (per your note they're not blocking): hard-coded NodeModeActive = true in the bridge constructor (still right for the local-only V2 flow), and the static V2Theme accent SolidColorBrush fields. Happy to do those in a follow-up if you'd like.

Two bugs surfaced during a live V2 setup-flow test, plus four hardening
items from the Hanselman dual-model review of the fix.

Live-test bugs:

1. "Gateway wizard not available" on first-run V2 setup
   The V2 GatewayWelcome page embeds the legacy WizardPage which polls
   App.GatewayClient for 30s waiting for IsConnectedToGateway==true,
   then calls wizard.start. Live test: zero wizard.start in the log;
   wizard fell through to its "offline" branch.

   Root cause: After PairAsync flips EnableNodeMode mid-onboarding,
   App.GatewayClient briefly points at the engine's bootstrap-pairing
   lifecycle (which still reports IsConnectedToGateway==true at the
   instant ScheduleAdvanceAfterCompletion ran). The previous code
   skipped ReconnectAsync because of that. The bootstrap socket then
   died after Phase 14/15 cleanup, leaving the wizard with a zombie
   client.

   Fix: ALWAYS call ConnectionManager.ReconnectAsync after engine
   Complete (no skip). Wrapped in Task.Run so we await it without
   blocking the UI dispatcher. Re-seed legacy.GatewayClient from the
   post-reconnect App.GatewayClient via the dispatcher. Added pre/post
   diagnostic log lines so future regressions are obvious.

2. "Back from Advanced Connection page goes to Permissions"
   In OnboardingWindow.OnRouteChanged, the V2 bridge-back route map
   had _ => V2Route.Permissions as catch-all. When the user hit Back
   on legacy Connection, the legacy state fired RouteChanged with
   OnboardingRoute.SetupWarning — which fell through to the catch-all
   and forward-jumped the user to V2 Permissions instead of back to
   V2 Welcome.

   Fix: explicit SetupWarning => V2.Welcome, plus Chat => V2.AllSet
   (don't demote a completed user back to Welcome).

Hanselman pass-3 hardening (HIGH consensus from both Opus + Codex):

3. Reconnect stampede / Retry race
   If the user clicks Try-again mid-reconnect, two ScheduleAdvance...
   continuations can interleave. The legacy.GatewayClient setter
   disposes the previous value, so a stale post-reseed could yank a
   client out from under the wizard.

   Fix: capture _engineGeneration at ScheduleAdvanceAfterCompletion
   start; bail in the continuation (and again inside the dispatcher
   enqueue) if generation no longer matches. Piggybacks on the
   existing _engineGeneration counter that OnRetryRequested bumps.

4. Post-dispose continuation safety
   The Task.Run continuation (and its dispatcher enqueue) could fire
   after the bridge is disposed, mutating a disposed OnboardingState
   and possibly double-disposing GatewayClient via the setter.

   Fix: check _disposed at three points — entry to the continuation,
   inside the dispatcher enqueue, AND in the existing 1-second
   advance-delay continuation. Belt-and-braces.

Hanselman pass-3 LOW (cheap defensive fixes):

5. Drop the "immediate seed" of legacy.GatewayClient with the
   pre-reconnect (zombie) value. The whole point of the fix is to
   replace that client; seeding it just gave non-V2 consumers the
   broken client until ReconnectAsync completed. Now legacy.GatewayClient
   is set exactly once, by the post-reconnect dispatcher continuation.

6. Log a Warn when the bridge-back route catch-all hits an unmapped
   legacy route, so future enum additions surface in testing instead
   of silently landing the user on V2 Welcome.

Hanselman pass-3 LOW (deliberately not addressed):

- Opus flagged a theoretical concern that SetupWarning could fire on
  the FORWARD path to Connection (not just on Back), bouncing the
  user back to V2 Welcome. Codex's symmetric-defect check explicitly
  investigated this and concluded the legacy OnboardingApp only fires
  NotifyRouteChanged from GoNext/GoBack, never on initial mount. The
  live test of the Back-from-Connection fix worked correctly. If we
  ever change the legacy app to fire RouteChanged on mount, this
  becomes a real concern — addressing then.

Validation (worktree, OPENCLAW_REPO_ROOT set):
- ./build.ps1 — all 4 projects ✅
- Shared.Tests — 1548 passed / 28 skipped / 0 failed ✅
- Tray.Tests — 1197 passed / 0 failed ✅
- OpenClawTray.OnboardingV2.Tests — 7 passed / 0 failed ✅

Manual test (live):
- Reset state + relaunch + V2 setup → completes through Welcome,
  LocalSetupProgress, GatewayWelcome (with wizard), Permissions, AllSet.
- V2 Welcome → Advanced setup → Back on legacy Connection → returns
  to V2 Welcome (was previously jumping to V2 Permissions).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@indierawk2k2
Copy link
Copy Markdown
Contributor Author

@shanselman quick re-review ping — head is now 60c54da. Two new commits since your last pass:

136e3e0 — addresses your re-review of 488c94f:

  1. _engineStarted permanent-block (your finding Add unit tests and code review for Moltbot.Shared #1) — moved flag to AFTER all preflight guards in EnsureEngineStarted. Synthetic existing-config block is now marked retryable (LocalSetupCanRetry=true) so the user can back up to Welcome, confirm replace, and Try-again without restarting onboarding. Engine-construction catch path also resets _engineStarted so transient construction failures stay recoverable.
  2. Stale retry continuation race (your finding Rename MoltBot to OpenClaw across entire codebase #2) — added monotonic _engineGeneration. OnRetryRequested bumps it; EnsureEngineStarted captures it before RunLocalOnlyAsync; the ContinueWith no-ops if generation no longer matches. Old runs can no longer reach OnEngineStateChanged and auto-advance V2 after the user clicked Try-again.
  3. Try-again on terminal failures (your finding WinUI tray menu crashes in Microsoft.UI.Windowing.Core.dll after idle/interaction #3) — added OnboardingV2State.LocalSetupCanRetry. Bridge sets it true ONLY for FailedRetryable; FailedTerminal and Blocked clear it. LocalSetupProgressPage.BuildErrorCard now takes Action? onTryAgain and omits the button when null. OnRetryRequested is gated on LocalSetupCanRetry as defense-in-depth.
  4. Slnx additions (your finding Fix WinUI tray menu crash with invisible anchor window pattern #4) — added OpenClawTray.OnboardingV2 and OpenClaw.SetupPreview (with x64/ARM64 mapping like Tray.WinUI). Both visible in VS/Rider now.

Plus 3 new LocalSetupCanRetry_* tests in OpenClawTray.OnboardingV2.Tests.

60c54da — fixes from a live setup test + a follow-up Hanselman pass:

  • Bug: gateway wizard fell through to "offline" branch on first-run V2 setup (no wizard.start ever fired). Root cause: bridge skipped ReconnectAsync after engine Complete because the engine's bootstrap-pairing lifecycle still reported IsConnectedToGateway==true momentarily — the socket then died after Phase 14/15 cleanup, leaving the wizard with a zombie client. Fix: ALWAYS call ReconnectAsync after Complete, await it from a Task.Run, re-seed legacy.GatewayClient from the post-reconnect value via the dispatcher. Pre/post diagnostic log lines added.
  • Bug: V2 Welcome → Advanced setup → Back on legacy Connection landed the user on V2 Permissions instead of V2 Welcome. The route-map catch-all _ => V2.Permissions was swallowing the SetupWarning event. Fix: explicit SetupWarning => V2.Welcome and Chat => V2.AllSet mappings.
  • Hardening from the Hanselman pass: capture _engineGeneration around the forced ReconnectAsync so a Retry mid-reconnect can't have its post-reseed continuation race the new run. Added _disposed checks at 3 points so post-dispose continuations don't mutate a torn-down OnboardingState. Dropped the immediate "zombie seed" of legacy.GatewayClient; it's now set exactly once by the post-reconnect dispatcher continuation. Catch-all on the route map now logs unmapped legacy routes instead of silent fallback.

Validation (worktree, OPENCLAW_REPO_ROOT set):

  • ./build.ps1
  • Shared.Tests — 1548 passed / 28 skipped ✅
  • Tray.Tests — 1197 passed ✅
  • OpenClawTray.OnboardingV2.Tests — 7 passed ✅

Live manual test: V2 setup completes → wizard mounts with provider/model picker → Permissions → AllSet. V2 Welcome → Advanced → Back returns to V2 Welcome correctly.

Lower-priority items from your earlier review still open (per your note, not blocking):

  • Hard-coded NodeModeActive = true in the bridge constructor — fine if V2 stays node-only; would need re-seed-from-settings if a non-node completion path is ever added.
  • Static V2Theme accent SolidColorBrush fields — partial cleanup happened, accents could move to Lazy<> too.

Happy to do those in a follow-up if you'd like. Otherwise this should be ready for another look.

indierawk2k2 pushed a commit to indierawk2k2/openclaw-windows-node that referenced this pull request May 15, 2026
Four items flagged by @shanselman after re-reviewing 488c94f. All four
addressed in this commit:

1. _engineStarted permanent-block bug (V2Bridge.EnsureEngineStarted)
   The flag was set BEFORE the existing-config guard, so a guarded early
   return left _engineStarted=true. A later call after the user confirmed
   replacement would hit if (_engineStarted) return; and never construct
   the engine — permanently locking the user out of local setup.

   Fix: move _engineStarted = true to AFTER all preflight guards pass.
   Also reset _engineStarted to false in the engineFactory catch path so
   transient construction failures are recoverable via Try-again.
   Mark the synthetic existing-config block as retryable so the user can
   confirm replace and retry without restarting onboarding.

2. Stale retry continuation race (V2Bridge.OnRetryRequested + EnsureEngineStarted)
   Added monotonic _engineGeneration counter. Bumped when retry resets
   engine state. Captured in the RunLocalOnlyAsync().ContinueWith(...)
   before the new run starts. Continuation no-ops if generation has been
   bumped — preventing an old run's final state from auto-advancing the
   V2 flow (LocalSetupProgress → GatewayWelcome) after the user clicked
   "Try again".

3. Try-again rendered for terminal/blocked failures (V2State + Bridge + Page)
   Added OnboardingV2State.LocalSetupCanRetry (default false). Bridge
   OnEngineStateChanged sets it true ONLY for FailedRetryable; terminal
   and blocked failures clear it. LocalSetupProgressPage.BuildErrorCard
   accepts a nullable Action? onTryAgain and omits the button when null;
   single-column grid layout when no retry button is shown. Bridge
   OnRetryRequested is gated on LocalSetupCanRetry as defense-in-depth
   so a stale UI event from before the page re-rendered cannot restart
   the engine on a terminal failure.

4. New source projects added to slnx
   Added OpenClawTray.OnboardingV2 (no platform mapping) and
   OpenClaw.SetupPreview (with x64/ARM64 mapping like Tray.WinUI, since
   it's WindowsAppSDKSelfContained=true and AnyCPU would need a RID).
   Now visible in VS/Rider solution view.

Tests added (OpenClawTray.OnboardingV2.Tests):
- LocalSetupCanRetry_DefaultsToFalse
- LocalSetupCanRetry_SetTrue_FiresStateChanged
- LocalSetupCanRetry_SetSameValue_DoesNotFireStateChanged

Bridge-level integration tests for fixes openclaw#1 and openclaw#2 (the simulation Scott
asked for) require constructing OnboardingV2Bridge with a mock engine
factory — but the bridge depends on App.xaml.cs (gateway client reseeding),
which can't be easily unit-tested without WinUI runtime. The fixes are
covered with strong inline contracts/comments documenting the invariants.

Validation (worktree, OPENCLAW_REPO_ROOT set):
- ./build.ps1 ✅
- Shared.Tests — 1548 passed / 28 skipped / 0 failed ✅
- Tray.Tests — 1197 passed / 0 failed ✅ (was 1178; +19 from master merge)
- OpenClawTray.OnboardingV2.Tests — 7 passed / 0 failed ✅ (was 4; +3 CanRetry)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@indierawk2k2
Copy link
Copy Markdown
Contributor Author

Superseded by #411, which has merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants