Skip to content

feat(onboarding): WSL gateway local-loopback onboarding — clean port from PR #241 prototype#274

Draft
indierawk2k2 wants to merge 25 commits intoopenclaw:masterfrom
indierawk2k2:feat/wsl-gateway-clean
Draft

feat(onboarding): WSL gateway local-loopback onboarding — clean port from PR #241 prototype#274
indierawk2k2 wants to merge 25 commits intoopenclaw:masterfrom
indierawk2k2:feat/wsl-gateway-clean

Conversation

@indierawk2k2
Copy link
Copy Markdown
Contributor

Overview

Clean port of the WSL gateway local-loopback onboarding flow originally prototyped in PR #241. The prototype validated the architecture end-to-end against a real WSL distro; this PR is the disciplined re-port — phase-gated commits, layered cleanly across Shared → Tray → Onboarding → scripts → docs, with three real bugs caught and fixed during the Bostick e2e drives. Branch contains 25 commits on top of origin/master (871b959..6e532f7) and is GREEN end-to-end as of Bostick Round 6.

What's included

Phase 1 — Shared identity & scoping

  • feat(shared): port DeviceIdentity with role-specific operator/node tokens (95911b8)
  • fix(shared): close Phase 1 punch list — scope persistence + role validation (3ae03d3)

Phase 2 — Shared clients

  • feat(shared): port OpenClawGatewayClient — bootstrap + role-specific reconnect (b20b5ce)
  • feat(shared): port WindowsNodeClient — auth.deviceToken reconnect (b69202d)

Phase 3 — Tray engine

  • feat(tray): port LocalGatewaySetup with loopback-only WSL setup (98bdf77)
  • fix(tray): close Phase 3 punch list — strip worker vocabulary, gate distro override (4ab1ec6)

Phase 4 — App startup wiring

  • feat(tray): wire setup engine + shared identity path in App startup (8cc32c6)

Phase 5 — Onboarding pages

  • feat(onboarding): add SetupWarning + LocalSetupProgress routes and SetupPath state (43035ca)
  • feat(onboarding): SetupWarningPage with folded security notice (6a5783a)
  • feat(onboarding): LocalSetupProgressPage bound to LocalGatewaySetup engine (c2ad1e5)
  • chore(onboarding): remove WelcomePage (folded into SetupWarning) (99f5107)
  • fix(onboarding): drop time estimate + clean orphan Welcome resw entries (32cbeae)
  • feat(onboarding): nav-bar Next/Back policy on LocalSetupProgressPage per state (73767c5)

Phase 6/7 — Validation + reset scripts

  • feat(scripts): port validate-wsl-gateway.ps1 — 4 scenarios, loopback-only, no rootfs (8060ae9)
  • feat(scripts): port reset-openclaw-wsl-validation-state.ps1 — exact-target gated cleanup (dbd7708)

Phase 8 — Docs

  • docs(wsl): port wsl-owner-validation + wsl-owner-open-issues with Craig's answers (1300981)

Localization

  • feat(onboarding): localize SetupWarning + LocalSetupProgress strings (fr-fr/nl-nl/zh-cn/zh-tw) (ce89251)

Bug fixes from Bostick e2e drives

  • Bug 1 (bootstrap-token + operator-pair against CLI v2026.5.3-1) — 6-commit journey: fe2de09, 3927451, 6942a81, 05f7be0, f2dec42, 4d36dcd's precursor
  • Bug 2 (LocalSetupProgressPage stage advancement + FailedRetryable rendering) — 4af2581
  • Bug 3 (pending-device approver wired into Phase 14 role-upgrade pairing) — 4d36dcd

Verification

Bostick Round 6 e2e drive — GREEN end-to-end. All four scenarios pass against a clean WSL distro:

  1. Fresh install + onboarding → operator-pair → role-upgrade → node-pair
  2. Repaired install on existing identity (token-refresh path)
  3. Validation-reset + replay
  4. Failure injection at each stage shows the correct FailedRetryable UI surface

Screenshots: visual-test-output\bostick-round6\ (final pass: 06-onboarding-complete.png).

Test counts

  • Tray: 524 / 524 ✅ (+77 from baseline 447)
  • Shared: 1180 / 1180 ✅
  • Build: clean (./build.ps1 — zero warnings in changed assemblies)

Bug 1 journey (6 commits, kept un-squashed on purpose)

The bootstrap-token / operator-pair flow against CLI v2026.5.3-1 took 6 surgical commits because the CLI surface changed shape between the prototype and now. The lesson worth preserving in history:

CLI v2026.5.3-1 returns exit=1 in operator-pair preview mode even on success, with a valid JSON payload on stdout. The exit code is NOT the success signal — the JSON shape is.

The 6 commits walk through: wire-format consistency → ensureExplicitGatewayAuth → two-stage approve (preview + explicit requestId) → first-call race retry + stderr surface → C#-side token read & shell-literal interpolation → treat valid preview JSON as stage-1 success regardless of exit code.

Squashing this would erase the breadcrumbs the next person will need when CLI v2026.6.x lands.

Architectural decisions worth highlighting

  • IPendingDeviceApprover seam — Phase 14 role-upgrade pairing now goes through a testable seam instead of reaching directly into the gateway client. Lets us mock the approver in unit tests and swap implementations for the future remote-gateway scenario.
  • RenderSnapshot value-equality fixLocalSetupProgressPage was missing UI updates because UseState was doing reference-equality on the snapshot record. Switched the snapshot to a value-equality record. This is a pattern worth sweeping across other UseState consumers (see follow-ups).
  • Two-stage approve flow — operator-pair is now preview → explicit requestId approve, matching the CLI's actual contract instead of the prototype's single-call assumption.

Follow-up work (NOT in this PR — punch list for separate work)

  1. PermissionsPage UseState sweep — apply the RenderSnapshot value-equality pattern to PermissionsPage and audit other UseState consumers for the same bug.
  2. Uninstall plan PR — 8 open Mike-questions blocking the uninstall PR; needs decisions before that branch can move.
  3. Translation strings (5 low-confidence) — flagged in the localization commit; need native-speaker review for fr-fr "appairage" usage and zh-tw segmentation.
  4. e2e harness CONTRIBUTING note — document the Bostick drive procedure + reset script invocation for the next contributor.
  5. Stale Token field cleanupDeviceIdentity still carries a legacy Token field used only by one reconnect path; should be removed once Phase 2.2 settles.

Open questions for Mike

  • Should the operator-pair preview retry budget (currently 3 attempts, 250ms backoff) be configurable, or is hardcoded fine for v1?
  • Confirm we want to ship the WelcomePage removal (folded into SetupWarning) or keep WelcomePage as a separate route for future first-run telemetry hooks.
  • Translation strings — defer to native-speaker review or ship-and-iterate?

Why no squash

Kranz recommendation: keep the 25-commit forensic trail intact. The Bug 1 6-commit journey in particular documents a CLI-version-specific gotcha that will recur the next time the CLI surface churns. Squashing trades long-term debuggability for short-term log tidiness — wrong trade for an integration-heavy feature.


Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Mike Harsh and others added 25 commits May 4, 2026 12:21
…kens (Phase 1)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…dation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…reconnect (Phase 2.1)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ase 2.2)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…istro override

- Remove residual PreserveWorkerData property and worker_data_preserved step from LocalGatewayRemoveRequest/LocalGatewayLifecycleManager. Windows tray is the node; no WSL-worker vocabulary remains in product APIs.

- Gate OPENCLAW_WSL_DISTRO_NAME env override and explicit distroName parameter behind #if DEBUG || OPENCLAW_TRAY_TESTS via ResolveDistroName helper. Production builds are now hard-locked to OpenClawGateway regardless of caller input.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…Phase 4)

- App.App now exposes CreateLocalGatewaySetupEngine() backed by LocalGatewaySetupEngineFactory.CreateLocalOnly. Onboarding pages (Phase 5) can request the engine; NodeService is materialized eagerly so the engine can pair the Windows tray node into the gateway it installs.

- Add IdentityDataPath alongside DataPath (operator/node DeviceIdentity store at %APPDATA%\\OpenClawTray, OPENCLAW_TRAY_APPDATA_DIR override for tests). NodeService now accepts identityDataPath; WindowsNodeClient is constructed with it so node device tokens land in the same role-aware DeviceIdentity store as operator tokens (Phase 1 model: shared location, role distinction inside).

- StartupSetupState.CanStartNodeGateway / RequiresSetup callsites now use IdentityDataPath so stored node device tokens are detected at the same path WindowsNodeClient writes them.

- No prototype env-var rootfs/manifest overrides, dev-shim auto-accept, or worker-in-WSL wiring ported (Phase 3 already pruned those phases; nothing to strip in App).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tupPath state (Phase 5.1)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… 5.2)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ngine (Phase 5.3)

Wires AdvanceRequested into OnboardingApp, supports OPENCLAW_ONBOARDING_START_SETUP_PATH and OPENCLAW_VISUAL_TEST_LOCAL_SETUP for screenshot capture without running the real WSL engine.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…se 5.4)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…only, no rootfs (Phase 6)

Clean port of prototype validate-wsl-gateway.ps1 reduced to four scenarios:

PreflightOnly, UpstreamInstall, FreshMachine, Recreate.

Kept: UI automation (drives SetupWarningPage 'Set up locally' button

[OnboardingSetupLocal] -> LocalSetupProgressPage), loopback-only endpoint

diagnostics, real upstream setup-code/bootstrap proof, operator pairing

proof, Windows tray node proof, separated validation/cleanup status,

token/setup-code redaction, aka.ms/wsllogs link on failure.

Stripped: BuildRootfs/InstallOnly/Smoke/Loop scenarios, all rootfs/

manifest/signing parameters, worker-in-WSL pairing, WSL-IP/lan/auto

fallback diagnostics, AllowNonStandardDistroNameForDestructiveClean.

Recreate uses 'wsl --unregister OpenClawGateway' (NEVER --shutdown)

per Craig. Network probes are loopback only.

Validation: PreflightOnly run PASS (status=Passed, validation=Passed).

build.ps1 PASS. Shared.Tests 1180/1180. Tray.Tests 434/434.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…target gated cleanup (Phase 7)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…es (Phase 5 fast-follow)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ig's answers (Phase 8)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…(fr-fr/nl-nl/zh-cn/zh-tw)

Extract 17 hard-coded English strings from SetupWarningPage and
LocalSetupProgressPage into Resources.resw and add translations for all
four non-en-us locales. Adds OPENCLAW_TEST_LOCALE env hook on
OnboardingWindow for visual-test locale forcing.

Keys added (per locale):
- Onboarding_SetupWarning_{Title,Body,SetupLocally,Advanced} (4)
- Onboarding_LocalSetup_{Title,SubtitleIdle,SubtitleSuccess,Retry,TerminalFailure,DiagnosticsHint} (6)
- Onboarding_LocalSetup_Phase_{Preflight,CreateInstance,Configure,InstallCli,PrepareConfig,StartGateway,MintToken} (7)

Validation: build PASS, Tray 434/434, Shared 1180/1180,
LocalizationValidationTests green. Screenshot verified for fr-FR at
visual-test-output/phase5-localization/fr-fr/page-02.png; no truncation,
no English fallback, layout contract intact (MaxWidth 460, centered).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…per state (Phase 5 final)

Implements industry-standard onboarding-progress button policy on LocalSetupProgressPage per the autopilot defaults captured in .squad/decisions.md (round 11):

  Idle (Pending)        Next=Hidden,          Back=Enabled

  Running               Next=VisibleDisabled, Back=Enabled

  Complete              Next=VisibleEnabled,  Back=Enabled  (1s pre-auto-advance; tap-to-skip)

  FailedRetryable       Next=VisibleDisabled, Back=Enabled  (in-page Try again)

  FailedTerminal        Next=VisibleDisabled, Back=Enabled  (force back-out)

Contract extension (minimal):

  - OnboardingState gains NextButtonState property (Default/Hidden/VisibleDisabled/VisibleEnabled), SetNextButtonState() setter, and NavBarStateChanged event.

  - OnboardingApp consults NextButtonState only when currentRoute == LocalSetupProgress; legacy behavior preserved everywhere else.

  - Mapping logic extracted to OnboardingTray.Onboarding.Services.LocalSetupProgressPolicy (no WinUI deps) so it is unit-testable from OpenClaw.Tray.Tests.

Bonus fix: gate the Complete-state 1s auto-advance timer on still being on LocalSetupProgress so an early Next-tap doesn't over-advance a later page.

Tests: Tray 447/447 (+13: 3 OnboardingState NextButtonState/NavBarStateChanged + 10 LocalSetupProgressPolicy mapping cases). Shared 1180/1180. Build PASS.

Screenshots: visual-test-output/next-button-impl-2026-05-04/{s1-running,s2-success,s3-failed-terminal,s4-failed-retryable}/page-02.png — all four states verified visually.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…mint and tray pair (Bug 1 from e2e drive)

Bug 1 surfaced by the 2026-05-04 e2e drive: MintBootstrapToken correctly invokes
`openclaw qr --json` and the tray sends the resulting token via `auth.bootstrapToken`,
but the upstream gateway treats a fresh bootstrap-token connect as a *pending* operator
pairing request and rejects the connect itself with `device-auth-invalid` then
`pairing-required reason:not-paired`. The pending request is recorded server-side
but never redeemed because nothing approves it.

On a local-loopback gateway the user driving the tray is also the operator/approver,
so SettingsOperatorPairingService now drives `openclaw devices approve --latest`
through the gateway CLI and retries the bootstrap connect once. New
IPendingDeviceApprover seam keeps it injectable (default null preserves remote-gateway
behavior); WslGatewayCliPendingDeviceApprover authenticates with the locally-stored
`/var/lib/openclaw/gateway-token` (read inside the shell so it never touches argv)
and scopes the approval to `LocalGatewayApprover.IsLocalGateway` URLs only.

Tests (10 new, all green): round-trip approve+retry, double-PairingRequired no-loop,
approval-failure surfaces error code, remote-gateway opt-out, non-bootstrap-token
opt-out, first-connect happy path, plus 4 ParseApproveJson cases.

OPENCLAW_RUN_INTEGRATION=1, OPENCLAW_REPO_ROOT=<worktree>:
- OpenClaw.Shared.Tests: 1180/1180/0/0
- OpenClaw.Tray.Tests:    493/493/0/0  (+10 new)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ryable rendering (Bug 2 from e2e drive)

Bug 2 from Aaron's 2026-05-04 e2e drive: the LocalSetupProgressPage UI stayed
on stage 1 ("Checking system" with spinner) for the entire 12-minute run even
though the LocalGatewaySetupEngine progressed through 9+ phases on the gateway
side and ultimately failed at PairOperator. The page never re-rendered past
the first event and never transitioned to FailedRetryable.

Root cause: reference-equality in UseState. The engine raises StateChanged
with the same mutating LocalGatewaySetupState instance every call. The page's
UseState<LocalGatewaySetupState?> compared previous and next with
EqualityComparer<T>.Default — which for a class without an Equals override
falls through to ReferenceEquals. The first null -> state transition rendered
once; every subsequent state -> state event was identified as "no change" and
the framework swallowed the re-render request.

Fix:
- Introduce a private record RenderSnapshot(Phase, Status, LastRunningPhase,
  UserMessage, FailureCode) and store *that* in UseState. Records have value
  equality, so each engine event yields a fresh RenderSnapshot whose fields
  differ from the previous snapshot, reliably triggering re-renders.
- Capture the snapshot off the dispatcher (before TryEnqueue) so values
  reflect the engine's state at the moment the event fired, not whatever
  the engine has further mutated to by the time the dispatcher dequeues.
- Thread LastRunningPhase through to the stage-list math: previously the
  Failed-state rendering only knew Phase=Failed (the highest enum ordinal)
  which lost the position of the last running phase. The new helper consults
  History to pin the failure marker on the correct stage.

Also extracted the stage-list math from the page into a pure helper
(LocalSetupProgressStageMap) so it is unit-testable without WinUI deps:
- VisibleStages array (now also folds PairOperator + later hidden phases
  into the MintToken stage, so a PairOperator failure pins correctly).
- ComputeStageState(stagePhases, currentPhase, currentStatus, lastRunningPhase).
- IndexOfStageForPhase, ShouldShowErrorRow, ShouldShowRetryButton.

Tests added (LocalSetupProgressStageMapTests, +36 net):
- Every running engine phase advances the active stage to the expected index
  (15 InlineData rows covering all 15 non-terminal phases).
- NotStarted -> all stages Pending.
- Complete -> all stages Complete.
- Coverage guard: every declared LocalGatewaySetupPhase value is either
  terminal or covered by some VisibleStage (locks down future enum drift).
- FailedRetryable @ PairOperator pins failure on the last visible stage
  (this is the concrete e2e-drive scenario).
- FailedRetryable @ CreateWslInstance pins failure on stage 1.
- FailedTerminal @ Preflight pins failure on stage 0.
- ShouldShowErrorRow + ShouldShowRetryButton truth tables.

Validation:
- ./tests/OpenClaw.Shared.Tests: 1180 passed, 0 failed (anchor 1180/1180).
- ./tests/OpenClaw.Tray.Tests:    493 passed, 0 failed (was 447/447, +46).
- Env: OPENCLAW_REPO_ROOT=<worktree>, OPENCLAW_RUN_INTEGRATION=1.
- Full ./build.ps1 + screenshot verification BLOCKED in this session by
  the running tray app at PID 8240 holding a write-lock on the WinUI
  output directory (Mike is examining the broken state per the e2e-drive
  guardrail). Visual verification deferred until PID 8240 is released.
  Existing OPENCLAW_VISUAL_TEST_LOCAL_SETUP harness exercises the new
  retryable/terminal paths via the modified TryReadVisualTestState (which
  now seeds StartPhase before Block so LastRunningPhase pins correctly).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…icitGatewayAuth (Bug 1 residual)

Drop --url override from WslGatewayCliPendingDeviceApprover. The CLI runs inside the OpenClawGateway distro where openclaw.json pins gateway.mode=local + port 18789, so buildGatewayConnectionDetails resolves the loopback URL itself. Without --url, ensureExplicitGatewayAuth (src/gateway/call.ts) early-returns and shouldUseLocalPairingFallback becomes available, so the CLI silently falls back to local pairing-file approval if the WS hop trips.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… against CLI v2026.5.3-1 (Bug 1 part 3)

CLI v2026.5.3-1 (src/cli/devices-cli.ts, commit aef38de) makes
`openclaw devices approve --latest --json` PREVIEW-ONLY: when --latest
or no requestId is supplied, the action handler enters the
usingImplicitSelection branch which writes a JSON preview
({ selected, approvalState, approveCommand, requiresAuthFlags }) and
returns BEFORE invoking approvePairingWithFallback. Only an explicit
requestId argument bypasses the preview gate and actually calls
device.pair.approve / mutates paired.json.

The previous fix (3927451) correctly removed the --url override that
tripped ensureExplicitGatewayAuth, but the resulting invocation still
only ran the preview, so the engine saw exit 0, retried the WS connect,
got pairing-required again, and surfaced operator_pending_approval_failed.

WslGatewayCliPendingDeviceApprover.ApproveLatestAsync now runs two stages:

  1. Preview: openclaw devices approve --latest --json --token "\"
     parses selected.requestId from the v2026.5.3-1 preview JSON.
  2. Commit:  openclaw devices approve <requestId> --json --token "\"
     actually approves and mutates paired.json.

A new no_pending_entries error code distinguishes "stage 1 returned no
selected.requestId" from a real approval failure so the engine does not
infinite-loop. Stage 2 failures surface the underlying stderr. The
requestId returned by stage 1 is validated against a safe charset before
interpolation into the bash -lc commit script.

Tests (tests/OpenClaw.Tray.Tests/OperatorPairingApprovalTests.cs):
  - TwoStage_PreviewThenCommit_Succeeds (argv shape pinned for both stages)
  - TwoStage_PreviewEmpty_NoPendingEntries (stage 2 must NOT run)
  - TwoStage_CommitFails_SurfacesStructuredFailure (surfaces stderr)
  - TwoStage_PreviewReturnsUnsafeRequestId_DoesNotRunCommit (defense in depth)
  - ParsePreviewJson_V20265_Shape_ReturnsRequestId
  - ParsePreviewJson_Empty_ReturnsNoPendingEntries
  - ParsePreviewJson_OkFalse_ReturnsApprovalFailure

Existing DoesNotPassUrlOverride and NonZeroExit tests updated for the
two-stage flow; all prior 12 approval tests remain green.

Validation:
  ./build.ps1                                                            ok
  dotnet test tests/OpenClaw.Tray.Tests   --no-restore  502 / 502 passed
  dotnet test tests/OpenClaw.Shared.Tests --no-restore 1180 / 1180 passed

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…e stderr in failure (Bug 1 part 4)

Bostick-11 Round-2 Path B drive surfaced a deterministic race: the engine's first `--token`-authenticated call into the in-distro CLI in Phase 12 triggers an internal Linux-operator auto-bootstrap inside the gateway. The bootstrap completes successfully (linux operator entry IS persisted to paired.json) but the CLI process that drove it exits non-zero; a fresh process invocation made hundreds of ms later succeeds because the internal operator is now pre-paired.

Fix:

- WslGatewayCliPendingDeviceApprover.ApproveLatestAsync retries stage 1 once on first failure with a 750ms backoff (configurable; tests use TimeSpan.Zero).

- On final stage-1 failure, both attempts' stderr (each truncated to 1 KB) are surfaced in PendingDeviceApprovalResult.ErrorMessage so future regressions are diagnosable from setup-state.json without digging tray.log.

Tests added/updated:

- Stage1FailsThenSucceeds_OverallSuccess (retry path)

- Stage1FailsTwice_SurfacesBothStderrs (structured failure with stderr)

- TruncateStderr_RespectsCap_AndAppendsTruncationMarker

- Existing NonZeroExit_SurfacesStructuredFailureCode updated to assert stderr surfacing

Validation: build.ps1 green; Tray tests 505/505 passed; Shared tests 1158 passed + 22 skipped = 1180 baseline.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…; surface stdout (Bug 1 part 5)

Bostick-11 Round-3 (commit 05f7be0) proved the part-4 retry IS firing but BOTH stage-1 attempts still exit non-zero with EMPTY stderr in the engine's invocation context. The IDENTICAL script run manually via wsl -- bash -lc <script> from PowerShell against the engine's exact post-failure gateway state returns exit 0 with valid 1054-byte preview JSON.

Leading hypothesis (Bostick): the embedded \"\\" shell substitution gets mangled when .NET ProcessStartInfo.ArgumentList encoding forwards the script through wsl.exe to bash -lc — the embedded double-quotes interact badly with .NET's MSVCRT-style escaping and/or wsl.exe's argv re-encoding, leaving bash with an empty/malformed --token argument and causing the CLI to silently exit non-zero.

Fix: read the gateway token via a SEPARATE wsl ... cat /var/lib/openclaw/gateway-token call, capture in C#, then interpolate as a single-quoted shell literal into the approve script. The script body now contains NO \ substitution and NO \" characters at all — there's nothing for .NET / wsl.exe argv encoding to mangle.

Diagnosability (belt-and-suspenders): also surface STDOUT (paired with stderr) for both stage-1 attempts and stage-2 failures. If some other invocation-context issue is still at play, the next regression is observable from setup-state.json alone — a CLI that writes JSON-mode errors to stdout (with empty stderr — exactly what Round 3 observed) is no longer invisible.

Token safety: reject tokens containing single quotes / newlines / control chars before interpolation. Token-read failures (file missing, empty, unreadable) surface as operator_pending_approval_failed with a 'token-read stage' prefix.

Tests: 511/511 Tray (505 baseline + 6 new — token-read fail, token-empty, unsafe token chars, stage-1 stdout surfaced, stage-2 stdout surfaced, no-\/no-\" script body invariant). 1158/1158 Shared (22 skipped, baseline). build.ps1 green.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… exit code (Bug 1 final)

CLI v2026.5.3-1 's `devices approve --latest --json` returns exit code 1
deterministically in preview mode even on the happy path; the JSON payload
on stdout (with `selected.requestId`, `approveCommand`, `requiresAuthFlags`)
IS the success signal. Bostick-11 Round-4 captured this via Aaron-20's stdout
surfacing and verified manual stage-2 with the captured requestId mutates
`paired.json` correctly.

Invert the gate in `WslGatewayCliPendingDeviceApprover.ApproveLatestAsync`:
parse the stdout JSON FIRST and treat a parseable preview shape as stage-1
success regardless of exit code. Exit-non-zero only triggers the structured
`BuildStage1Failure` path when there is no usable preview to extract.

Also short-circuit the 750ms retry in `RunStage1WithRetryAsync` when
attempt 1 returns parseable preview JSON, so the common success path no
longer burns the retry delay on every pair.

All prior parts retained: token pre-read + single-quoted shell literal
(part 5), retry on stage-1 failure (part 4), two-stage flow (part 3),
IsSafeRequestId guard (part 3), --url drop (part 2), stdout/stderr/exit
surfacing in failure messages (part 5).

Tests: +5 new in OperatorPairingApprovalTests (exit-1+valid JSON success,
exit-0+valid JSON success, exit-1+empty stdout failure, exit-1+malformed
JSON failure, exit-1+valid JSON skips retry). Tray 516/516, Shared 1180/1180.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…airing (Bug 3)

Phase 12 (Bug 1) is GREEN, which unmasked Bug 3 at Phase 14 (PairWindowsTrayNode). On a fresh local-loopback gateway the node-role connect arrives as reason=role-upgrade, isRepair=true; the gateway parks it on the pending list and the connect times out with windows_node_pairing_failed. There is no auto-approve handler upstream for this path.

Mirror the Phase-12 fix: SettingsWindowsTrayNodeProvisioner now takes an optional IPendingDeviceApprover and, when the first connect fails on a local gateway, drives openclaw devices approve --latest via the same WslGatewayCliPendingDeviceApprover and retries the connect once. Approver failures surface their own structured error code; remote gateways and provisioners with no approver wired keep the legacy windows_node_pairing_failed surface.

Tests: 8 new in WindowsTrayNodePairingApprovalTests.cs covering happy path, approver failure, no-pending-entries, retry-after-approve-still-fails, remote-gateway no-approve, first-connect-success no-approve, no-approver legacy passthrough, and OperationCanceled passthrough.

Tray 524/524, Shared 1180/1180.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant