Gateway compat CI (fork validation)#3
Open
ranjeshj wants to merge 381 commits into
Open
Conversation
… sidebar/voice width fixes - Skills page: redesigned with Expander-based collapsible groups (Enabled/Disabled) - Skills page: enable/disable toggle via gateway skills.update API - Skills page: code-behind card building (no ListView flash) - Skills page: badge styling matches cron page (colors, padding, centering) - Workspace page: cached file list to avoid re-fetch on navigation - Workspace page: improved loading indicator - Sidebar: reduced max pane width 320 -> 260 - Voice settings: removed MaxWidth constraint - Cron page: fixed result badge text vertical centering - Gateway client: added SetSkillEnabledAsync with correct payload shape - Gateway client: auto-refresh skills after update/install responses Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ncomplete-setup Fix incomplete onboarding Finish recovery
Nine structurally identical Copy* methods (CopySupportContext, CopyDebugBundle, CopyBrowserSetupGuidance, CopyPortDiagnostics, CopyCapabilityDiagnostics, CopyNodeInventory, CopyChannelSummary, CopyActivitySummary, CopyExtensibilitySummary) each repeated the same DataPackage + Clipboard.SetContent boilerplate alongside identical try/catch logging. This replaces all nine bodies with a single private CopyDiagnostic(string label, Func<GatewayCommandCenterState, string>) helper and reduces each method to an expression-bodied one-liner. No observable behavior change: log messages, clipboard content, error handling, and method signatures (exposed as Action delegates in DeepLinkActions) are all preserved exactly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Keep skills refreshes scoped to the active agent filter and key workspace file-list cache replay by agent id so agent-specific pages do not show stale data. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implement SetSkillEnabledAsync on the onboarding test fake after merging current master so PR validation covers the new gateway client interface. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
UI fixes: skills redesign, workspace caching, sidebar/voice width
…y-diagnostic-helper refactor: extract CopyDiagnostic helper for diagnostic copy methods
Add a shared ClipboardHelper for text copy operations and route existing WinUI clipboard writes through it while preserving the chat timeline flush behavior and App.CopyTextToClipboard API. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…etup" dismiss Two bugs reported by Scott Hanselman against master: 1. Tray app launched the onboarding wizard on every start even when the user already had a working remote-gateway operator configuration. StartupSetupState.RequiresSetup only short-circuited for node mode (EnableNodeMode + node device token) or MCP-only mode, so an operator with a non-default gateway URL + stored device token still got the wizard popped at OnLaunched. Fix: add an operator-mode short-circuit that requires BOTH a stored operator device token AND a non-default GatewayUrl (guards against orphan tokens after uninstall and against half-finished setups that never picked a gateway target). 2. On the SetupWarning page warn-and-confirm UI, clicking "Keep my setup" only toggled in-page state. Because OnboardingWindow defaulted SetupPath = Advanced when existing config was detected, the global nav-bar Next button stayed enabled, so the user was one click from advancing into ConnectionPage anyway. Fix: add OnboardingState.Dismiss() that raises a new Dismissed event; OnboardingWindow handles it by setting a _dismissedWithoutCompletion guard, then Close()ing the window. OnClosed now skips TryCompleteOnboarding when that guard is set so OnboardingCompleted is NOT fired and existing settings / gateway connection are preserved. SetupWarningPage.CancelReplace calls Props.Dismiss(). Belt-and-suspenders: drop the auto-default of SetupPath = Advanced for existing-config users in OnboardingWindow. With SetupPath left null, the nav-bar Next button is disabled on SetupWarning so the user MUST pick "Replace my setup", "Keep my setup", or "Advanced setup" explicitly — no accidental Next-into-setup path remains. Tests: - StartupSetupStateTests: operator paired with remote gateway returns false; operator token + default URL still returns true (stale-token guard); non-default URL alone (no token) still returns true. - OnboardingStateTests: Dismiss fires Dismissed but NOT Finished; safe without subscribers. Validation: - ./build.ps1 succeeded - Shared.Tests: 1548 passed, 28 skipped - Tray.Tests: 1175 passed Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rd-helper Refactor WinUI clipboard text copies
Fixes from a Hanselman adversarial code review (Opus + Codex parallel): 1. Per-gateway tokens (Codex HIGH) — RequiresSetup only scanned the legacy root identity (device-key-ed25519.json at the dataPath root). Modern pairings via DeviceIdentityStore write tokens at <dataPath>/gateways/<gatewayId>/device-key-ed25519.json (see GatewayConnectionManager._activeIdentityPath = perGatewayIdentityDir). Operators paired post-GatewayRegistry would still see the wizard pop on every launch. Fix: HasAnyOperatorDeviceToken now scans the legacy root AND every gateways/* subdir. 2. SSH-tunnel false positive (Codex HIGH) — SSH topology routes via ws://127.0.0.1:LocalPort and the user typically leaves GatewayUrl at default. HasNonDefaultGatewayUrl alone returned false. Fix: HasAnyConfiguredGatewayTarget treats (UseSshTunnel + non-empty SshTunnelHost) as a configured target. 3. NodeMode + MCP precedence regression (Codex MEDIUM) — original code was 'if (NodeMode && nodeToken) false; return !MCP;' which let MCP-only mode bypass setup even when NodeMode was accidentally true without a node token. The first patch made NodeMode short-circuit first, breaking that precedence. Fix: check EnableMcpServer BEFORE EnableNodeMode so MCP wins, matching original semantics. 4. _dismissedWithoutCompletion stuck on Close exception (Opus MEDIUM) — the flag was set BEFORE Close(); if Close() threw, the flag stayed true and TryCompleteOnboarding was permanently suppressed for the window's lifetime, wedging the user. Fix: reset the flag in the catch block so the X-button / Finish path still works. 5. DefaultGatewayUrl duplication (Opus HIGH) — the constant existed in both StartupSetupState and OnboardingExistingConfigGuard with only a comment promising sync. Fix: promote OnboardingExistingConfigGuard.DefaultGatewayUrl to public const (single source of truth) and reference it from StartupSetupState. Added DefaultGatewayUrl_MatchesGuardConstant invariant test. 6. CancelReplace UI flash (Opus MEDIUM) — setConfirmingReplace(false) was called immediately before Props.Dismiss(), causing a brief re-render of the 'Set up locally' button before the window closed. Fix: drop the dead state change. Tests added (5): - RequiresSetup_ReturnsFalse_WhenSshTunnelConfiguredWithStoredToken - RequiresSetup_ReturnsTrue_WhenSshTunnelEnabledButNoHostConfigured - RequiresSetup_ReturnsFalse_WhenOperatorTokenStoredOnlyInPerGatewayDir - RequiresSetup_ReturnsFalse_WhenMcpEnabledEvenWithNodeModeAndNoNodeToken - DefaultGatewayUrl_MatchesGuardConstant Validation: - ./build.ps1 succeeded - Shared.Tests: 1548 passed, 28 skipped - Tray.Tests: 1180 passed (5 new); all 16 onboarding-fix tests green Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the duplicate Conversations page with an enhanced Sessions page: - Remove 'Conversations' nav item (was showing identical data to Sessions) - Add SelectorBar with channel filter tabs (All + auto-populated per-channel) - Show per-session context usage as a progress bar (TotalTokens/ContextTokens) - Display input/output token counts per session (↓in / ↑out) - 3-row card layout: name+status, provider·model·channel, progress+tokens - Keep Reset/Compact/Delete action buttons from original SessionsPage - Redirect legacy 'conversations' nav tag to SessionsPage Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…with Fluent rows
UX overhaul of the OpenClaw Tray hub. Capabilities is folded into Permissions
so device-level capability picks and exec-policy/allowlist controls live in
one place. Settings gets a consistent Fluent row-card pattern with auto-save.
Both pages localize ~40 newly-introduced strings.
## Pages
- **PermissionsPage** absorbs the former Capabilities page:
- Node Mode master toggle + live Node Status card on top
- Per-capability rows (Browser, Camera, Canvas, Screen, Location, TTS, STT),
disabled and dimmed when Node Mode is off
- STT row description notes the Whisper model download trigger
- STT/TTS engine details render as subtle attached continuation panels
(no duplicate banner; provider combo + ElevenLabs config for TTS;
download status + retry hint for STT)
- Local MCP Server integration card
- Exec policy: default-action row + rules card with auto-save, count badge,
Fluent semantic action pills, trash-icon row actions, empty state
- Node allowlist (gateway-side, read-only)
- Windows-level privacy launcher row
- Whisper model auto-download when STT is toggled on, with failure surface
- **SettingsPage** rewrites the old expander layout into row cards:
General · Notifications · Privacy · Local Gateway (conditional). Auto-save
with a transient "Saved" toast bottom-right. No Save/Cancel buttons.
- **HubWindow** drops the standalone Capabilities nav item; `"capabilities"`
tag routes to PermissionsPage for back-compat. Permissions sidebar icon
switched from key to shield (Glyph EA18). Settings sidebar keeps its gear.
- Home and About/Info pages are untouched and identical to master.
## Localization
- 13 `CapabilitiesPage_*` x:Uid keys renamed to `PermissionsPage_*` (XAML +
5 locale resw + coverage tests + invariant list)
- 41 new `PermissionsPage_*` resw keys for code-built strings: capability
labels/descriptions, node status text, STT engine hints, MCP statuses,
rule-count formatters, allowlist messages, TTS provider status, MCP
token-read failure format
- Pinned in `LocalizationValidationTests.InvariantOrDeferredResourceKeys`
- New `LocalizationHelper.Format(key, args)` helper catches `FormatException`
from malformed translations so a translator placeholder typo can't crash
the UI thread
- New `NoLocale_HasEmptyOrWhitespaceValues` test prevents an empty resw value
from leaking the raw resource-key into UI via the GetString fallback
## Lifecycle + threading correctness
- `SettingsManager.Saved` subscribe/unsubscribe moved to page `Loaded` /
`Unloaded` on both pages; the per-navigation handler leak (and the latent
N² stale-page UI work it caused) is gone
- `EnsureWhisperModelDownloadedAsync` is `async void` with a try/catch
wrapping the entire body so no path can escape to
`SynchronizationContext.UnhandledException`; page-local
`_isDownloadingWhisperModel` + `_whisperDownloadError` give accurate hint
copy independent of `VoiceService` state
- Whisper-download early-return also defers to
`VoiceService.IsWhisperDownloadingModel` to avoid concurrent writes to the
model file
- `OnSettingsSaved` refreshes MCP/STT/TTS cards too, gated by `IsLoaded`;
`UpdateTtsCard` skips writes to TTS textboxes when `FocusState !=
Unfocused` so cross-surface saves can't clobber in-progress input
- `UpdateTtsCard` no longer unconditionally clears `TtsStatusText`, so the
auto-save toast ("Default provider: x", "ElevenLabs settings saved.") is
no longer wiped one frame later by the dispatched refresh
- `_execSavedHintTimer` / `_savedIndicatorTimer` reused per page instead of
allocated on every save
- `_execPolicyLoaded` one-shot latch replaced with scoped
`_loadingExecPolicy` try/finally flag — safe for future reload paths
## Exec policy
- Case-insensitive JSON read (accepts both `pattern` and `Pattern`) to
recover policy files written by the pre-fix anonymous-type leak; writes
always use lowercase going forward
- Auto-saves on every mutation (add rule, remove rule, default action
change). Inline "Saved" pill in the rules-card header, 1.5s
- `NewRuleAction` ComboBox now uses `Tag="allow"/"deny"` rather than reading
the localizable `Content`, so future translations can't break the
JSON-on-disk contract
## Tests / validation
- 1161 / 1161 tray tests pass (added `NoLocale_HasEmptyOrWhitespaceValues`)
- All locales preserve format-placeholder parity (existing test)
- Build clean on net10.0-windows10.0.22621.0 / win-arm64
- Two Hanselman-style dual-model adversarial reviews
(Claude Opus 4.7 + GPT-5.3-Codex) ran across the diff; all HIGH-consensus
and LOW-consensus-real findings have fixes in this commit
## Master-merge work
- Carried over master's clipboard refactor: `ClipboardHelper.CopyText`
replaces the `DataPackage` + `Clipboard.SetContent` pair in the MCP
token/URL copy methods on PermissionsPage
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove the orphaned Conversations page files after routing conversations into Sessions, and update the chat root comment to point at SessionsPage. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…gs-info-merge Merge Capabilities into Permissions; redesign Settings & Permissions with Fluent rows
…ons-page feat: unify Sessions and Conversations into single Sessions page
Assert sanitized jsonlPath error responses now that internal exception details stay local to logs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Assert the battery failure payload keeps internal exception details out of the response. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…smiss Addresses Scott Hanselman's review on PR openclaw#340: Blocking fix: - OnboardingExistingConfigGuard.GetSummary().HasOperatorDeviceToken only checked DeviceIdentity.HasStoredDeviceToken on the legacy root path. Modern pairings store the operator token at <dataPath>/gateways/<id>/device-key-ed25519.json via DeviceIdentityStore, so a fresh-paired user opening Setup/Reconfigure could overwrite a working gateway without seeing the "Replace my setup / Keep my setup" warning. - Extracted the per-gateway scan (previously private to StartupSetupState) to OnboardingExistingConfigGuard.HasAnyOperatorDeviceToken as the single source of truth. StartupSetupState.HasUsableOperatorConfiguration and GetSummary() both call it now, so the startup auto-launch decision and the in-wizard guard always agree on what counts as paired. Hardening (Scott's lower-confidence suggestion): - OnboardingState.Dismiss() is now idempotent. A double-click or repeated handler invocation no longer fires the lifecycle signal twice. Tests added: - OnboardingExistingConfigGuardTests.HasExistingConfiguration_ReturnsTrue_ WhenOperatorTokenStoredOnlyInPerGatewayDir — Scott's exact test shape. - OnboardingStateTests.Dismiss_IsIdempotent_FiresDismissedAtMostOnce. Follow-up tracked separately (per Scott's note): - Make the startup token scan registry-aware (prefer the active GatewayRegistry record's identity dir over arbitrary gateways/* dirs) to avoid orphan dirs from suppressing onboarding for a different active gateway. Validation: - ./build.ps1 succeeded - Shared.Tests: 1548 passed, 28 skipped - Tray.Tests: 1182 passed (+2 new) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…f-section-probe-and-missing-settings-2026-05-09-ae8d66c4b9104b7f [Repo Assist] fix(wsl): add mountFsTab=false + [time] section to wsl.conf; make IsAlreadyConfigured probe section-aware
…age-leaks-remaining-2026-05-11-83f5733e4978f96a [Repo Assist] fix(security): stop leaking ex.Message in node client, device capability, and approval prompts
…jsonlpath-exmessage-leak-2026-05-13-78f4414fcfd54f2f [Repo Assist] fix(security): remove residual ex.Message leak in canvas jsonlPath error path
…-existing-config fix(onboarding): skip wizard for paired operators and make "Keep my setup" actually dismiss
Ensure node capability registration has a NodeService available before node connect, surface binding failures in diagnostics, and cover the diagnostic failure path with a regression test.
…tate (openclaw#466) * fix: ID-based parallel tool call tracking with truthful Interrupted state Tool calls in the chat window could get stuck in 'running' state because: - Single ActiveToolCallId slot couldn't track parallel tools - Turn end/error events didn't finalize in-progress tools - Legacy fallback could misroute outputs to wrong tools Changes: - Add ChatToolCallStatus.Interrupted for tools that never completed - Add ActiveToolCalls (ImmutableDictionary) for ID-based parallel tracking - Extract itemId from gateway events for correlation - ResolveToolEntry: strict ID lookup (no misrouting), legacy fallback only when no ID - ApplyTurnEnd marks remaining in-progress tools as Interrupted - Keep output mapping until turn end (handles command_output + item end ordering) - UI renders Interrupted as grey dash glyph (truthful, not fake Success) - 9 new regression tests for parallel tools, ID correlation, interrupted state Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix tool output preservation Preserve command output when an empty item-end event follows the real output for the same tool call. Also serialize tray tests that mutate OPENCLAW_TRAY_DATA_DIR so local validation is deterministic. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Scott Hanselman <scott@hanselman.com>
* feat: add exec approval V2 prompt adapter interface Defines the prompt adapter contract needed before the coordinator (PR7) can be wired up. The interface decouples the coordinator from any UI implementation; the null stub lets PR7 compile and be tested without a WinUI dependency. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address V2 prompt adapter review feedback - Add CancellationToken to PromptAsync contract - Introduce ExecApprovalPromptOutcome (Deny=0) to eliminate fail-open default from ExecApprovalDecision on the prompt-facing interface - Add required CorrelationId to ExecApprovalV2PromptRequest for audit/telemetry - Document SessionKey semantics: origin, scope, null meaning, display safety - Tighten DisplayCommand doc to call out control chars and BiDi overrides - Harden ProductionWiring test: skip bin/obj, add deletion comment - Add tests: cancelled token, fail-closed default, CorrelationId storage, full Allow/AllowOnce/AllowAlways/Deny outcome coverage Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: AlexAlves87 <alexalves87@github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Avoid reserializing parsed gateway event JSON just to compute debug-log payload lengths. Instead, pass the original raw message length through the event dispatch path so chat and agent event logs keep the same privacy-preserving shape/length signal without extra serialization work.
Dispatch node capability invokes off the WebSocket receive loop with bounded concurrency so slow commands do not block health, pairing, ping, or subsequent invoke traffic. Serialize concurrent WebSocket sends and add regression coverage for slow invoke unblocking, busy rejection, and async JSON args lifetime.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaces hard-coded 'latest' in LocalGatewaySetupOptions with a compile-time constant from new gateway-lkg.json (initial pin: 2026.5.17). Adds OPENCLAW_GATEWAY_VERSION env var override for CI matrices and hands-on validation. GatewayLkgTests enforces JSON/constant sync so drift fails the build. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
W0 — .github/workflows/gateway-compat-spike.yml (manual dispatch only): proves WSL + Ubuntu-24.04 + openclaw install + provider config validation on a windows-2025 runner before we build the real harness. Records cold-start timings and the authoritative provider config shape. W2 — tools/fake-llm-server/: minimal OpenAI-compatible HTTP mock used by the gateway-compat tests to avoid burning real provider credit. Scope is intentionally tiny (one non-streaming endpoint + assertion endpoints); expand as scenarios demand. W3.1 — Compile-time gating for the future tray.testhook.* MCP tool surface. New MSBuild property OpenClawEnableTestHooks=true defines the OPENCLAW_E2E_HOOKS constant; the placeholder TestHookCapability.cs is wrapped in #if OPENCLAW_E2E_HOOKS. Rubber-duck critique flagged that env-var gating in a shipped binary is unsafe (loopback MCP token + destructive hooks like pairing.reset); compile-time gating + a Release-build smoke test (ReleaseBuildExcludesTestHooksTests, verified to fail loudly when the hooks are accidentally shipped) keep the dangerous surface out of production tray binaries. Validated: build green; shared 1808 passed; tray 1128 passed (incl. the new smoke test + verified red when -p:OpenClawEnableTestHooks=true). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds docs/GATEWAY_COMPAT_TESTING.md as the operator-facing companion to the implementation plan: pieces, LKG bump flow (manual + automated), local override, opting into compile-time test hooks for local dev, running the fake LLM standalone, adding a new scenario, extending the fake LLM. Adds a 'Gateway version (LKG) pinning' section to docs/RELEASING.md that names the source of truth, the auto-bump workflow, the no-auto-merge rule, and the runtime override env var. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The spike (.github/workflows/gateway-compat-spike.yml) was authored to
prove the WSL + openclaw + provider-config + fake-LLM pipeline on
windows-2025 before sinking effort into the real harness. After several
iterations (lessons captured below), the run is now green end-to-end in
~2m12s and the canonical provider config shape is verified.
Spike outcome
-------------
- windows-2025 ships WSL 2.7.3.0 preinstalled, no distros. Ubuntu-24.04
install ~36s; openclaw npm install ~66s; full spike job 2m12s cold.
CI budget verified for the real workflows.
- Provider config root is models.providers.<id>, NOT agents.providers.<id>.
Verified accepted keys (openclaw 2026.5.18 schema):
api / baseUrl / apiKey / authMode / models[].id
- Default selector: agents.defaults.model.primary = "<provider>/<model>".
- openclaw config patch --file accepts atomic JSON5 patches.
- openclaw config validate is the build gate.
- openclaw config schema prints the full 2.2 MB canonical schema.
The verified JSON5 patch is committed to tools/fake-llm-server/README.md
and will be used verbatim by the W3 harness.
Lessons baked into the workflow
-------------------------------
- Shell scripts live in tools/spike/*.sh with .gitattributes "*.sh
text eol=lf" so CRLF on Windows checkout never breaks "set -euo
pipefail" inside WSL.
- Workflow steps invoke .sh files via `wsl ... -- bash $wslPath`
through a ConvertTo-WslPath PowerShell helper. NOT via piping
PS here-strings to wsl stdin (which mangles encoding).
- Diagnostics step is `continue-on-error: true` so a fresh runner
without registered distros (the expected state) doesn't kill the
job before real work begins.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds the TestHookCapability the gateway-compat harness will drive via the local MCP HTTP server. The class is compile-time gated behind OpenClawEnableTestHooks=true (production tray binaries do not contain it, enforced by ReleaseBuildExcludesTestHooksTests). NodeService registers it MCP-only (registerOnGateway: false) so a misbehaving gateway can never trigger destructive hooks like pairing.reset, and the capability second-gates on OPENCLAW_TRAY_E2E=1 at runtime. Surface (8 commands declared; diagnostics.dump fully implemented): - tray.testhook.diagnostics.dump (implemented) - tray.testhook.gateway.config.patch (stub) - tray.testhook.localSetup.start/status/cancel (stub) - tray.testhook.connection.waitFor (stub) - tray.testhook.pairing.reset (stub) - tray.testhook.chat.send (stub) Stubs return a stable "not yet implemented" error so the harness can probe the surface, and a test asserts that message stays stable so a future commit filling in a tool cannot regress to silent success. 13 unit tests in OpenClaw.Tray.Tests cover the surface snapshot, both gates, the diagnostics shape (snapshot via JSON parse), error wrapping, and the stub failure mode. Test project defines OPENCLAW_E2E_HOOKS so it can exercise the class; the Release-build smoke test re-verifies absence in the shipped tray binary. Validated: 1140 tray tests pass (+12); 1808 shared tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
New tests/OpenClaw.GatewayCompat.E2ETests/ xUnit project that drives the
real tray exe over MCP. GatewayCompatFixture provisions isolated AppData,
finds a free port, spawns the E2E-built tray with OPENCLAW_TRAY_E2E=1,
waits for mcp-token.txt + the HTTP listener, and hands tests an McpClient
ready to call tray.testhook.* tools.
Test taxonomy via xUnit Trait:
Tier=Smoke - HarnessSmokeTests: spawn tray, list tools, call
tray.testhook.diagnostics.dump. Runs anywhere; no WSL.
Tier=Gateway - OperatorPairingTests etc.: real gateway scenarios.
Gated by GatewayCompatFactAttribute which skips unless
OPENCLAW_RUN_GATEWAY_COMPAT=1, so they only run on the
Windows+WSL CI lane.
Reuses tests/OpenClaw.Tray.IntegrationTests/McpClient.cs via <Compile Link>
so the JSON-RPC wire shape stays single-source-of-truth.
Locates the E2E tray binary via OPENCLAW_E2E_TRAY_EXE env first, then
falls back to src/OpenClaw.Tray.WinUI/bin/{E2E,Debug}/.../OpenClaw.Tray.WinUI.exe.
The harness expects that build to have -p:OpenClawEnableTestHooks=true;
without it, tray.testhook.* tools are absent and the smoke test fails
loudly.
OperatorPairingTests added as a Tier=Gateway placeholder (Assert.Fail
with "Implementation pending - W3.2 follow-up tools required") so the
real CI workflow has a target to depend on while the testhook stubs are
filled in.
Validated end-to-end: built tray with -p:OpenClawEnableTestHooks=true,
ran smoke tier - 2 tests pass, fixture spawn + MCP handshake + diagnostics
dump round-trip all work in 2 seconds.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
gateway-compat.yml
- On PR/push to relevant paths: runs the Smoke tier (no WSL) - merge gate.
- On schedule (nightly 07:00 UTC) or workflow_dispatch with
run_gateway_tier=true: also runs the Gateway tier with WSL +
Ubuntu-24.04 + openclaw + fake LLM. Matrix tests gateway_version
in [lkg, latest]; "latest" failures are alert-only (continue-on-error
via matrix include.failure_is_blocking=false).
- Reusable via workflow_call so gateway-lkg-bump.yml can invoke it.
- Reuses tools/spike/*.sh + ConvertTo-WslPath helper from the W0 spike.
gateway-lkg-bump.yml
- Scheduled every 6h. Polls registry.npmjs.org/openclaw for the
"latest" dist-tag, compares to gateway-lkg.json.
- Refuses pre-releases (alpha/beta/rc/...) unless force_version is set.
- On newer candidate: calls gateway-compat.yml as a reusable workflow
with the candidate version and run_gateway_tier=true.
- On green: opens (or updates) a PR titled
"chore(lkg): bump gateway LKG to X.Y.Z" updating gateway-lkg.json AND
src/OpenClaw.Shared/GatewayLkg.cs in lockstep (the existing
GatewayLkgTests enforces drift = build failure).
- PR body records previous + new version, npm publish time, tarball
shasum, and a link to the validation workflow run.
- NEVER auto-merges. CODEOWNER review required.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Every test hook must invoke the same method the matching UI click handler invokes. If a handler does the work inline, extract a shared service method first and have both the handler and the hook call that method. No parallel implementations - they defeat the purpose of gateway-compat (a test that passes against a stub tells us nothing about whether the real UI path works). Rule encoded in: - src/OpenClaw.Tray.WinUI/Services/TestHooks/TestHookCapability.cs file header (anyone editing the file has to read it) - docs/GATEWAY_COMPAT_TESTING.md "Same-path-as-user rule" section with a mapping table (test hook -> shared method -> UI caller) - plan.md - Repository memory Each new tool comment will name the UI caller and the shared method so future refactors can't drift. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
First real W4 hook. Writes a JSON5 patch into the WSL distro and runs
the exact same `openclaw config patch --file <path>` + `openclaw config
validate` CLI sequence the user can run by hand - via the same
IWslCommandRunner the tray uses for every other WSL operation. No
parallel implementation (same-path rule).
NodeService now constructs a WslExeCommandRunner and hands it to
TestHookCapability, mirroring how LocalGatewaySetup obtains the runner.
Args: { distroName, patchJson, openclawBinPath?, patchPath?, wslUser? }
Returns: { writeOk, writeStderr, patchOk, patchStdout, patchStderr,
validateOk, validateStdout, validateStderr, patchPath }
The hook returns Ok=true even when validate fails so the harness can
inspect WHY (typical pattern: a future gateway version moves a key and
the scenario test surfaces the exact schema error).
5 new TestHookCapabilityTests cover:
- requires IWslCommandRunner
- requires distroName / patchJson
- exact 3-call sequence (write, patch, validate) with arg snapshots
and base64 round-trip verification of the written body
- validate failure returns Ok=true with payload (doesn't throw)
- write failure short-circuits (no patch or validate call)
New tests/OpenClaw.GatewayCompat.E2ETests/GatewayConfigPatchTests.cs
is a Tier=Gateway scenario that asserts the verified fake-LLM patch
shape still validates against the running gateway. Catches schema drift
in the openclaw config root and blocks the LKG-bump auto-PR when
upstream breaks compatibility.
Validated: 1145 tray tests pass (+5); harness builds.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per user direction: E2E scenarios will cover what unit tests do today, so trim unit tests to the irreducible set the harness cannot replace. Deletes from TestHookCapabilityTests: - Surface stability snapshot (covered by HarnessSmokeTests.ToolsList...) - Diagnostics shape (covered by HarnessSmokeTests.DiagnosticsDump...) - Diagnostics provider-error wrapping (low value, breaking the host in E2E is impractical) - All "not yet implemented" placeholder assertions (they go away as each hook is implemented and gets a real scenario test) - Gateway-config-patch arg-validation guards (distroName/patchJson) Keeps: - AllTools_AreGatedBy_OPENCLAW_TRAY_E2E (security invariant E2E can't prove) - UnknownCommand (trivial) - gateway.config.patch exact-command-sequence assertion (same-path rule) - gateway.config.patch failure-mode tests (write fails, validate fails) - requires-IWslCommandRunner Deletes from LocalGatewaySetupTests: - 4 OPENCLAW_GATEWAY_VERSION env-override tests - LocalGatewaySetupOptions_DefaultsToLkgVersion (These will be re-covered by an E2E scenario that sets OPENCLAW_GATEWAY_VERSION and asserts the actually-installed gateway version matches.) Promotes Gateway tier (LKG cell only) to run on every PR. The matrix expands to ['lkg','latest'] only on schedule. Adds ~3min PR latency in exchange for catching gateway regressions before merge instead of the morning after. Tests: 1129 tray (was 1145; -16 redundant); shared still 1808. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds: - tray.testhook.connection.waitFor - tray.testhook.pairing.reset - tray.testhook.chat.send - tray.testhook.localSetup.start / status / cancel All four follow the same-path-as-user rule: each invokes the same production method the matching UI click handler invokes. New plumbing: - ITestHookHost interface (compile-time-gated) aggregates the App-level dependencies the hooks need. App.TestHookHost.cs (partial class, also compile-time-gated) wires it up. - TestHookCapability accepts an optional ITestHookHost. NodeService passes (App.Current as App) when registering the capability. Same-path mappings: - connection.waitFor -> IGatewayConnectionManager.StateChanged (same event tray icon + ConnectionPage observe) - pairing.reset -> GatewayRegistry.Remove + per-gateway identity wipe (same Remove method UI surfaces use) - chat.send -> OpenClawChatDataProvider.SendMessageAsync (same method ChatWindow.OnSendClicked invokes) - localSetup.start -> App.CreateLocalGatewaySetupEngine + RunLocalOnlyAsync (same chain LocalSetupProgressPage / OnboardingV2Bridge invoke) LocalSetup hook is async-shaped: start kicks off RunLocalOnlyAsync on a background Task with its own CTS, status polls the latest engine state (captured via the same StateChanged event the V2 bridge subscribes to), cancel triggers the CTS. Concurrency-guarded: a second start while a run is in-flight returns an error rather than racing. ITestHookHost is also linked into OpenClaw.Tray.Tests so the existing unit tests still compile. Tray tests: 1129 passing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaces the placeholder OperatorPairingTests Assert.Fail with real
end-to-end scenarios that drive the production code paths via the
tray.testhook.* tools. Per user direction: no stubs, all fully
implemented and tested.
New GatewayCompatScenarios.cs centralizes:
- DistroName ("Ubuntu-24.04") and FakeLlmPort
- The verified fake-LLM provider JSON5 patch (single source of truth
for the schema-validated body; tools/fake-llm-server/README.md and
this file move together)
- ApplyFakeLlmProviderAsync (called by every scenario)
- UnwrapToolPayload helper for MCP tools/call response shape
7 scenarios under Tier=Gateway (skipped unless OPENCLAW_RUN_GATEWAY_COMPAT=1):
1. GatewayConfigPatchTests — pre-existing; validates the fake-LLM provider
patch against the live gateway. Failure blocks LKG auto-bump.
2. OperatorPairingTests — drives local-setup -> waits for operator
Connected -> asserts a device ID was issued.
3. NodePairingTests — waits for node Connected+Paired -> asserts
gateway sees the node via app.nodes (existing production MCP tool).
4. ToolEventsTests — regression guard for the "tool-events cap missing"
bug (repo memory). Sends a chat and confirms send=true.
5. ChatRoundTripTests — sends a chat via chat.send and asserts the
fake LLM server received the user message verbatim (via the W2
/__assert/last-request endpoint).
6. NodeInvokeTests — asserts gateway sees the Windows node with at
least one capability via app.nodes; the failure mode this guards
is "node.invoke silently dropped" per docs/gateway-node-integration.md.
7. ReconnectTests — pair -> pairing.reset -> re-pair, asserts Ready in
both passes and that reset removed at least one record.
Validation (no-hooks build, normal dev):
- Shared 1808 passed
- Tray 1129 passed
- Harness Smoke 2 passed, Gateway 7 skipped (correctly gated)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The dotnet restore at the workflow root doesn't generate the win-x64 RID-targeted assets for the WinUI sub-projects (FunctionalUI, OnboardingV2). The existing ci.yml works around this by omitting --no-restore on the 'Build Tray App (WinUI)' step, which triggers the RID-targeted restore. Mirror that here. Caught by the first PR-triggered run of gateway-compat.yml on the fork (run 26141658423). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
npm reports no such version 2026.5.17, so PR CI failed to install it in the Gateway tier. The W0 spike (run 26138294682) installed and verified 2026.5.18 (which is npm dist-tag 'latest'). Use that as the real LKG. GatewayLkgTests stays green because both gateway-lkg.json and GatewayLkg.cs are bumped together. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
First real PR-triggered run on the fork (run 26142143433) revealed the hook was passing args to RunInDistroAsync which prepends '-d name --'. Combined with my '-u user --' that produces a double-'--' that ends wsl arg parsing prematurely - bash sees '-' as positional arg 0 and fails with 'bash: - : invalid option'. Switch to RunAsync directly with the production-pattern args: wsl -d <distro> -u <user> -- bash -lc <script> This matches LocalGatewaySetup.cs:993 exactly (which is the production install command users run via the local-setup flow). Unit tests updated to snapshot the new arg layout. FakeWslRunner now implements RunAsync (was previously only RunInDistroAsync). Distro name extracted from '-d' arg position for test assertion convenience. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PR-triggered run 26142580405 surfaced the exact schema requirement: models.providers.fake.models.0.name: Invalid input The W0 spike (which used 'openclaw config schema') only confirmed the provider root path; it didn't probe the inner array element shape. Real validate caught it. Updated GatewayCompatScenarios.FakeLlmProviderPatch and the docs in tools/fake-llm-server/README.md to use 'name' instead of 'id'. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previous PR-triggered runs flip-flopped between 'models.0.id: Invalid' and 'models.0.name: Invalid' depending on which field was missing last. The real shape requires BOTH id and name plus reasoning, input, cost, contextWindow, maxTokens - taken verbatim from openclaw's own src/config/model-alias-defaults.test.ts fixture. Also fix authMode -> auth (schema.help.ts:938 confirms 'auth' is the canonical name). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Real schema confirmed at src/config/zod-schema.core.ts:319 of the gateway repo. Required: id (min 1) + name (min 1). All other fields optional. My JSON5 shape was correct but flip-flopping errors suggest the parser is picky. Switch to strict JSON with quoted keys to remove parser ambiguity as a variable. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PR run 26143696116 advanced past the schema issue but hit: 'ConfigMutationConflictError: config changed since last load' openclaw config patch is read-modify-write and can race with the gateway's own config writes. Retry up to 5 times with 500ms*attempt backoff, but only for that specific error - other failures fail fast. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The workflow no longer pre-installs WSL + openclaw under Ubuntu-24.04.
The gateway-compat scenarios now drive the production install path
themselves via tray.testhook.localSetup.start (the same code path the
LocalSetupProgressPage 'Set up locally' button invokes). That is the
exact regression target we want to test against new gateway versions.
- Drop: Install Ubuntu-24.04 distro
- Drop: Provision openclaw user
- Drop: Install openclaw@<version>
- Drop: Start fake LLM server inside WSL
- Add: WSL host diagnostics (wsl --version/status/list)
- Keep: Register WSL path helper (useful for log paths)
- Change: Collect WSL gateway log now targets OpenClawGateway distro
(production default created by LocalGatewaySetup engine)
- Change: Cleanup WSL distro now unregisters OpenClawGateway
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Introduces a collection-scoped xUnit fixture that drives the full production tray.testhook.localSetup.start flow once, then shares the resulting installed-and-paired tray with every gateway-tier scenario in the [Collection(`"Gateway`")] collection. Cost (~3-4 min cold) is paid once per CI run instead of per scenario. Adds GatewayCompatScenarios helpers: - DriveLocalSetupAndPrepareGatewayAsync: kicks off localSetup, polls localSetup.status to terminal, shells wsl.exe into OpenClawGateway to launch tools/spike/start-fake-llm.sh, then applies the verified fake-LLM provider patch. - StartFakeLlmInDistroAsync: wsl.exe-based bootstrap, UTF-8 capture. - WaitForConnectionAsync: client-side polling around <=20s server waits to respect McpClient's 30s HTTP timeout. - FindRepoRoot + ToWslPath: path helpers. DistroName flipped from Ubuntu-24.04 to the production default OpenClawGateway (LocalGatewaySetupOptions.DistroName). A separate ReconnectFixture lets ReconnectTests own its own pairing state since it resets and re-pairs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Now that GatewayCollectionFixture drives the full production install and pairing flow once per CI run, the per-scenario setup boilerplate (ApplyFakeLlmProviderAsync + localSetup.start + connection.waitFor with 600s server timeouts) goes away. Each test body becomes just the specific assertion it was meant to express. - OperatorPairingTests, NodePairingTests, ToolEventsTests, ChatRoundTripTests, NodeInvokeTests, GatewayConfigPatchTests: joined [Collection(`"Gateway`")], use GatewayCollectionFixture, and confirm settled connection state via WaitForConnectionAsync (client-side polling, respects McpClient 30s timeout). - GatewayConfigPatchTests now uses GatewayCompatScenarios.DistroName + FakeLlmProviderPatch (the verified strict-JSON patch shape), exercising idempotence against the already-installed gateway. - ReconnectTests stays per-class on ReconnectFixture so the reset / re-pair dance doesn't trash the shared collection state. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Iteration in response to the first push to PR #3 where 7 of 9 gateway scenarios failed: - Collection fixture's first localSetup attempt failed at `"Creating the OpenClaw Gateway WSL instance`" within 18s on a cold runner. All five shared-collection scenarios then failed instantly because the fixture init faulted once and xUnit reuses the fault. - ReconnectFixture's attempt got past WSL install but hung 20 min at `"Pairing Windows tray node`" before our timeout fired. Changes: - DriveLocalSetupAndPrepareGatewayAsync now retries once on status=FailedRetryable. Matches the production `"Retry`" button UX the user would click on a transient WSL hiccup. Terminal failures (FailedTerminal) still fail-fast. - localSetup wall timeout bumped from 20 min to 25 min (gives the pairing step more headroom; will revisit if it still times out). - GatewayCompatFixture preserves the tray's DataDir (including openclaw-tray.log) into ` before deleting it, when the workflow sets that env. Workflow sets it to TestResults/Gateway-<version>/tray-data, which is uploaded as part of the existing gateway-tier results artifact. - `"Collect WSL gateway log`" now also dumps openclaw service logs under ~/.openclaw, distro process list, and listening sockets — so the next failure tells us whether the gateway was even listening when pairing hung. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CI run 26146683288 surfaced two issues. This commit fixes #1; #2 is a production-side issue documented in the next-session handoff. #1: Race between the shared Gateway collection and ReconnectFixture Both fixtures spawned in parallel and both invoked tray.testhook.localSetup.start on their own tray instances. localSetup eventually calls wsl --install OpenClawGateway, and on a fresh runner one side wins while the other sees a partial registration and bails with wsl_existing_distro_unavailable (WSL_E_DISTRO_NOT_FOUND on the probe). Confirmed via setup-state.json artifacts. Fix: disable assembly-level parallel collection execution. The Gateway collection and ReconnectFixture now serialize. The ~3-4 min cold install is paid once for the collection and once for Reconnect; total wall time roughly equals the single longest scenario plus reconnect, which fits inside the existing 45 min job budget. #2 (deferred): operator pair succeeds; node pair fails because - node connects with role=node but existing approval is role=operator, so gateway returns NOT_PAIRED/role-upgrade. - tray autopair fires node.pair.approve and gets `"unknown requestId`" (likely a race against the just-issued request). - then the gateway sends `"shutdown / 1012 service restart`" and never comes back: tray gets `"Unable to connect to the remote server`" for the next 20 minutes until our deadline expires. This is a production / gateway-side flow problem and is now visible precisely because Plan A drives the real path. Investigation belongs in a follow-up commit (likely either: ensure the user-systemd unit for openclaw-gateway sets Restart=on-failure, or add a tray.testhook hook that calls `openclaw devices approve` from inside WSL to side-step the autopair race). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Validating the W1-W5 gateway compatibility CI changes within the fork before any upstream PR. Do not merge - this PR exists to drive the workflows on real CI.
Branch contains 12 commits covering:
Expected: ci.yml passes; gateway-compat Smoke passes (~3min); gateway-compat Gateway tier vs LKG attempts a full WSL+openclaw run (~10-15 min). First real run may need timing tweaks.