Skip to content

Extend gateway configuration timeout#652

Merged
ranjeshj merged 1 commit into
masterfrom
ranjeshj/fix-preparing-gateway-timeout
Jun 3, 2026
Merged

Extend gateway configuration timeout#652
ranjeshj merged 1 commit into
masterfrom
ranjeshj/fix-preparing-gateway-timeout

Conversation

@ranjeshj
Copy link
Copy Markdown
Collaborator

@ranjeshj ranjeshj commented Jun 2, 2026

Summary

  • extend the WSL gateway configuration step timeout from 30s to 120s
  • return a timeout-specific setup failure message instead of the generic empty exit -1 error
  • add setup-engine tests for the extended timeout and timeout-specific failure path

Fixes #638.

Validation

  • OPENCLAW_REPO_ROOT=...; .\build.ps1
  • OPENCLAW_REPO_ROOT=...; dotnet test .\tests\OpenClaw.Shared.Tests\OpenClaw.Shared.Tests.csproj --no-restore
  • OPENCLAW_REPO_ROOT=...; dotnet test .\tests\OpenClaw.Tray.Tests\OpenClaw.Tray.Tests.csproj --no-restore
  • OPENCLAW_REPO_ROOT=...; dotnet test .\tests\OpenClaw.Shared.Tests\OpenClaw.Shared.Tests.csproj --no-build --no-restore — 2074 total, 0 failed, 2045 passed, 29 skipped
  • OPENCLAW_REPO_ROOT=...; dotnet test .\tests\OpenClaw.Tray.Tests\OpenClaw.Tray.Tests.csproj --no-build --no-restore — 934 total, 0 failed
  • OPENCLAW_REPO_ROOT=...; dotnet test .\tests\OpenClaw.SetupEngine.Tests\OpenClaw.SetupEngine.Tests.csproj --no-build --no-restore — 199 total, 0 failed

@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented Jun 2, 2026

Codex review: needs maintainer review before merge. Reviewed June 3, 2026, 10:32 AM ET / 14:32 UTC.

Summary
The PR extends the WSL configure-gateway timeout from 30s to 120s, adds a timeout-specific setup failure message, and adds setup-engine unit coverage.

Reproducibility: yes. The linked setup logs show wsl.exe timing out around the current 30s limit, and current main routes ConfigureGatewayStep through CommandRunner, which returns TimedOut=true and exit -1 on process timeout; I did not run a live Windows/WSL reproduction.

Review metrics: 3 noteworthy metrics.

  • Diff scope: 2 files changed. The patch is confined to one production setup step and its setup-engine tests.
  • Timeout change: 30s to 120s. This is the main user-visible behavior and the merge tradeoff maintainers need to accept or tune.
  • Regression tests: 2 added. The new tests cover both the extended timeout value and the timeout-specific failure message.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🌊 off-meta tidepool
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Confirm or tune the 120s per-attempt setup wait before release.

Risk before merge

  • [P2] The timeout increase changes the worst-case configure-gateway wait from 30s to 120s per attempt; with the default three-attempt retry policy, a true hang can keep users in setup materially longer before failure.
  • [P1] This read-only Linux review did not run a live Windows/WSL slow-start setup smoke; the PR body reports the repository build and test validation instead.

Maintainer options:

  1. Accept the bounded longer setup wait (recommended)
    Merge once normal checks pass if maintainers are comfortable trading a longer true-hang wait for fewer false configure-gateway timeouts on slow WSL startup.
  2. Tune the timeout policy first
    If 120s per attempt is too long, adjust the configure-gateway timeout or retry policy while keeping the timeout-specific failure message and tests.
  3. Pause for live WSL proof
    Hold the PR until a Windows/WSL setup run shows the configure-gateway step completing after the old 30s limit if maintainers want runtime proof before release.

Next step before merge

  • [P2] The remaining action is maintainer acceptance of the longer setup timeout on a collaborator-authored PR, not an automated repair.

Security
Cleared: The diff changes only setup timeout/error handling and test fakes; it does not add dependency, secret, permission, workflow, or code-execution surface.

Review details

Best possible solution:

Land the narrow timeout and timeout-message fix with the focused setup-engine tests once maintainers accept or tune the 120s per-attempt setup wait.

Do we have a high-confidence way to reproduce the issue?

Yes. The linked setup logs show wsl.exe timing out around the current 30s limit, and current main routes ConfigureGatewayStep through CommandRunner, which returns TimedOut=true and exit -1 on process timeout; I did not run a live Windows/WSL reproduction.

Is this the best way to solve the issue?

Yes, mostly. Extending only the configure-gateway timeout and mapping TimedOut to a specific message is narrow and preserves the GATEWAY_CONFIGURED success sentinel, but the exact 120s value is a maintainer UX tradeoff.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 402c1a39a908.

Label changes

Label justifications:

  • P2: This fixes a bounded local WSL gateway onboarding failure reported by a user, without evidence of a broader runtime outage.
  • merge-risk: 🚨 availability: The PR intentionally lengthens the setup failure wait for genuine configure-gateway hangs, which unit tests cannot fully settle.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🌊 off-meta tidepool and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Not applicable: The external-contributor real-behavior-proof gate is not applied because this PR is from a collaborator; the PR body reports build and test validation.
Evidence reviewed

What I checked:

  • Repository policy read: AGENTS.md was read fully; its setup-engine validation and onboarding architecture guidance are relevant to this review, although no validation was run because this is a read-only checkout review. (AGENTS.md:1, 402c1a39a908)
  • Current main still has the old timeout: ConfigureGatewayStep currently runs the gateway configuration WSL script with TimeSpan.FromSeconds(30) and reports the generic exit-code failure when the sentinel is missing. (src/OpenClaw.SetupEngine/SetupSteps.cs:1246, 402c1a39a908)
  • Timeouts map to exit -1: CommandRunner marks process timeout as TimedOut=true, kills the process, and returns exit code -1, matching the linked setup logs and making a timeout-specific message a narrow contract-compatible change. (src/OpenClaw.SetupEngine/CommandRunner.cs:102, 402c1a39a908)
  • Default retries explain repeated failures: RetryPolicy.Default uses three attempts with an initial two-second delay, so increasing the per-attempt timeout materially changes worst-case failure wait. (src/OpenClaw.SetupEngine/RetryExecutor.cs:5, 402c1a39a908)
  • PR diff is focused: The patch adds ConfigureGatewayStep.GatewayConfigurationTimeout at 120s, uses it for RunInWslAsync, handles TimedOut with a specific message, and adds two focused tests plus fake-runner support. (src/OpenClaw.SetupEngine/SetupSteps.cs:1191, 4c32c3f63018)
  • Supported setup path: The onboarding docs describe local setup as installing and connecting a fresh app-owned OpenClawGateway WSL instance, so the configure-gateway step is core supported setup behavior. (docs/ONBOARDING_WIZARD.md:3, 402c1a39a908)

Likely related people:

  • ranjeshj: GitHub commit history shows prior authored setup-engine work on deterministic WSL setup and WSL setup failure handling in the same source and test files; this is prior merged history, not just authorship of this PR. (role: feature-history owner and recent area contributor; confidence: high; commits: 98d0760cb5b6, ff3a7cc53736; files: src/OpenClaw.SetupEngine/SetupSteps.cs, tests/OpenClaw.SetupEngine.Tests/SetupStepsTests.cs)
  • shanselman: GitHub commit metadata lists shanselman as committer on recent WSL setup and failure-handling commits that shaped the current ConfigureGatewayStep path. (role: merger for recent setup-engine history; confidence: medium; commits: 98d0760cb5b6, ff3a7cc53736; files: src/OpenClaw.SetupEngine/SetupSteps.cs, tests/OpenClaw.SetupEngine.Tests/SetupStepsTests.cs)
  • RBrid: Current local blame for the configure-gateway block points to release-era commit 7d9152f authored by Régis Brid in the shallow checkout. (role: recent area contributor; confidence: medium; commits: 7d9152f427a3; files: src/OpenClaw.SetupEngine/SetupSteps.cs, tests/OpenClaw.SetupEngine.Tests/SetupStepsTests.cs)
  • kmahone: Recent GitHub commit history shows local dev setup work touching the same setup-engine source and test files after the WSL setup rewrites. (role: recent adjacent contributor; confidence: medium; commits: 9106725567af; files: src/OpenClaw.SetupEngine/SetupSteps.cs, tests/OpenClaw.SetupEngine.Tests/SetupStepsTests.cs)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P2 Normal priority bug or improvement with limited blast radius. merge-risk: 🚨 availability 🚨 Merging this PR could cause crashes, hangs, restart loops, stalls, or process outages. labels Jun 2, 2026
Give the WSL gateway configuration step more time for slow WSL startup and report timeout failures explicitly instead of surfacing an empty exit -1 error.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ranjeshj ranjeshj force-pushed the ranjeshj/fix-preparing-gateway-timeout branch from 4ec4fcf to 4c32c3f Compare June 3, 2026 14:29
@ranjeshj ranjeshj merged commit 3ddade7 into master Jun 3, 2026
17 checks passed
@ranjeshj ranjeshj deleted the ranjeshj/fix-preparing-gateway-timeout branch June 3, 2026 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 availability 🚨 Merging this PR could cause crashes, hangs, restart loops, stalls, or process outages. P2 Normal priority bug or improvement with limited blast radius. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error on "Preparing Gateway"

1 participant