Skip to content

fix(acpx): await startup probe before gateway ready#80187

Merged
steipete merged 1 commit into
mainfrom
fix/acpx-ready-await-startup-probe-79596
May 10, 2026
Merged

fix(acpx): await startup probe before gateway ready#80187
steipete merged 1 commit into
mainfrom
fix/acpx-ready-await-startup-probe-79596

Conversation

@steipete
Copy link
Copy Markdown
Contributor

@steipete steipete commented May 10, 2026

Summary

  • Run the ACPX startup probe by default unless OPENCLAW_ACPX_RUNTIME_STARTUP_PROBE=0 opts back into lazy startup.
  • Await the ACPX probe inside service.start(), so Gateway sidecar readiness waits until acpx is usable or has reported a bounded probe failure.
  • Preserve the existing OPENCLAW_SKIP_ACPX_RUNTIME_PROBE=1 fast path for E2E scripts and update docs/changelog for the new default.

Closes #79596.

Real behavior proof

  • Behavior or issue addressed: Stuck ACPX ACP agents should not block Gateway readiness indefinitely during startup probing.
  • Real environment tested: Local OpenClaw Gateway run from the current PR head with isolated OPENCLAW_HOME, OPENCLAW_STATE_DIR, and OPENCLAW_CONFIG_PATH under /tmp/openclaw-acpx-proof.7B2Plt.
  • Exact steps or command run after this patch: Configured plugins.entries.acpx.enabled=true, plugins.entries.acpx.config.timeoutSeconds=0.001, and plugins.entries.acpx.config.agents.codex.command="node" with args=["-e", "setInterval(() => {}, 1000)"], then ran OPENCLAW_HOME=/tmp/openclaw-acpx-proof.7B2Plt/home OPENCLAW_STATE_DIR=/tmp/openclaw-acpx-proof.7B2Plt/state OPENCLAW_CONFIG_PATH=/tmp/openclaw-acpx-proof.7B2Plt/state/openclaw.json pnpm openclaw gateway run --auth none --bind loopback --port 19876 --verbose --allow-unconfigured.
  • Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output): Copied live output from the isolated Gateway run:
10:08:43 [plugins] embedded acpx runtime backend registered (cwd: /tmp/openclaw-acpx-proof.7B2Plt/home/.openclaw/workspace)
10:08:43 [plugins] embedded acpx runtime setup failed: embedded acpx runtime backend startup probe timed out after 0.001s
10:08:43 [gateway] ready
  • Observed result after fix: The startup probe timed out after 0.001s, the ACPX backend was reported degraded, and the gateway reached ready instead of hanging.
  • What was not tested: No additional gaps.

Verification

  • pnpm exec oxfmt --check --threads=1 extensions/acpx/register.runtime.ts extensions/acpx/src/service.ts extensions/acpx/src/service.test.ts docs/tools/acp-agents-setup.md CHANGELOG.md
  • pnpm test extensions/acpx/src/service.test.ts
  • env -u OPENCLAW_TESTBOX -u OPENCLAW_TESTBOX_ID pnpm check:changed
  • pnpm build
  • Isolated Gateway run above for bounded ACPX startup probe behavior.

Additional suite check:

  • pnpm test extensions/acpx still fails in extensions/acpx/src/claude-agent-acp-completion.test.ts:178 (resolved becomes true after idle). This file is not touched by this PR and the focused ACPX startup test passes.

@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation extensions: acpx size: S maintainer Maintainer-authored PR labels May 10, 2026
@steipete steipete force-pushed the fix/acpx-ready-await-startup-probe-79596 branch from 0181e94 to 7462d9d Compare May 10, 2026 08:47
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 10, 2026

Codex review: needs real behavior proof before merge.

Summary
The PR makes the ACPX startup probe run by default, awaits that probe inside ACPX service startup before gateway readiness can proceed, preserves probe opt-outs, and updates tests/docs/changelog.

Reproducibility: yes. from source plus issue logs: current main awaits plugin service startup before gateway ready, but ACPX currently launches its probe in a detached async block and can resolve startup before the probe reports ready or failed. I did not live-reproduce a Windows gateway boot in this read-only review.

Real behavior proof
Needs real behavior proof before merge: The PR body has test/build verification but no after-fix real gateway output; issue comments prove the existing opt-in workaround, not this default-await patch. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, ask a maintainer to comment @clawsweeper re-review.

Next step before merge
Protected maintainer label, a product-sensitive default change, and missing after-fix real behavior proof make this a human landing decision rather than a repair job.

Security
Cleared: The diff touches ACPX runtime glue, tests, docs, and changelog only; the new startup probe process launch is the stated behavior and existing opt-outs remain available.

Review details

Best possible solution:

Land the awaited-probe fix once maintainers accept the default-behavior change and add redacted cold-start proof; keep broader sidecar parallelization tracked separately in #79625.

Do we have a high-confidence way to reproduce the issue?

Yes, from source plus issue logs: current main awaits plugin service startup before gateway ready, but ACPX currently launches its probe in a detached async block and can resolve startup before the probe reports ready or failed. I did not live-reproduce a Windows gateway boot in this read-only review.

Is this the best way to solve the issue?

Yes, with maintainer approval for the default change: moving the ACPX probe into the awaited service-start path is the narrow owner-boundary fix for the reported readiness race while preserving explicit lazy-start opt-outs.

Acceptance criteria:

  • pnpm test extensions/acpx/src/service.test.ts
  • pnpm exec oxfmt --check --threads=1 extensions/acpx/register.runtime.ts extensions/acpx/src/service.ts extensions/acpx/src/service.test.ts docs/tools/acp-agents-setup.md CHANGELOG.md
  • pnpm check:changed
  • pnpm build
  • Cold gateway startup proof with ACPX enabled, showing the ACPX probe ready/failure log before gateway ready; redact private paths, tokens, endpoints, and host details.

What I checked:

  • Protected label: The live issue search output for this PR shows the protected maintainer label, so conservative cleanup must keep it open for explicit maintainer handling.
  • Gateway readiness waits on sidecars: Current main awaits startGatewaySidecars, then calls sidecar-ready hooks and logs gateway ready, so plugin service startup timing directly affects the ready signal. (src/gateway/server-startup-post-attach.ts:773, 0a17339a7064)
  • Plugin service startup contract: Current main awaits each plugin service's start method, which means ACPX can participate in readiness only through what its start promise waits for. (src/plugins/services.ts:72, 0a17339a7064)
  • Current ACPX behavior: Current main makes startup probing opt-in with OPENCLAW_ACPX_RUNTIME_STARTUP_PROBE=1, and the actual probe is launched in a detached async block, so ACPX service startup can resolve before the probe reports ready or failed. (extensions/acpx/src/service.ts:202, 0a17339a7064)
  • PR implementation: The PR flips ACPX probing to opt-out, adds OPENCLAW_SKIP_ACPX_RUNTIME_PROBE handling in the startup decision, and replaces the detached probe with awaited probe/error reporting inside service.start(). (extensions/acpx/src/service.ts:201, 7462d9df376b)
  • Regression coverage: The PR adds a test that holds service.start() open until the mocked ACPX probe resolves, plus opt-out coverage for lazy registration and skipped probe health. (extensions/acpx/src/service.test.ts:201, 7462d9df376b)

Likely related people:

  • steipete: Current-main blame attributes the ACPX startup-probe decision and detached probe block to Peter Steinberger, and local history shows repeated recent ACPX/gateway/plugin-service work in this area. (role: recent area contributor; confidence: high; commits: 5bcc6337afbf, 5659d7f985eb, 12c12570238c; files: extensions/acpx/src/service.ts, extensions/acpx/register.runtime.ts, src/plugins/services.ts)
  • Onur Solmaz: Local history links Onur to ACPX command/version configuration and ACP thread-bound agent work that intersects the runtime startup and probe-agent configuration surface. (role: adjacent feature contributor; confidence: medium; commits: 921ebfb25e9a, a7d56e3554d0; files: extensions/acpx/src/service.ts, docs/tools/acp-agents-setup.md)

Remaining risk / open question:

  • The PR body lists tests and build checks but does not include after-fix cold gateway logs or terminal output showing gateway ready after the awaited ACPX probe result.
  • This intentionally changes the documented lazy ACPX startup default, so maintainers should be comfortable with probing the configured ACP harness during gateway startup by default.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 0a17339a7064.

@steipete steipete force-pushed the fix/acpx-ready-await-startup-probe-79596 branch 3 times, most recently from 5d1f219 to e7be9bf Compare May 10, 2026 09:10
@steipete steipete force-pushed the fix/acpx-ready-await-startup-probe-79596 branch from e7be9bf to 2294499 Compare May 10, 2026 09:11
@steipete steipete merged commit c18cc76 into main May 10, 2026
109 of 110 checks passed
@steipete steipete deleted the fix/acpx-ready-await-startup-probe-79596 branch May 10, 2026 09:16
@steipete
Copy link
Copy Markdown
Contributor Author

Landed via rebase onto main.

  • Gate: focused ACPX service test, changed gate, build, real Gateway proof, CI, CodeQL Critical Quality
  • Land commit: c18cc76
  • Merge commit: c18cc76

Thanks @steipete!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Improvements or additions to documentation extensions: acpx maintainer Maintainer-authored PR size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: gateway "ready" fires before acpx plugin finishes registering

1 participant