Skip to content

fix(cli): use active node for startup bench scripts#84451

Merged
giodl73-repo merged 1 commit into
openclaw:mainfrom
giodl73-repo:fix-bench-node-execpath
May 20, 2026
Merged

fix(cli): use active node for startup bench scripts#84451
giodl73-repo merged 1 commit into
openclaw:mainfrom
giodl73-repo:fix-bench-node-execpath

Conversation

@giodl73-repo
Copy link
Copy Markdown
Contributor

Fixes #83921.

Summary

  • Use process.execPath instead of bare node when benchmark wrapper scripts spawn scripts/bench-cli-startup.ts.
  • Allow narrowed benchmark budget runs, such as --preset startup, to compare only the cases that were actually generated instead of failing on unrelated fixture cases.
  • Add focused coverage for both wrapper spawning and narrowed preset behavior.

Verification

  • git diff --check
  • Blacksmith Testbox via Crabbox tbx_01ks1wp6dw5wb18s2v7nfv7e1r:
    • pnpm build
    • CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 OPENCLAW_TESTBOX=1 OPENCLAW_TESTBOX_REMOTE_RUN=1 node scripts/run-vitest.mjs test/scripts/cli-startup-bench-spawner.test.ts
    • node_bin="$(command -v node)"; PATH=/nonexistent "$node_bin" scripts/test-update-cli-startup-bench.mjs --out /tmp/openclaw-cli-startup-bench-review.json --preset startup --runs 1 --warmup 0 --timeout-ms 10000
    • node_bin="$(command -v node)"; PATH=/nonexistent "$node_bin" scripts/test-cli-startup-bench-budget.mjs --preset startup --runs 1 --warmup 0 --timeout-ms 10000 --max-duration-regression-pct 100000 --max-first-output-regression-pct 100000 --max-rss-regression-pct 100000

Real behavior proof

Behavior addressed: CLI startup benchmark wrapper scripts no longer depend on a node executable being discoverable through PATH, and narrowed startup budget checks no longer fail on unrelated real-mode fixture cases.

Real environment tested: Blacksmith Testbox via Crabbox, provider blacksmith-testbox, id tbx_01ks1wp6dw5wb18s2v7nfv7e1r.

Exact steps or command run after this patch: Built the source tree, ran the focused Vitest file, then ran both benchmark wrapper commands with PATH=/nonexistent while invoking the wrappers through the current Node executable.

Evidence after fix: The focused Vitest file passed 2 tests, test-update-cli-startup-bench.mjs wrote /tmp/openclaw-cli-startup-bench-review.json, and test-cli-startup-bench-budget.mjs completed its startup budget comparison successfully.

Observed result after fix: Testbox completed with exit code 0.

What was not tested: Full benchmark presets and full CI were not run for this narrow script fix.

@openclaw-barnacle openclaw-barnacle Bot added scripts Repository scripts size: S maintainer Maintainer-authored PR labels May 20, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 20, 2026

Codex review: needs maintainer review before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
This PR changes the CLI startup benchmark wrapper scripts to spawn the benchmark runner with process.execPath, relaxes non-all preset budget comparisons to generated cases, and adds focused coverage.

Reproducibility: yes. at source level: current main has spawnSync("node", args) in both wrapper scripts while sibling benchmark code already uses process.execPath. I did not run a writable repro in this checkout, but the PR body supplies after-fix Testbox proof with PATH=/nonexistent.

PR rating
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Summary: The patch is small and well-proven for the main bug, with the only merge hesitation being maintainer selection against an overlapping open PR and the intentional narrowed-preset semantics change.

Rank-up moves:

  • none
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Sufficient (live_output): The PR body includes real Testbox commands that built the tree, ran focused Vitest coverage, and exercised both wrapper scripts with PATH=/nonexistent after the patch.

Risk before merge

  • An earlier open PR, fix(scripts): spawn bench runners under process.execPath (#83921) #84304, addresses the same linked issue, so maintainers should choose one branch rather than land overlapping fixes.
  • The narrowed-preset budget change intentionally moves non-all runs to overlap-only fixture comparison; maintainers should be comfortable that this is the desired script contract before merge.

Maintainer options:

  1. Decide the mitigation before merge
    Land one focused fix for the linked benchmark wrapper bug, preferably with the process.execPath change and regression coverage, then close the overlapping PR that is not selected.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge
No automated repair is needed; maintainers should choose between this PR and #84304 before merging either overlapping fix.

Security
Cleared: The diff only changes local benchmark scripts and tests, adds no dependencies or workflow execution surface, and narrows child-process resolution to the current Node executable.

Review details

Best possible solution:

Land one focused fix for the linked benchmark wrapper bug, preferably with the process.execPath change and regression coverage, then close the overlapping PR that is not selected.

Do we have a high-confidence way to reproduce the issue?

Yes, at source level: current main has spawnSync("node", args) in both wrapper scripts while sibling benchmark code already uses process.execPath. I did not run a writable repro in this checkout, but the PR body supplies after-fix Testbox proof with PATH=/nonexistent.

Is this the best way to solve the issue?

Yes for the executable mismatch: using process.execPath is the narrow established fix and aligns with the benchmark harness. The maintainer choice is whether to land this richer PR or the older overlapping PR, because this one also changes narrowed-preset budget semantics and adds tests.

Label changes:

  • add P2: This is a normal-priority tooling bug fix for benchmark scripts with limited runtime blast radius and a clear linked issue.
  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes real Testbox commands that built the tree, ran focused Vitest coverage, and exercised both wrapper scripts with PATH=/nonexistent after the patch.
  • add rating: 🐚 platinum hermit: Current PR rating is 🐚 platinum hermit because proof is 🦞 diamond lobster, patch quality is 🐚 platinum hermit, and The patch is small and well-proven for the main bug, with the only merge hesitation being maintainer selection against an overlapping open PR and the intentional narrowed-preset semantics change.
  • add status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (live_output): The PR body includes real Testbox commands that built the tree, ran focused Vitest coverage, and exercised both wrapper scripts with PATH=/nonexistent after the patch.

Label justifications:

  • P2: This is a normal-priority tooling bug fix for benchmark scripts with limited runtime blast radius and a clear linked issue.
  • rating: 🐚 platinum hermit: Current PR rating is 🐚 platinum hermit because proof is 🦞 diamond lobster, patch quality is 🐚 platinum hermit, and The patch is small and well-proven for the main bug, with the only merge hesitation being maintainer selection against an overlapping open PR and the intentional narrowed-preset semantics change.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (live_output): The PR body includes real Testbox commands that built the tree, ran focused Vitest coverage, and exercised both wrapper scripts with PATH=/nonexistent after the patch.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes real Testbox commands that built the tree, ran focused Vitest coverage, and exercised both wrapper scripts with PATH=/nonexistent after the patch.

What I checked:

Likely related people:

  • vincentkoc: Introduced the CLI startup benchmark harness and later added response/parent startup budget work in the same script family. (role: feature introducer and recent area contributor; confidence: high; commits: bdd59e014936, 97283f0a2e55, 8717525fbc58; files: scripts/bench-cli-startup.ts, scripts/test-cli-startup-bench-budget.mjs)
  • gumadeiras: Standardized the CLI startup benchmark scripts and multi-sample baselines that include the affected wrapper files. (role: baseline and wrapper standardization contributor; confidence: high; commits: 68e49fa791aa, 47136536c898; files: scripts/test-cli-startup-bench-budget.mjs, scripts/test-update-cli-startup-bench.mjs, test/fixtures/cli-startup-bench.json)
  • steipete: Recently touched the affected wrapper scripts during the custom test planner runtime removal, so they are a plausible routing candidate for script-maintenance review. (role: adjacent refactor contributor; confidence: medium; commits: 9ef5d85e40b9; files: scripts/test-cli-startup-bench-budget.mjs, scripts/test-update-cli-startup-bench.mjs)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 989e53c20d39.

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P2 Normal backlog priority with limited blast radius. labels May 20, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 20, 2026

ClawSweeper PR egg

✨ Hatched: 💎 rare Gilded Clawlet

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 💎 rare.
Trait: collects tiny proofs.
Image traits: location artifact grotto; accessory proof snapshot camera; palette cobalt, lime, and pearl; mood watchful; pose curling around a status light; shell frosted glass shell; lighting soft studio lighting; background subtle branch markers.
Share on X: post this hatch
Copy: My PR egg hatched a 💎 rare Gilded Clawlet in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@giodl73-repo giodl73-repo merged commit 67c12e0 into openclaw:main May 20, 2026
126 of 134 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maintainer Maintainer-authored PR P2 Normal backlog priority with limited blast radius. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. scripts Repository scripts size: S status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Benchmark spawner scripts use bare "node" instead of process.execPath, risking version mismatch

1 participant