Skip to content

fix(gateway): quiet default watch sync I/O traces#77816

Closed
rubencu wants to merge 0 commit into
openclaw:mainfrom
rubencu:codex/quiet-watch-sync-trace
Closed

fix(gateway): quiet default watch sync I/O traces#77816
rubencu wants to merge 0 commit into
openclaw:mainfrom
rubencu:codex/quiet-watch-sync-trace

Conversation

@rubencu
Copy link
Copy Markdown
Contributor

@rubencu rubencu commented May 5, 2026

Summary

  • Problem: pnpm gateway:watch enabled Node sync-I/O tracing by default, which could flood the watch terminal with repeated Detected use of sync API stack blocks during otherwise healthy startup.
  • What changed: watch mode no longer injects OPENCLAW_TRACE_SYNC_IO=1; explicit OPENCLAW_TRACE_SYNC_IO=1 still passes through for diagnostics.
  • Scope boundary: no new public config surface. This only changes the default watch-mode behavior, docs, changelog, and the watch-node regression coverage.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #
  • Related #
  • This PR fixes a bug or regression

Real behavior proof (required for external PRs)

  • Behavior or issue addressed: default gateway:watch no longer enables Node sync-I/O tracing, so normal watch logs stay quiet and interactive.
  • Real environment tested: macOS local worktree, isolated OpenClaw home from scripts/check-gateway-watch-regression.mjs.
  • Exact steps or command run after this patch: node scripts/check-gateway-watch-regression.mjs --window-ms 2000 --ready-timeout-ms 30000 --cpu-fail-ms 20000
  • Evidence after fix:
{
  "windowMs": 2000,
  "watchTriggeredBuild": false,
  "watchBuildReason": null,
  "cpuMs": 10,
  "totalCpuMs": 3980,
  "readyBeforeWindow": true,
  "distRuntimeFileGrowth": 0,
  "distRuntimeByteGrowth": 0,
  "watchExit": {
    "code": 143,
    "signal": null
  },
  "timing": {
    "userSeconds": 2.8,
    "sysSeconds": 1.18,
    "elapsedSeconds": 6.08
  }
}
  • Additional log check after the bounded run: rg over .local/gateway-watch-regression/watch/watch.stderr.log and watch.stdout.log found no Detected use of sync API, --trace-sync-io, or OPENCLAW_TRACE_SYNC_IO output.
  • Observed result after fix: bounded default watch reached Gateway readiness without rebuild churn, exited cleanly by SIGTERM, and the captured stdout/stderr logs contained no sync-I/O trace output.
  • What was not tested: owner-side long-running watch session; owner performs final manual verification outside this agent run.
  • Before evidence: owner observed the repeated trace-output loop disappear when running with OPENCLAW_TRACE_SYNC_IO=0.

Root Cause (if applicable)

  • Root cause: scripts/watch-node.mjs set OPENCLAW_TRACE_SYNC_IO=1 for gateway:watch when the user had not provided an explicit override.
  • Missing detection / guardrail: watch-node coverage asserted the default trace flag, so the noisy diagnostic default was locked in instead of tested as opt-in behavior.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/infra/watch-node.test.ts
  • Scenario the test locks in: default gateway watch does not inject sync-I/O tracing, while explicit OPENCLAW_TRACE_SYNC_IO=0 and OPENCLAW_TRACE_SYNC_IO=1 remain preserved.
  • Why this is the smallest reliable guardrail: it verifies the environment contract at the watcher boundary that caused the terminal noise.

User-visible / Behavior Changes

pnpm gateway:watch no longer enables Node sync-I/O tracing by default. Use OPENCLAW_TRACE_SYNC_IO=1 when sync-I/O stack traces are needed for startup diagnostics.

Security Impact (required)

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) No
  • New/changed network calls? (Yes/No) No
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: local Node/pnpm worktree
  • Model/provider: N/A
  • Integration/channel: Gateway watch
  • Relevant config: isolated OpenClaw home from the watch regression harness

Steps

  1. Run pnpm gateway:watch on current main.
  2. Observe repeated Node Detected use of sync API stack blocks in the watch terminal.
  3. Apply this patch and run the targeted checks plus bounded watch regression smoke.

Expected

  • Default watch does not print sync-I/O trace stacks.
  • Explicit OPENCLAW_TRACE_SYNC_IO=1 still enables the diagnostic stacks.

Actual

  • After this patch, default watch reached Gateway readiness without rebuild churn and the bounded default watch logs contained no sync-I/O trace output.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Verification run:

  • pnpm exec oxfmt --check --threads=1 CHANGELOG.md docs/help/debugging.md scripts/watch-node.mjs src/cli/gateway-cli/run.ts src/infra/watch-node.test.ts
  • pnpm test src/infra/watch-node.test.ts
  • pnpm tsgo:test
  • pnpm lint --threads=8
  • pnpm check:docs
  • git diff --check origin/main..HEAD
  • node scripts/check-gateway-watch-regression.mjs --window-ms 2000 --ready-timeout-ms 30000 --cpu-fail-ms 20000
  • rg check over .local/gateway-watch-regression/watch/watch.stderr.log and watch.stdout.log for sync-I/O trace markers
  • codex review --base origin/main (clean)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios: default gateway watch env no longer injects OPENCLAW_TRACE_SYNC_IO, explicit OPENCLAW_TRACE_SYNC_IO=0 is preserved, explicit OPENCLAW_TRACE_SYNC_IO=1 is preserved, and bounded default watch reaches readiness without sync-I/O trace output.
  • Owner manual verification: owner performs final long-running watch verification outside this agent run.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No new public config/env surface
  • Migration needed? (Yes/No) No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: sync-I/O stack traces are no longer present by default in watch logs.
    • Mitigation: diagnostics remain available with explicit OPENCLAW_TRACE_SYNC_IO=1, and benchmark mode still documents its own trace handling.

@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation cli CLI command changes scripts Repository scripts size: S triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 5, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 5, 2026

Codex review: needs maintainer review before merge.

Summary
The PR removes the gateway-watch default OPENCLAW_TRACE_SYNC_IO=1 injection, updates watch-node regression coverage, and adjusts the debugging docs and changelog.

Reproducibility: yes. from source: current main sets OPENCLAW_TRACE_SYNC_IO=1 for gateway watch when unset, and the runner maps that to --trace-sync-io. I did not live-run the noisy terminal path in this read-only review, so the structured status is source-reproducible rather than live reproduced.

Real behavior proof
Sufficient (terminal): The PR body includes after-fix terminal JSON from a bounded gateway-watch regression run plus log searches showing no sync-I/O trace markers.

Next step before merge
No repair lane is needed; the PR already contains the narrow code, docs, and test change and should proceed through normal checks and maintainer review.

Security
Cleared: No concrete security or supply-chain concern found; the patch changes only local env defaulting, docs, changelog, and tests.

Review details

Best possible solution:

Land this PR after normal CI and maintainer validation, keeping sync-I/O stack traces opt-in through OPENCLAW_TRACE_SYNC_IO=1 without adding a new config surface.

Do we have a high-confidence way to reproduce the issue?

Yes, from source: current main sets OPENCLAW_TRACE_SYNC_IO=1 for gateway watch when unset, and the runner maps that to --trace-sync-io. I did not live-run the noisy terminal path in this read-only review, so the structured status is source-reproducible rather than live reproduced.

Is this the best way to solve the issue?

Yes: removing the watch-node default is the narrowest fix, preserves explicit OPENCLAW_TRACE_SYNC_IO=0 and 1, and aligns docs plus regression coverage without a new public knob.

What I checked:

Likely related people:

  • steipete: Recent history shows this maintainer introduced the default gateway-watch sync-I/O tracing and then adjusted adjacent benchmark trace filtering behavior. (role: recent maintainer and likely follow-up owner; confidence: high; commits: 35e48a049b2e, f8e080386d8b, a167acee6792; files: scripts/watch-node.mjs, scripts/gateway-watch-tmux.mjs, scripts/run-node.mjs)
  • vincentkoc: Recent merged work touched adjacent script runtime and watch-node hardening surfaces, though the specific default trace behavior is more directly tied to the gateway-watch tracing commits. (role: adjacent script/runtime maintainer; confidence: medium; commits: abd5ec98ab01, c7bbb3f9af36, ac3cd1a0ca8c; files: scripts/watch-node.mjs, scripts/run-node.mjs, src/infra/watch-node.test.ts)

Remaining risk / open question:

  • This read-only sweep did not execute the PR branch locally; it relies on source inspection plus the contributor's copied terminal and log proof.
  • The PR body explicitly leaves owner-side long-running watch verification for a maintainer.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 70d92b5e59df.

@openclaw-barnacle openclaw-barnacle Bot removed the triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. label May 5, 2026
@rubencu rubencu force-pushed the codex/quiet-watch-sync-trace branch 2 times, most recently from d9d01b6 to cb62b0e Compare May 5, 2026 11:46
@openclaw-barnacle openclaw-barnacle Bot added size: XS triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. and removed cli CLI command changes size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 5, 2026
Copy link
Copy Markdown

@byungskers byungskers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great DX improvement. Default watch mode was indeed noisy with sync-I/O traces. The test coverage is solid — verifying both the absence of the env var by default and the preservation of explicit overrides.

The CHANGELOG and docs updates are well-placed and clear. This is a clean, focused fix.

LGTM.

Copy link
Copy Markdown

@byungskers byungskers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great DX improvement. Default watch mode was indeed noisy with sync-I/O traces. The test coverage is solid — verifying both the absence of the env var by default and the preservation of explicit overrides.

The CHANGELOG and docs updates are well-placed and clear. This is a clean, focused fix.

LGTM.

@rubencu rubencu closed this May 10, 2026
@rubencu rubencu force-pushed the codex/quiet-watch-sync-trace branch from cb62b0e to d5fe89a Compare May 10, 2026 00:15
@rubencu
Copy link
Copy Markdown
Contributor Author

rubencu commented May 10, 2026

Closing rationale after rebasing this branch:

  • The original bug was real. On the old base, scripts/watch-node.mjs defaulted gateway watch to OPENCLAW_TRACE_SYNC_IO=1, and scripts/run-node.mjs converted that into Node's --trace-sync-io flag. A live bounded gateway-watch run on the old base reached readiness but produced 3,173 Detected use of sync API warning blocks and 525,112 stderr lines in a 2s watch window.
  • The fix has already landed on main through fix: keep gateway watch sync tracing opt-in #79110 (e984a99c7e): gateway watch now leaves sync-I/O tracing opt-in, and the docs/tests were updated there.
  • After rebasing this branch to current main (d5fe89abb5), the diff became empty and the contributor CHANGELOG.md entry is gone from this PR.
  • After-fix live proof on the rebased state used the same gateway-watch harness: readiness was reached, no rebuild churn occurred, stderr was empty, and there were no sync-I/O trace markers.

So this PR is closed because it is now a duplicate/empty merge vehicle, not because the issue was invalid. Current main already contains the correct behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Improvements or additions to documentation scripts Repository scripts size: XS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants