Skip to content

fix(native-hook-relay): prune stale bridge files on registration#87706

Merged
clawsweeper[bot] merged 6 commits into
mainfrom
clawsweeper/automerge-openclaw-openclaw-87563
May 28, 2026
Merged

fix(native-hook-relay): prune stale bridge files on registration#87706
clawsweeper[bot] merged 6 commits into
mainfrom
clawsweeper/automerge-openclaw-openclaw-87563

Conversation

@clawsweeper
Copy link
Copy Markdown
Contributor

@clawsweeper clawsweeper Bot commented May 28, 2026

Makes #87563 merge-ready for the ClawSweeper automerge loop.
The edit pass should inspect the live PR diff, review comments, and failing checks; rebase if needed; keep the contributor branch credited; and stop only when validation is green or an external blocker is proven.

ClawSweeper 🐠 replacement reef notes:

  • Repair fallback: GitHub rejected the repair branch push because it updates workflow files and the ClawSweeper app token does not have workflows permission

Co-author credit kept:

fish notes: model gpt-5.5, reasoning high; reviewed against baf3826.

@clawsweeper clawsweeper Bot added agents Agent runtime and tooling size: S clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge proof: supplied External PR includes structured after-fix real behavior proof. proof: sufficient ClawSweeper judged the real behavior proof convincing. labels May 28, 2026
@clawsweeper clawsweeper Bot added P1 High-priority user-facing bug, regression, or broken workflow. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. merge-risk: 🚨 automation 🚨 May affect CI, automerge, proof capture, label sync, or maintainer automation. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane. clawsweeper Tracked by ClawSweeper automation labels May 28, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: supplied External PR includes structured after-fix real behavior proof. label May 28, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor Author

clawsweeper Bot commented May 28, 2026

Codex review: passed. Reviewed May 28, 2026, 1:16 PM ET / 17:16 UTC.

Summary
The PR adds registration-time pruning of expired or ESRCH-dead native-hook relay bridge JSON files and regression tests for dead, expired, live, and unknown-liveness foreign records.

PR surface: Source +59, Tests +148. Total +207 across 2 files.

Reproducibility: yes. The linked source PR includes a concrete live WSL2/systemd reproduction with stale bridge records causing native hook failures, and current source shows the native hook CLI fails closed when the relay cannot be reached.

Review metrics: 1 noteworthy metric.

  • New Bridge Cleanup Path: 1 registration-time scanner added. This is the merge-relevant behavior because it deletes files from a uid-scoped directory shared by same-user gateways.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • none.

Risk before merge

  • [P1] Registration now deletes expired or ESRCH-dead JSON records from an OS-user-shared bridge directory; if that predicate is wrong in a real same-uid multi-gateway setup, another profile's native-hook relay could be removed until it re-registers.

Maintainer options:

  1. Accept Guarded Stale-Record Cleanup (recommended)
    Land with the current predicate because it deletes only expired records or PIDs proven ESRCH-dead, while tests and live proof cover same-uid live-record preservation.
  2. Add Same-Relay Startup-Race Proof
    Before merge, add a focused regression where a stale record for the same relay id exists before registration and a hook invocation succeeds after the new bridge record is written.

Next step before merge

  • [P2] No repair lane is needed; the branch has no narrow mechanical defect and should proceed through the existing automerge/check gates plus maintainer risk acceptance.

Security
Cleared: No concrete security or supply-chain concern was found; the deletion path is confined to the existing uid-scoped private bridge directory and no dependency or workflow surface remains in the final diff.

Review details

Best possible solution:

Land the focused pruning fix once required checks pass, preserving the live and unknown-liveness guards and treating the uid-shared deletion boundary as an explicit maintainer risk decision.

Do we have a high-confidence way to reproduce the issue?

Yes. The linked source PR includes a concrete live WSL2/systemd reproduction with stale bridge records causing native hook failures, and current source shows the native hook CLI fails closed when the relay cannot be reached.

Is this the best way to solve the issue?

Yes. The branch keeps the fix bounded to bridge registration cleanup, preserves live and unknown-liveness foreign records, and avoids adding a new config surface; the remaining decision is the uid-shared deletion boundary.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 031117135027.

Label changes

Label changes:

  • remove merge-risk: 🚨 automation: Current PR review merge-risk labels are merge-risk: 🚨 compatibility, merge-risk: 🚨 availability.
  • remove status: 👀 ready for maintainer look: Current PR status label is status: 🚀 automerge armed.

Label justifications:

  • P1: The PR targets a broken native-hook relay path that can block Codex-backed native tool calls for real users.
  • merge-risk: 🚨 compatibility: The new cleanup changes how existing same-uid gateway/profile bridge records are treated during registration.
  • merge-risk: 🚨 availability: A wrong stale-record decision could make another live native-hook relay unavailable until that gateway refreshes its registration.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 🚀 automerge armed: This PR is in ClawSweeper's automerge lane. Sufficient (logs): The linked source PR supplies after-fix live command output and logs from a real OpenClaw WSL2/systemd setup, including preservation of a parallel same-uid gateway record.
  • proof: sufficient: Contributor real behavior proof is sufficient. The linked source PR supplies after-fix live command output and logs from a real OpenClaw WSL2/systemd setup, including preservation of a parallel same-uid gateway record.
Evidence reviewed

PR surface:

Source +59, Tests +148. Total +207 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 60 1 +59
Tests 1 149 1 +148
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 209 2 +207

What I checked:

  • Repository policy read: Root AGENTS.md and the scoped src/agents/AGENTS.md were read; their guidance made the uid-shared bridge directory and agent test guardrails relevant to this review. (AGENTS.md:1, 031117135027)
  • Pruning implementation: The head branch scans the bridge directory during registration, skips current-process and live/unknown foreign records, and removes records that are expired or whose PID returns ESRCH. (src/agents/harness/native-hook-relay.ts:720, 65c17cdf6edd)
  • Regression coverage: The PR adds tests for dead foreign PID pruning, expired foreign record pruning, preserving live foreign records, and preserving records when liveness cannot be proven. (src/agents/harness/native-hook-relay.test.ts:887, 65c17cdf6edd)
  • Current relay path: The CLI requires an explicit relay id and tries the direct bridge before gateway fallback, so the changed behavior is limited to native-hook relay registry files rather than a broader discovery surface. (src/cli/native-hook-relay-cli.ts:39, 65c17cdf6edd)
  • Codex hook registration path: Codex registers a deterministic relay id and emits hook commands from the registration handle, which keeps the fix on the existing native-hook relay path. (extensions/codex/src/app-server/native-hook-relay.ts:115, 65c17cdf6edd)
  • Source PR proof: The linked source PR body supplies real WSL2/systemd proof showing stale records pruned, a live same-uid Beacon gateway record preserved, and native commands succeeding afterward.

Likely related people:

  • Andy Ye: Blame in this checkout attributes the current native-hook relay and CLI relay paths to the recent grafted main commit, making this the best current-main routing signal available locally. (role: recent area contributor; confidence: medium; commits: 3fea2196923b; files: src/agents/harness/native-hook-relay.ts, src/cli/native-hook-relay-cli.ts)
  • Applied-AI-Solutions-hub: Authored the original source PR commits and supplied the live stale-record reproduction and after-fix proof that this writable replacement carries forward. (role: source repro and patch author; confidence: high; commits: a1ee12f31621, debd5178514e; files: src/agents/harness/native-hook-relay.ts, src/agents/harness/native-hook-relay.test.ts)
  • Takhoffman: Requested ClawSweeper automerge after the stop/re-arm sequence, which is relevant to the remaining risk-acceptance path for this PR. (role: review requester; confidence: medium)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper
Copy link
Copy Markdown
Contributor Author

clawsweeper Bot commented May 28, 2026

🦞✅
ClawSweeper merged this PR after the passing review.

Source: clawsweeper[bot]
Feedback: structured ClawSweeper verdict: pass (sha=65c17cdf6edd096380927f2898adc81ea4d38a79)
Merge status: merged by ClawSweeper automerge
Merged at: 2026-05-28T17:17:29Z
Merge commit: 202ccf4cf729

What merged:

  • The PR adds registration-time pruning of expired or ESRCH-dead native-hook relay bridge JSON files and regression tests for dead, expired, live, and unknown-liveness foreign records.
  • PR surface: Source +59, Tests +148. Total +207 across 2 files.
  • Reproducibility: yes. The linked source PR includes a concrete live WSL2/systemd reproduction with stale bri ... hook failures, and current source shows the native hook CLI fails closed when the relay cannot be reached.

Automerge notes:

  • PR branch already contained follow-up commit before automerge: test(native-hook-relay): cover stale bridge pruning
  • PR branch already contained follow-up commit before automerge: ci: raise plugin sdk strict smoke heap
  • PR branch already contained follow-up commit before automerge: test(native-hook-relay): satisfy process kill mock types
  • PR branch already contained follow-up commit before automerge: fix(native-hook-relay): prune stale bridge files on registration

The automerge loop is complete.

Automerge progress:

  • 2026-05-28 16:27:15 UTC review queued baf3826c990a (queued)
  • 2026-05-28 16:32:41 UTC review passed baf3826c990a (structured ClawSweeper verdict: pass (sha=baf3826c990a52231d84ece2c0d0e8eb4a0e9...)
  • 2026-05-28 17:05:05 UTC review queued 0a0d5ac004d4 (queued)
  • 2026-05-28 17:16:55 UTC review passed 65c17cdf6edd (structured ClawSweeper verdict: pass (sha=65c17cdf6edd096380927f2898adc81ea4d38...)
  • 2026-05-28 17:17:32 UTC merged 65c17cdf6edd (merged by ClawSweeper automerge)

@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. labels May 28, 2026
@Takhoffman
Copy link
Copy Markdown
Contributor

@clawsweeper stop

@clawsweeper
Copy link
Copy Markdown
Contributor Author

clawsweeper Bot commented May 28, 2026

🦞✅
Got it. ClawSweeper will leave this item for human review.

I added clawsweeper:human-review, removed clawsweeper:automerge, and paused the automation trail until a maintainer asks again.

@clawsweeper clawsweeper Bot added clawsweeper:human-review Needs maintainer review before ClawSweeper can continue and removed clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge labels May 28, 2026
@clawsweeper clawsweeper Bot force-pushed the clawsweeper/automerge-openclaw-openclaw-87563 branch from baf3826 to d56dd3b Compare May 28, 2026 16:44
@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane. merge-risk: 🚨 automation 🚨 May affect CI, automerge, proof capture, label sync, or maintainer automation. labels May 28, 2026
@Takhoffman
Copy link
Copy Markdown
Contributor

@clawsweeper re-review

@clawsweeper
Copy link
Copy Markdown
Contributor Author

clawsweeper Bot commented May 28, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@Takhoffman
Copy link
Copy Markdown
Contributor

@clawsweeper automerge

@clawsweeper
Copy link
Copy Markdown
Contributor Author

clawsweeper Bot commented May 28, 2026

🦞👀
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

Re-review progress:

@clawsweeper clawsweeper Bot added clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge and removed clawsweeper:human-review Needs maintainer review before ClawSweeper can continue labels May 28, 2026
Applied-AI-Solutions-hub and others added 6 commits May 28, 2026 17:08
Only the current gateway process should own records in the native hook
relay bridge directory. After a crash, SIGUSR1 supervisor restart, or any
non-graceful shutdown, stale bridge files (pointing to dead PIDs and
unbound ports) can remain in /tmp/openclaw-native-hook-relays-<uid>/.

When Codex clients enumerate the bridge directory and select a stale
registration, every native tool call fails with 'Native hook relay
unavailable' until the next clean restart. This is the recurring root
cause behind #73723 and #87536.

Fix: at the start of registerNativeHookRelayBridge, scan the bridge
directory and remove any .json record whose 'pid' field does not match
process.pid. This makes the relay system self-healing across crash/
restart cycles without requiring operators to manually clean up the
directory or rely on ExecStartPre hooks.

Refs: #87536, #73723

Signed-off-by: Adam Houk <adam@appliedai.solutions>
Co-authored-by: Stbckrlx <271293351+Applied-AI-Solutions-hub@users.noreply.github.com>

Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
@clawsweeper clawsweeper Bot force-pushed the clawsweeper/automerge-openclaw-openclaw-87563 branch from 0a0d5ac to 65c17cd Compare May 28, 2026 17:08
@clawsweeper clawsweeper Bot added size: S proof: supplied External PR includes structured after-fix real behavior proof. labels May 28, 2026
@clawsweeper clawsweeper Bot added merge-risk: 🚨 automation 🚨 May affect CI, automerge, proof capture, label sync, or maintainer automation. status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane. labels May 28, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: supplied External PR includes structured after-fix real behavior proof. label May 28, 2026
@Takhoffman
Copy link
Copy Markdown
Contributor

@clawsweeper automerge

@clawsweeper
Copy link
Copy Markdown
Contributor Author

clawsweeper Bot commented May 28, 2026

🦞👀
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

@clawsweeper clawsweeper Bot removed the status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. label May 28, 2026
@clawsweeper clawsweeper Bot merged commit 202ccf4 into main May 28, 2026
132 of 138 checks passed
@clawsweeper clawsweeper Bot deleted the clawsweeper/automerge-openclaw-openclaw-87563 branch May 28, 2026 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge clawsweeper Tracked by ClawSweeper automation merge-risk: 🚨 automation 🚨 May affect CI, automerge, proof capture, label sync, or maintainer automation. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: M status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants