fix(native-hook-relay): prune stale bridge files on registration#87706
Conversation
|
Codex review: passed. Reviewed May 28, 2026, 1:16 PM ET / 17:16 UTC. Summary PR surface: Source +59, Tests +148. Total +207 across 2 files. Reproducibility: yes. The linked source PR includes a concrete live WSL2/systemd reproduction with stale bridge records causing native hook failures, and current source shows the native hook CLI fails closed when the relay cannot be reached. Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land the focused pruning fix once required checks pass, preserving the live and unknown-liveness guards and treating the uid-shared deletion boundary as an explicit maintainer risk decision. Do we have a high-confidence way to reproduce the issue? Yes. The linked source PR includes a concrete live WSL2/systemd reproduction with stale bridge records causing native hook failures, and current source shows the native hook CLI fails closed when the relay cannot be reached. Is this the best way to solve the issue? Yes. The branch keeps the fix bounded to bridge registration cleanup, preserves live and unknown-liveness foreign records, and avoids adding a new config surface; the remaining decision is the uid-shared deletion boundary. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 031117135027. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +59, Tests +148. Total +207 across 2 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
🦞✅ Source: What merged:
Automerge notes:
The automerge loop is complete. Automerge progress:
|
|
@clawsweeper stop |
|
🦞✅ I added |
baf3826 to
d56dd3b
Compare
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper automerge |
|
🦞👀 Command router queued. I will update this comment with the next step. Re-review progress:
|
Only the current gateway process should own records in the native hook relay bridge directory. After a crash, SIGUSR1 supervisor restart, or any non-graceful shutdown, stale bridge files (pointing to dead PIDs and unbound ports) can remain in /tmp/openclaw-native-hook-relays-<uid>/. When Codex clients enumerate the bridge directory and select a stale registration, every native tool call fails with 'Native hook relay unavailable' until the next clean restart. This is the recurring root cause behind #73723 and #87536. Fix: at the start of registerNativeHookRelayBridge, scan the bridge directory and remove any .json record whose 'pid' field does not match process.pid. This makes the relay system self-healing across crash/ restart cycles without requiring operators to manually clean up the directory or rely on ExecStartPre hooks. Refs: #87536, #73723 Signed-off-by: Adam Houk <adam@appliedai.solutions>
Co-authored-by: Stbckrlx <271293351+Applied-AI-Solutions-hub@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
0a0d5ac to
65c17cd
Compare
|
@clawsweeper automerge |
|
🦞👀 Command router queued. I will update this comment with the next step. |
Makes #87563 merge-ready for the ClawSweeper automerge loop.
The edit pass should inspect the live PR diff, review comments, and failing checks; rebase if needed; keep the contributor branch credited; and stop only when validation is green or an external blocker is proven.
ClawSweeper 🐠 replacement reef notes:
Co-author credit kept:
fish notes: model gpt-5.5, reasoning high; reviewed against baf3826.