fix(BUG-017): extend claude-mem-heal to patch hooks.json EPIPE race (cross-OS)#84
Merged
Merged
Conversation
…cross-OS) User encountered `UserPromptSubmit operation blocked by hook: printf: write error: Permission denied` on the hive project minutes after BUG-016 (PR #83) merged. BUG-016 closed the same EPIPE race for `.mcp.json` but explicitly deferred `hooks.json` -- the deferral was wrong: same root cause, same symptom class, just a different surface. The upstream `plugin/hooks/hooks.json` ships 6 hooks (Setup, SessionStart x2, UserPromptSubmit, PostToolUse, PreToolUse, Stop) all using the same broken cascade-pipe pattern. When the consumer breaks early, unconsumed producer writes EPIPE on Git Bash Windows. Changes: 1. scripts/claude-mem-heal.sh: - new `heal_hooks_json` function (~12 LOC) - walks both `<dir>/hooks/hooks.json` (cache layout, no `plugin/` subdir) and `<dir>/plugin/hooks/hooks.json` (marketplace layout) - minimal substitution via sed: `break; }; done` -> `}; done | head -n1` - idempotent (skips when broken pattern absent) - one log line per patched file with hook count 2. scripts/claude-mem-heal.ps1: - new `Repair-HooksJson` function (~20 LOC) - equivalent walk + substitution (.Replace() with literal string) - PSScriptAnalyzer + AST clean, ASCII-only 3. tests/setup-linux.bats: 3 new parity asserts - both scripts define the new function - both contain the literal `break; }; done` -> `head -n1` substitution - both walk hooks.json AND plugin/hooks/hooks.json paths - both reference BUG-017 + claude-mem#2607 Empirical (2026-05-21 user's Windows daily-driver): - First run: 14 hook commands patched across 2 files (7 cache + 7 marketplace-via-junction, since BUG-012's `thedotmack` junction aliases to `thedotmack-claude-mem/plugin`) - Second run: silent (idempotent -- no broken pattern left to detect) - Post-patch grep: `break; }; done` count -> 0; `head -n1` count -> 7 per file Spec at specs/BUG-017-claude-mem-heal-hooks-json-race/. 52 LOC of production diff (at threshold) + spec. Lesson (post-merge): "When a bug class spans multiple surfaces of an upstream system, the heal must patch ALL surfaces in the same PR. BUG-016 deferred hooks.json; BUG-017 was needed minutes later because the same user hit the same race on a different surface. Pre-emptively walk all known affected surfaces rather than waiting for the second user report." Pairs with: - BUG-016 (PR #83) -- same pattern fix applied to .mcp.json - BUG-015 (PR #81) -- detection layer surfacing when path resolution fails - Upstream issue thedotmack/claude-mem#2607 -- where this Option A fix is what we recommend for upstream merge.
CI test 471 failed because the grep pattern for `hooks\hooks.json` had 4 backslashes (over-escaped in single-quoted bash). Reduced to 2 backslashes -- the correct count for matching a single literal backslash in grep BRE. Bash single-quoted: 'hooks\\hooks\.json' -> 4-char literal `\\` -> grep matches `\` (wrong) 'hooks\hooks\.json' -> 2-char literal `\` -> grep matches `\` (correct) No production code change.
5 tasks
mlorentedev
added a commit
that referenced
this pull request
May 21, 2026
…minators (cross-OS) User hit a SECOND blocker minutes after BUG-017 (PR #84) closed the EPIPE race -- and then a THIRD when the Stop hook also failed with the same "No stderr output" symptom in a 9-loop iteration. Original BUG-018 narrow scope (only UserPromptSubmit / session-init) was insufficient: ALL 5 claude-mem hooks that terminate with `node ... hook claude-code <event>"` lack the {"continue":true} directive that Claude Code requires after BUG-017's race fix removed the prior EPIPE-induced false-block: - SessionStart context -> `hook claude-code context` - UserPromptSubmit -> `hook claude-code session-init` - PostToolUse -> `hook claude-code observation` - PreToolUse -> `hook claude-code file-context` - Stop -> `hook claude-code summarize` The 6th hook (Setup, `node "$_P/scripts/version-check.js"`) is left untouched -- it fires only on plugin install/update, not the user hot path. Changes: 1. scripts/claude-mem-heal.sh::heal_hooks_json: sed substitution uses regex capture `\([a-z][a-z-]*\)` to match any `hook claude-code <X>"` terminator and append the directive in a single pass. 2. scripts/claude-mem-heal.ps1::Repair-HooksJson: PowerShell `-replace` with the equivalent regex; reports the count of hooks transformed. 3. tests/setup-linux.bats: 1 new parity assert covering the regex-based substitution + continue directive + BUG-018 reference in both heal scripts. Empirical (2026-05-21 user's Windows): - After BUG-017 merged: UserPromptSubmit failed (No stderr output) - After narrow BUG-018 manual patch: ping/pong worked, but Stop hook failed 9 times in a row (Claude Code's CLAUDE_CODE_STOP_HOOK_BLOCK_CAP forced override) - After regex-based patch (this commit) applied locally: all 5 hooks now end with the directive; subsequent prompts complete without loop. This PR persists the fix across `/plugin update` upstream reverts. Anti-scope: Setup hook (version-check.js) terminator left as-is. Future BUG-018b can extend if user encounters that hook in practice. Companion: - BUG-017 (PR #84 merged) -- EPIPE race prerequisite - upstream issue thedotmack/claude-mem#2607 -- root cause + 3 fix options for the cascade pipe; this PR's continue-directive append is independent layer addressing claude-mem#2188 (empty-stdin / hook protocol mismatch).
mlorentedev
added a commit
that referenced
this pull request
May 21, 2026
…ss-OS) (#85) * fix(BUG-018): append continue directive to ALL 5 hook claude-code terminators (cross-OS) User hit a SECOND blocker minutes after BUG-017 (PR #84) closed the EPIPE race -- and then a THIRD when the Stop hook also failed with the same "No stderr output" symptom in a 9-loop iteration. Original BUG-018 narrow scope (only UserPromptSubmit / session-init) was insufficient: ALL 5 claude-mem hooks that terminate with `node ... hook claude-code <event>"` lack the {"continue":true} directive that Claude Code requires after BUG-017's race fix removed the prior EPIPE-induced false-block: - SessionStart context -> `hook claude-code context` - UserPromptSubmit -> `hook claude-code session-init` - PostToolUse -> `hook claude-code observation` - PreToolUse -> `hook claude-code file-context` - Stop -> `hook claude-code summarize` The 6th hook (Setup, `node "$_P/scripts/version-check.js"`) is left untouched -- it fires only on plugin install/update, not the user hot path. Changes: 1. scripts/claude-mem-heal.sh::heal_hooks_json: sed substitution uses regex capture `\([a-z][a-z-]*\)` to match any `hook claude-code <X>"` terminator and append the directive in a single pass. 2. scripts/claude-mem-heal.ps1::Repair-HooksJson: PowerShell `-replace` with the equivalent regex; reports the count of hooks transformed. 3. tests/setup-linux.bats: 1 new parity assert covering the regex-based substitution + continue directive + BUG-018 reference in both heal scripts. Empirical (2026-05-21 user's Windows): - After BUG-017 merged: UserPromptSubmit failed (No stderr output) - After narrow BUG-018 manual patch: ping/pong worked, but Stop hook failed 9 times in a row (Claude Code's CLAUDE_CODE_STOP_HOOK_BLOCK_CAP forced override) - After regex-based patch (this commit) applied locally: all 5 hooks now end with the directive; subsequent prompts complete without loop. This PR persists the fix across `/plugin update` upstream reverts. Anti-scope: Setup hook (version-check.js) terminator left as-is. Future BUG-018b can extend if user encounters that hook in practice. Companion: - BUG-017 (PR #84 merged) -- EPIPE race prerequisite - upstream issue thedotmack/claude-mem#2607 -- root cause + 3 fix options for the cascade pipe; this PR's continue-directive append is independent layer addressing claude-mem#2188 (empty-stdin / hook protocol mismatch). * docs(BUG-018): scaffold spec to satisfy spec-gate (>50 LOC production) PR #85 had 69 LOC of production diff (heal scripts) which exceeds the 50-LOC spec-gate threshold. CI failed on spec-gate. Add proposal.md, tasks.md, verification.md describing the regex-based BUG-018 fix that patches all 5 `hook claude-code <X>` terminators. No code change; spec-only commit.
This was referenced May 21, 2026
Merged
mlorentedev
added a commit
that referenced
this pull request
May 21, 2026
Move from specs/ to specs/archive/ per SDD lifecycle close (the folder move IS the archive marker; status: archived frontmatter update deferred to per-spec follow-up if needed). This session shipped (today, 2026-05-21): - AI-014-opencode-windows-bootstrap (PR #78) - BUG-014-claude-mem-marketplace-register (PR #75) - BUG-016-claude-mem-heal-v13-refresh (PR #83) - BUG-017-claude-mem-heal-hooks-json-race (PR #84) - BUG-018-userpromptsubmit-continue-directive (PR #85) - REFACTOR-003-diff-check-ps1 (PR #82) Catch-up archive (merged earlier weeks but specs/ folder lingered): - BUG-007-remove-github-plugin-broken (PR #65, 2026-05-19) - BUG-011-mcp-loop-claude-json-guard (PR #69, 2026-05-20) - BUG-012-claude-mem-marketplace-junction (PR #70, 2026-05-20) - SDD-005-github-copilot-instructions-sync (PR #62, 2026-05-19) - SDD-006-vault-integrity-check (PR #63, 2026-05-19) Active specs remaining in specs/ (not yet merged): - REFACTOR-002-paths-in-env-contract (queued, still draft) - WIN-002-windows-smoke-sweep (partial closure via PR #73, full clean-VM sweep still open) 33 file moves total (3 files per spec × 11 specs). Zero content change.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
User hit `UserPromptSubmit operation blocked by hook: printf: write error: Permission denied` on the hive project minutes after BUG-016 (PR #83) merged. BUG-016 closed the same EPIPE race for `.mcp.json` but explicitly deferred `hooks.json` — the deferral was wrong: same root cause, same symptom class, just a different surface.
The upstream `plugin/hooks/hooks.json` ships 6 hooks (Setup, SessionStart x2, UserPromptSubmit, PostToolUse, PreToolUse, Stop) all using the same broken cascade-pipe pattern. When the consumer breaks early, unconsumed producer writes EPIPE on Git Bash Windows.
Empirical (this branch, user's Windows daily-driver)
```
PS> pwsh -NoProfile -File scripts/claude-mem-heal.ps1 -VerboseOutput
[claude-mem-heal] patched hooks.json (BUG-017, 7 hook(s) -> head -n1 race-free form): .../cache/13.3.0/hooks/hooks.json
[claude-mem-heal] patched hooks.json (BUG-017, 7 hook(s) -> head -n1 race-free form): .../marketplaces/thedotmack/plugin/hooks/hooks.json
[claude-mem-heal] hooks.json already healthy: .../thedotmack-claude-mem/plugin/hooks/hooks.json # via junction
```
14 hook commands patched. Post-patch: `grep -c 'break; }; done'` → 0; `grep -c 'head -n1'` → 7 per file. Re-run silent (idempotent).
Approach
Minimal literal substitution: `break; }; done` → `}; done | head -n1`. Preserves each of the 6 hooks' command tail bit-for-bit. The loop no longer breaks early; producers no longer EPIPE; `head -n1` consumes the first match.
Changes
Spec at `specs/BUG-017-claude-mem-heal-hooks-json-race/`. 52 LOC production diff (at spec-gate threshold).
Test plan
Lesson (post-merge)
"When a bug class spans multiple surfaces of an upstream system, the heal must patch ALL surfaces in the same PR. BUG-016 deferred hooks.json; BUG-017 was needed minutes later because the same user hit the same race on a different surface. Pre-emptively walk all known affected surfaces rather than waiting for the second user report."
Companion PRs / issues