Skip to content

fix(BUG-017): extend claude-mem-heal to patch hooks.json EPIPE race (cross-OS)#84

Merged
mlorentedev merged 2 commits into
mainfrom
fix/BUG-017-claude-mem-heal-hooks-json-race
May 21, 2026
Merged

fix(BUG-017): extend claude-mem-heal to patch hooks.json EPIPE race (cross-OS)#84
mlorentedev merged 2 commits into
mainfrom
fix/BUG-017-claude-mem-heal-hooks-json-race

Conversation

@mlorentedev
Copy link
Copy Markdown
Owner

@mlorentedev mlorentedev commented May 21, 2026

Summary

User hit `UserPromptSubmit operation blocked by hook: printf: write error: Permission denied` on the hive project minutes after BUG-016 (PR #83) merged. BUG-016 closed the same EPIPE race for `.mcp.json` but explicitly deferred `hooks.json` — the deferral was wrong: same root cause, same symptom class, just a different surface.

The upstream `plugin/hooks/hooks.json` ships 6 hooks (Setup, SessionStart x2, UserPromptSubmit, PostToolUse, PreToolUse, Stop) all using the same broken cascade-pipe pattern. When the consumer breaks early, unconsumed producer writes EPIPE on Git Bash Windows.

Empirical (this branch, user's Windows daily-driver)

```
PS> pwsh -NoProfile -File scripts/claude-mem-heal.ps1 -VerboseOutput
[claude-mem-heal] patched hooks.json (BUG-017, 7 hook(s) -> head -n1 race-free form): .../cache/13.3.0/hooks/hooks.json
[claude-mem-heal] patched hooks.json (BUG-017, 7 hook(s) -> head -n1 race-free form): .../marketplaces/thedotmack/plugin/hooks/hooks.json
[claude-mem-heal] hooks.json already healthy: .../thedotmack-claude-mem/plugin/hooks/hooks.json # via junction
```

14 hook commands patched. Post-patch: `grep -c 'break; }; done'` → 0; `grep -c 'head -n1'` → 7 per file. Re-run silent (idempotent).

Approach

Minimal literal substitution: `break; }; done` → `}; done | head -n1`. Preserves each of the 6 hooks' command tail bit-for-bit. The loop no longer breaks early; producers no longer EPIPE; `head -n1` consumes the first match.

Changes

  1. `scripts/claude-mem-heal.sh::heal_hooks_json` (~12 LOC) — sed-based substitution.
  2. `scripts/claude-mem-heal.ps1::Repair-HooksJson` (~20 LOC) — `.Replace()` literal substitution.
  3. Both walk `/hooks/hooks.json` (cache layout) AND `/plugin/hooks/hooks.json` (marketplace layout).
  4. `tests/setup-linux.bats` — 3 new cross-OS parity asserts.

Spec at `specs/BUG-017-claude-mem-heal-hooks-json-race/`. 52 LOC production diff (at spec-gate threshold).

Test plan

  • `bash -n scripts/claude-mem-heal.sh` → OK
  • PowerShell AST + PSScriptAnalyzer clean on `.ps1`
  • ASCII-only check passes
  • Empirical heal + idempotence (above)
  • CI green
  • User retries hive project session → no UserPromptSubmit hook fail (the original repro)

Lesson (post-merge)

"When a bug class spans multiple surfaces of an upstream system, the heal must patch ALL surfaces in the same PR. BUG-016 deferred hooks.json; BUG-017 was needed minutes later because the same user hit the same race on a different surface. Pre-emptively walk all known affected surfaces rather than waiting for the second user report."

Companion PRs / issues

…cross-OS)

User encountered `UserPromptSubmit operation blocked by hook: printf: write
error: Permission denied` on the hive project minutes after BUG-016 (PR #83)
merged. BUG-016 closed the same EPIPE race for `.mcp.json` but explicitly
deferred `hooks.json` -- the deferral was wrong: same root cause, same
symptom class, just a different surface.

The upstream `plugin/hooks/hooks.json` ships 6 hooks (Setup, SessionStart x2,
UserPromptSubmit, PostToolUse, PreToolUse, Stop) all using the same broken
cascade-pipe pattern. When the consumer breaks early, unconsumed producer
writes EPIPE on Git Bash Windows.

Changes:

1. scripts/claude-mem-heal.sh:
   - new `heal_hooks_json` function (~12 LOC)
   - walks both `<dir>/hooks/hooks.json` (cache layout, no `plugin/` subdir)
     and `<dir>/plugin/hooks/hooks.json` (marketplace layout)
   - minimal substitution via sed: `break; }; done` -> `}; done | head -n1`
   - idempotent (skips when broken pattern absent)
   - one log line per patched file with hook count

2. scripts/claude-mem-heal.ps1:
   - new `Repair-HooksJson` function (~20 LOC)
   - equivalent walk + substitution (.Replace() with literal string)
   - PSScriptAnalyzer + AST clean, ASCII-only

3. tests/setup-linux.bats: 3 new parity asserts
   - both scripts define the new function
   - both contain the literal `break; }; done` -> `head -n1` substitution
   - both walk hooks.json AND plugin/hooks/hooks.json paths
   - both reference BUG-017 + claude-mem#2607

Empirical (2026-05-21 user's Windows daily-driver):
- First run: 14 hook commands patched across 2 files (7 cache + 7
  marketplace-via-junction, since BUG-012's `thedotmack` junction
  aliases to `thedotmack-claude-mem/plugin`)
- Second run: silent (idempotent -- no broken pattern left to detect)
- Post-patch grep: `break; }; done` count -> 0; `head -n1` count -> 7
  per file

Spec at specs/BUG-017-claude-mem-heal-hooks-json-race/. 52 LOC of
production diff (at threshold) + spec.

Lesson (post-merge):
"When a bug class spans multiple surfaces of an upstream system, the heal
must patch ALL surfaces in the same PR. BUG-016 deferred hooks.json;
BUG-017 was needed minutes later because the same user hit the same race
on a different surface. Pre-emptively walk all known affected surfaces
rather than waiting for the second user report."

Pairs with:
- BUG-016 (PR #83) -- same pattern fix applied to .mcp.json
- BUG-015 (PR #81) -- detection layer surfacing when path resolution fails
- Upstream issue thedotmack/claude-mem#2607 -- where this Option A fix is
  what we recommend for upstream merge.
CI test 471 failed because the grep pattern for `hooks\hooks.json` had
4 backslashes (over-escaped in single-quoted bash). Reduced to 2
backslashes -- the correct count for matching a single literal
backslash in grep BRE.

Bash single-quoted:
  'hooks\\hooks\.json'  -> 4-char literal `\\` -> grep matches `\` (wrong)
  'hooks\hooks\.json'    -> 2-char literal `\`   -> grep matches `\` (correct)

No production code change.
@mlorentedev mlorentedev merged commit 3008cb9 into main May 21, 2026
6 checks passed
@mlorentedev mlorentedev deleted the fix/BUG-017-claude-mem-heal-hooks-json-race branch May 21, 2026 19:05
mlorentedev added a commit that referenced this pull request May 21, 2026
…minators (cross-OS)

User hit a SECOND blocker minutes after BUG-017 (PR #84) closed the EPIPE
race -- and then a THIRD when the Stop hook also failed with the same
"No stderr output" symptom in a 9-loop iteration.

Original BUG-018 narrow scope (only UserPromptSubmit / session-init)
was insufficient: ALL 5 claude-mem hooks that terminate with
`node ... hook claude-code <event>"` lack the {"continue":true} directive
that Claude Code requires after BUG-017's race fix removed the prior
EPIPE-induced false-block:

  - SessionStart context  -> `hook claude-code context`
  - UserPromptSubmit       -> `hook claude-code session-init`
  - PostToolUse            -> `hook claude-code observation`
  - PreToolUse             -> `hook claude-code file-context`
  - Stop                   -> `hook claude-code summarize`

The 6th hook (Setup, `node "$_P/scripts/version-check.js"`) is left
untouched -- it fires only on plugin install/update, not the user hot
path.

Changes:

1. scripts/claude-mem-heal.sh::heal_hooks_json: sed substitution uses
   regex capture `\([a-z][a-z-]*\)` to match any `hook claude-code <X>"`
   terminator and append the directive in a single pass.

2. scripts/claude-mem-heal.ps1::Repair-HooksJson: PowerShell `-replace`
   with the equivalent regex; reports the count of hooks transformed.

3. tests/setup-linux.bats: 1 new parity assert covering the regex-based
   substitution + continue directive + BUG-018 reference in both heal
   scripts.

Empirical (2026-05-21 user's Windows):
- After BUG-017 merged: UserPromptSubmit failed (No stderr output)
- After narrow BUG-018 manual patch: ping/pong worked, but Stop hook
  failed 9 times in a row (Claude Code's CLAUDE_CODE_STOP_HOOK_BLOCK_CAP
  forced override)
- After regex-based patch (this commit) applied locally: all 5 hooks
  now end with the directive; subsequent prompts complete without loop.

This PR persists the fix across `/plugin update` upstream reverts.

Anti-scope: Setup hook (version-check.js) terminator left as-is. Future
BUG-018b can extend if user encounters that hook in practice.

Companion:
  - BUG-017 (PR #84 merged) -- EPIPE race prerequisite
  - upstream issue thedotmack/claude-mem#2607 -- root cause + 3 fix
    options for the cascade pipe; this PR's continue-directive append
    is independent layer addressing claude-mem#2188 (empty-stdin /
    hook protocol mismatch).
mlorentedev added a commit that referenced this pull request May 21, 2026
…ss-OS) (#85)

* fix(BUG-018): append continue directive to ALL 5 hook claude-code terminators (cross-OS)

User hit a SECOND blocker minutes after BUG-017 (PR #84) closed the EPIPE
race -- and then a THIRD when the Stop hook also failed with the same
"No stderr output" symptom in a 9-loop iteration.

Original BUG-018 narrow scope (only UserPromptSubmit / session-init)
was insufficient: ALL 5 claude-mem hooks that terminate with
`node ... hook claude-code <event>"` lack the {"continue":true} directive
that Claude Code requires after BUG-017's race fix removed the prior
EPIPE-induced false-block:

  - SessionStart context  -> `hook claude-code context`
  - UserPromptSubmit       -> `hook claude-code session-init`
  - PostToolUse            -> `hook claude-code observation`
  - PreToolUse             -> `hook claude-code file-context`
  - Stop                   -> `hook claude-code summarize`

The 6th hook (Setup, `node "$_P/scripts/version-check.js"`) is left
untouched -- it fires only on plugin install/update, not the user hot
path.

Changes:

1. scripts/claude-mem-heal.sh::heal_hooks_json: sed substitution uses
   regex capture `\([a-z][a-z-]*\)` to match any `hook claude-code <X>"`
   terminator and append the directive in a single pass.

2. scripts/claude-mem-heal.ps1::Repair-HooksJson: PowerShell `-replace`
   with the equivalent regex; reports the count of hooks transformed.

3. tests/setup-linux.bats: 1 new parity assert covering the regex-based
   substitution + continue directive + BUG-018 reference in both heal
   scripts.

Empirical (2026-05-21 user's Windows):
- After BUG-017 merged: UserPromptSubmit failed (No stderr output)
- After narrow BUG-018 manual patch: ping/pong worked, but Stop hook
  failed 9 times in a row (Claude Code's CLAUDE_CODE_STOP_HOOK_BLOCK_CAP
  forced override)
- After regex-based patch (this commit) applied locally: all 5 hooks
  now end with the directive; subsequent prompts complete without loop.

This PR persists the fix across `/plugin update` upstream reverts.

Anti-scope: Setup hook (version-check.js) terminator left as-is. Future
BUG-018b can extend if user encounters that hook in practice.

Companion:
  - BUG-017 (PR #84 merged) -- EPIPE race prerequisite
  - upstream issue thedotmack/claude-mem#2607 -- root cause + 3 fix
    options for the cascade pipe; this PR's continue-directive append
    is independent layer addressing claude-mem#2188 (empty-stdin /
    hook protocol mismatch).

* docs(BUG-018): scaffold spec to satisfy spec-gate (>50 LOC production)

PR #85 had 69 LOC of production diff (heal scripts) which exceeds the
50-LOC spec-gate threshold. CI failed on spec-gate. Add proposal.md,
tasks.md, verification.md describing the regex-based BUG-018 fix that
patches all 5 `hook claude-code <X>` terminators.

No code change; spec-only commit.
mlorentedev added a commit that referenced this pull request May 21, 2026
Move from specs/ to specs/archive/ per SDD lifecycle close (the
folder move IS the archive marker; status: archived frontmatter
update deferred to per-spec follow-up if needed).

This session shipped (today, 2026-05-21):
  - AI-014-opencode-windows-bootstrap (PR #78)
  - BUG-014-claude-mem-marketplace-register (PR #75)
  - BUG-016-claude-mem-heal-v13-refresh (PR #83)
  - BUG-017-claude-mem-heal-hooks-json-race (PR #84)
  - BUG-018-userpromptsubmit-continue-directive (PR #85)
  - REFACTOR-003-diff-check-ps1 (PR #82)

Catch-up archive (merged earlier weeks but specs/ folder lingered):
  - BUG-007-remove-github-plugin-broken (PR #65, 2026-05-19)
  - BUG-011-mcp-loop-claude-json-guard (PR #69, 2026-05-20)
  - BUG-012-claude-mem-marketplace-junction (PR #70, 2026-05-20)
  - SDD-005-github-copilot-instructions-sync (PR #62, 2026-05-19)
  - SDD-006-vault-integrity-check (PR #63, 2026-05-19)

Active specs remaining in specs/ (not yet merged):
  - REFACTOR-002-paths-in-env-contract (queued, still draft)
  - WIN-002-windows-smoke-sweep (partial closure via PR #73, full
    clean-VM sweep still open)

33 file moves total (3 files per spec × 11 specs). Zero content change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant