Skip to content

fix(BUG-016): refresh claude-mem-heal to detect + patch v13.x cascade pattern (cross-OS)#83

Merged
mlorentedev merged 1 commit into
mainfrom
fix/BUG-016-claude-mem-heal-v13-refresh
May 21, 2026
Merged

fix(BUG-016): refresh claude-mem-heal to detect + patch v13.x cascade pattern (cross-OS)#83
mlorentedev merged 1 commit into
mainfrom
fix/BUG-016-claude-mem-heal-v13-refresh

Conversation

@mlorentedev
Copy link
Copy Markdown
Owner

@mlorentedev mlorentedev commented May 21, 2026

Summary

User encountered `/mcp Failed to reconnect to plugin:claude-mem:mcp-search: -32000` in this session AFTER BUG-014 (PR #75) restored the install AND BUG-015 (PR #81) shipped the detection layer. Root cause: `claude-mem-heal` silently no-ops against v13.3.0 installs.

`Repair-McpJson` was authored in PR #57 against v12.7.4's broken pattern (`${_R%/}` literal). v13.0.0+ ships a different broken pattern — `sh -c` with cascading-printf pipe that triggers the EPIPE race documented in thedotmack/claude-mem#2607. The original detection regex (`${_R%/}`) doesn't match v13.x → heal exits silent → `.mcp.json` stays broken → MCP server fails to start → -32000.

Empirical validation (this branch, user's Windows daily-driver)

```
PS> pwsh -NoProfile -File scripts/claude-mem-heal.ps1 -VerboseOutput
[claude-mem-heal] patched .mcp.json (v13.x cascade -> head -n1 race-free form): C:\Users\Manu.claude\plugins\cache\thedotmack\claude-mem\13.3.0.mcp.json
[claude-mem-heal] zod present in C:\Users\Manu.claude\plugins\cache\thedotmack\claude-mem\13.3.0
[claude-mem-heal] legacy marketplace path already present: ...
[claude-mem-heal] patched .mcp.json (v13.x cascade -> head -n1 race-free form): ...\marketplaces\thedotmack\plugin.mcp.json
[claude-mem-heal] patched .mcp.json (v13.x cascade -> head -n1 race-free form): ...\marketplaces\thedotmack-claude-mem\plugin.mcp.json
$LASTEXITCODE = 0
```

3 `.mcp.json` files patched on first run. Second run silent (idempotent).

Changes

  1. `scripts/claude-mem-heal.sh::heal_mcp_json`:

    • Detection: OR-of v12.7.4 `${_R%/}` + v13.x signature (`"sh".*"-c"` OR `while IFS= read`).
    • Replacement: canonical race-free template using `done | head -n1` instead of `done` with inner `break` (Option A from issue #2607 applied locally).
    • Log message distinguishes which signature was patched.
  2. `scripts/claude-mem-heal.ps1::Repair-McpJson`: equivalent on Windows (PSScriptAnalyzer + AST clean, ASCII-only).

  3. `tests/setup-linux.bats`: 3 new cross-OS parity asserts (v13.x signature detection, `head -n1` template, BUG-016 + #2607 references).

Spec at `specs/BUG-016-claude-mem-heal-v13-refresh/` (proposal + tasks + verification).

Diff

```
scripts/claude-mem-heal.ps1 | 34 ++++++++++++++++++++++++++++------
scripts/claude-mem-heal.sh | 29 +++++++++++++++++++++++------
tests/setup-linux.bats | 28 ++++++++++++++++++++++++++++

  • specs/BUG-016-claude-mem-heal-v13-refresh/ (3 files)
    ```

Test plan

  • `bash -n scripts/claude-mem-heal.sh` → OK
  • PowerShell AST + PSScriptAnalyzer clean on `claude-mem-heal.ps1`
  • ASCII-only check passes
  • Empirical patch + idempotence (above)
  • CI green
  • Post-merge: `/mcp` reconnects without `-32000` on user's machine (heal already applied locally during validation, but CI confirms repo-side changes ship)

Out of scope

  • `hooks.json` patches. The same pipe-race pattern is in 6 hooks but has different command tails. Real fix is upstream (#2607 Option A); BUG-015 ships the detection. A future BUG-017 could mirror the `head -n1` patch for hooks.json if upstream stays unfixed.
  • Upstream version pinning. Heal must work against any v12.x or v13.x install.
  • Reverting to v10.6.3 simple form. That form relies exclusively on `CLAUDE_PLUGIN_ROOT` being set by Claude Code. Cascade form is more robust empirically.

Lesson candidate

"Heal scripts must be versioned against the upstream bug class they paper over; when upstream's bug pattern changes, the heal's detection regex MUST be refreshed in the same PR that discovers the new pattern. Else the heal silently no-ops while users continue hitting the bug."

Companion PRs / issues

… pattern (cross-OS)

User encountered `/mcp Failed to reconnect to plugin:claude-mem:mcp-search: -32000`
in this session AFTER BUG-014 (PR #75) restored the install AND BUG-015
(PR #81) shipped the detection layer. Root cause traced to claude-mem-heal
silently no-oping against v13.3.0 installs.

scripts/claude-mem-heal.{sh,ps1}::Repair-McpJson was authored in PR #57
against v12.7.4's broken pattern (literal `${_R%/}`). v13.0.0+ ships a
different broken pattern: `sh -c` with cascading-printf pipe that
triggers the same EPIPE race documented in
thedotmack/claude-mem#2607. The original detection regex (`${_R%/}`)
doesn't match v13.x at all -> heal exits silent, leaving the .mcp.json
broken -> MCP server fails to start -> -32000.

Changes:

1. scripts/claude-mem-heal.sh::heal_mcp_json:
   - Detection extended: triggers on EITHER v12.7.4 `${_R%/}` OR
     v13.x signature (`"sh".*"-c"` OR `while IFS= read`).
   - Replacement template: canonical race-free form using
     `done | head -n1` instead of `done` with inner `break`.
     This consumes the entire producer pipe (no leftover writes ->
     no EPIPE -- Option A from upstream issue #2607 applied locally).
   - Log message distinguishes which signature was patched.

2. scripts/claude-mem-heal.ps1::Repair-McpJson: equivalent on Windows.

3. tests/setup-linux.bats: 3 new asserts:
   - both heal scripts grep `while IFS=` (v13.x signature)
   - both heal scripts contain `head -n1` (race-free template)
   - both reference BUG-016 + claude-mem#2607

Empirical validation on user's daily-driver Windows during implementation:
3 affected .mcp.json files (cache 13.3.0 + marketplace junction +
marketplace canonical) patched on first heal run (exit 0). Second run
silent (idempotent -- no v12/v13 signature left to detect).

Spec at specs/BUG-016-claude-mem-heal-v13-refresh/.

Diff: +63 prod (heal scripts) + 28 test = 91 LOC. Production diff at
spec-gate threshold; spec folder ships per discipline.

Lesson candidate (post-merge): "heal scripts must be versioned against
the upstream bug class they paper over; when upstream's bug pattern
changes, the heal's detection regex MUST be refreshed in the same PR
that discovers the new pattern."

Companion to BUG-015 (PR #81 detection layer) and BUG-012 (PR #70
junction fix). Pairs with upstream issue thedotmack/claude-mem#2607
where Option A `head -n1` is recommended as the canonical upstream fix
for the same pipe race.
@mlorentedev mlorentedev merged commit 1ff247a into main May 21, 2026
7 checks passed
@mlorentedev mlorentedev deleted the fix/BUG-016-claude-mem-heal-v13-refresh branch May 21, 2026 18:50
mlorentedev added a commit that referenced this pull request May 21, 2026
…cross-OS) (#84)

* fix(BUG-017): extend claude-mem-heal to patch hooks.json EPIPE race (cross-OS)

User encountered `UserPromptSubmit operation blocked by hook: printf: write
error: Permission denied` on the hive project minutes after BUG-016 (PR #83)
merged. BUG-016 closed the same EPIPE race for `.mcp.json` but explicitly
deferred `hooks.json` -- the deferral was wrong: same root cause, same
symptom class, just a different surface.

The upstream `plugin/hooks/hooks.json` ships 6 hooks (Setup, SessionStart x2,
UserPromptSubmit, PostToolUse, PreToolUse, Stop) all using the same broken
cascade-pipe pattern. When the consumer breaks early, unconsumed producer
writes EPIPE on Git Bash Windows.

Changes:

1. scripts/claude-mem-heal.sh:
   - new `heal_hooks_json` function (~12 LOC)
   - walks both `<dir>/hooks/hooks.json` (cache layout, no `plugin/` subdir)
     and `<dir>/plugin/hooks/hooks.json` (marketplace layout)
   - minimal substitution via sed: `break; }; done` -> `}; done | head -n1`
   - idempotent (skips when broken pattern absent)
   - one log line per patched file with hook count

2. scripts/claude-mem-heal.ps1:
   - new `Repair-HooksJson` function (~20 LOC)
   - equivalent walk + substitution (.Replace() with literal string)
   - PSScriptAnalyzer + AST clean, ASCII-only

3. tests/setup-linux.bats: 3 new parity asserts
   - both scripts define the new function
   - both contain the literal `break; }; done` -> `head -n1` substitution
   - both walk hooks.json AND plugin/hooks/hooks.json paths
   - both reference BUG-017 + claude-mem#2607

Empirical (2026-05-21 user's Windows daily-driver):
- First run: 14 hook commands patched across 2 files (7 cache + 7
  marketplace-via-junction, since BUG-012's `thedotmack` junction
  aliases to `thedotmack-claude-mem/plugin`)
- Second run: silent (idempotent -- no broken pattern left to detect)
- Post-patch grep: `break; }; done` count -> 0; `head -n1` count -> 7
  per file

Spec at specs/BUG-017-claude-mem-heal-hooks-json-race/. 52 LOC of
production diff (at threshold) + spec.

Lesson (post-merge):
"When a bug class spans multiple surfaces of an upstream system, the heal
must patch ALL surfaces in the same PR. BUG-016 deferred hooks.json;
BUG-017 was needed minutes later because the same user hit the same race
on a different surface. Pre-emptively walk all known affected surfaces
rather than waiting for the second user report."

Pairs with:
- BUG-016 (PR #83) -- same pattern fix applied to .mcp.json
- BUG-015 (PR #81) -- detection layer surfacing when path resolution fails
- Upstream issue thedotmack/claude-mem#2607 -- where this Option A fix is
  what we recommend for upstream merge.

* fix(BUG-017): correct backslash count in bats grep pattern

CI test 471 failed because the grep pattern for `hooks\hooks.json` had
4 backslashes (over-escaped in single-quoted bash). Reduced to 2
backslashes -- the correct count for matching a single literal
backslash in grep BRE.

Bash single-quoted:
  'hooks\\hooks\.json'  -> 4-char literal `\\` -> grep matches `\` (wrong)
  'hooks\hooks\.json'    -> 2-char literal `\`   -> grep matches `\` (correct)

No production code change.
mlorentedev added a commit that referenced this pull request May 21, 2026
Move from specs/ to specs/archive/ per SDD lifecycle close (the
folder move IS the archive marker; status: archived frontmatter
update deferred to per-spec follow-up if needed).

This session shipped (today, 2026-05-21):
  - AI-014-opencode-windows-bootstrap (PR #78)
  - BUG-014-claude-mem-marketplace-register (PR #75)
  - BUG-016-claude-mem-heal-v13-refresh (PR #83)
  - BUG-017-claude-mem-heal-hooks-json-race (PR #84)
  - BUG-018-userpromptsubmit-continue-directive (PR #85)
  - REFACTOR-003-diff-check-ps1 (PR #82)

Catch-up archive (merged earlier weeks but specs/ folder lingered):
  - BUG-007-remove-github-plugin-broken (PR #65, 2026-05-19)
  - BUG-011-mcp-loop-claude-json-guard (PR #69, 2026-05-20)
  - BUG-012-claude-mem-marketplace-junction (PR #70, 2026-05-20)
  - SDD-005-github-copilot-instructions-sync (PR #62, 2026-05-19)
  - SDD-006-vault-integrity-check (PR #63, 2026-05-19)

Active specs remaining in specs/ (not yet merged):
  - REFACTOR-002-paths-in-env-contract (queued, still draft)
  - WIN-002-windows-smoke-sweep (partial closure via PR #73, full
    clean-VM sweep still open)

33 file moves total (3 files per spec × 11 specs). Zero content change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant