Skip to content

fix(cache): invalidate schema cache on contract change (#58)#59

Merged
rodaddy merged 3 commits into
mainfrom
fix/58-schema-cache-staleness
Jun 28, 2026
Merged

fix(cache): invalidate schema cache on contract change (#58)#59
rodaddy merged 3 commits into
mainfrom
fix/58-schema-cache-staleness

Conversation

@rodaddy

@rodaddy rodaddy commented Jun 28, 2026

Copy link
Copy Markdown
Owner

Problem

mcp2cli's per-credential/per-service schema cache only expired on a 24h TTL, so after an upstream contract bump (e.g. Open Brain memory-tools.v11 adding append_session_event.create_if_missing) mcp2cli schema and generate-skills served the old tool shape for up to a day — even though live tool calls already worked against the new contract. Fixes #58.

What this lands (cache-coherence core — general, capability-tiered)

  • Schema fingerprint in cache metadata. writeCache records a deterministic schemaFingerprint over the full tool surface (incl. inputSchema, so a new field like create_if_missing changes it). Accepts an authoritative contract hash when a server publishes one (e.g. OB's get_contract.schema_hash); otherwise derives it from the tool surface. New fingerprintSchemas() helper — order-independent, change-sensitive.

  • Invalidate ALL keys for a service. clearServiceCacheKeys() clears the bare entry and every per-credential key (credential:<base64url([service,user])>), so a bump can't leave a credential-scoped read serving the old schema (the observed failure: client cache stale while another key looked refreshed).

  • Daemon self-heal + invalidation. The drift-hook (already runs on connect via the open connection — no self-call) now clears all cache keys for a service on detected drift, then repopulates the base entry.

  • Client-side coherence via piggyback. The daemon stamps its current per-service schemaFingerprint onto /list-tools and /schema responses; the client compares it to its own cached fingerprint and drops its stale ~/.cache entries on mismatch — no extra round-trip. Cold/absent cache is left alone (refetches naturally).

  • cache warm --force — clear-then-refetch, the documented recovery after a bump.

Deferred to a follow-up (noted on #58)

The cache warm daemon self-call deadlock (Daemon failed to start within 10000ms when warming a daemon-routed service) touches PR #52's "route schema refresh through daemon" decision and warrants its own focused PR. The drift-hook repopulation in this PR already routes around it (reuses the open connection), so the invalidation core is complete and deadlock-safe without it.

Acceptance status

  • ✅ After a bump, a stale client cache is dropped on the next daemon call (fingerprint mismatch) — no manual rm.
  • ✅ Invalidation covers the base entry and all credential:* keys.
  • ✅ Daemon absorbs the bump on connect and invalidates.
  • generate-skills regenerates from current schemas (drift-hook auto-regen path, unchanged).
  • ✅ No regression for servers without a contract hash (surface-hash fingerprint; TTL preserved).
  • cache warm succeeds on the CT216 daemon — follow-up (deadlock fix + [codex] route schema refresh through daemon #52 reconciliation).

Verification

  • bun run typecheck clean.
  • Full suite 1069 pass / 0 fail, deterministic across repeated runs.
  • New unit coverage: fingerprintSchemas, clearServiceCacheKeys (base + credential keys, unrelated services untouched), reconcileClientCache (mismatch / match / cold / no-fingerprint / error response), and cache warm --force.

Refs #55 (nested-field schema discoverability — worked together).

🤖 Generated with Claude Code

mcp2cli's per-credential/per-service schema cache only expired on a 24h TTL, so
after an upstream contract bump (e.g. Open Brain memory-tools.v11 adding
append_session_event.create_if_missing) `mcp2cli schema` and generate-skills
served the OLD tool shape for up to a day, even though live tool calls already
worked against the new contract.

This lands the cache-coherence core (general, capability-tiered):

- Schema fingerprint in cache metadata. writeCache now records a deterministic
  schemaFingerprint over the full tool surface (incl. inputSchema, so a new
  field like create_if_missing changes it). Accepts an authoritative contract
  hash when a server publishes one; otherwise derives it from the tool surface.
  New fingerprintSchemas() helper (order-independent, change-sensitive).

- Invalidate ALL keys for a service. clearServiceCacheKeys() clears the bare
  entry AND every per-credential key (credential:<base64url([service,user])>),
  so a bump can't leave a credential-scoped read serving the old schema.

- Daemon self-heal + invalidation. The drift-hook (already runs on connect via
  the open connection -- no self-call) now clears all cache keys for a service
  when it detects drift, then repopulates the base entry.

- Client-side coherence via piggyback. The daemon stamps its current
  schemaFingerprint (per-service) onto /list-tools and /schema responses; the
  client compares it to its own cached fingerprint and drops its stale cache on
  mismatch -- closing the staleness on the client's ~/.cache layer with no extra
  round-trip. Cold/absent cache is left alone (refetches naturally).

- cache warm --force: clear-then-refetch, the documented recovery after a bump.

Deferred to a follow-up (noted on #58): the `cache warm` daemon self-call
deadlock fix touches the #52 "route schema refresh through daemon" decision and
warrants its own PR; the drift-hook repopulation here already routes around it.

Verified: bun run typecheck clean; full suite 1069 pass / 0 fail, deterministic
across repeated runs. New unit coverage for fingerprintSchemas,
clearServiceCacheKeys (base + credential keys), reconcileClientCache (mismatch /
match / cold / no-fingerprint / error), and cache warm --force.

Refs #55 (nested-field schema discoverability -- worked together).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rodaddy

rodaddy commented Jun 28, 2026

Copy link
Copy Markdown
Owner Author

Review swarm: BLOCKER found — do not merge yet

Ran a 3-reviewer adversarial swarm on this PR. All three independently converged on the same root cause. The client-side piggyback coherence is broken as written and would destroy the cache it's meant to keep coherent.

BLOCKER — fingerprint comparison is across two different keys (never converges)

  • Daemon stamps readCacheFingerprint(poolKey) where poolKey is credential:<base64url([service,user])> for per-credential services (the documented Open Brain deployment).
  • Client compares against readCacheFingerprint(service) — the bare service name.
  • These are different files over different content → fingerprints never match → reconcileClientCache clears the client cache on every call after a cache warm. Confirmed empirically.

BLOCKER (B2) — divergent hash surface even on the bare key

Daemon writers hash description ?? ""; the client warm path hashes "(no description)" (already defaulted by getToolSchemaCached). Any no-description tool → divergent fingerprint → same clear-on-every-call symptom, no credentials required.

MAJOR — clearServiceCacheKeys over-counts on failed unlink

unlink(...).catch(()=>{}) then cleared++ unconditionally. A failed unlink leaves a stale credential cache file while logging keysCleared: N as success — silently re-opens #58.

MAJOR — cross-identity client cache thrash

Multiple identities sharing one client cache dir reconcile against the same bare key → each identity switch clears the other's cache. (Severity depends on whether identities share a cache dir in the real deployment.)

Why tests didn't catch it

tests/process/client-cache-coherence.test.ts hand-injects fingerprints and never exercises a real two-key / two-writer round-trip. Needs a test that populates a credential: key + a bare key from the actual write paths and asserts the daemon-stamped vs client-read fingerprints match for identical tools (they currently don't).

Fix plan

  1. Align the comparison key: client must reconcile against the same poolKey the daemon used (or daemon stamps the bare-service fingerprint, since the drift-hook maintains the bare key) — like-for-like.
  2. Canonicalize the description default inside hashToolSchema so every writer hashes one surface.
  3. clearServiceCacheKeys: only cleared++ on a resolved unlink; log failures.
  4. Add a real generate→write→stamp→reconcile round-trip regression test.

Holding merge until these land.

rodaddy and others added 2 commits June 28, 2026 18:14
…58 review)

Review-swarm blockers on PR #59 (all three reviewers converged):

- B1 (key mismatch, false-positive invalidation): the daemon stamped the
  per-credential pool-key fingerprint while the client compared its bare-service
  fingerprint -- different files, never converge -> the client cleared its cache
  on every call after a warm. The daemon now stamps readCacheFingerprint(
  body.service) (bare) on /list-tools and /schema. The drift-hook already
  maintains the bare entry on every connect (checkDriftOnConnect runs with
  baseServiceName), so both sides reference the same, credential-independent key.

- B2 (description-default divergence): the daemon hashed `description ?? ""`
  while the client warm path hashed the already-defaulted "(no description)",
  producing different fingerprints for the same tool. hashToolSchema now
  normalizes "" and "(no description)" to one canonical value, so every writer
  derives the same fingerprint (proven: daemon-style and client-style hashes are
  now identical; real descriptions stay distinct).

- M1 (silent under-delete): clearServiceCacheKeys incremented its count even when
  unlink was swallowed, reporting success while a stale credential entry
  survived -- re-opening the staleness. It now counts only resolved unlinks and
  logs failures.

- Added a real daemon/client fingerprint-convergence round-trip test that
  exercises both write surfaces; it would have caught B1/B2 (the prior tests
  hand-injected fingerprints and never compared two independently-derived ones).

Verified: bun run typecheck clean; full suite 1070 pass / 0 fail, deterministic.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… guard)

The re-review noted the B1 fix (daemon stamps the bare-service fingerprint, not
the credential pool key) had no regression guard -- a revert to the pool key
would ship silently. This adds a daemon /list-tools test that wires a per-user
credential so the pool key is genuinely a `credential:` key distinct from the
bare key, seeds the two with different fingerprints, and asserts the response
carries the BARE one. Mutation-verified: reverting the stamp to listPoolKey
fails this test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rodaddy

rodaddy commented Jun 28, 2026

Copy link
Copy Markdown
Owner Author

Re-review: blockers fixed, verified SAFE TO MERGE

Adversarial re-verification (same reviewer) on the fixes:

  • B1 (key mismatch): FIXED — daemon now stamps readCacheFingerprint(body.service) (bare) on /list-tools and /schema. Confirmed both sides reference the same bare key, and the drift-hook populates it on every connect so the stamp isn't perpetually undefined.
  • B2 (description divergence): FIXEDhashToolSchema normalizes ""/undefined/"(no description)" to one canonical value. Reviewer probed the real client warm path (schemaToCachedTool) vs the daemon path across annotations/inputSchema/truncation — no residual surface divergence.
  • M1 (over-count): FIXEDclearServiceCacheKeys counts only resolved unlinks and logs failures (cache_unlink_failed).
  • B1 regression guard ADDED — the reviewer flagged that no test pinned the bare-key stamp. Added tests/daemon/list-tools-fingerprint.test.ts, which wires a per-user credential so the pool key is a real credential: key distinct from the bare key, and asserts the response stamps the bare one. Mutation-verified: reverting the stamp to listPoolKey fails this test.

Residual (MINOR, tracked as #61): bare-key fingerprint is last-writer-wins if an upstream returns different tool sets per credential. Self-heals, no loop, no data crossing; uncommon in practice. Out of scope here.

Verified: typecheck clean; full suite 1071 pass / 0 fail, deterministic. CI green. Ready to merge.

@rodaddy rodaddy merged commit 2c79819 into main Jun 28, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Schema cache serves stale tool schema after upstream contract bump (no invalidation on version change)

1 participant