chore(spike): live DeepSeek cache spike for RFC #110#113
Merged
Conversation
Two-file spike under benchmarks/spike-mcp-reconnect/: - runner.ts — 5 chat calls against live deepseek-chat with controlled tool-list drifts (identity, append, mid-stream edit) - results.md — captured run + empirical findings The headline result overturns the RFC body's "any drift = full cache miss" claim. DeepSeek's prefix cache works at chunk granularity (≈128 tokens), so the cost depends on WHERE the drift falls: - append a tool at the end → trivial cost (94.8% hit, even better than the no-drift 85% baseline because the new chunk gets cached) - edit a tool's description in the middle → loses chunks past the edit (84.1% hit observed) - replacing or reordering the tool list → effectively full miss This nudges the C2b design call away from blanket "strict default" toward graduated permissive: silent on appends, warn on mid-stream edits, refuse on reorders / removals. `--strict` remains as an explicit flag. Refs #110.
This was referenced May 2, 2026
esengine
added a commit
that referenced
this pull request
May 2, 2026
Pure function `classifyToolListDrift(before, after) → DriftReport` encoding the cache-cost taxonomy validated by the live spike (#113): - `identity` — same names, same order, same content → free - `append` — every before-tool unchanged, new tools at the end → trivially cheap (94.8% hit observed) - `edit` — same names + positions, content of ≥1 tool changed → bounded loss past the divergence point - `reorder` — same set, different order, OR additions not at the end → catastrophic - `remove` — any before-tool missing from after → catastrophic; dominates even when other tools were added Report carries `added` / `removed` / `edited` arrays so the policy layer (next PR) can populate the warn line precisely. Pure function, no I/O. 12 unit tests cover the matrix including edge cases (empty before/after, remove-with-also-added). Refs #110.
4 tasks
esengine
added a commit
that referenced
this pull request
May 2, 2026
C2b implementation per RFC #110. Identity-drift only — append / edit / reorder / remove drift cases surface a clear "restart Reasonix to apply" message instead of mutating the registry or prefix mid-session. The graduated permissive policy from the empirical spike (#113) needs API work on `ImmutablePrefix` (replaceTool / removeTool) before the other drift kinds can take effect mid-session; that's a follow-up PR. Touch: - `src/mcp/registry.ts`: new `McpClientHost = { client: McpClient }` indirection on BridgeOptions. Tool closures resolve the live client via `host.client` at call time, so reconnect can swap the underlying socket without re-bridging tools. - `src/mcp/reconnect.ts` (new): `reconnectMcpServer({ host, spec, beforeTools })` re-handshakes a fresh transport, classifies drift, swaps `host.client` only on identity, closes the new client cleanly on refusal so the old one stays untouched. - `src/cli/ui/slash/handlers/mcp.ts`: third subcommand `reconnect <name>`, fires async, reports via `ctx.postInfo` with the lifecycle `↻ reconnect…` / `✓ connected` / `✖ failed` formatter. - `src/cli/ui/mcp-lifecycle.ts`: `reconnect` state added to the union. - `src/cli/ui/slash/types.ts`: `McpServerSummary.client?` replaced by `host: McpClientHost`. `McpClient` import dropped (now via host). - `src/cli/ui/mcp-browse.ts`: `/resource` and `/prompt` read through `server.host.client`. Disconnected-server warnings dropped — host always carries a client now. - `src/cli/commands/chat.tsx`: builds the host at bridge time, stores it on the summary. Tests: - `tests/mcp-reconnect.test.ts`: 2 cases for spec_parse early returns. - `tests/mcp-integration.test.ts`: live test that bridge + host indirection routes a swapped client correctly through registry.dispatch. - `tests/slash.test.ts`: 3 cases for the slash dispatch (lifecycle line emission, unknown-name rejection with hint, no-arg usage). - `tests/mcp-browse.test.ts`: server() helper updated to accept `client` and wrap in host shape. Closes part of #110 (identity case only). Append/edit/reorder/remove mid-session handling deferred — needs ImmutablePrefix surgery.
This was referenced May 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Refs #110.
Empirical follow-up to the structural-test PR (#112). Earlier today I claimed in the RFC body that "any tool-list drift on reconnect = full cache miss". That's wrong. Live spike against
deepseek-chat:DeepSeek caches in ~128-token chunks. Cost of drift depends on where the drift lands:
What this changes
The RFC's strict-vs-permissive framing was based on a wrong premise. Updated thinking:
--strictflag for users who care about every byte (high-volume scripted runs).Touch
benchmarks/spike-mcp-reconnect/runner.ts— 5-turn live runner, ~¥0.01 per executionbenchmarks/spike-mcp-reconnect/results.md— captured run + empirical findings + design implicationsNo source files touched. The RFC #110 thread will get an updated comment summarizing the finding.
Test plan