chore(spike): live DeepSeek cache spike for RFC #110 by esengine · Pull Request #113 · esengine/reasonix

esengine · 2026-05-02T08:52:35Z

Refs #110.

Empirical follow-up to the structural-test PR (#112). Earlier today I claimed in the RFC body that "any tool-list drift on reconnect = full cache miss". That's wrong. Live spike against deepseek-chat:

turn                                      prompt     hit    miss    hit%
1 · cold start (toolset A)                   758     640     118   84.4%
2 · same prefix (toolset A)                  753     640     113   85.0%
3 · drift: ADDED tool (toolset A+)           810     768      42   94.8%
4 · same prefix again (toolset A+)           807     768      39   95.2%
5 · drift: EDITED desc (toolset A')          761     640     121   84.1%

DeepSeek caches in ~128-token chunks. Cost of drift depends on where the drift lands:

append a tool at the end → trivial (94.8% hit, better than no-drift baseline because the new chunk gets cached)
edit a tool's description mid-stream → bounded loss (84.1%, only chunks past the edit miss)
replace / reorder / remove → would be effectively full miss (not measured here, but follows from the chunk model)

What this changes

The RFC's strict-vs-permissive framing was based on a wrong premise. Updated thinking:

Default: graduated permissive. Silent on appends, warn on mid-stream edits, refuse on whole-list reorders or removals.
--strict flag for users who care about every byte (high-volume scripted runs).

Touch

benchmarks/spike-mcp-reconnect/runner.ts — 5-turn live runner, ~¥0.01 per execution
benchmarks/spike-mcp-reconnect/results.md — captured run + empirical findings + design implications

No source files touched. The RFC #110 thread will get an updated comment summarizing the finding.

Test plan

Local run completed against live DeepSeek
Lint + typecheck clean
Re-run on a different day to confirm stability of the chunk model under different cache-state conditions (optional — current data is enough for the design call)

Two-file spike under benchmarks/spike-mcp-reconnect/: - runner.ts — 5 chat calls against live deepseek-chat with controlled tool-list drifts (identity, append, mid-stream edit) - results.md — captured run + empirical findings The headline result overturns the RFC body's "any drift = full cache miss" claim. DeepSeek's prefix cache works at chunk granularity (≈128 tokens), so the cost depends on WHERE the drift falls: - append a tool at the end → trivial cost (94.8% hit, even better than the no-drift 85% baseline because the new chunk gets cached) - edit a tool's description in the middle → loses chunks past the edit (84.1% hit observed) - replacing or reordering the tool list → effectively full miss This nudges the C2b design call away from blanket "strict default" toward graduated permissive: silent on appends, warn on mid-stream edits, refuse on reorders / removals. `--strict` remains as an explicit flag. Refs #110.

Pure function `classifyToolListDrift(before, after) → DriftReport` encoding the cache-cost taxonomy validated by the live spike (#113): - `identity` — same names, same order, same content → free - `append` — every before-tool unchanged, new tools at the end → trivially cheap (94.8% hit observed) - `edit` — same names + positions, content of ≥1 tool changed → bounded loss past the divergence point - `reorder` — same set, different order, OR additions not at the end → catastrophic - `remove` — any before-tool missing from after → catastrophic; dominates even when other tools were added Report carries `added` / `removed` / `edited` arrays so the policy layer (next PR) can populate the warn line precisely. Pure function, no I/O. 12 unit tests cover the matrix including edge cases (empty before/after, remove-with-also-added). Refs #110.

C2b implementation per RFC #110. Identity-drift only — append / edit / reorder / remove drift cases surface a clear "restart Reasonix to apply" message instead of mutating the registry or prefix mid-session. The graduated permissive policy from the empirical spike (#113) needs API work on `ImmutablePrefix` (replaceTool / removeTool) before the other drift kinds can take effect mid-session; that's a follow-up PR. Touch: - `src/mcp/registry.ts`: new `McpClientHost = { client: McpClient }` indirection on BridgeOptions. Tool closures resolve the live client via `host.client` at call time, so reconnect can swap the underlying socket without re-bridging tools. - `src/mcp/reconnect.ts` (new): `reconnectMcpServer({ host, spec, beforeTools })` re-handshakes a fresh transport, classifies drift, swaps `host.client` only on identity, closes the new client cleanly on refusal so the old one stays untouched. - `src/cli/ui/slash/handlers/mcp.ts`: third subcommand `reconnect <name>`, fires async, reports via `ctx.postInfo` with the lifecycle `↻ reconnect…` / `✓ connected` / `✖ failed` formatter. - `src/cli/ui/mcp-lifecycle.ts`: `reconnect` state added to the union. - `src/cli/ui/slash/types.ts`: `McpServerSummary.client?` replaced by `host: McpClientHost`. `McpClient` import dropped (now via host). - `src/cli/ui/mcp-browse.ts`: `/resource` and `/prompt` read through `server.host.client`. Disconnected-server warnings dropped — host always carries a client now. - `src/cli/commands/chat.tsx`: builds the host at bridge time, stores it on the summary. Tests: - `tests/mcp-reconnect.test.ts`: 2 cases for spec_parse early returns. - `tests/mcp-integration.test.ts`: live test that bridge + host indirection routes a swapped client correctly through registry.dispatch. - `tests/slash.test.ts`: 3 cases for the slash dispatch (lifecycle line emission, unknown-name rejection with hint, no-arg usage). - `tests/mcp-browse.test.ts`: server() helper updated to accept `client` and wrap in host shape. Closes part of #110 (identity case only). Append/edit/reorder/remove mid-session handling deferred — needs ImmutablePrefix surgery.

esengine merged commit d811f2d into main May 2, 2026
1 check passed

esengine deleted the chore/mcp-reconnect-cache-spike branch May 2, 2026 08:53

This was referenced May 2, 2026

RFC: /mcp reconnect <name> — live teardown without breaking the cache prefix #110

Closed

feat(mcp): tool-list drift classifier — groundwork for #110 C2b #114

Merged

esengine mentioned this pull request May 2, 2026

feat(mcp): /mcp reconnect <name> for identity drift (closes part of #110) #115

Merged

4 tasks

This was referenced May 2, 2026

feat(mcp): append-drift mid-session reconnect (refs #110) #117

Merged

release: 0.22.0 — live MCP reconnect (identity + append) #119

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(spike): live DeepSeek cache spike for RFC #110#113

chore(spike): live DeepSeek cache spike for RFC #110#113
esengine merged 1 commit intomainfrom
chore/mcp-reconnect-cache-spike

esengine commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

esengine commented May 2, 2026

What this changes

Touch

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant