Skip to content

chore(spike): live DeepSeek cache spike for RFC #110#113

Merged
esengine merged 1 commit intomainfrom
chore/mcp-reconnect-cache-spike
May 2, 2026
Merged

chore(spike): live DeepSeek cache spike for RFC #110#113
esengine merged 1 commit intomainfrom
chore/mcp-reconnect-cache-spike

Conversation

@esengine
Copy link
Copy Markdown
Owner

@esengine esengine commented May 2, 2026

Refs #110.

Empirical follow-up to the structural-test PR (#112). Earlier today I claimed in the RFC body that "any tool-list drift on reconnect = full cache miss". That's wrong. Live spike against deepseek-chat:

turn                                      prompt     hit    miss    hit%
1 · cold start (toolset A)                   758     640     118   84.4%
2 · same prefix (toolset A)                  753     640     113   85.0%
3 · drift: ADDED tool (toolset A+)           810     768      42   94.8%
4 · same prefix again (toolset A+)           807     768      39   95.2%
5 · drift: EDITED desc (toolset A')          761     640     121   84.1%

DeepSeek caches in ~128-token chunks. Cost of drift depends on where the drift lands:

  • append a tool at the end → trivial (94.8% hit, better than no-drift baseline because the new chunk gets cached)
  • edit a tool's description mid-stream → bounded loss (84.1%, only chunks past the edit miss)
  • replace / reorder / remove → would be effectively full miss (not measured here, but follows from the chunk model)

What this changes

The RFC's strict-vs-permissive framing was based on a wrong premise. Updated thinking:

  • Default: graduated permissive. Silent on appends, warn on mid-stream edits, refuse on whole-list reorders or removals.
  • --strict flag for users who care about every byte (high-volume scripted runs).

Touch

  • benchmarks/spike-mcp-reconnect/runner.ts — 5-turn live runner, ~¥0.01 per execution
  • benchmarks/spike-mcp-reconnect/results.md — captured run + empirical findings + design implications

No source files touched. The RFC #110 thread will get an updated comment summarizing the finding.

Test plan

  • Local run completed against live DeepSeek
  • Lint + typecheck clean
  • Re-run on a different day to confirm stability of the chunk model under different cache-state conditions (optional — current data is enough for the design call)

Two-file spike under benchmarks/spike-mcp-reconnect/:

- runner.ts — 5 chat calls against live deepseek-chat with
  controlled tool-list drifts (identity, append, mid-stream edit)
- results.md — captured run + empirical findings

The headline result overturns the RFC body's "any drift = full
cache miss" claim. DeepSeek's prefix cache works at chunk
granularity (≈128 tokens), so the cost depends on WHERE the drift
falls:

- append a tool at the end → trivial cost (94.8% hit, even better
  than the no-drift 85% baseline because the new chunk gets cached)
- edit a tool's description in the middle → loses chunks past the
  edit (84.1% hit observed)
- replacing or reordering the tool list → effectively full miss

This nudges the C2b design call away from blanket "strict default"
toward graduated permissive: silent on appends, warn on mid-stream
edits, refuse on reorders / removals. `--strict` remains as an
explicit flag.

Refs #110.
@esengine esengine merged commit d811f2d into main May 2, 2026
1 check passed
@esengine esengine deleted the chore/mcp-reconnect-cache-spike branch May 2, 2026 08:53
esengine added a commit that referenced this pull request May 2, 2026
Pure function `classifyToolListDrift(before, after) → DriftReport`
encoding the cache-cost taxonomy validated by the live spike (#113):

- `identity`  — same names, same order, same content → free
- `append`    — every before-tool unchanged, new tools at the end → trivially cheap (94.8% hit observed)
- `edit`      — same names + positions, content of ≥1 tool changed → bounded loss past the divergence point
- `reorder`   — same set, different order, OR additions not at the end → catastrophic
- `remove`    — any before-tool missing from after → catastrophic; dominates even when other tools were added

Report carries `added` / `removed` / `edited` arrays so the policy
layer (next PR) can populate the warn line precisely.

Pure function, no I/O. 12 unit tests cover the matrix including
edge cases (empty before/after, remove-with-also-added).

Refs #110.
esengine added a commit that referenced this pull request May 2, 2026
C2b implementation per RFC #110. Identity-drift only — append / edit /
reorder / remove drift cases surface a clear "restart Reasonix to apply"
message instead of mutating the registry or prefix mid-session. The
graduated permissive policy from the empirical spike (#113) needs API
work on `ImmutablePrefix` (replaceTool / removeTool) before the other
drift kinds can take effect mid-session; that's a follow-up PR.

Touch:

- `src/mcp/registry.ts`: new `McpClientHost = { client: McpClient }`
  indirection on BridgeOptions. Tool closures resolve the live client
  via `host.client` at call time, so reconnect can swap the underlying
  socket without re-bridging tools.
- `src/mcp/reconnect.ts` (new): `reconnectMcpServer({ host, spec,
  beforeTools })` re-handshakes a fresh transport, classifies drift,
  swaps `host.client` only on identity, closes the new client cleanly
  on refusal so the old one stays untouched.
- `src/cli/ui/slash/handlers/mcp.ts`: third subcommand `reconnect <name>`,
  fires async, reports via `ctx.postInfo` with the lifecycle
  `↻ reconnect…` / `✓ connected` / `✖ failed` formatter.
- `src/cli/ui/mcp-lifecycle.ts`: `reconnect` state added to the union.
- `src/cli/ui/slash/types.ts`: `McpServerSummary.client?` replaced by
  `host: McpClientHost`. `McpClient` import dropped (now via host).
- `src/cli/ui/mcp-browse.ts`: `/resource` and `/prompt` read through
  `server.host.client`. Disconnected-server warnings dropped — host
  always carries a client now.
- `src/cli/commands/chat.tsx`: builds the host at bridge time, stores
  it on the summary.

Tests:

- `tests/mcp-reconnect.test.ts`: 2 cases for spec_parse early returns.
- `tests/mcp-integration.test.ts`: live test that bridge + host
  indirection routes a swapped client correctly through registry.dispatch.
- `tests/slash.test.ts`: 3 cases for the slash dispatch (lifecycle line
  emission, unknown-name rejection with hint, no-arg usage).
- `tests/mcp-browse.test.ts`: server() helper updated to accept `client`
  and wrap in host shape.

Closes part of #110 (identity case only). Append/edit/reorder/remove
mid-session handling deferred — needs ImmutablePrefix surgery.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant