Skip to content

RFC: /mcp reconnect <name> — live teardown without breaking the cache prefix #110

@esengine

Description

@esengine

Spun out of #105 (Stage C2b). The slow-toast half landed in #109; this captures what's left for follow-up design before any code is written.

Why

/mcp reconnect <name> should let a user recover from a transiently broken MCP server (handshake stuck, server hot-restarted) without restarting the whole Reasonix session. The design doc (docs/design/agent-tui-terminal.html §37) lists ↻ reconnect 2/5 as one of the six lifecycle states.

The constraint that makes this non-trivial

Reasonix's whole architecture rests on a byte-stable prompt prefix:

system + tool_specs + few-shots → identical bytes across turns → DeepSeek prefix-cache hits at 90%+

Reconnecting a server tears down its McpClient, re-handshakes, and re-bridges. If the reconnected server's tools/list came back with anything different — added a tool, dropped one, changed a description, even reordered — the prompt prefix shifts and the next turn cache-misses entirely. On a long session that 80%+ cache hit becomes an unrecoverable loss.

So the design question isn't "how do we tear down and re-bridge"; that part is a few dozen lines. The question is "what do we do when the reconnected tool surface differs?"

Approaches to weigh

1. Strict (refuse-on-drift): before re-bridging, snapshot the new tools/list and compare against the prior. If anything changed, refuse with a card explaining "tool surface changed, restart to apply". Preserves cache. Cost: reconnect is useless when the server has actually been updated upstream.

2. Permissive + announce: always accept the new surface; if it differs from the prior, emit a prominent warn card ("cache reset — next turn will be a full miss; subsequent turns reseed") and continue. Cost: silent expensive turns when users don't read warnings.

3. Identity-only: allow reconnect only when the new tools/list is byte-identical to the prior. Same effect as (1) for the user, but the trigger isn't user-visible. Probably worse than (1).

4. Two-mode flag: /mcp reconnect <name> is strict by default; /mcp reconnect <name> --force is permissive + warns. Best ergonomics, most code.

Other open questions

  • Mid-turn vs between-turn: is reconnect allowed while a turn is in flight? Easiest answer: no, queue until next prompt.
  • Other servers' callers: if the server being reconnected was mid-tool-call, the in-flight callTool promise needs to reject cleanly. Existing AbortSignal threading should cover this.
  • Lifecycle event ordering: the design's ↻ reconnect 2/5 backoff 4s implies retry semantics. Is the "2/5" surface from the underlying transport's reconnect attempts, or from a Reasonix-level retry wrapper around the manual /mcp reconnect? Probably the latter, capped at 5 with exponential backoff, but worth pinning.
  • r keybind in /mcp browser: Stage B left this as a stub. Once this RFC lands, the r key triggers the same code path.

Out of scope

  • Auto-reconnect on transient failures during dispatch. That's a separate failure-mode question (currently a failed lifecycle line is emitted and the session continues without it). Tracking issue if anyone wants it.
  • Hot-reloading the --mcp flag list to add new servers mid-session (different feature; would also need cache-prefix work).

Suggested resolution shape

I'd commit to approach #4 (default strict, --force for permissive) once the open questions above have at-least-loose answers. Spike: prototype the strict path first against the bundled demo MCP server in tests, confirm cache prefix stability, then layer --force.

Closes part of #105 (specifically: removes the r keybind stub from McpBrowser once shipped).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestrfcArchitecture proposal / request for comments

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions