Spun out of #105 (Stage C2b). The slow-toast half landed in #109; this captures what's left for follow-up design before any code is written.
Why
/mcp reconnect <name> should let a user recover from a transiently broken MCP server (handshake stuck, server hot-restarted) without restarting the whole Reasonix session. The design doc (docs/design/agent-tui-terminal.html §37) lists ↻ reconnect 2/5 as one of the six lifecycle states.
The constraint that makes this non-trivial
Reasonix's whole architecture rests on a byte-stable prompt prefix:
system + tool_specs + few-shots → identical bytes across turns → DeepSeek prefix-cache hits at 90%+
Reconnecting a server tears down its McpClient, re-handshakes, and re-bridges. If the reconnected server's tools/list came back with anything different — added a tool, dropped one, changed a description, even reordered — the prompt prefix shifts and the next turn cache-misses entirely. On a long session that 80%+ cache hit becomes an unrecoverable loss.
So the design question isn't "how do we tear down and re-bridge"; that part is a few dozen lines. The question is "what do we do when the reconnected tool surface differs?"
Approaches to weigh
1. Strict (refuse-on-drift): before re-bridging, snapshot the new tools/list and compare against the prior. If anything changed, refuse with a card explaining "tool surface changed, restart to apply". Preserves cache. Cost: reconnect is useless when the server has actually been updated upstream.
2. Permissive + announce: always accept the new surface; if it differs from the prior, emit a prominent warn card ("cache reset — next turn will be a full miss; subsequent turns reseed") and continue. Cost: silent expensive turns when users don't read warnings.
3. Identity-only: allow reconnect only when the new tools/list is byte-identical to the prior. Same effect as (1) for the user, but the trigger isn't user-visible. Probably worse than (1).
4. Two-mode flag: /mcp reconnect <name> is strict by default; /mcp reconnect <name> --force is permissive + warns. Best ergonomics, most code.
Other open questions
- Mid-turn vs between-turn: is reconnect allowed while a turn is in flight? Easiest answer: no, queue until next prompt.
- Other servers' callers: if the server being reconnected was mid-tool-call, the in-flight
callTool promise needs to reject cleanly. Existing AbortSignal threading should cover this.
- Lifecycle event ordering: the design's
↻ reconnect 2/5 backoff 4s implies retry semantics. Is the "2/5" surface from the underlying transport's reconnect attempts, or from a Reasonix-level retry wrapper around the manual /mcp reconnect? Probably the latter, capped at 5 with exponential backoff, but worth pinning.
r keybind in /mcp browser: Stage B left this as a stub. Once this RFC lands, the r key triggers the same code path.
Out of scope
- Auto-reconnect on transient failures during dispatch. That's a separate failure-mode question (currently a
failed lifecycle line is emitted and the session continues without it). Tracking issue if anyone wants it.
- Hot-reloading the
--mcp flag list to add new servers mid-session (different feature; would also need cache-prefix work).
Suggested resolution shape
I'd commit to approach #4 (default strict, --force for permissive) once the open questions above have at-least-loose answers. Spike: prototype the strict path first against the bundled demo MCP server in tests, confirm cache prefix stability, then layer --force.
Closes part of #105 (specifically: removes the r keybind stub from McpBrowser once shipped).
Spun out of #105 (Stage C2b). The slow-toast half landed in #109; this captures what's left for follow-up design before any code is written.
Why
/mcp reconnect <name>should let a user recover from a transiently broken MCP server (handshake stuck, server hot-restarted) without restarting the whole Reasonix session. The design doc (docs/design/agent-tui-terminal.html§37) lists↻ reconnect 2/5as one of the six lifecycle states.The constraint that makes this non-trivial
Reasonix's whole architecture rests on a byte-stable prompt prefix:
Reconnecting a server tears down its
McpClient, re-handshakes, and re-bridges. If the reconnected server'stools/listcame back with anything different — added a tool, dropped one, changed a description, even reordered — the prompt prefix shifts and the next turn cache-misses entirely. On a long session that 80%+ cache hit becomes an unrecoverable loss.So the design question isn't "how do we tear down and re-bridge"; that part is a few dozen lines. The question is "what do we do when the reconnected tool surface differs?"
Approaches to weigh
1. Strict (refuse-on-drift): before re-bridging, snapshot the new
tools/listand compare against the prior. If anything changed, refuse with a card explaining "tool surface changed, restart to apply". Preserves cache. Cost: reconnect is useless when the server has actually been updated upstream.2. Permissive + announce: always accept the new surface; if it differs from the prior, emit a prominent warn card ("cache reset — next turn will be a full miss; subsequent turns reseed") and continue. Cost: silent expensive turns when users don't read warnings.
3. Identity-only: allow reconnect only when the new
tools/listis byte-identical to the prior. Same effect as (1) for the user, but the trigger isn't user-visible. Probably worse than (1).4. Two-mode flag:
/mcp reconnect <name>is strict by default;/mcp reconnect <name> --forceis permissive + warns. Best ergonomics, most code.Other open questions
callToolpromise needs to reject cleanly. Existing AbortSignal threading should cover this.↻ reconnect 2/5 backoff 4simplies retry semantics. Is the "2/5" surface from the underlying transport's reconnect attempts, or from a Reasonix-level retry wrapper around the manual/mcp reconnect? Probably the latter, capped at 5 with exponential backoff, but worth pinning.rkeybind in/mcpbrowser: Stage B left this as a stub. Once this RFC lands, therkey triggers the same code path.Out of scope
failedlifecycle line is emitted and the session continues without it). Tracking issue if anyone wants it.--mcpflag list to add new servers mid-session (different feature; would also need cache-prefix work).Suggested resolution shape
I'd commit to approach #4 (default strict,
--forcefor permissive) once the open questions above have at-least-loose answers. Spike: prototype the strict path first against the bundled demo MCP server in tests, confirm cache prefix stability, then layer--force.Closes part of #105 (specifically: removes the
rkeybind stub from McpBrowser once shipped).