Skip to content

MCP client does not auto-reconnect after disconnect #11489

@lihah111222333-cloud

Description

@lihah111222333-cloud

Summary

MCP server connections in Codex currently do not auto-recover after a disconnect, while model SSE streams already have retry/backoff logic.

This causes MCP tools/resources to remain unavailable until users manually reload MCP servers or restart the CLI.

Observed behavior

  1. AsyncManagedClient uses a cached Shared<Future> startup result, so initialization is effectively one-shot.
  2. RmcpClient has Connecting -> Ready but no explicit Disconnected/Reconnecting lifecycle.
  3. There is no heartbeat/health monitor to proactively detect dropped MCP transport.
  4. Model API SSE flow has reconnect (retry + backoff), but MCP path has no equivalent resilience.
  5. Practical recovery is manual (RefreshMcpServers / config/mcpServer/reload) or full process restart.

Why this matters

  • Long-running Codex sessions with stdio or streamable-http MCP servers are fragile.
  • Any transient server restart/network hiccup can permanently break MCP calls in the current session.
  • Users experience random "tool call failed" errors without automatic self-healing.

Minimal repro

  • Configure any MCP server (stdio is enough).
  • Start a session and successfully call an MCP tool once.
  • Restart/kill the MCP server process.
  • Call MCP tool again.
  • Expected: Codex reconnects and retries automatically.
  • Actual: MCP call fails until manual reload/restart.

Suggested direction

  • Add reconnect support to MCP connection management:
    • Detect connection-class errors on MCP operations.
    • Recreate client transport and re-run initialize/list-tools.
    • Retry failed request once after successful reconnect.
    • Re-send relevant state (e.g. sandbox-state capability update) after reconnect.
  • Optionally add periodic health check/heartbeat for proactive reconnection.

If helpful, I can open a follow-up PR with a concrete implementation and tests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestmcpIssues related to the use of model context protocol (MCP) servers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions