-
Notifications
You must be signed in to change notification settings - Fork 8.7k
Open
Labels
enhancementNew feature or requestNew feature or requestmcpIssues related to the use of model context protocol (MCP) serversIssues related to the use of model context protocol (MCP) servers
Description
Summary
MCP server connections in Codex currently do not auto-recover after a disconnect, while model SSE streams already have retry/backoff logic.
This causes MCP tools/resources to remain unavailable until users manually reload MCP servers or restart the CLI.
Observed behavior
AsyncManagedClientuses a cachedShared<Future>startup result, so initialization is effectively one-shot.RmcpClienthasConnecting -> Readybut no explicitDisconnected/Reconnectinglifecycle.- There is no heartbeat/health monitor to proactively detect dropped MCP transport.
- Model API SSE flow has reconnect (
retry + backoff), but MCP path has no equivalent resilience. - Practical recovery is manual (
RefreshMcpServers/config/mcpServer/reload) or full process restart.
Why this matters
- Long-running Codex sessions with stdio or streamable-http MCP servers are fragile.
- Any transient server restart/network hiccup can permanently break MCP calls in the current session.
- Users experience random "tool call failed" errors without automatic self-healing.
Minimal repro
- Configure any MCP server (stdio is enough).
- Start a session and successfully call an MCP tool once.
- Restart/kill the MCP server process.
- Call MCP tool again.
- Expected: Codex reconnects and retries automatically.
- Actual: MCP call fails until manual reload/restart.
Suggested direction
- Add reconnect support to MCP connection management:
- Detect connection-class errors on MCP operations.
- Recreate client transport and re-run initialize/list-tools.
- Retry failed request once after successful reconnect.
- Re-send relevant state (e.g. sandbox-state capability update) after reconnect.
- Optionally add periodic health check/heartbeat for proactive reconnection.
If helpful, I can open a follow-up PR with a concrete implementation and tests.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestmcpIssues related to the use of model context protocol (MCP) serversIssues related to the use of model context protocol (MCP) servers