fix: make session MCP startup lazy#22901
Conversation
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 599f53bc67
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| pending_mcp_server_refresh: Mutex::new(None), | ||
| mcp_connection_manager_initialized: std::sync::atomic::AtomicBool::new(false), |
There was a problem hiding this comment.
Start MCP initialization for idle thread starts
With the manager now left uninitialized here and no replacement startup path after SessionConfigured, a client that only calls thread/start or thread/resume and then listens for MCP server status never triggers ensure_mcp_connection_manager_initialized(). That means mcpServer/startupStatus/updated notifications are no longer emitted for configured servers until a later turn/tool listing happens; the existing thread_start_emits_mcp_server_status_updated_notifications workflow still waits for those notifications immediately after ThreadStartResponse, so it will time out for optional/broken MCP configs.
Useful? React with 👍 / 👎.
599f53b to
95d02e8
Compare
|
@codex review |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 68dc2fda2f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| client_elicitation_capability: mcp_config.client_elicitation_capability, | ||
| elicitation_reviewer: None, |
There was a problem hiding this comment.
Pass the Guardian reviewer during lazy MCP init
When the session MCP manager is first created through the new lazy/startup path, elicitation_reviewer is hard-coded to None, whereas the old startup path passed Some(sess.mcp_elicitation_reviewer()). In sessions using Guardian approval routing, MCP servers that emit Guardian MCP-tool elicitations during normal tool calls will now bypass review_guardian_mcp_elicitation for the lifetime of the initial manager and fall back to raw ElicitationRequest handling until a refresh happens.
Useful? React with 👍 / 👎.
| let (refreshed_manager, cancel_token) = McpConnectionManager::new( | ||
| &mcp_servers, | ||
|
|
||
| self.replace_mcp_connection_manager(ReplaceMcpConnectionManagerArgs { |
There was a problem hiding this comment.
Serialize refresh with lazy MCP initialization
This refresh path can run while the background startup task is still inside ensure_mcp_connection_manager_initialized(), but it does not take the initialization lock or otherwise invalidate the in-flight initial build. If an app-server client sends RefreshMcpServers before the startup prewarm completes and the next turn applies it, the refresh and the original lazy init can race; whichever replace_mcp_connection_manager finishes last wins, so the original config can overwrite the just-refreshed server set.
Useful? React with 👍 / 👎.
| .unwrap_or_else(|err| { | ||
| panic!("session MCP environment selections should remain valid: {err}") | ||
| }) |
There was a problem hiding this comment.
Return an error for invalid MCP environments
With a stored/sticky MCP environment id that is no longer known, this lazy initialization path now panics inside the spawned startup task instead of returning the InvalidRequest error that the old Session::new path produced. That leaves app-server thread/start/thread/resume apparently successful, and the first later MCP consumer can panic again while trying to initialize the manager.
Useful? React with 👍 / 👎.
| anyhow::bail!("required MCP servers failed to initialize: {details}"); | ||
| } | ||
| } | ||
| sess.start_mcp_connection_manager_initialization(); |
There was a problem hiding this comment.
Cancel lazy MCP initialization on shutdown
Starting MCP initialization as a detached task here means an immediate Op::Shutdown can drain the still-uninitialized manager and emit ShutdownComplete while this task continues holding the session alive. When it later reaches replace_mcp_connection_manager, it can create and swap in real MCP clients after shutdown, leaving stdio MCP server processes running and possibly emitting startup events for a closed thread.
Useful? React with 👍 / 👎.
| self.replace_mcp_connection_manager(ReplaceMcpConnectionManagerArgs { | ||
| submit_id: INITIAL_SUBMIT_ID.to_owned(), | ||
| approval_policy: &session_configuration.approval_policy, | ||
| permission_profile: session_configuration.permission_profile(), | ||
| runtime_environment: self.session_mcp_runtime_environment(&session_configuration), | ||
| config: &config, | ||
| mcp_servers, | ||
| store_mode: config.mcp_oauth_credentials_store_mode, | ||
| auth: auth.as_ref(), | ||
| host_owned_codex_apps_enabled: host_owned_codex_apps_enabled( | ||
| &mcp_config, | ||
| auth.as_ref(), | ||
| ), | ||
| client_elicitation_capability: mcp_config.client_elicitation_capability, | ||
| elicitation_reviewer: None, | ||
| }) |
There was a problem hiding this comment.
Preserve required MCP server failures
After moving startup behind lazy initialization, the old required_startup_failures(...) check is no longer called anywhere, so a configured required = true server that fails to start only produces startup events and the turn proceeds with the missing tools. For setups that mark MCP servers required because the agent depends on them, the first MCP/tool-building consumer should still fail or otherwise stop the turn instead of silently continuing without the required server.
Useful? React with 👍 / 👎.
| { | ||
| let mut guard = self.services.mcp_startup_cancellation_token.lock().await; | ||
| guard.cancel(); | ||
| *guard = CancellationToken::new(); | ||
| } |
There was a problem hiding this comment.
Honor interrupts that arrive before token replacement
If the user interrupts an idle session while the lazy MCP init task is still computing provenance/auth status, cancel_mcp_startup() cancels the placeholder token, but this block then discards that cancelled token and installs a fresh uncancelled one before spawning MCP clients. In that timing window, the interrupt no longer cancels startup, so MCP processes can still be launched after the user asked to stop idle startup work.
Useful? React with 👍 / 👎.
Summary
App-server thread startup used to build the session's MCP pool up front.
thread/startandthread/resumewaited for that work, including required MCP server startup, before returning.This change keeps one MCP pool per session, but removes that startup wait.
thread/startandthread/resumenow return after session configuration, while MCP initialization begins in the background. The first real MCP consumer still awaits the shared session pool if startup has not finished yet.This keeps the user-visible startup path fast without changing how MCP consumers observe readiness, and it preserves the early
mcpServer/startupStatus/updatednotifications that clients already rely on.This also keeps the forked-thread boundary intact. We still do not share one process-wide MCP pool because MCP runtime state depends on session config,
cwd, environment selection, and approval state.Design decisions
thread/startandthread/resumecritical path while still emitting startup status updates promptly.Testing
codex-coretests covering deferred refresh and idempotent lazy initialization.codex-app-servertests coveringthread/start,thread/resume, and MCP status notifications.