feat: rehydrate MCP sessions from disk on codex-reply#12594
feat: rehydrate MCP sessions from disk on codex-reply#12594zsxkib wants to merge 1 commit intoopenai:mainfrom
Conversation
When the Codex MCP server restarts, in-memory sessions are lost and codex-reply with an old threadId returns "Session not found". This change adds a fallback in the codex-reply error path that automatically locates the persisted JSONL rollout transcript on disk and resumes the session. The rehydration leverages three existing functions that were already in the codebase but not wired together: find_thread_path_by_id_str to locate the rollout file, read_session_meta_line to recover the original cwd, and resume_thread_from_rollout to replay the conversation history into a new thread. The response includes the new threadId so clients seamlessly pick it up for subsequent calls. If no rollout file exists on disk, the original "Session not found" error is returned unchanged.
|
I have read the CLA Document and I hereby sign the CLA Sakib Ahamed seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. |
|
We've updated our contribution guidelines to indicate that we're no longer accepting unsolicited code contributions. All code contributions are by invitation only. To read more about why we've taken this step, please refer to this announcement. |
motivation
i use codex as an MCP server inside claude code a lot. the two work really well as a team — claude handles the high-level orchestration and codex does the heavy lifting with shell access, file edits, etc.
the problem is when i've been working with just codex directly and it already has all the context for what i'm doing. i want to tell claude "go chat with codex
019c7616-b3e9-7782-9ea8-b53e6bb09329" and have it pick up right where i left off. but if the MCP server has restarted since that session (which happens all the time),codex-replyjust returns "Session not found" — even though the full JSONL transcript is still sitting on disk at~/.codex/sessions/.this is frustrating because the data is RIGHT THERE. all the building blocks to fix it already exist in the codebase, they just aren't wired together in the error path.
what this does
adds a fallback in the
codex-replyerror path — if a thread isn't found in memory, it tries to load it from the persisted rollout file and resume the session with the full conversation history.how it works
basically chains three functions that already existed but weren't connected:
find_thread_path_by_id_str— locates the JSONL rollout file for the threadIdread_session_meta_line— recovers the original cwd from session metadataresume_thread_from_rollout— replays the conversation history into a new threadif no rollout file exists on disk, the original "Session not found" error is returned unchanged — so this is purely additive, no behavior change for the normal case.
the response already includes
threadIdin the output, so the client automatically picks up the new ID for subsequent calls.files changed
codex-rs/core/src/thread_manager.rspub fn auth_manager()getter (+4 lines)codex-rs/mcp-server/src/message_processor.rscodex_homefield,try_rehydrate_from_disk()method, modified error path (+75/-10 lines)test plan
cargo build -p codex-mcp-server— compilescargo test -p codex-mcp-server— all 11 existing tests passcodex-replywith old threadId → session resumes from disk