Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let effective_timeout_ms = params.timeout_ms.unwrap_or(DEFAULT_TIMEOUT_MS); | ||
| let stopwatch = Stopwatch::new(Some(effective_timeout_ms)); | ||
| let cancel_rx = stopwatch.cancellation_receiver(); | ||
| let process_timeout_ms = if cancel_rx.is_some() { | ||
| Some(u64::MAX) |
There was a problem hiding this comment.
Avoid overflowing tokio timeout in stopwatch path
Shell requests now set process_timeout_ms to Some(u64::MAX) whenever a stopwatch is created (mcp.rs lines 94-98). That value flows into core/src/exec.rs where it becomes the Duration passed to tokio::time::timeout, and Tokio panics if the timeout exceeds its MAX_DURATION (≈584 years). u64::MAX ms (~584 billion years) will therefore panic the exec path before the cancellation receiver can fire, crashing the shell tool as soon as this path is exercised; use a bounded duration or skip the timeout instead of an overflowing value.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Yep, fixed on the latest version. Now:
// We want a large timeout that is effectively infinite, because the
// real timeout is enforced by the stopwatch and cancellation receiver.
// While we could use u64::MAX here, using a more reasonable value
// avoids potential issues with libraries that might not handle extreme
// values well.
let process_timeout_ms: u64 = 365 * 24 * 60 * 60 * 1000;4e9941d to
b309236
Compare
| // While we could use u64::MAX here, using a more reasonable value | ||
| // avoids potential issues with libraries that might not handle extreme | ||
| // values well. | ||
| let process_timeout_ms: u64 = 365 * 24 * 60 * 60 * 1000; |
There was a problem hiding this comment.
I would not call this value "reasonable" regarding the scope of codex but fair enough :D
| // We want a large timeout that is effectively infinite, because the | ||
| // real timeout is enforced by the stopwatch and cancellation receiver. | ||
| // While we could use u64::MAX here, using a more reasonable value | ||
| // avoids potential issues with libraries that might not handle extreme | ||
| // values well. | ||
| let process_timeout_ms: u64 = 365 * 24 * 60 * 60 * 1000; |
There was a problem hiding this comment.
why not use None for the timeout?
There was a problem hiding this comment.
Because then it will default to 10s:
codex/codex-rs/core/src/exec.rs
Line 33 in a8a6cbd
| timeout_ms: Option<u64>, | ||
| cancel_rx: Option<oneshot::Receiver<()>>, |
There was a problem hiding this comment.
It seems wrong to have both a timeout and a cancel token.
There was a problem hiding this comment.
I guess I could try to make them exclusive in a follow-up?
There was a problem hiding this comment.
Why not only have a cancel token? Leave it to the caller to set a timeout.
| if remaining.is_zero() { | ||
| break; | ||
| } |
There was a problem hiding this comment.
duplicative with elapsed >= limit above?
| } | ||
|
|
||
| /// Runs `fut`, pausing the stopwatch while the future is pending. The clock resumes | ||
| /// automatically when the future completes. Nested calls are reference-counted so the |
There was a problem hiding this comment.
| /// automatically when the future completes. Nested calls are reference-counted so the | |
| /// automatically when the future completes. Nested/overlapping calls are reference-counted so the |
aec9655 to
208863f
Compare
0401532 to
50ab543
Compare
24e97ab to
5f44599
Compare
…6972) This updates `ExecParams` so that instead of taking `timeout_ms: Option<u64>`, it now takes a more general cancellation mechanism, `ExecExpiration`, which is an enum that includes a `Cancellation(tokio_util::sync::CancellationToken)` variant. If the cancellation token is fired, then `process_exec_tool_call()` returns in the same way as if a timeout was exceeded. This is necessary so that in #6973, we can manage the timeout logic external to the `process_exec_tool_call()` because we want to "suspend" the timeout when an elicitation from a human user is pending. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/6972). * #7005 * #6973 * __->__ #6972
Previously, we were running into an issue where we would run the
shelltool call with a timeout of 10s, but it fired an elicitation asking for user approval, the time the user took to respond to the elicitation was counted agains the 10s timeout, so theshelltool call would fail with a timeout error unless the user is very fast!This PR addresses this issue by introducing a "stopwatch" abstraction that is used to manage the timeout. The idea is:
Stopwatch::new()is called with the real timeout of theshelltool call.process_exec_tool_call()is called with theCancellationvariant ofExecExpirationbecause it should not manage its own timeout in this caseStopwatchexpiration is wired up to thecancel_rxpassed toprocess_exec_tool_call()shelltool call is received, theStopwatchpausesI verified that I can test the MCP server using
@modelcontextprotocol/inspectorand specifygit statusas thecommandwith a timeout of 500ms and that the elicitation pops up and I have all the time in the world to respond whereas previous to this PR, that would not have been possible.Stack created with Sapling. Best reviewed with ReviewStack.