[codex] Emit sandbox outcome telemetry event#25955
Conversation
96cbc37 to
0cf4c3e
Compare
|
could we achieve sandbox denial by adding a field (e.g. trigger/reason) to |
@rhan-oai We could but we'd lose some data on the completion of commands that would help us with our analysis of sandbox outcomes / timing. I'd rather we capture it in both if possible. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0cf4c3ea64
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| event_ctx, | ||
| out.map(|result| result.output), | ||
| /*applied_patch_delta*/ None, | ||
| ) | ||
| .await?; |
There was a problem hiding this comment.
Preserve sandbox outcome before returning shell errors
When the classic shell_command attempt is denied or times out and no retry succeeds, emitter.finish(...) converts the sandbox ToolError into a FunctionCallError and this ? returns before the ExecLikeOutput { sandbox_outcome, ... } is constructed. That means CoreToolRuntime::handle_any sees an error and log_tool_result_with_tags records sandbox_outcome=None, so the new telemetry misses the no-retry denied/timed_out cases it is intended to make queryable.
Useful? React with 👍 / 👎.
| retry_result.map(|output| OrchestratorRunResult { | ||
| output, | ||
| deferred_network_approval: retry_deferred_network_approval, | ||
| sandbox_outcome: Some("escalated"), |
There was a problem hiding this comment.
Preserve denied outcome after sandbox retry
When the first sandboxed attempt is denied and the user approves the retry, this records the successful retry as sandbox_outcome="escalated" instead of preserving the denied sandbox outcome. As a result, queries or alerts over sandbox_outcome=denied will miss exactly the approved-retry path that this telemetry change says it is meant to retain.
Useful? React with 👍 / 👎.
c380a87 to
eece338
Compare
| .find_map(|(field_key, value)| (*field_key == key).then_some(*value)) | ||
| } | ||
|
|
||
| fn duration_ms_i64(duration: Duration) -> i64 { |
There was a problem hiding this comment.
Done in 13fd9ac414 — removed the helper and inlined the duration conversion at the event callsite.
|
pre-approving |
Add a dedicated codex.sandbox_outcome telemetry event for sandbox edge outcomes. Emit denied, timed_out, signal, and escalated outcomes from the tool orchestrator instead of plumbing sandbox metadata through tool_result output paths. Include initial and escalated attempt durations on the event. Add OTEL coverage for the new event shape.
eece338 to
13fd9ac
Compare
Summary
Adds a dedicated
codex.sandbox_outcometelemetry event so we can query sandbox edge outcomes without threading sandbox metadata through tool-result output types.This is meant to make sandbox failures and approved escalation retries visible in OTEL while keeping the existing
codex.tool_resultevent shape focused on tool completion data.What changed
SessionTelemetry::sandbox_outcome(...), which emitscodex.sandbox_outcomeas both a log and trace event.deniedwhen the sandbox blocks execution and no retry is run.timed_outandsignalwhen those sandbox errors surface from tool execution.escalatedwhen the initial sandboxed attempt fails and the approved unsandboxed retry succeeds.Validation
RUST_MIN_STACK=8388608 just test -p codex-core sandbox_outcome_event_records_outcome handle_sandbox_error_user_approves_retry_records_tool_decisionjust test -p codex-otel otel_export_routing_policy_routes_tool_result_log_and_trace_events runtime_metrics_summary_collects_tool_api_and_streaming_metricsjust fix -p codex-corejust fix -p codex-otel