Skip to content

[codex] Emit sandbox outcome telemetry event#25955

Merged
rreichel3-oai merged 1 commit into
mainfrom
rreichel3/sandbox-outcome-tool-result
Jun 5, 2026
Merged

[codex] Emit sandbox outcome telemetry event#25955
rreichel3-oai merged 1 commit into
mainfrom
rreichel3/sandbox-outcome-tool-result

Conversation

@rreichel3-oai

@rreichel3-oai rreichel3-oai commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a dedicated codex.sandbox_outcome telemetry event so we can query sandbox edge outcomes without threading sandbox metadata through tool-result output types.

This is meant to make sandbox failures and approved escalation retries visible in OTEL while keeping the existing codex.tool_result event shape focused on tool completion data.

What changed

  • Adds SessionTelemetry::sandbox_outcome(...), which emits codex.sandbox_outcome as both a log and trace event.
  • Records the tool name, call id, sandbox outcome, initial attempt duration, and escalated attempt duration when a retry runs.
  • Emits denied when the sandbox blocks execution and no retry is run.
  • Emits timed_out and signal when those sandbox errors surface from tool execution.
  • Emits escalated when the initial sandboxed attempt fails and the approved unsandboxed retry succeeds.
  • Adds OTEL coverage for the new event payload, including timing fields.

Validation

  • RUST_MIN_STACK=8388608 just test -p codex-core sandbox_outcome_event_records_outcome handle_sandbox_error_user_approves_retry_records_tool_decision
  • just test -p codex-otel otel_export_routing_policy_routes_tool_result_log_and_trace_events runtime_metrics_summary_collects_tool_api_and_streaming_metrics
  • just fix -p codex-core
  • just fix -p codex-otel

@rreichel3-oai rreichel3-oai force-pushed the rreichel3/sandbox-outcome-tool-result branch 4 times, most recently from 96cbc37 to 0cf4c3e Compare June 3, 2026 16:16
@rhan-oai

rhan-oai commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

could we achieve sandbox denial by adding a field (e.g. trigger/reason) to codex.tool_decision to reduce the amount of plumbing we're doing here

@rreichel3-oai

rreichel3-oai commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

could we achieve sandbox denial by adding a field (e.g. trigger/reason) to codex.tool_decision to reduce the amount of plumbing we're doing here

@rhan-oai We could but we'd lose some data on the completion of commands that would help us with our analysis of sandbox outcomes / timing. I'd rather we capture it in both if possible.

@rreichel3-oai rreichel3-oai marked this pull request as ready for review June 3, 2026 21:12
@rreichel3-oai rreichel3-oai requested a review from a team as a code owner June 3, 2026 21:12

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0cf4c3ea64

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 244 to 248
event_ctx,
out.map(|result| result.output),
/*applied_patch_delta*/ None,
)
.await?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve sandbox outcome before returning shell errors

When the classic shell_command attempt is denied or times out and no retry succeeds, emitter.finish(...) converts the sandbox ToolError into a FunctionCallError and this ? returns before the ExecLikeOutput { sandbox_outcome, ... } is constructed. That means CoreToolRuntime::handle_any sees an error and log_tool_result_with_tags records sandbox_outcome=None, so the new telemetry misses the no-retry denied/timed_out cases it is intended to make queryable.

Useful? React with 👍 / 👎.

Comment thread codex-rs/core/src/tools/orchestrator.rs Outdated
retry_result.map(|output| OrchestratorRunResult {
output,
deferred_network_approval: retry_deferred_network_approval,
sandbox_outcome: Some("escalated"),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve denied outcome after sandbox retry

When the first sandboxed attempt is denied and the user approves the retry, this records the successful retry as sandbox_outcome="escalated" instead of preserving the denied sandbox outcome. As a result, queries or alerts over sandbox_outcome=denied will miss exactly the approved-retry path that this telemetry change says it is meant to retain.

Useful? React with 👍 / 👎.

@rreichel3-oai rreichel3-oai force-pushed the rreichel3/sandbox-outcome-tool-result branch 2 times, most recently from c380a87 to eece338 Compare June 4, 2026 01:20
.find_map(|(field_key, value)| (*field_key == key).then_some(*value))
}

fn duration_ms_i64(duration: Duration) -> i64 {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we inline this

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 13fd9ac414 — removed the helper and inlined the duration conversion at the event callsite.

@rhan-oai

rhan-oai commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

pre-approving

@rreichel3-oai rreichel3-oai changed the title [codex] Emit sandbox outcome in tool result telemetry [codex] Emit sandbox outcome telemetry event Jun 5, 2026
Add a dedicated codex.sandbox_outcome telemetry event for sandbox edge outcomes.

Emit denied, timed_out, signal, and escalated outcomes from the tool orchestrator instead of plumbing sandbox metadata through tool_result output paths.

Include initial and escalated attempt durations on the event.

Add OTEL coverage for the new event shape.
@rreichel3-oai rreichel3-oai force-pushed the rreichel3/sandbox-outcome-tool-result branch from eece338 to 13fd9ac Compare June 5, 2026 00:38
@rreichel3-oai rreichel3-oai merged commit ecae412 into main Jun 5, 2026
31 checks passed
@rreichel3-oai rreichel3-oai deleted the rreichel3/sandbox-outcome-tool-result branch June 5, 2026 00:58
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 5, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants