add sandbox policy and sandbox name to codex.tool.call metrics#10711
add sandbox policy and sandbox name to codex.tool.call metrics#10711iceweasel-oai merged 5 commits intomainfrom
Conversation
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a39a204494
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| fn sandbox_tag(policy: &SandboxPolicy, windows_sandbox_level: WindowsSandboxLevel) -> &'static str { | ||
| if matches!(policy, SandboxPolicy::DangerFullAccess) { | ||
| return "none"; | ||
| } | ||
| if matches!(policy, SandboxPolicy::ExternalSandbox { .. }) { | ||
| return "external"; | ||
| } | ||
| if cfg!(target_os = "windows") { | ||
| if matches!(windows_sandbox_level, WindowsSandboxLevel::Elevated) { | ||
| return "windows_elevated"; | ||
| } | ||
| } | ||
|
|
||
| get_platform_sandbox(windows_sandbox_level != WindowsSandboxLevel::Disabled) | ||
| .map(SandboxType::as_metric_tag) | ||
| .unwrap_or("none") |
There was a problem hiding this comment.
Tag sandbox metrics from actual attempt, not policy
The new sandbox_tag derives the sandbox metric purely from the turn’s policy and platform, but tool execution can be retried without sandbox after a sandbox denial (see ToolOrchestrator::run, which reruns with SandboxType::None). In those cases the metrics for the successful retry will still be tagged as seccomp/seatbelt/windows_*, so the codex.tool.call success rates won’t reflect the real sandbox used. To keep the metrics accurate (especially for comparing Windows sandbox success rates), the tag should be based on the actual sandbox used in the attempt that produced the result, not just policy-level configuration.
Useful? React with 👍 / 👎.
This will give visibility into the comparative success rate of the Windows sandbox implementations compared to other platforms.