Added model summary and risk assessment for commands that violate sandbox policy #5536

etraut-openai · 2025-10-22T21:55:38Z

This PR adds support for a model-based summary and risk assessment for commands that violate the sandbox policy and require user approval. This aids the user in evaluating whether the command should be approved.

The feature works by taking a failed command and passing it back to the model and asking it to summarize the command, give it a risk level (low, medium, high) and a risk category (e.g. "data deletion" or "data exfiltration"). It uses a new conversation thread so the context in the existing thread doesn't influence the answer. If the call to the model fails or takes longer than 5 seconds, it falls back to the current behavior.

For now, this is an experimental feature and is gated by a config key experimental_sandbox_command_assessment.

Here is a screen shot of the approval prompt showing the risk assessment and summary.

…dbox policy and require user approval

etraut-openai · 2025-10-22T22:00:02Z

@codex review

chatgpt-codex-connector · 2025-10-22T22:05:06Z

Codex Review: Didn't find any major issues. 👍

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

codex-rs/core/src/sandboxing/assessment.rs

codex-rs/core/src/tools/runtimes/unified_exec.rs

codex-rs/core/src/config.rs

codex-rs/protocol/src/protocol.rs

codex-rs/core/src/sandboxing/assessment.rs

pakrym-oai

LGTM after @jif-oai 's comments.

* Moved prompt into its own file and switched it to use askama for templating * Refactored sandbox_retry_data trait for simplification * Fixed otel telemetry so assessment conversation doesn't appear as a new task * Added otel telemetry point for recording latency of assessment * Removed defensive JSON parsing of assessment response Removed new experimental config key from public documentation for now. We're going to roll this out internally first to get feedback.

* Simplified config handling by leveraging "features" mechanism * Moved approvals-related schemas from protocol.rs to simplify

etraut-openai · 2025-10-24T22:05:23Z

@codex review

chatgpt-codex-connector · 2025-10-24T22:11:18Z

Codex Review: Didn't find any major issues. Keep them coming!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Added model summary and risk assessment for commands that violate san…

49715e1

…dbox policy and require user approval

etraut-openai changed the title ~~Added model summary and risk assessment for commands that violate san…~~ Added model summary and risk assessment for commands that violate sandbox policy Oct 22, 2025

etraut-openai added 2 commits October 22, 2025 15:10

Refined TUI

3e294f2

Removed inaccurate part of prompt

c1e68c1

etraut-openai marked this pull request as ready for review October 22, 2025 22:31

jif-oai reviewed Oct 23, 2025

View reviewed changes

pakrym-oai reviewed Oct 23, 2025

View reviewed changes

codex-rs/protocol/src/protocol.rs Outdated Show resolved Hide resolved

pakrym-oai reviewed Oct 23, 2025

View reviewed changes

codex-rs/core/src/sandboxing/assessment.rs Outdated Show resolved Hide resolved

pakrym-oai approved these changes Oct 23, 2025

View reviewed changes

etraut-openai added 4 commits October 24, 2025 13:29

More code review feedback:

f050c8e

* Simplified config handling by leveraging "features" mechanism * Moved approvals-related schemas from protocol.rs to simplify

Bug fixes

46fc689

Merge branch 'main' into etraut/command-assessment

1fd5da9

etraut-openai merged commit f8af4f5 into main Oct 24, 2025
20 checks passed

etraut-openai deleted the etraut/command-assessment branch October 24, 2025 22:23

github-actions bot locked and limited conversation to collaborators Oct 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added model summary and risk assessment for commands that violate sandbox policy #5536

Added model summary and risk assessment for commands that violate sandbox policy #5536

Uh oh!

etraut-openai commented Oct 22, 2025 •

edited

Loading

Uh oh!

etraut-openai commented Oct 22, 2025

Uh oh!

chatgpt-codex-connector bot commented Oct 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pakrym-oai left a comment

Uh oh!

etraut-openai commented Oct 24, 2025

Uh oh!

chatgpt-codex-connector bot commented Oct 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Added model summary and risk assessment for commands that violate sandbox policy #5536

Added model summary and risk assessment for commands that violate sandbox policy #5536

Uh oh!

Conversation

etraut-openai commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

etraut-openai commented Oct 22, 2025

Uh oh!

chatgpt-codex-connector bot commented Oct 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pakrym-oai left a comment

Choose a reason for hiding this comment

Uh oh!

etraut-openai commented Oct 24, 2025

Uh oh!

chatgpt-codex-connector bot commented Oct 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

etraut-openai commented Oct 22, 2025 •

edited

Loading