Skip to content

Conversation

@etraut-openai
Copy link
Collaborator

@etraut-openai etraut-openai commented Oct 22, 2025

This PR adds support for a model-based summary and risk assessment for commands that violate the sandbox policy and require user approval. This aids the user in evaluating whether the command should be approved.

The feature works by taking a failed command and passing it back to the model and asking it to summarize the command, give it a risk level (low, medium, high) and a risk category (e.g. "data deletion" or "data exfiltration"). It uses a new conversation thread so the context in the existing thread doesn't influence the answer. If the call to the model fails or takes longer than 5 seconds, it falls back to the current behavior.

For now, this is an experimental feature and is gated by a config key experimental_sandbox_command_assessment.

Here is a screen shot of the approval prompt showing the risk assessment and summary.

image

@etraut-openai etraut-openai changed the title Added model summary and risk assessment for commands that violate san… Added model summary and risk assessment for commands that violate sandbox policy Oct 22, 2025
@etraut-openai
Copy link
Collaborator Author

@codex review

@chatgpt-codex-connector
Copy link
Contributor

Codex Review: Didn't find any major issues. 👍

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@etraut-openai etraut-openai marked this pull request as ready for review October 22, 2025 22:31
Copy link
Collaborator

@pakrym-oai pakrym-oai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after @jif-oai 's comments.

* Moved prompt into its own file and switched it to use askama for templating
* Refactored sandbox_retry_data trait for simplification
* Fixed otel telemetry so assessment conversation doesn't appear as a new task
* Added otel telemetry point for recording latency of assessment
* Removed defensive JSON parsing of assessment response

Removed new experimental config key from public documentation for now. We're going to roll this out internally first to get feedback.
* Simplified config handling by leveraging "features" mechanism
* Moved approvals-related schemas from protocol.rs to simplify
@etraut-openai
Copy link
Collaborator Author

@codex review

@chatgpt-codex-connector
Copy link
Contributor

Codex Review: Didn't find any major issues. Keep them coming!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@etraut-openai etraut-openai merged commit f8af4f5 into main Oct 24, 2025
20 checks passed
@etraut-openai etraut-openai deleted the etraut/command-assessment branch October 24, 2025 22:23
@github-actions github-actions bot locked and limited conversation to collaborators Oct 24, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants