-
Notifications
You must be signed in to change notification settings - Fork 6.4k
[app-server] feat: add v2 command execution approval flow #6758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@codex review this |
|
Codex Review: Didn't find any major issues. What shall we delve into next? ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
63f1dd4 to
f6b7bc6
Compare
bolinfest
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still reviewing, but I wanted to publish some of my initial comments early.
| /// Use to correlate this with [codex_core::protocol::ExecCommandBeginEvent] | ||
| /// and [codex_core::protocol::ExecCommandEndEvent]. | ||
| pub call_id: String, | ||
| pub command: Vec<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today, we have various "shell tools" that have different APIs:
- the "unified exec" approach has two tools:
exec_commandthe "command" is namedcmdand is a single stringwrite_stdinstreams input bytes to an existing session (we currently have no expectation of applying approvals to these)
- the
shell/container.exec/local_shelltool that takescommandas astring[] - the
shell_commandtool I added in feat: shell_command tool #6510 takescommandas a singlestringlikeexec_command, but there is no complementarywrite_stdintool in this case
We also have some work in flight where the tool call is command: string like unified exec, but the approval is still tied to one or more string[] instances where each maps to an execve() invocation.
I'm enumerating these to be sure that ExecCommandApprovalParams makes sense for all these cases. /cc @nornagon-openai
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are your thoughts of the v2.rs shape here? https://github.com/openai/codex/pull/6758/files#diff-08e3876d082b8c0ed5b525feeb0d204b12b3731d4a1a0ed4f72455e819e4eea6R624
I'm thinking command as a string in the API is the most correct form (joining an underlying string[] using shlex if necessary), and for the interactive stuff aka write_stdin, we'd have to add a field to ThreadItem::CommandExecution that is something like a vector of strings/bytes/etc.
We also have some work in flight where the tool call is command: string like unified exec, but the approval is still tied to one or more string[] instances where each maps to an execve() invocation.
Interesting... I think in that case it is representable with one command exec approval request per execve() call (which means there may be multiple approvals per ThreadItem::CommandExecution item):
pub struct CommandExecutionRequestApprovalParams {
pub thread_id: String,
pub turn_id: String,
pub item_id: String,
/// Optional explanatory reason (e.g. request for network access).
pub reason: Option<String>,
/// Optional model-provided risk assessment describing the blocked command.
pub risk: Option<SandboxCommandAssessment>,
/// A best-effort parsing of the command to identify the type of command and its arguments.
pub parsed_cmd: Vec<ParsedCommand>,
/// NEW: execve invocation
pub execve_invocation: Vec<String>
}
Luckily it seems doable to expand the API to support these new exec use cases. @bolinfest Thoughts?
| pub cwd: PathBuf, | ||
| pub reason: Option<String>, | ||
| pub risk: Option<SandboxCommandAssessment>, | ||
| pub parsed_cmd: Vec<ParsedCommand>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is appropriate here. Are we doing something like this today?
At a minimum, the command should not map to more than one ParsedCommand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are actually, this is due to the fact that a shell command can be a sequence of piped commands - we have an existing unit test demonstrating this:
fn handles_complex_bash_command_head() {
let inner =
"rg --version && node -v && pnpm -v && rg --files | wc -l && rg --files | head -n 40";
assert_parsed(
&vec_str(&["bash", "-lc", inner]),
vec![
// Expect commands in left-to-right execution order
ParsedCommand::Search {
cmd: "rg --version".to_string(),
query: None,
path: None,
},
ParsedCommand::Unknown {
cmd: "node -v".to_string(),
},
ParsedCommand::Unknown {
cmd: "pnpm -v".to_string(),
},
ParsedCommand::Search {
cmd: "rg --files".to_string(),
query: None,
path: None,
},
ParsedCommand::Unknown {
cmd: "head -n 40".to_string(),
},
],
);
}
| pub diff: String, | ||
| } | ||
|
|
||
| #[derive(Serialize, Deserialize, Debug, Clone, PartialEq, JsonSchema, TS)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see now this was all moved over from codex-rs/app-server-protocol/src/protocol/common.rs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah sorry forgot to mention in PR description. everything in v1.rs was just moved, and I didn't touch how the legacy API works
| ); | ||
|
|
||
| v2_enum_from_core!( | ||
| pub enum ReviewDecision from codex_protocol::protocol::ReviewDecision { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not completely convinced this is the right shape...for ExecCommandApprovalParams for example, what if you want to approve for session, but with a strict prefix of command rather than the exact command itself?
Maybe we should make ReviewDecision a simpler enum of Approved | Denied | Abort (or to be more in line with MCP elicitations: accept | decline | cancel https://modelcontextprotocol.io/specification/draft/client/elicitation#response-actions) and then have other concepts like ApprovedForSession be separate fields on ExecCommandApprovalResponse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like it, will make an update here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to:
{
"id": 0,
"result": {
"acceptSettings": {
"forSession": false
},
"decision": "accept"
}
}
which makes it easy to extend acceptSettings to take in a command prefix or anything else we might want to add in the future.
(see PR description for the full flow)
5ffaed1 to
54db61b
Compare
| #[serde(tag = "type", rename_all = "camelCase")] | ||
| #[ts(tag = "type")] | ||
| #[ts(export_to = "v2/")] | ||
| pub enum ParsedCommand { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe CommandAction is a better name? It's not important that it was "parsed" from some other command: it's important that the command maps to a type of action that is easier for a user to reason about.
| #[ts(export_to = "v2/")] | ||
| pub enum ParsedCommand { | ||
| Read { | ||
| cmd: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move from cmd to command in this API? cmd is too similar to cwd for my liking.
| command: String, | ||
| aggregated_output: String, | ||
| exit_code: Option<i32>, | ||
| /// The command's working directory if not the default cwd for the agent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This says "if not" but this is not Option<PathBuf>? We should keep cwd as required, right?
| /// Optional model-provided risk assessment describing the blocked command. | ||
| pub risk: Option<SandboxCommandAssessment>, | ||
| /// A best-effort parsing of the command to identify the type of command and its arguments. | ||
| pub parsed_cmd: Vec<ParsedCommand>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe command_actions or subcommands or actions instead?
| #[ts(export_to = "v2/")] | ||
| pub struct CommandExecutionRequestApprovalResponse { | ||
| pub decision: ApprovalDecision, | ||
| pub accept_settings: Option<CommandExecutionRequestAcceptSettings>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we just flatten this and declare for_session on CommandExecutionRequestApprovalResponse?
Also, add #[default]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put it as a nested field to make it more clear that these settings are only applied if decision == 'accept'
do you think for_session and other params in the future should apply to decline or cancel too?
| use std::path::PathBuf; | ||
|
|
||
| fn shlex_join(tokens: &[String]) -> String { | ||
| pub fn shlex_join(tokens: &[String]) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should make a shlex_join.rs or park this in some other command utility (follow-up)
This PR adds the API V2 version of the command‑execution approval flow for the shell tool.
This PR wires the new RPC (
item/commandExecution/requestApproval, V2 only) and related events (item/started,item/completed, anditem/commandExecution/delta, which are emitted in both V1 and V2) through the app-serverprotocol. The new approval RPC is only sent when the user initiates a turn with the new
turn/startAPI so we don't break backwards compatibility with VSCE.The approach I took was to make as few changes to the Codex core as possible, leveraging existing
EventMsgcore events, and translating those in app-server. I did have to add additional fields toEventMsg::ExecCommandEndEventto capture the command's input so that app-server can statelessly transform these events to aThreadItem::CommandExecutionitem for theitem/completedevent.Once we stabilize the API and it's complete enough for our partners, we can work on migrating the core to be aware of command execution items as a first-class concept.
Note: We'll need followup work to make sure these APIs work for the unified exec tool, but will wait til that's stable and landed before doing a pass on app-server.
Example payloads below: