Skip to content

Conversation

@owenlin0
Copy link
Contributor

@owenlin0 owenlin0 commented Nov 17, 2025

This PR adds the API V2 version of the command‑execution approval flow for the shell tool.

This PR wires the new RPC (item/commandExecution/requestApproval, V2 only) and related events (item/started, item/completed, and item/commandExecution/delta, which are emitted in both V1 and V2) through the app-server
protocol. The new approval RPC is only sent when the user initiates a turn with the new turn/start API so we don't break backwards compatibility with VSCE.

The approach I took was to make as few changes to the Codex core as possible, leveraging existing EventMsg core events, and translating those in app-server. I did have to add additional fields to EventMsg::ExecCommandEndEvent to capture the command's input so that app-server can statelessly transform these events to a ThreadItem::CommandExecution item for the item/completed event.

Once we stabilize the API and it's complete enough for our partners, we can work on migrating the core to be aware of command execution items as a first-class concept.

Note: We'll need followup work to make sure these APIs work for the unified exec tool, but will wait til that's stable and landed before doing a pass on app-server.

Example payloads below:

{
  "method": "item/started",
  "params": {
    "item": {
      "aggregatedOutput": null,
      "command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'",
      "cwd": "/Users/owen/repos/codex/codex-rs",
      "durationMs": null,
      "exitCode": null,
      "id": "call_lNWWsbXl1e47qNaYjFRs0dyU",
      "parsedCmd": [
        {
          "cmd": "touch /tmp/should-trigger-approval",
          "type": "unknown"
        }
      ],
      "status": "inProgress",
      "type": "commandExecution"
    }
  }
}
{
  "id": 0,
  "method": "item/commandExecution/requestApproval",
  "params": {
    "itemId": "call_lNWWsbXl1e47qNaYjFRs0dyU",
    "parsedCmd": [
      {
        "cmd": "touch /tmp/should-trigger-approval",
        "type": "unknown"
      }
    ],
    "reason": "Need to create file in /tmp which is outside workspace sandbox",
    "risk": null,
    "threadId": "019a93e8-0a52-7fe3-9808-b6bc40c0989a",
    "turnId": "1"
  }
}
{
  "id": 0,
  "result": {
    "acceptSettings": {
      "forSession": false
    },
    "decision": "accept"
  }
}
{
  "params": {
    "item": {
      "aggregatedOutput": null,
      "command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'",
      "cwd": "/Users/owen/repos/codex/codex-rs",
      "durationMs": 224,
      "exitCode": 0,
      "id": "call_lNWWsbXl1e47qNaYjFRs0dyU",
      "parsedCmd": [
        {
          "cmd": "touch /tmp/should-trigger-approval",
          "type": "unknown"
        }
      ],
      "status": "completed",
      "type": "commandExecution"
    }
  }
}

@owenlin0 owenlin0 marked this pull request as ready for review November 17, 2025 18:30
@owenlin0 owenlin0 changed the title [app-server] feat: add command execution approval flow [app-server] feat: add v2 command execution approval flow Nov 17, 2025
@owenlin0
Copy link
Contributor Author

@codex review this

@chatgpt-codex-connector
Copy link
Contributor

Codex Review: Didn't find any major issues. What shall we delve into next?

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@owenlin0 owenlin0 force-pushed the owen/v2_command_execution branch from 63f1dd4 to f6b7bc6 Compare November 17, 2025 19:14
Copy link
Collaborator

@bolinfest bolinfest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still reviewing, but I wanted to publish some of my initial comments early.

/// Use to correlate this with [codex_core::protocol::ExecCommandBeginEvent]
/// and [codex_core::protocol::ExecCommandEndEvent].
pub call_id: String,
pub command: Vec<String>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today, we have various "shell tools" that have different APIs:

  • the "unified exec" approach has two tools:
    • exec_command the "command" is named cmd and is a single string
    • write_stdin streams input bytes to an existing session (we currently have no expectation of applying approvals to these)
  • the shell/container.exec/local_shell tool that takes command as a string[]
  • the shell_command tool I added in feat: shell_command tool #6510 takes command as a single string like exec_command, but there is no complementary write_stdin tool in this case

We also have some work in flight where the tool call is command: string like unified exec, but the approval is still tied to one or more string[] instances where each maps to an execve() invocation.

I'm enumerating these to be sure that ExecCommandApprovalParams makes sense for all these cases. /cc @nornagon-openai

Copy link
Contributor Author

@owenlin0 owenlin0 Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are your thoughts of the v2.rs shape here? https://github.com/openai/codex/pull/6758/files#diff-08e3876d082b8c0ed5b525feeb0d204b12b3731d4a1a0ed4f72455e819e4eea6R624

I'm thinking command as a string in the API is the most correct form (joining an underlying string[] using shlex if necessary), and for the interactive stuff aka write_stdin, we'd have to add a field to ThreadItem::CommandExecution that is something like a vector of strings/bytes/etc.

We also have some work in flight where the tool call is command: string like unified exec, but the approval is still tied to one or more string[] instances where each maps to an execve() invocation.

Interesting... I think in that case it is representable with one command exec approval request per execve() call (which means there may be multiple approvals per ThreadItem::CommandExecution item):

pub struct CommandExecutionRequestApprovalParams {
    pub thread_id: String,
    pub turn_id: String,
    pub item_id: String,
    /// Optional explanatory reason (e.g. request for network access).
    pub reason: Option<String>,
    /// Optional model-provided risk assessment describing the blocked command.
    pub risk: Option<SandboxCommandAssessment>,
    /// A best-effort parsing of the command to identify the type of command and its arguments.
    pub parsed_cmd: Vec<ParsedCommand>,
    
    /// NEW: execve invocation
    pub execve_invocation: Vec<String>
}

Luckily it seems doable to expand the API to support these new exec use cases. @bolinfest Thoughts?

pub cwd: PathBuf,
pub reason: Option<String>,
pub risk: Option<SandboxCommandAssessment>,
pub parsed_cmd: Vec<ParsedCommand>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is appropriate here. Are we doing something like this today?

At a minimum, the command should not map to more than one ParsedCommand.

Copy link
Contributor Author

@owenlin0 owenlin0 Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are actually, this is due to the fact that a shell command can be a sequence of piped commands - we have an existing unit test demonstrating this:

    fn handles_complex_bash_command_head() {
        let inner =
            "rg --version && node -v && pnpm -v && rg --files | wc -l && rg --files | head -n 40";
        assert_parsed(
            &vec_str(&["bash", "-lc", inner]),
            vec![
                // Expect commands in left-to-right execution order
                ParsedCommand::Search {
                    cmd: "rg --version".to_string(),
                    query: None,
                    path: None,
                },
                ParsedCommand::Unknown {
                    cmd: "node -v".to_string(),
                },
                ParsedCommand::Unknown {
                    cmd: "pnpm -v".to_string(),
                },
                ParsedCommand::Search {
                    cmd: "rg --files".to_string(),
                    query: None,
                    path: None,
                },
                ParsedCommand::Unknown {
                    cmd: "head -n 40".to_string(),
                },
            ],
        );
    }

pub diff: String,
}

#[derive(Serialize, Deserialize, Debug, Clone, PartialEq, JsonSchema, TS)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see now this was all moved over from codex-rs/app-server-protocol/src/protocol/common.rs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah sorry forgot to mention in PR description. everything in v1.rs was just moved, and I didn't touch how the legacy API works

);

v2_enum_from_core!(
pub enum ReviewDecision from codex_protocol::protocol::ReviewDecision {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not completely convinced this is the right shape...for ExecCommandApprovalParams for example, what if you want to approve for session, but with a strict prefix of command rather than the exact command itself?

Maybe we should make ReviewDecision a simpler enum of Approved | Denied | Abort (or to be more in line with MCP elicitations: accept | decline | cancel https://modelcontextprotocol.io/specification/draft/client/elicitation#response-actions) and then have other concepts like ApprovedForSession be separate fields on ExecCommandApprovalResponse?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like it, will make an update here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to:

{
  "id": 0,
  "result": {
    "acceptSettings": {
      "forSession": false
    },
    "decision": "accept"
  }
}

which makes it easy to extend acceptSettings to take in a command prefix or anything else we might want to add in the future.

(see PR description for the full flow)

@owenlin0 owenlin0 force-pushed the owen/v2_command_execution branch from 5ffaed1 to 54db61b Compare November 17, 2025 22:17
#[serde(tag = "type", rename_all = "camelCase")]
#[ts(tag = "type")]
#[ts(export_to = "v2/")]
pub enum ParsedCommand {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe CommandAction is a better name? It's not important that it was "parsed" from some other command: it's important that the command maps to a type of action that is easier for a user to reason about.

#[ts(export_to = "v2/")]
pub enum ParsedCommand {
Read {
cmd: String,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move from cmd to command in this API? cmd is too similar to cwd for my liking.

command: String,
aggregated_output: String,
exit_code: Option<i32>,
/// The command's working directory if not the default cwd for the agent.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says "if not" but this is not Option<PathBuf>? We should keep cwd as required, right?

/// Optional model-provided risk assessment describing the blocked command.
pub risk: Option<SandboxCommandAssessment>,
/// A best-effort parsing of the command to identify the type of command and its arguments.
pub parsed_cmd: Vec<ParsedCommand>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe command_actions or subcommands or actions instead?

#[ts(export_to = "v2/")]
pub struct CommandExecutionRequestApprovalResponse {
pub decision: ApprovalDecision,
pub accept_settings: Option<CommandExecutionRequestAcceptSettings>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we just flatten this and declare for_session on CommandExecutionRequestApprovalResponse?

Also, add #[default]?

Copy link
Contributor Author

@owenlin0 owenlin0 Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it as a nested field to make it more clear that these settings are only applied if decision == 'accept'

do you think for_session and other params in the future should apply to decline or cancel too?

use std::path::PathBuf;

fn shlex_join(tokens: &[String]) -> String {
pub fn shlex_join(tokens: &[String]) -> String {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should make a shlex_join.rs or park this in some other command utility (follow-up)

@owenlin0 owenlin0 enabled auto-merge (squash) November 18, 2025 00:18
@owenlin0 owenlin0 merged commit cecbd5b into main Nov 18, 2025
25 checks passed
@owenlin0 owenlin0 deleted the owen/v2_command_execution branch November 18, 2025 00:23
@github-actions github-actions bot locked and limited conversation to collaborators Nov 18, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants