[app-server] feat: add v2 command execution approval flow #6758

owenlin0 · 2025-11-17T04:15:05Z

This PR adds the API V2 version of the command‑execution approval flow for the shell tool.

This PR wires the new RPC (item/commandExecution/requestApproval, V2 only) and related events (item/started, item/completed, and item/commandExecution/delta, which are emitted in both V1 and V2) through the app-server
protocol. The new approval RPC is only sent when the user initiates a turn with the new turn/start API so we don't break backwards compatibility with VSCE.

The approach I took was to make as few changes to the Codex core as possible, leveraging existing EventMsg core events, and translating those in app-server. I did have to add additional fields to EventMsg::ExecCommandEndEvent to capture the command's input so that app-server can statelessly transform these events to a ThreadItem::CommandExecution item for the item/completed event.

Once we stabilize the API and it's complete enough for our partners, we can work on migrating the core to be aware of command execution items as a first-class concept.

Note: We'll need followup work to make sure these APIs work for the unified exec tool, but will wait til that's stable and landed before doing a pass on app-server.

Example payloads below:

{
  "method": "item/started",
  "params": {
    "item": {
      "aggregatedOutput": null,
      "command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'",
      "cwd": "/Users/owen/repos/codex/codex-rs",
      "durationMs": null,
      "exitCode": null,
      "id": "call_lNWWsbXl1e47qNaYjFRs0dyU",
      "parsedCmd": [
        {
          "cmd": "touch /tmp/should-trigger-approval",
          "type": "unknown"
        }
      ],
      "status": "inProgress",
      "type": "commandExecution"
    }
  }
}

{
  "id": 0,
  "method": "item/commandExecution/requestApproval",
  "params": {
    "itemId": "call_lNWWsbXl1e47qNaYjFRs0dyU",
    "parsedCmd": [
      {
        "cmd": "touch /tmp/should-trigger-approval",
        "type": "unknown"
      }
    ],
    "reason": "Need to create file in /tmp which is outside workspace sandbox",
    "risk": null,
    "threadId": "019a93e8-0a52-7fe3-9808-b6bc40c0989a",
    "turnId": "1"
  }
}

{
  "id": 0,
  "result": {
    "acceptSettings": {
      "forSession": false
    },
    "decision": "accept"
  }
}

{
  "params": {
    "item": {
      "aggregatedOutput": null,
      "command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'",
      "cwd": "/Users/owen/repos/codex/codex-rs",
      "durationMs": 224,
      "exitCode": 0,
      "id": "call_lNWWsbXl1e47qNaYjFRs0dyU",
      "parsedCmd": [
        {
          "cmd": "touch /tmp/should-trigger-approval",
          "type": "unknown"
        }
      ],
      "status": "completed",
      "type": "commandExecution"
    }
  }
}

owenlin0 · 2025-11-17T18:48:51Z

@codex review this

chatgpt-codex-connector · 2025-11-17T18:57:34Z

Codex Review: Didn't find any major issues. What shall we delve into next?

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

bolinfest

I'm still reviewing, but I wanted to publish some of my initial comments early.

codex-rs/app-server-protocol/src/protocol/v1.rs

bolinfest · 2025-11-17T20:00:07Z

codex-rs/app-server-protocol/src/protocol/v1.rs

+    /// Use to correlate this with [codex_core::protocol::ExecCommandBeginEvent]
+    /// and [codex_core::protocol::ExecCommandEndEvent].
+    pub call_id: String,
+    pub command: Vec<String>,


Today, we have various "shell tools" that have different APIs:

the "unified exec" approach has two tools:

exec_command the "command" is named cmd and is a single string

write_stdin streams input bytes to an existing session (we currently have no expectation of applying approvals to these)

the shell/container.exec/local_shell tool that takes command as a string[]

the shell_command tool I added in feat: shell_command tool #6510 takes command as a single string like exec_command, but there is no complementary write_stdin tool in this case

We also have some work in flight where the tool call is command: string like unified exec, but the approval is still tied to one or more string[] instances where each maps to an execve() invocation.

I'm enumerating these to be sure that ExecCommandApprovalParams makes sense for all these cases. /cc @nornagon-openai

what are your thoughts of the v2.rs shape here? https://github.com/openai/codex/pull/6758/files#diff-08e3876d082b8c0ed5b525feeb0d204b12b3731d4a1a0ed4f72455e819e4eea6R624

I'm thinking command as a string in the API is the most correct form (joining an underlying string[] using shlex if necessary), and for the interactive stuff aka write_stdin, we'd have to add a field to ThreadItem::CommandExecution that is something like a vector of strings/bytes/etc.

We also have some work in flight where the tool call is command: string like unified exec, but the approval is still tied to one or more string[] instances where each maps to an execve() invocation.

Interesting... I think in that case it is representable with one command exec approval request per execve() call (which means there may be multiple approvals per ThreadItem::CommandExecution item):

pub struct CommandExecutionRequestApprovalParams { pub thread_id: String, pub turn_id: String, pub item_id: String, /// Optional explanatory reason (e.g. request for network access). pub reason: Option<String>, /// Optional model-provided risk assessment describing the blocked command. pub risk: Option<SandboxCommandAssessment>, /// A best-effort parsing of the command to identify the type of command and its arguments. pub parsed_cmd: Vec<ParsedCommand>, /// NEW: execve invocation pub execve_invocation: Vec<String> }

Luckily it seems doable to expand the API to support these new exec use cases. @bolinfest Thoughts?

bolinfest · 2025-11-17T20:01:29Z

codex-rs/app-server-protocol/src/protocol/v1.rs

+    pub cwd: PathBuf,
+    pub reason: Option<String>,
+    pub risk: Option<SandboxCommandAssessment>,
+    pub parsed_cmd: Vec<ParsedCommand>,


I don't think this is appropriate here. Are we doing something like this today?

At a minimum, the command should not map to more than one ParsedCommand.

We are actually, this is due to the fact that a shell command can be a sequence of piped commands - we have an existing unit test demonstrating this:

fn handles_complex_bash_command_head() { let inner = "rg --version && node -v && pnpm -v && rg --files | wc -l && rg --files | head -n 40"; assert_parsed( &vec_str(&["bash", "-lc", inner]), vec![ // Expect commands in left-to-right execution order ParsedCommand::Search { cmd: "rg --version".to_string(), query: None, path: None, }, ParsedCommand::Unknown { cmd: "node -v".to_string(), }, ParsedCommand::Unknown { cmd: "pnpm -v".to_string(), }, ParsedCommand::Search { cmd: "rg --files".to_string(), query: None, path: None, }, ParsedCommand::Unknown { cmd: "head -n 40".to_string(), }, ], ); }

bolinfest · 2025-11-17T20:25:37Z

codex-rs/app-server-protocol/src/protocol/v1.rs

    pub diff: String,
 }

+#[derive(Serialize, Deserialize, Debug, Clone, PartialEq, JsonSchema, TS)]


Oh, I see now this was all moved over from codex-rs/app-server-protocol/src/protocol/common.rs?

oh yeah sorry forgot to mention in PR description. everything in v1.rs was just moved, and I didn't touch how the legacy API works

bolinfest · 2025-11-17T20:30:20Z