[apps] Add tool_suggest tool. by mzeng-openai · Pull Request #14287 · openai/codex

mzeng-openai · 2026-03-11T01:57:45Z

Add tool_suggest tool.
Move chatgpt/src/connectors.rs and core/src/connectors.rs into a dedicated mod so that we have all the logic and global cache in one place.
Update TUI app link view to support rendering the installation view for mcp elicitation.

## Summary Foundation PR only (base for PR #3). This PR contains the SDK runtime foundation and generated artifacts: - pinned runtime binary in `sdk/python/bin/` (`codex` or `codex.exe` by platform) - single maintenance script: `sdk/python/scripts/update_sdk_artifacts.py` - generated protocol/types artifacts under: - `sdk/python/src/codex_app_server/generated/protocol_types.py` - `sdk/python/src/codex_app_server/generated/schema_types.py` - `sdk/python/src/codex_app_server/generated/v2_all/*` - generation-contract test wiring (`tests/test_contract_generation.py`) ## Release asset behavior `update_sdk_artifacts.py` now: - selects latest release by channel (`--channel stable|alpha`) - resolves the correct asset for current OS/arch - extracts platform binary (`codex` on macOS/Linux, `codex.exe` on Windows) - keeps runtime on single pinned binary source in `sdk/python/bin/` ## Scope boundary - ✅ PR #2 = binary + generation pipeline + generated types foundation - ❌ PR #2 does **not** include examples/integration logic polish (that is PR #3) ## Validation - Ran: `python scripts/update_sdk_artifacts.py --channel stable` - Regenerated and committed resulting generated artifacts - Local tests pass on branch

Addresses #13586 This doesn't affect our CI scripts. It was user-reported. Summary - add `wiremock::ResponseTemplate` and `body_string_contains` imports behind `#[cfg(not(debug_assertions))]` in `codex-rs/core/tests/suite/view_image.rs` so release builds only pull the helpers they actually use

Replace the Unix shell lookup path in `codex-rs/core/src/shell.rs` to use `libc::getpwuid_r()` instead of `libc::getpwuid()` when resolving the current user's shell. Why: - `getpwuid()` can return pointers into libc-managed shared storage - on the musl static Linux build, concurrent callers can race on that storage - this matches the crash pattern reported in tmux/Linux sessions with parallel shell activity Refs: - Fixes #13842

There are some bug investigations that currently require us to ask users for their user ID even though they've already uploaded logs and session details via `/feedback`. This frustrates users and increases the time for diagnosis. This PR includes the ChatGPT user ID in the metadata uploaded for `/feedback` (both the TUI and app-server).

Summary - document output types for the various tool handlers and registry so the API exposes richer descriptions - update unified execution helpers and client tests to align with the new output metadata - clean up unused helpers across tool dispatch paths Testing - Not run (not requested)

## Summary - run the split stdout/stderr PTY test through the normal shell helper on every platform - use a Windows-native command string instead of depending on Python to emit split streams - assert CRLF line endings on Windows explicitly ## Why this fixes the flake The earlier PTY split-output test used a Python one-liner on Windows while the rest of the file exercised shell-command behavior. That made the test depend on runner-local Python availability and masked the real Windows shell output shape. Using a native cmd-compatible command and asserting the actual CRLF output makes the split stdout/stderr coverage deterministic on Windows runners.

We already have a type to represent the MCP tool output, reuse it instead of the custom McpHandlerOutput

Fixes a Codex app bug where quitting the app mid-run could leave the reopened thread stuck in progress and non-interactable. On cold thread resume, app-server could return an idle thread with a replayed turn still marked in progress. This marks incomplete replayed turns as interrupted unless the thread is actually active.

This updates the `skill-creator` sample skill to explicitly cover forward-testing as part of the skill authoring workflow. The guidance now treats subagent-based validation as a first-class step for complex or fragile skills, with an emphasis on preserving evaluation integrity and avoiding leaked context. The sample initialization script is also updated so newly created skills point authors toward forward-testing after validation. Together, these changes make the sample more opinionated about how skills should be iterated on once the initial implementation is complete. - Add new guidance to `SKILL.md` on protecting validation integrity, when to use subagents for forward-testing, and how to structure realistic test prompts without leaking expected answers. - Expand the skill creation workflow so iteration explicitly includes forward-testing for complex skills, including approval guidance for expensive or risky validation runs.

## Summary - add the OpenAI Docs skill under codex-rs/skills/src/assets/samples/openai-docs - include the skill metadata, assets, and GPT-5.4 upgrade reference files - exclude the test harness and test fixtures ## Testing - not run (skill-only asset copy)

## Summary - add ARC monitor support for MCP tool calls by serializing MCP approval requests into the ARC action shape and sending the relevant conversation/policy context to the `/api/codex/safety/arc` endpoint - route ARC outcomes back into MCP approval flow so `ask-user` falls back to a user prompt and `steer-model` blocks the tool call, with guardian/ARC tests covering the new request shape - update the TUI approval copy from “Approve Once” to “Allow” / “Allow for this session” and refresh the related snapshots --------- Co-authored-by: Fouad Matin <fouad@openai.com> Co-authored-by: Fouad Matin <169186268+fouad-openai@users.noreply.github.com>

Add forceRemoteSync to plugin/list. When it is set to True, we will sync the local plugin status with the remote one (backend-api/plugins/list).

#14232) …e package ## Summary This changes the Python SDK packaging model so we no longer commit `codex` binaries into `sdk/python`. Instead, published SDK builds now depend on a separate `codex-cli-bin` runtime package that carries the platform-specific `codex` binary. The SDK and runtime can be staged together with an exact version pin, so the published Python SDK still resolves to a Codex version we know is compatible. The SDK now resolves `codex` in this order: - `AppServerConfig.codex_bin` if explicitly set - installed `codex-cli-bin` runtime package There is no `PATH` fallback anymore. Published installs either use the pinned runtime or fail loudly, and local development uses an explicit `codex_bin` override when working from the repo. ## What changed - removed checked-in binaries from `sdk/python/src/codex_app_server/bin` - changed `AppServerClient` to resolve `codex` from: - explicit `AppServerConfig.codex_bin` - installed `codex-cli-bin` - kept `AppServerConfig.codex_bin` override support for local/dev use - added a new `sdk/python-runtime` package template for the pinned runtime - updated `scripts/update_sdk_artifacts.py` to stage releasable SDK/runtime packages instead of downloading binaries into the repo - made `codex-cli-bin` build as a platform-specific wheel - made `codex-cli-bin` wheel-only by rejecting `sdist` builds - updated docs/tests to match the new packaging flow and explicit local-dev contract ## Why Checking in six platform binaries made the repo much heavier and tied normal source changes to release artifacts. This keeps the compatibility guarantees we want, but moves them into packaging: - the published SDK can depend on an exact `codex-cli-bin==...` - the runtime package carries the platform-specific binary - users still get a pinned runtime - the repo no longer needs to store those binaries It also makes the runtime contract stricter and more predictable: - published installs never silently fall back to an arbitrary `codex` on `PATH` - local development remains supported through explicit `codex_bin` - `codex-cli-bin` is distributed as platform wheels only, which avoids unsafe source-distribution installs for a package that embeds a prebuilt binary ## Validation - ran targeted Python SDK tests: - `python3 -m pytest sdk/python/tests/test_artifact_workflow_and_binaries.py sdk/python/tests/test_client_rpc_methods.py sdk/python/tests/test_contract_generation.py` - exercised the staging flow with a local dummy binary to verify SDK/runtime staging end to end - verified the staged runtime package builds a platform-specific wheel (`Root-Is-Purelib: false`) rather than a universal `py3-none-any` wheel - added test coverage for the explicit-only runtime resolution model - added test coverage that `codex-cli-bin` rejects `sdist` builds --------- Co-authored-by: sdcoffey <stevendcoffey@gmail.com>

- add `model` and `reasoning_effort` to the `spawn_agent` schema so the values pass through - validate requested models against `model.model` and only check that the selected model supports the requested reasoning effort --------- Co-authored-by: Codex <noreply@openai.com>

image-gen feature will have the model saving to /tmp by default + at all times

…Reject (#14191) ## Summary This change makes `AskForApproval::Reject` gate correctly anywhere it appears inside otherwise-stable app-server protocol types. Previously, experimental gating for `approval_policy: Reject` was handled with request-specific logic in `ClientRequest` detection. That covered a few request params types, but it did not generalize to other nested uses such as `ProfileV2`, `Config`, `ConfigReadResponse`, or `ConfigRequirements`. This PR replaces that ad hoc handling with a generic nested experimental propagation mechanism. ## Testing seeing this when run app-server-test-client without experimental api enabled: ``` initialize response: InitializeResponse { user_agent: "codex-toy-app-server/0.0.0 (Mac OS 26.3.1; arm64) vscode/2.4.36 (codex-toy-app-server; 0.0.0)" } > { > "id": "50244f6a-270a-425d-ace0-e9e98205bde7", > "method": "thread/start", > "params": { > "approvalPolicy": { > "reject": { > "mcp_elicitations": false, > "request_permissions": true, > "rules": false, > "sandbox_approval": true > } > }, > "baseInstructions": null, > "config": null, > "cwd": null, > "developerInstructions": null, > "dynamicTools": null, > "ephemeral": null, > "experimentalRawEvents": false, > "mockExperimentalField": null, > "model": null, > "modelProvider": null, > "persistExtendedHistory": false, > "personality": null, > "sandbox": null, > "serviceName": null > } > } < { < "error": { < "code": -32600, < "message": "askForApproval.reject requires experimentalApi capability" < }, < "id": "50244f6a-270a-425d-ace0-e9e98205bde7" < } [verified] thread/start rejected approvalPolicy=Reject without experimentalApi ``` --------- Co-authored-by: celia-oai <celia@openai.com>

…de (#14236) Summary - drop `McpToolOutput` in favor of `CallToolResult`, moving its helpers to keep MCP tooling focused on the final result shape - wire the new schema definitions through code mode, context, handlers, and spec modules so MCP tools serialize the exact output shape expected by the model - extend code mode tests to cover multiple MCP call scenarios and ensure the serialized data matches the new schema - refresh JS runner helpers and protocol models alongside the schema changes Testing - Not run (not requested)

Summary - document that `@openai/code_mode` exposes `set_max_output_tokens_per_exec_call` and that `code_mode` truncates the final Rust-side output when the budget is exceeded - enforce the configured budget in the Rust tool runner, reusing truncation helpers so text-only outputs follow the unified-exec wrapper and mixed outputs still fit within the limit - ensure the new behavior is covered by a code-mode integration test and string spec update Testing - Not run (not requested)

- raise the sdk workflow job timeout from 10 to 15 minutes to reduce false cancellations near the current limit --------- Co-authored-by: Codex <noreply@openai.com>

- clarify the `close_agent` tool description so it nudges models to close agents they no longer need - keep the change scoped to the tool spec text only Co-authored-by: Codex <noreply@openai.com>

Summary - document how code-mode can import `output_text`/`output_image` and ensure `add_content` stays compatible - add a synthetic `@openai/code_mode` module that appends content items and validates inputs - cover the new behavior with integration tests for structured text and image outputs Testing - Not run (not requested)

### Summary This PR adds first-class ephemeral support to thread/fork, bringing it in line with thread/start. The goal is to support one-off completions on full forked threads without persisting them as normal user-visible threads. ### Testing

…ontacts, Reminders (#14155) Add additional macOS Sandbox Permissions levers for the following: - Launch Services - Contacts - Reminders

Pass more params to /compact. This should give us parity with the /responses endpoint to improve caching. I'm torn about the MCP await. Blocking will give us parity but it seems like we explicitly don't block on MCPs. Happy either way

adds support for transferring state across code mode invocations.

## Summary - add `skill_approval` to `RejectConfig` and the app-server v2 `AskForApproval::Reject` payload so skill-script prompts can be configured independently from sandbox and rule-based prompts - update Unix shell escalation to reject prompts based on the actual decision source, keeping prefix rules tied to `rules`, unmatched command fallbacks tied to `sandbox_approval`, and skill scripts tied to `skill_approval` - regenerate the affected protocol/config schemas and expand unit/integration coverage for the new flag and skill approval behavior

## What changed - keep the explicit stdin-close behavior after writing so the child still receives EOF deterministically - on Windows, stop using `python -c` for the round-trip assertion and instead run a native `cmd.exe` pipeline that reads one line from stdin with `set /p` and echoes it back - send ` ` on Windows so the stdin payload matches the platform-native line ending the shell reader expects ## Why this fixes flakiness The failing branch-local flake was not in `spawn_pipe_process` itself. The child exited cleanly, but the Windows ARM runner sometimes produced an empty stdout string when the test used Python as the stdin consumer. That makes the test sensitive to Python startup and stdin-close timing rather than the pipe primitive we actually want to validate. Switching the Windows path to a native `cmd.exe` reader keeps the assertion focused on our pipe behavior: bytes written to stdin should come back on stdout before EOF closes the process. The explicit ` ` write removes line-ending ambiguity on Windows. ## Scope - test-only - no production logic change

## Summary - update the guardian prompting - clarify the guardian rejection message so an action may still proceed if the user explicitly approves it after being informed of the risk ## Testing - cargo run on selected examples

canvrno-oai

Tested and working, very cool stuff. Added two comments for small issues, nothing blocking. During testing I had a small hiccup (of my own making) since the connector I tested with had been installed and disabled previously. We should think about how to handle this when the connector is disabled, there are a few ways to look at this.

canvrno-oai · 2026-03-11T17:42:25Z

codex-rs/core/src/codex.rs

    };
+    let auth = sess.services.auth_manager.auth().await;
+    let discoverable_connectors =
+        if turn_context.features.enabled(Feature::Apps) && turn_context.tools_config.search_tool {


Should this be gated by the Feature::ToolSuggest feature?

canvrno-oai · 2026-03-11T17:59:56Z

codex-rs/core/src/tools/handlers/tool_suggest.rs

+                    connectors::list_tool_suggest_discoverable_connectors_with_auth(
+                        &turn.config,
+                        auth.as_ref(),
+                        &[],


If we pass an empty set here the downstream accessible-connectors filtering will not filter out connectors that are already accessible.

@mzeng-openai is this a problem? the comment here seems true.

…ggest_2

mzeng-openai · 2026-03-11T22:55:55Z

codex-rs/chatgpt/src/connectors.rs

@@ -1,25 +1,18 @@
-use std::collections::HashMap;


Move most of the logic to the connectors mod so that we can share the logic and persist global connectors cache in one place.

…ggest_2

codex-rs/core/src/tools/handlers/tool_suggest.rs

sayan-oai · 2026-03-12T01:21:46Z

codex-rs/core/templates/search_tool/tool_suggest_description.md

+   - `tool_type`: `connector` or `plugin`
+   - `action_type`: `install` or `enable`


right now only connector + install is supported. i know we will fast-follow with others, but should we include those in the tool description now? dont want to confuse the model.

updated to support plugins.

codex-rs/core/templates/search_tool/tool_suggest_description.md

codex-rs/core/src/tools/handlers/tool_suggest.rs

codex-rs/tui/src/bottom_pane/mcp_server_elicitation.rs

sayan-oai · 2026-03-12T02:26:50Z

codex-rs/tui/src/bottom_pane/app_link_view.rs

+        };
+        self.app_event_tx.send(AppEvent::SubmitThreadOp {
+            thread_id: target.thread_id,
+            op: Op::ResolveElicitation {


maybe nit: should app_link_view know about resolving elicitations, request_id, etc? its mostly rendering/input handling, should we just return an outcome and consolidate the elicitation parsing + resolution in a single spot in the bottom_pane mod?

…ggest_2

sayan-oai · 2026-03-12T02:41:33Z

codex-rs/core/src/tools/handlers/tool_suggest.rs

+        turn_id: Some(turn_id),
+        server_name: CODEX_APPS_MCP_SERVER_NAME.to_string(),
+        request: McpServerElicitationRequest::Form {
+            meta: Some(build_tool_suggestion_meta(


will this be cleaned up in the follow up you mentioned here?

it doesnt seem great to shove all the tool suggestion stuff into this meta field like i mentioned.

sayan-oai

approving so we can start testing. i think the elicitation parsing and resolution should be consolidated, right now it feels like the TUI needs to know too much about how things are passed and structured.

shaqayeq-oai and others added 30 commits March 10, 2026 01:00

Reuse McpToolOutput in McpHandler (#14229)

46e6661

We already have a type to represent the MCP tool output, reuse it instead of the custom McpHandlerOutput

Add OpenAI Docs skill (#13596)

01e2c3b

## Summary - add the OpenAI Docs skill under codex-rs/skills/src/assets/samples/openai-docs - include the skill metadata, assets, and GPT-5.4 upgrade reference files - exclude the test harness and test fixtures ## Testing - not run (skill-only asset copy)

feat: Allow sync with remote plugin status. (#14176)

2544bd0

Add forceRemoteSync to plugin/list. When it is set to True, we will sync the local plugin status with the remote one (backend-api/plugins/list).

Add granular metrics for cloud requirements load (#14108)

22d0aea

unifying all image saves to /tmp to bug-proof (#14149)

2893476

image-gen feature will have the model saving to /tmp by default + at all times

Load agent metadata from role files (#14177)

b732287

Increase sdk workflow timeout to 15 minutes (#14252)

44bfd2f

- raise the sdk workflow job timeout from 10 to 15 minutes to reduce false cancellations near the current limit --------- Co-authored-by: Codex <noreply@openai.com>

Clarify close_agent tool description (#14269)

bf936fa

- clarify the `close_agent` tool description so it nudges models to close agents they no longer need - keep the change scoped to the tool spec text only Co-authored-by: Codex <noreply@openai.com>

feat: Add additional macOS Sandbox Permissions for Launch Services, C…

de2a73c

…ontacts, Reminders (#14155) Add additional macOS Sandbox Permissions levers for the following: - Launch Services - Contacts - Reminders

Pass more params to compaction (#14247)

f8ef154

Pass more params to /compact. This should give us parity with the /responses endpoint to improve caching. I'm torn about the MCP await. Blocking will give us parity but it seems like we explicitly don't block on MCPs. Happy either way

Add store/load support for code mode (#14259)

18199d4

adds support for transferring state across code mode invocations.

prompt changes to guardian (#14263)

16daab6

## Summary - update the guardian prompting - clarify the guardian rejection message so an action may still proceed if the user explicitly approves it after being informed of the risk ## Testing - cargo run on selected examples

canvrno-oai approved these changes Mar 11, 2026

View reviewed changes

mzeng-openai requested review from pakrym-oai and sayan-oai March 11, 2026 18:23

bolinfest force-pushed the main branch from 402ca6c to f5bb338 Compare March 11, 2026 19:49

mzeng-openai added 2 commits March 11, 2026 15:48

Merge branch 'main' of github.com:openai/codex into dev/mzeng/tool_su…

7a04154

…ggest_2

update

ef99210

mzeng-openai commented Mar 11, 2026

View reviewed changes

mzeng-openai added 4 commits March 11, 2026 16:00

update

c0a1bfa

Merge branch 'main' of github.com:openai/codex into dev/mzeng/tool_su…

9a77aa5

…ggest_2

update

9441037

update

976f491