chore: sync fork with latest OpenAI Codex upstream by ARCJ137442 · Pull Request #33 · exomind-team/codex

ARCJ137442 · 2026-03-07T11:47:05Z

Summary

Sync the Exomind fork forward by merging upstream/main into main lineage on a dedicated branch.

What changed

Merged openai/codex upstream/main into the fork sync branch
Preserved existing Exomind queue / steer / repeating-message behavior during the merge
Fixed one compatibility break introduced by the upstream merge:
- an upstream-added UserMessage test initializer in codex-rs/tui/src/chatwidget/tests.rs was missing Exomind fork fields (repeat_mode, steer_mode, enqueue_seq)

Why this PR exists

The fork was on a moving upstream-main-derived baseline but had fallen behind current openai/codex main. This PR establishes a fresh synchronized checkpoint before Exomind-specific version-line work continues.

Tracking issue: #32

Local validation

Passed locally using G:\cargo-target-sync-upstream as the target dir:

cargo fmt --all --check
cargo test -p codex-tui
cargo test -p codex-cli
cargo build --bin codex
G:\cargo-target-sync-upstream\debug\codex.exe --version

Notes

The built binary still reports codex-cli 0.0.0
Version-line / manifest / release-workflow changes should be handled as follow-up work after this sync checkpoint

To save memory

`/feedback` uploads can include `codex-logs.log` from the in-memory feedback logger path. That logger was emitting level + message without a timestamp, which made some uploaded logs much harder to inspect. This change makes the feedback logger use an explicit timer so feedback-captured log lines include timestamps consistently. This is not Windows-specific code. The bug showed up in Windows reports because those uploads were hitting the feedback-buffer path more often, while Linux/macOS reports were typically coming from the SQLite feedback export, which already prefixes timestamps. Here's an example of a log that is missing the timestamps: ``` TRACE app-server request: getAuthStatus TRACE app-server request: model/list INFO models cache: evaluating cache eligibility INFO models cache: attempting load_fresh INFO models cache: loaded cache file INFO models cache: cache version mismatch INFO models cache: no usable cache entry DEBUG INFO models cache: cache miss, fetching remote models TRACE windows::current_platform is called TRACE Returning Info { os_type: Windows, version: Semantic(10, 0, 26200), edition: Some("Windows 11 Professional"), codename: None, bitness: X64, architecture: Some("x86_64") } ```

## Summary - add one-time session recovery in `RmcpClient` for streamable HTTP MCP `404` session expiry - rebuild the transport and retry the failed operation once after reinitializing the client state - extend the test server and integration coverage for `404`, `401`, single-retry, and non-session failure scenarios ## Testing - just fmt - cargo test -p codex-rmcp-client (the post-rebase run lost its final summary in the terminal; the suite had passed earlier before the rebase) - just fix -p codex-rmcp-client

avoid DB explosion. This is a temp solution

## Summary Today `SandboxPermissions::requires_additional_permissions()` does not actually mean "is `WithAdditionalPermissions`". It returns `true` for any non-default sandbox override, including `RequireEscalated`. That broad behavior is relied on in multiple `main` callsites. The naming is security-sensitive because `SandboxPermissions` is used on shell-like tool calls to tell the executor how a single command should relate to the turn sandbox: - `UseDefault`: run with the turn sandbox unchanged - `RequireEscalated`: request execution outside the sandbox - `WithAdditionalPermissions`: stay sandboxed but widen permissions for that command only ## Problem The old helper name reads as if it only applies to the `WithAdditionalPermissions` variant. In practice it means "this command requested any explicit sandbox override." That ambiguity made it easy to read production checks incorrectly and made the guardian change look like a standalone `main` fix when it is not. On `main` today: - `shell` and `unified_exec` intentionally reject any explicit `sandbox_permissions` request unless approval policy is `OnRequest` - `exec_policy` intentionally treats any explicit sandbox override as prompt-worthy in restricted sandboxes - tests intentionally serialize both `RequireEscalated` and `WithAdditionalPermissions` as explicit sandbox override requests So changing those callsites from the broad helper to a narrow `WithAdditionalPermissions` check would be a behavior change, not a pure cleanup. ## What This PR Does - documents `SandboxPermissions` as a per-command sandbox override, not a generic permissions bag - adds `requests_sandbox_override()` for the broad meaning: anything except `UseDefault` - adds `uses_additional_permissions()` for the narrow meaning: only `WithAdditionalPermissions` - keeps `requires_additional_permissions()` as a compatibility alias to the broad meaning for now - updates the current broad callsites to use the accurately named broad helper - adds unit coverage that locks in the semantics of all three helpers ## What This PR Does Not Do This PR does not change runtime behavior. That is intentional. --------- Co-authored-by: Codex <noreply@openai.com>

…ake (openai#13770) This fixes a flaky `turn_start_shell_zsh_fork_executes_command_v2` test. The interrupt path can race with the follow-up `/responses` request that reports the aborted tool call, so the test now allows that extra no-op response instead of assuming there will only ever be one request. The assertions still stay focused on the behavior the test actually cares about: starting the zsh-forked command correctly. Testing: - `just fmt` - `cargo test -p codex-app-server --test all suite::v2::turn_start_zsh_fork::turn_start_shell_zsh_fork_executes_command_v2 -- --exact --nocapture`

…enai#13630) ### Summary This adds turn-level latency metrics for the first model output and the first completed agent message. - `codex.turn.ttft.duration_ms` starts at turn start and records on the first output signal we see from the model. That includes normal assistant text, reasoning deltas, and non-text outputs like tool-call items. - `codex.turn.ttfm.duration_ms` also starts at turn start, but it records when the first agent message finishes streaming rather than when its first delta arrives. ### Implementation notes The timing is tracked in codex-core, not app-server, so the definition stays consistent across CLI, TUI, and app-server clients. I reused the existing turn lifecycle boundary that already drives `codex.turn.e2e_duration_ms`, stored the turn start timestamp in turn state, and record each metric once per turn. I also wired the new metric names into the OTEL runtime metrics summary so they show up in the same in-memory/debug snapshot path as the existing timing metrics.

## Summary - move sqlite log reads and writes onto a dedicated `logs_1.sqlite` database to reduce lock contention with the main state DB - add a dedicated logs migrator and route `codex-state-logs` to the new database path - leave the old `logs` table in the existing state DB untouched for now ## Testing - just fmt - cargo test -p codex-state --------- Co-authored-by: Codex <noreply@openai.com>

This branch: * Avoid flushing DB when not necessary * Filter events for which we perfom an `upsert` into the DB * Add a dedicated update function of the `thread:updated_at` that is lighter This should significantly reduce the DB lock contention. If it is not sufficient, we can de-sync the flush of the DB for `updated_at`

#### What Add structured `@plugin` parsing and TUI support for plugin mentions. - Core: switch from plain-text `@display_name` parsing to structured `plugin://...` mentions via `UserInput::Mention` and `[$...](plugin://...)` links in text, same pattern as apps/skills. - TUI: add plugin mention popup, autocomplete, and chips when typing `$`. Load plugin capability summaries and feed them into the composer; plugin mentions appear alongside skills and apps. - Generalize mention parsing to a sigil parameter, still defaults to `$` <img width="797" height="119" alt="image" src="https://github.com/user-attachments/assets/f0fe2658-d908-4927-9139-73f850805ceb" /> Builds on openai#13510. Currently clients have to build their own `id` via `plugin@marketplace` and filter plugins to show by `enabled`, but we will add `id` and `available` as fields returned from `plugin/list` soon. ####Tests Added tests, verified locally.

## Summary - reduce the SQLite-backed log retention window from 90 days to 10 days ## Testing - just fmt - cargo test -p codex-state Co-authored-by: Codex <noreply@openai.com>

…n file (openai#13780) At over 7,000 lines, `codex-rs/core/src/config/mod.rs` was getting a bit unwieldy. This PR does the same type of move as openai#12957 to put unit tests in their own file, though I decided `config_tests.rs` is a more intuitive name than `mod_tests.rs`. Ultimately, I'll codemod the rest of the codebase to follow suit, but I want to do it in stages to reduce merge conflicts for people.

openai#13783) This is analogous to openai#13780.

I believe this broke in openai#13772.

## Summary - reject the global `*` domain pattern in proxy allow/deny lists and managed constraints introduced for testing earlier - keep exact hosts plus scoped wildcards like `*.example.com` and `**.example.com` - update docs and regression tests for the new invalid-config behavior

Publish CLI releases to winget. Uses https://github.com/vedantmgoyal9/winget-releaser to greatly reduce boilerplate needed to create winget-pkgs manifets

## Summary This is a structural cleanup of `codex-otel` to make the ownership boundaries a lot clearer. For example, previously it was quite confusing that `OtelManager` which emits log + trace event telemetry lived under `codex-rs/otel/src/traces/`. Also, there were two places that defined methods on OtelManager via `impl OtelManager` (`lib.rs` and `otel_manager.rs`). What changed: - move the `OtelProvider` implementation into `src/provider.rs` - move `OtelManager` and session-scoped event emission into `src/events/otel_manager.rs` - collapse the shared log/trace event helpers into `src/events/shared.rs` - pull target classification into `src/targets.rs` - move `traceparent_context_from_env()` into `src/trace_context.rs` - keep `src/otel_provider.rs` as a compatibility shim for existing imports - update the `codex-otel` README to reflect the new layout ## Why `lib.rs` and `otel_provider.rs` were doing too many different jobs at once: provider setup, export routing, trace-context helpers, and session event emission all lived together. This refactor separates those concerns without trying to change the behavior of the crate. The goal is to make future OTEL work easier to reason about and easier to review. ## Notes - no intended behavior change - `OtelManager` remains the session-scoped event emitter in this PR - the `otel_provider` shim keeps downstream churn low while the internals move around ## Validation - `just fmt` - `cargo test -p codex-otel` - `just fix -p codex-otel`

## Problem Browser login failures historically leave support with an incomplete picture. HARs can show that the browser completed OAuth and reached the localhost callback, but they do not explain why the native client failed on the final `/oauth/token` exchange. Direct `codex login` also relied mostly on terminal stderr and the browser error page, so even when the login crate emitted better sign-in diagnostics through TUI or app-server flows, the one-shot CLI path still did not leave behind an easy artifact to collect. ## Mental model This implementation treats the browser page, the returned `io::Error`, and the normal structured log as separate surfaces with different safety requirements. The browser page and returned error preserve the detail that operators need to diagnose failures. The structured log stays narrower: it records reviewed lifecycle events, parsed safe fields, and redacted transport errors without becoming a sink for secrets or arbitrary backend bodies. Direct `codex login` now adds a fourth support surface: a small file-backed log at `codex-login.log` under the configured `log_dir`. That artifact carries the same login-target events as the other entrypoints without changing the existing stderr/browser UX. ## Non-goals This does not add auth logging to normal runtime requests, and it does not try to infer precise transport root causes from brittle string matching. The scope remains the browser-login callback flow in the `login` crate plus a direct-CLI wrapper that persists those events to disk. This also does not try to reuse the TUI logging stack wholesale. The TUI path initializes feedback, OpenTelemetry, and other session-oriented layers that are useful for an interactive app but unnecessary for a one-shot login command. ## Tradeoffs The implementation favors fidelity for caller-visible errors and restraint for persistent logs. Parsed JSON token-endpoint errors are logged safely by field. Non-JSON token-endpoint bodies remain available to the returned error so CLI and browser surfaces still show backend detail. Transport errors keep their real `reqwest` message, but attached URLs are surgically redacted. Custom issuer URLs are sanitized before logging. On the CLI side, the code intentionally duplicates a narrow slice of the TUI file-logging setup instead of sharing the full initializer. That keeps `codex login` easy to reason about and avoids coupling it to interactive-session layers that the command does not need. ## Architecture The core auth behavior lives in `codex-rs/login/src/server.rs`. The callback path now logs callback receipt, callback validation, token-exchange start, token-exchange success, token-endpoint non-2xx responses, and transport failures. App-server consumers still use this same login-server path via `run_login_server(...)`, so the same instrumentation benefits TUI, Electron, and VS Code extension flows. The direct CLI path in `codex-rs/cli/src/login.rs` now installs a small file-backed tracing layer for login commands only. That writes `codex-login.log` under `log_dir` with login-specific targets such as `codex_cli::login` and `codex_login::server`. ## Observability The main signals come from the `login` crate target and are intentionally scoped to sign-in. Structured logs include redacted issuer URLs, redacted transport errors, HTTP status, and parsed token-endpoint fields when available. The callback-layer log intentionally avoids `%err` on token-endpoint failures so arbitrary backend bodies do not get copied into the normal log file. Direct `codex login` now leaves a durable artifact for both failure and success cases. Example output from the new file-backed CLI path: Failing callback: ```text 2026-03-06T22:08:54.143612Z INFO codex_cli::login: starting browser login flow 2026-03-06T22:09:03.431699Z INFO codex_login::server: received login callback path=/auth/callback has_code=false has_state=true has_error=true state_valid=true 2026-03-06T22:09:03.431745Z WARN codex_login::server: oauth callback returned error error_code="access_denied" has_error_description=true ``` Succeeded callback and token exchange: ```text 2026-03-06T22:09:14.065559Z INFO codex_cli::login: starting browser login flow 2026-03-06T22:09:36.431678Z INFO codex_login::server: received login callback path=/auth/callback has_code=true has_state=true has_error=false state_valid=true 2026-03-06T22:09:36.436977Z INFO codex_login::server: starting oauth token exchange issuer=https://auth.openai.com/ redirect_uri=http://localhost:1455/auth/callback 2026-03-06T22:09:36.685438Z INFO codex_login::server: oauth token exchange succeeded status=200 OK ``` ## Tests - `cargo test -p codex-login` - `cargo clippy -p codex-login --tests -- -D warnings` - `cargo test -p codex-cli` - `just bazel-lock-update` - `just bazel-lock-check` - manual direct `codex login` smoke tests for both a failing callback and a successful browser login --------- Co-authored-by: Codex <noreply@openai.com>

…#13695) Enhance pty utils: * Support closing stdin * Separate stderr and stdout streams to allow consumers differentiate them * Provide compatibility helper to merge both streams back into combined one * Support specifying terminal size for pty, including on-demand resizes while process is already running * Support terminating the process while still consuming its outputs

## Summary Clarify the `js_repl` prompt guidance around persistent bindings and redeclaration recovery. This updates the generated `js_repl` instructions in `core/src/project_doc.rs` to prefer this order when a name is already bound: 1. Reuse the existing binding 2. Reassign a previously declared `let` 3. Pick a new descriptive name 4. Use `{ ... }` only for short-lived scratch scope 5. Reset the kernel only when a clean state is actually needed The prompt now also explicitly warns against wrapping an entire cell in block scope when the goal is to reuse names across later cells. ## Why The previous wording still left too much room for low-value workarounds like whole-cell block wrapping. In downstream browser rollouts, that pattern was adding tokens and preventing useful state reuse across `js_repl` cells. This change makes the preferred behavior more explicit without changing runtime semantics. ## Scope - Prompt/documentation change only - No runtime behavior changes - Updates the matching string-backed `project_doc` tests

## Summary Remove `docs/auth-login-logging-plan.md`. ## Why The document was a temporary planning artifact. The durable rationale for the auth-login diagnostics work now lives in the code comments, tests, PR context, and existing implementation notes, so keeping the standalone plan doc adds duplicate maintenance surface. ## Testing - not run (docs-only deletion) Co-authored-by: Codex <noreply@openai.com>

…guage in config.toml (openai#13434) ## Why `SandboxPolicy` currently mixes together three separate concerns: - parsing layered config from `config.toml` - representing filesystem sandbox state - carrying basic network policy alongside filesystem choices That makes the existing config awkward to extend and blocks the new TOML proposal where `[permissions]` becomes a table of named permission profiles selected by `default_permissions`. (The idea is that if `default_permissions` is not specified, we assume the user is opting into the "traditional" way to configure the sandbox.) This PR adds the config-side plumbing for those profiles while still projecting back to the legacy `SandboxPolicy` shape that the current macOS and Linux sandbox backends consume. It also tightens the filesystem profile model so scoped entries only exist for `:project_roots`, and so nested keys must stay within a project root instead of using `.` or `..` traversal. This drops support for the short-lived `[permissions.network]` in `config.toml` because now that would be interpreted as a profile named `network` within `[permissions]`. ## What Changed - added `PermissionsToml`, `PermissionProfileToml`, `FilesystemPermissionsToml`, and `FilesystemPermissionToml` so config can parse named profiles under `[permissions.<profile>.filesystem]` - added top-level `default_permissions` selection, validation for missing or unknown profiles, and compilation from a named profile into split `FileSystemSandboxPolicy` and `NetworkSandboxPolicy` values - taught config loading to choose between the legacy `sandbox_mode` path and the profile-based path without breaking legacy users - introduced `codex-protocol::permissions` for the split filesystem and network sandbox types, and stored those alongside the legacy projected `sandbox_policy` in runtime `Permissions` - modeled `FileSystemSpecialPath` so only `ProjectRoots` can carry a nested `subpath`, matching the intended config syntax instead of allowing invalid states for other special paths - restricted scoped filesystem maps to `:project_roots`, with validation that nested entries are non-empty descendant paths and cannot use `.` or `..` to escape the project root - kept existing runtime consumers working by projecting `FileSystemSandboxPolicy` back into `SandboxPolicy`, with an explicit error for profiles that request writes outside the workspace root - loaded proxy settings from top-level `[network]` - regenerated `core/config.schema.json` ## Verification - added config coverage for profile deserialization, `default_permissions` selection, top-level `[network]` loading, network enablement, rejection of writes outside the workspace root, rejection of nested entries for non-`:project_roots` special paths, and rejection of parent-directory traversal in `:project_roots` maps - added protocol coverage for the legacy bridge rejecting non-workspace writes ## Docs - update the Codex config docs on developers.openai.com/codex to document named `[permissions.<profile>]` entries, `default_permissions`, scoped `:project_roots` syntax, the descendant-path restriction for nested `:project_roots` entries, and top-level `[network]` proxy configuration --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13434). * openai#13453 * openai#13452 * openai#13451 * openai#13449 * openai#13448 * openai#13445 * openai#13440 * openai#13439 * __->__ openai#13434

- add experimental_realtime_ws_startup_context to override or disable realtime websocket startup context - preserve generated startup context when unset and cover the new override paths in tests

…uilding Codex (openai#13814) I mainly use the devcontainer to be able to run `cargo clippy --tests` locally for Linux. We still need to make it possible to run clippy from Bazel so I don't need to do this!

## Summary This is a purely mechanical refactor of `OtelManager` -> `SessionTelemetry` to better convey what the struct is doing. No behavior change. ## Why `OtelManager` ended up sounding much broader than what this type actually does. It doesn't manage OTEL globally; it's the session-scoped telemetry surface for emitting log/trace events and recording metrics with consistent session metadata (`app_version`, `model`, `slug`, `originator`, etc.). `SessionTelemetry` is a more accurate name, and updating the call sites makes that boundary a lot easier to follow. ## Validation - `just fmt` - `cargo test -p codex-otel` - `cargo test -p codex-core`

1. Add a synced curated plugin marketplace and include it in marketplace discovery. 2. Expose optional plugin.json interface metadata in plugin/list 3. Tighten plugin and marketplace path handling using validated absolute paths. 4. Let manifests override skill, MCP, and app config paths. 5. Restrict plugin enablement/config loading to the user config layer so plugin enablement is at global level

…i#13791) ## Summary - Treat skill scripts with no permission profile, or an explicitly empty one, as permissionless and run them with the turn's existing sandbox instead of forcing an exec approval prompt. - Keep the approval flow unchanged for skills that do declare additional permissions. - Update the skill approval tests to assert that permissionless skill scripts do not prompt on either the initial run or a rerun. ## Why Permissionless skills should inherit the current turn sandbox directly. Prompting for exec approval in that case adds friction without granting any additional capability.

Previously, we could only configure whether web search was on/off. This PR enables sending along a web search config, which includes all the stuff responsesapi supports: filters, location, etc.

…penai#13640) * Add an ability to stream stdin, stdout, and stderr * Streaming of stdout and stderr has a configurable cap for total amount of transmitted bytes (with an ability to disable it) * Add support for overriding environment variables * Add an ability to terminate running applications (using `command/exec/terminate`) * Add TTY/PTY support, with an ability to resize the terminal (using `command/exec/resize`)

…porter (openai#13819) This fixes a schema export bug where two different `WebSearchAction` types were getting merged under the same name in the app-server v2 JSON schema bundle. The problem was that v2 thread items use the app-server API's `WebSearchAction` with camelCase variants like `openPage`, while `ThreadResumeParams.history` and `RawResponseItemCompletedNotification.item` pull in the upstream `ResponseItem` graph, which uses the Responses API snake_case shape like `open_page`. During bundle generation we were flattening nested definitions into the v2 namespace by plain name, so the later definition could silently overwrite the earlier one. That meant clients generating code from the bundled schema could end up with the wrong `WebSearchAction` definition for v2 thread history. In practice this shows up on web search items reconstructed from rollout files with persisted extended history. This change does two things: - Gives the upstream Responses API schema a distinct JSON schema name: `ResponsesApiWebSearchAction` - Makes namespace-level schema definition collisions fail loudly instead of silently overwriting

…3804) ## Summary - resolve trust roots by inspecting `.git` entries on disk instead of spawning `git rev-parse --git-common-dir` - keep regular repo and linked-worktree trust inheritance behavior intact - add a synthetic regression test that proves worktree trust resolution works without a real git command ## Testing - `just fmt` - `cargo test -p codex-core resolve_root_git_project_for_trust` - `cargo clippy -p codex-core --all-targets -- -D warnings` - `cargo test -p codex-core` (fails in this environment on unrelated managed-config `DangerFullAccess` tests in `codex::tests`, `tools::js_repl::tests`, and `unified_exec::tests`)

## Summary - treat `requirements.toml` `allowed_domains` and `denied_domains` as managed network baselines for the proxy - in restricted modes by default, build the effective runtime policy from the managed baseline plus user-configured allowlist and denylist entries, so common hosts can be pre-approved without blocking later user expansion - add `experimental_network.managed_allowed_domains_only = true` to pin the effective allowlist to managed entries, ignore user allowlist additions, and hard-deny non-managed domains without prompting - apply `managed_allowed_domains_only` anywhere managed network enforcement is active, including full access, while continuing to respect denied domains from all sources - add regression coverage for merged-baseline behavior, managed-only behavior, and full-access managed-only enforcement ## Behavior Assuming `requirements.toml` defines both `experimental_network.allowed_domains` and `experimental_network.denied_domains`. ### Default mode - By default, the effective allowlist is `experimental_network.allowed_domains` plus user or persisted allowlist additions. - By default, the effective denylist is `experimental_network.denied_domains` plus user or persisted denylist additions. - Allowlist misses can go through the network approval flow. - Explicit denylist hits and local or private-network blocks are still hard-denied. - When `experimental_network.managed_allowed_domains_only = true`, only managed `allowed_domains` are respected, user allowlist additions are ignored, and non-managed domains are hard-denied without prompting. - Denied domains continue to be respected from all sources. ### Full access - With managed requirements present, the effective allowlist is pinned to `experimental_network.allowed_domains`. - With managed requirements present, the effective denylist is pinned to `experimental_network.denied_domains`. - There is no allowlist-miss approval path in full access. - Explicit denylist hits are hard-denied. - `experimental_network.managed_allowed_domains_only = true` now also applies in full access, so managed-only behavior remains in effect anywhere managed network enforcement is active.

## Why `openai#13434` introduces split `FileSystemSandboxPolicy` and `NetworkSandboxPolicy`, but the runtime still made most execution-time sandbox decisions from the legacy `SandboxPolicy` projection. That projection loses information about combinations like unrestricted filesystem access with restricted network access. In practice, that means the runtime can choose the wrong platform sandbox behavior or set the wrong network-restriction environment for a command even when config has already separated those concerns. This PR carries the split policies through the runtime so sandbox selection, process spawning, and exec handling can consult the policy that actually matters. ## What changed - threaded `FileSystemSandboxPolicy` and `NetworkSandboxPolicy` through `TurnContext`, `ExecRequest`, sandbox attempts, shell escalation state, unified exec, and app-server exec overrides - updated sandbox selection in `core/src/sandboxing/mod.rs` and `core/src/exec.rs` to key off `FileSystemSandboxPolicy.kind` plus `NetworkSandboxPolicy`, rather than inferring behavior only from the legacy `SandboxPolicy` - updated process spawning in `core/src/spawn.rs` and the platform wrappers to use `NetworkSandboxPolicy` when deciding whether to set `CODEX_SANDBOX_NETWORK_DISABLED` - kept additional-permissions handling and legacy `ExternalSandbox` compatibility projections aligned with the split policies, including explicit user-shell execution and Windows restricted-token routing - updated callers across `core`, `app-server`, and `linux-sandbox` to pass the split policies explicitly ## Verification - added regression coverage in `core/tests/suite/user_shell_cmd.rs` to verify `RunUserShellCommand` does not inherit `CODEX_SANDBOX_NETWORK_DISABLED` from the active turn - added coverage in `core/src/exec.rs` for Windows restricted-token sandbox selection when the legacy projection is `ExternalSandbox` - updated Linux sandbox coverage in `linux-sandbox/tests/suite/landlock.rs` to exercise the split-policy exec path - verified the current PR state with `just clippy` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13439). * openai#13453 * openai#13452 * openai#13451 * openai#13449 * openai#13448 * openai#13445 * openai#13440 * __->__ openai#13439 --------- Co-authored-by: viyatb-oai <viyatb@openai.com>

…ai#13440) ## Why `openai#13434` and `openai#13439` introduce split filesystem and network policies, but the only code that could answer basic filesystem questions like "is access effectively unrestricted?" or "which roots are readable and writable for this cwd?" still lived on the legacy `SandboxPolicy` path. That would force later backends to either keep projecting through `SandboxPolicy` or duplicate path-resolution logic. This PR moves those queries onto `FileSystemSandboxPolicy` itself so later runtime and platform changes can consume the split policy directly. ## What changed - added `FileSystemSandboxPolicy` helpers for full-read/full-write checks, platform-default reads, readable roots, writable roots, and explicit unreadable roots resolved against a cwd - added a shared helper for the default read-only carveouts under writable roots so the legacy and split-policy paths stay aligned - added protocol coverage for full-access detection and derived readable, writable, and unreadable roots ## Verification - added protocol coverage in `protocol/src/protocol.rs` and `protocol/src/permissions.rs` for full-root access and derived filesystem roots - verified the current PR state with `just clippy` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13440). * openai#13453 * openai#13452 * openai#13451 * openai#13449 * openai#13448 * openai#13445 * __->__ openai#13440 * openai#13439 --------- Co-authored-by: viyatb-oai <viyatb@openai.com>

…openai#13816) ## Summary - distinguish reject-policy handling for prefix-rule approvals versus sandbox approvals in Unix shell escalation - keep prompting for skill-script execution when `rules=true` but `sandbox_approval=false`, instead of denying the command up front - add regression coverage for both skill-script reject-policy paths in `codex-rs/core/tests/suite/skill_approval.rs`

…i#13833) ## Summary - require windowsSandbox/setupStart.cwd to be an AbsolutePathBuf - reject relative cwd values at request parsing instead of normalizing them later in the setup flow - add RPC-layer coverage for relative cwd rejection and update the checked-in protocol schemas/docs ## Why windowsSandbox/setupStart was carrying the client-provided cwd as a raw PathBuf for command_cwd while config derivation normalized the same value into an absolute policy_cwd. That left room for relative-path ambiguity in the setup path, especially for inputs like cwd: "repo". Making the RPC accept only absolute paths removes that split entirely: the handler now receives one already-validated absolute path and uses it for both config derivation and setup. This keeps the trust model unchanged. Trusted clients could already choose the session cwd; this change is only about making the setup RPC reject relative paths so command_cwd and policy_cwd cannot diverge. ## Testing - cargo test -p codex-app-server windows_sandbox_setup (run locally by user) - cargo test -p codex-app-server-protocol windows_sandbox (run locally by user)

Addresses feature request openai#13660 Adds new option to `/statusline` so the status line can display "fast on" or "fast off" Summary - introduce a `FastMode` status-line item so `/statusline` can render explicit `Fast on`/`Fast off` text for the service tier - wire the item into the picker metadata and resolve its string from `ChatWidget` without adding any unrelated `thread-name` logic or storage changes - ensure the refresh paths keep the cached footer in sync when the service tier (fast mode) changes Testing - Manually tested Here's what it looks like when enabled: <img width="366" height="75" alt="image" src="https://github.com/user-attachments/assets/7f992d2b-6dab-49ed-aa43-ad496f56f193" />

## Why `apply_patch` safety approval was still checking writable paths through the legacy `SandboxPolicy` projection. That can hide explicit `none` carveouts when a split filesystem policy projects back to compatibility `ExternalSandbox`, which leaves one more approval path that can auto-approve writes inside paths that are intentionally blocked. ## What changed - passed `turn.file_system_sandbox_policy` into `assess_patch_safety` - changed writable-path checks to derive effective access from `FileSystemSandboxPolicy` instead of the legacy `SandboxPolicy` - made those checks reject explicit unreadable roots before considering broad write access or writable roots - added regression coverage showing that an `ExternalSandbox` compatibility projection still asks for approval when the split filesystem policy blocks a subpath ## Verification - `cargo test -p codex-core safety::tests::` - `cargo test -p codex-core test_sandbox_config_parsing` - `cargo clippy -p codex-core --all-targets -- -D warnings` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13445). * openai#13453 * openai#13452 * openai#13451 * openai#13449 * openai#13448 * __->__ openai#13445 * openai#13440 * openai#13439 --------- Co-authored-by: viyatb-oai <viyatb@openai.com>

ARCJ137442 · 2026-03-07T13:04:05Z

CI fix follow-up pushed in 63ffdd2.\n\nRoot cause:\n-
ust-ci / Fork smoke (clippy + tests) was failing on codex-app-server-protocol::schema_fixtures::schema_fixtures_match_generated.\n- The vendored app-server schema fixtures in the fork were stale after the upstream sync, specifically around the command/exec schema surface.\n\nFix:\n- Refreshed the 11 affected schema fixture files and pushed est: refresh app-server schema fixtures.\n\nLocal verification:\n- cargo test -p codex-app-server-protocol schema_fixtures_match_generated -- --nocapture\n- cargo test -p codex-tui\n\nI am watching CI and will follow up again if any further failure shows up.

ARCJ137442 · 2026-03-07T13:32:00Z

Final CI status:\n\n- Fork smoke (clippy + tests) passed.\n- CI results (required) passed.\n- PR merge state is now CLEAN.\n\nThe upstream sync checkpoint is ready from a CI perspective.

ARCJ137442 · 2026-03-07T13:35:08Z

Post-CI recheck against openai/codex upstream:\n\n- This PR remains CI-green and mergeable.\n- However, after refetching upstream, openai/codex main has advanced by 1 new commit since this sync branch was cut.\n- Current upstream tip: cf143bf (eat: simplify DB further (openai#13771)).\n\nSo PR #33 is a good green sync checkpoint, but it is not the exact latest upstream head anymore.

…penai-main

## Summary - add the guardian reviewer flow for `on-request` approvals in command, patch, sandbox-retry, and managed-network approval paths - keep guardian behind `features.guardian_approval` instead of exposing a public `approval_policy = guardian` mode - route ordinary `OnRequest` approvals to the guardian subagent when the feature is enabled, without changing the public approval-mode surface ## Public model - public approval modes stay unchanged - guardian is enabled via `features.guardian_approval` - when that feature is on, `approval_policy = on-request` keeps the same approval boundaries but sends those approval requests to the guardian reviewer instead of the user - `/experimental` only persists the feature flag; it does not rewrite `approval_policy` - CLI and app-server no longer expose a separate `guardian` approval mode in this PR ## Guardian reviewer - the reviewer runs as a normal subagent and reuses the existing subagent/thread machinery - it is locked to a read-only sandbox and `approval_policy = never` - it does not inherit user/project exec-policy rules - it prefers `gpt-5.4` when the current provider exposes it, otherwise falls back to the parent turn's active model - it fail-closes on timeout, startup failure, malformed output, or any other review error - it currently auto-approves only when `risk_score < 80` ## Review context and policy - guardian mirrors `OnRequest` approval semantics rather than introducing a separate approval policy - explicit `require_escalated` requests follow the same approval surface as `OnRequest`; the difference is only who reviews them - managed-network allowlist misses that enter the approval flow are also reviewed by guardian - the review prompt includes bounded recent transcript history plus recent tool call/result evidence - transcript entries and planned-action strings are truncated with explicit `<guardian_truncated ... />` markers so large payloads stay bounded - apply-patch reviews include the full patch content (without duplicating the structured `changes` payload) - the guardian request layout is snapshot-tested using the same model-visible Responses request formatter used elsewhere in core ## Guardian network behavior - the guardian subagent inherits the parent session's managed-network allowlist when one exists, so it can use the same approved network surface while reviewing - exact session-scoped network approvals are copied into the guardian session with protocol/port scope preserved - those copied approvals are now seeded before the guardian's first turn is submitted, so inherited approvals are available during any immediate review-time checks ## Out of scope / follow-ups - the sandbox-permission validation split was pulled into a separate PR and is not part of this diff - a future follow-up can enable `serde_json` preserve-order in `codex-core` and then simplify the guardian action rendering further --------- Co-authored-by: Codex <noreply@openai.com>

…penai-main

ARCJ137442 · 2026-03-07T14:31:10Z

Strict-latest sync follow-up pushed in �a0a46665.\n\nWhat changed:\n- Upstream moved again after the previous green checkpoint, so I incrementally merged the new upstream commits into this same PR branch.\n- At push time, the branch was aligned with upstream main tip e84ee33 (Add guardian approval MVP (openai#13692)).\n\nLocal validation was intentionally scoped to avoid a full compile/test sweep:\n- targeted codex-state/core/cli/app-server filters for the DB/thread/memories sync\n- cargo test -p codex-core guardian\n- cargo test -p codex-tui guardian_approval\n\nI am now watching the new CI run. Per merge criteria, I will recheck upstream one more time after CI is green before merging.

ARCJ137442 · 2026-03-07T15:18:12Z

Pushed follow-up fix for the remaining Fork smoke (clippy + tests) failures: 7145abb5c19c0c466a5b8c5cbcb1be1512613f02

What changed:

Fixed the guardian shell test to refresh derived sandbox policies after switching to DangerFullAccess; this removes the Linux CI-only LandlockSandboxExecutableNotProvided failure.
Split the two platform-sensitive snapshots by OS:
- core guardian review request layout gets a Linux-specific snapshot for the planned-action JSON field order
- tui experimental popup gets a Linux-specific snapshot for the extra Bubblewrap sandbox menu item

Minimal validation run locally with G:\cargo-target-sync-upstream:

cargo fmt --all --check
cargo test -p codex-core 'codex::tests::guardian_tests::guardian_allows_shell_additional_permissions_requests_past_policy_validation' -- --exact
cargo test -p codex-core 'guardian::tests::guardian_review_request_layout_matches_model_visible_request_snapshot' -- --exact
cargo test -p codex-tui 'chatwidget::tests::experimental_popup_includes_guardian_approval' -- --exact

I will not merge after green automatically. Once CI is green, I still need to fetch/recheck whether upstream/main moved again and confirm this branch is still strictly aligned before merge.

ARCJ137442 · 2026-03-07T15:43:39Z

Follow-up fix pushed: 99efbf1fb8362177fe2d3e51a3830da8306e7dd7

The prior rerun reduced fork-smoke to a single Linux snapshot mismatch in experimental_popup_includes_guardian_approval@linux. The only remaining difference was the selected-row glyph (? rendered in CI vs > in the manually added snapshot).

This commit is snapshot-only and updates that glyph to match CI output exactly.

ARCJ137442 · 2026-03-07T16:09:20Z

Current CI blocker after 99efbf1f is no longer tied to the guardian/snapshot fixes.

Failed run: 22802014604

The only remaining failures are two realtime conversation integration tests timing out in CI:

suite::realtime_conversation::conversation_startup_context_falls_back_to_workspace_map
suite::realtime_conversation::conversation_start_injects_startup_context_from_thread_history

I ran both exact tests locally on this branch and both passed:

cargo test -p codex-core --test all 'suite::realtime_conversation::conversation_startup_context_falls_back_to_workspace_map' -- --exact
cargo test -p codex-core --test all 'suite::realtime_conversation::conversation_start_injects_startup_context_from_thread_history' -- --exact

That points to an upstream/main CI timeout/flaky failure rather than a deterministic regression from this PR. I am rerunning the failed workflow instead of making speculative code changes.

ARCJ137442 · 2026-03-07T16:37:25Z

Final gate passed before merge.

PR head: 99efbf1fb8362177fe2d3e51a3830da8306e7dd7
latest upstream/main: e84ee33cc02e693a3cf66204c72cb37e8dda3ed6
merge-base(HEAD, upstream/main) matches latest upstream tip exactly
required CI is green

So the branch is still aligned with the latest upstream/main after CI completion and is now safe to merge.

jif-oai and others added 30 commits March 6, 2026 15:10

feat: prune old memories in DB (openai#13734)

8ad768e

To save memory

fix: windows normalization (openai#13742)

5d43035

feat: drop sqlite db feature flag (openai#13750)

fa16c26

feat: drop discrepency metrics (openai#13753)

f891f51

feat: limit number of rows per log (openai#13763)

c8f4b5b

avoid DB explosion. This is a temp solution

app-server: Emit thread/name/updated event globally (openai#13674)

51fcdc7

Reduce SQLite log retention to 10 days (openai#13781)

ad98504

## Summary - reduce the SQLite-backed log retention window from 90 days to 10 days ## Testing - just fmt - cargo test -p codex-state Co-authored-by: Codex <noreply@openai.com>

fix: move unit tests in codex-rs/core/src/codex.rs into their own file (

488875f

openai#13783) This is analogous to openai#13780.

fix bazel build (openai#13787)

7a5aff4

I believe this broke in openai#13772.

Codex/winget auto update (openai#12943)

8ede180

Publish CLI releases to winget. Uses https://github.com/vedantmgoyal9/winget-releaser to greatly reduce boilerplate needed to create winget-pkgs manifets

Add realtime startup context override (openai#13796)

a11c59f

- add experimental_realtime_ws_startup_context to override or disable realtime websocket startup context - preserve generated startup context when unset and cover the new override paths in tests

fix: include libcap-dev dependency when creating a devcontainer for b…

3794363

…uilding Codex (openai#13814) I mainly use the devcontainer to be able to run `cargo clippy --tests` locally for Linux. We still need to make it possible to run clippy from Bazel so I don't need to do this!

celia-oai and others added 13 commits March 6, 2026 16:40

Allow full web search tool config (openai#13675)

61098c7

Previously, we could only configure whether web search was on/off. This PR enables sending along a web search config, which includes all the stuff responsesapi supports: filters, location, etc.

Merge upstream/main into sync branch

373626d

ARCJ137442 mentioned this pull request Mar 7, 2026

Sync fork to latest OpenAI Codex upstream baseline #32

Open

8 tasks

jif-oai and others added 2 commits March 7, 2026 03:48

feat: simplify DB further (openai#13771)

cf143bf

test: refresh app-server schema fixtures

63ffdd2

ARCJ137442 and others added 3 commits March 7, 2026 21:36

Merge remote-tracking branch 'upstream/main' into chore/sync-latest-o…

fe048d4

…penai-main

Merge remote-tracking branch 'upstream/main' into chore/sync-latest-o…

aa0a466

…penai-main

test: fix linux fork-smoke guardian snapshots

7145abb

test: fix linux popup snapshot selection marker

99efbf1

ARCJ137442 merged commit a02eeaa into main Mar 7, 2026
18 of 20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: sync fork with latest OpenAI Codex upstream#33

chore: sync fork with latest OpenAI Codex upstream#33
ARCJ137442 merged 50 commits intomainfrom
chore/sync-latest-openai-main

ARCJ137442 commented Mar 7, 2026

Uh oh!

ARCJ137442 commented Mar 7, 2026

Uh oh!

ARCJ137442 commented Mar 7, 2026

Uh oh!

ARCJ137442 commented Mar 7, 2026

Uh oh!

ARCJ137442 commented Mar 7, 2026

Uh oh!

ARCJ137442 commented Mar 7, 2026

Uh oh!

ARCJ137442 commented Mar 7, 2026

Uh oh!

ARCJ137442 commented Mar 7, 2026

Uh oh!

ARCJ137442 commented Mar 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants

Conversation

ARCJ137442 commented Mar 7, 2026

Summary

What changed

Why this PR exists

Local validation

Notes

Uh oh!

ARCJ137442 commented Mar 7, 2026

Uh oh!

ARCJ137442 commented Mar 7, 2026

Uh oh!

ARCJ137442 commented Mar 7, 2026

Uh oh!

ARCJ137442 commented Mar 7, 2026

Uh oh!

ARCJ137442 commented Mar 7, 2026

Uh oh!

ARCJ137442 commented Mar 7, 2026

Uh oh!

ARCJ137442 commented Mar 7, 2026

Uh oh!

ARCJ137442 commented Mar 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants