chore(stable): refresh to rust-v0.123.0#22
Merged
richardgetz merged 120 commits intostablefrom Apr 23, 2026
Merged
Conversation
…ai#18662) Add a metric `codex.turn.memory` to know if a turn used memories or not. This is not part of the other turn metrics as a label to limit cardinality
Due to the app-server rebase of the TUI, the review prompt was leaked into the transcript on the TUI This is not a security issue but it was bad UX. This PR fixes this
In the log client, use the log level filter as a minimum severity instead of exact match --------- Co-authored-by: Codex <noreply@openai.com>
## Summary Introduces a single background/control-plane agent task for ChatGPT backend requests that do not have a thread-scoped task, with `AuthManager` owning the default ChatGPT backend authorization decision. Callers now ask `AuthManager` for the default ChatGPT backend authorization header. `AuthManager` decides whether that is bearer or background AgentAssertion based on config/internal state, while low-level bootstrap paths can explicitly request bearer-only auth. This PR is stacked on PR4 and focuses on the shared background task auth plumbing plus the first tranche of backend/control-plane consumers. The remaining callsite wiring is split into PR4.2 to keep review size down. ## Stack - PR1: openai#17385 - add `features.use_agent_identity` - PR2: openai#17386 - register agent identities when enabled - PR3: openai#17387 - register agent tasks when enabled - PR3.1: openai#17978 - persist and prewarm registered tasks per thread - PR4: openai#17980 - use task-scoped `AgentAssertion` for downstream calls - PR4.1: this PR - introduce AuthManager-owned background/control-plane `AgentAssertion` auth - PR4.2: openai#18260 - use background task auth for additional backend/control-plane calls ## What Changed - add background task registration and assertion minting inside `codex-login` - persist `agent_identity.background_task_id` separately from per-session task state - make `BackgroundAgentTaskManager` private to `codex-login`; call sites do not instantiate or pass it around - teach `AuthManager` the ChatGPT backend base URL and feature-derived background auth mode from resolved config - expose bearer-only helpers for bootstrap/registration/refresh-style paths that must not use AgentAssertion - wire `AuthManager` default ChatGPT authorization through app listing, connector directory listing, remote plugins, MCP status/listing, analytics, and core-skills remote calls - preserve bearer fallback when the feature is disabled, the backend host is unsupported, or background task registration is not available ## Validation - `just fmt` - `cargo check -p codex-core -p codex-login -p codex-analytics -p codex-app-server -p codex-cloud-requirements -p codex-cloud-tasks -p codex-models-manager -p codex-chatgpt -p codex-model-provider -p codex-mcp -p codex-core-skills` - `cargo test -p codex-login agent_identity` - `cargo test -p codex-model-provider bearer_auth_provider` - `cargo test -p codex-core agent_assertion` - `cargo test -p codex-app-server remote_control` - `cargo test -p codex-cloud-requirements fetch_cloud_requirements` - `cargo test -p codex-models-manager manager::tests` - `cargo test -p codex-chatgpt` - `cargo test -p codex-cloud-tasks` - `just fix -p codex-core -p codex-login -p codex-analytics -p codex-app-server -p codex-cloud-requirements -p codex-cloud-tasks -p codex-models-manager -p codex-chatgpt -p codex-model-provider -p codex-mcp -p codex-core-skills` - `just fix -p codex-app-server` - `git diff --check`
## Why Fresh app-server thread startup can create a shell snapshot through a temp file and then promote it to the final snapshot path. The previous implementation briefly wrapped the temp path in `ShellSnapshot`, so after a successful rename its `Drop` attempted to delete the old temp path and could log a false `ENOENT` warning. Fixes openai#17549. ## What changed - Validate the temp snapshot path directly before promotion. - Rename the temp path directly to the final snapshot path. - Keep explicit cleanup of the temp path on validation or finalization failures.
…#18260) ## Summary Splits the larger PR4.1 background task auth rollout by moving additional backend/control-plane call sites into this downstream PR. This PR keeps callers on the same design as PR4.1: most code asks `AuthManager` for the default ChatGPT backend authorization header, and `AuthManager` decides bearer vs background AgentAssertion internally. Task-pinned inference auth remains separate because it needs the thread's registered task id. ## Stack - PR1: openai#17385 - add `features.use_agent_identity` - PR2: openai#17386 - register agent identities when enabled - PR3: openai#17387 - register agent tasks when enabled - PR3.1: openai#17978 - persist and prewarm registered tasks per thread - PR4: openai#17980 - use task-scoped `AgentAssertion` for downstream calls - PR4.1: openai#18094 - introduce AuthManager-owned background/control-plane `AgentAssertion` auth - PR4.2: this PR - use background task auth for additional backend/control-plane calls ## What Changed - pass full authorization header values through backend-client and cloud-tasks-client call paths where needed - move ChatGPT client, cloud requirements, cloud tasks, thread-manager, and models-manager background auth usage into this downstream slice - make app-server remote control enrollment/websocket auth ask `AuthManager` for the local backend authorization header instead of threading a background auth mode through transport options - keep the same feature-gated bearer fallback behavior from PR4.1 ## Validation - `just fmt` - `cargo check -p codex-core -p codex-login -p codex-analytics -p codex-app-server -p codex-cloud-requirements -p codex-cloud-tasks -p codex-models-manager -p codex-chatgpt -p codex-model-provider -p codex-mcp -p codex-core-skills` - `cargo test -p codex-login agent_identity` - `cargo test -p codex-model-provider bearer_auth_provider` - `cargo test -p codex-core agent_assertion` - `cargo test -p codex-app-server remote_control` - `cargo test -p codex-cloud-requirements fetch_cloud_requirements` - `cargo test -p codex-models-manager manager::tests` - `cargo test -p codex-chatgpt` - `cargo test -p codex-cloud-tasks` - `just fix -p codex-core -p codex-login -p codex-analytics -p codex-app-server -p codex-cloud-requirements -p codex-cloud-tasks -p codex-models-manager -p codex-chatgpt -p codex-model-provider -p codex-mcp -p codex-core-skills` - `just fix -p codex-app-server` - `git diff --check`
Before this some tests were leaking an auth.json file into `codex-rs/core`. This just fixes it
Fixes openai#18539. ## Summary The recent `/mcp` performance work kept the default command fast by avoiding resource and resource-template inventory probes, but it also removed useful diagnostics for users trying to confirm MCP server state. This keeps bare `/mcp` on the fast tools/auth path and adds `/mcp verbose` for the slower diagnostic view. Verbose mode requests full MCP server status from the app-server and restores status, resources, and resource templates in the TUI output. ## Testing In addition to running automation, I manually tested the feature to confirm that it works.
## Problem The TUI resume/fork picker was backfilling thread names from local rollout indexes. This was left over from before the TUI was moved to the app server. It should be using app-server APIs because the TUI might be connected to a remote connection. This bug wasn't (yet) reported by a user. I found it by asking Codex to review places in the TUI code where it was still directly accessing the CODEX_HOME directory rather than going through app-server APIs. ## Solution The resume picker and session lookups should use app-server thread APIs only. Remove legacy rollout name/list backfills, and avoid local name reads in fork history. ## Testing I manually tested `codex resume` and `codex resume --all` to look for functional or performance regressions in the resume picker.
## Summary Side conversations can hide important state changes from the parent conversation while the user is focused on the side thread. In particular, the parent may finish, fail, need user input, or require an approval while the side conversation remains visible. Users need a lightweight signal for those states, but parent approval overlays should not interrupt the side conversation itself. This change adds parent-conversation status to the side conversation context label and defers parent interactive overlays while side mode is active. When the user exits side mode, pending parent approvals and input requests are restored in the main thread. The pending approval footer avoids duplicating the same parent approval status, and replayed notice cells are filtered when restoring a pending interactive request so tips or warnings do not crowd out the approval prompt. The change is contained to the TUI side-conversation and thread replay paths. Example 1: Approval pending <img width="752" height="35" alt="Screenshot 2026-04-19 at 12 56 07 PM" src="https://github.com/user-attachments/assets/1cc0f1a3-9cab-4d60-aed2-96523ccafc20" /> Example 2: Turn complete <img width="754" height="35" alt="Screenshot 2026-04-19 at 12 56 27 PM" src="https://github.com/user-attachments/assets/653521a5-e298-4366-ae1c-72b56eb88eeb" />
- Migrates unloaded `thread/name/set` and `thread/memoryModeSet` app-server writes behind the generic `ThreadStore::update_thread_metadata` API rather than adding one-off store methods for setting thread name or memory mode. - Implements the local ThreadStore metadata patch path for thread name and memory mode, including rollout append, legacy name index updates, SessionMeta validation/update, SQLite reconciliation, and re-reading the stored thread. - Adds focused local thread-store unit coverage plus app-server integration coverage for the migrated unloaded write paths.
## Why `PermissionProfile` needs stable, canonical file-system semantics before it can become the primary runtime permissions abstraction. Without a canonical form, callers have to keep re-deriving legacy sandbox maps and profile comparisons remain lossy or order-dependent. ## What changed This adds canonicalization helpers for `FileSystemPermissions` and `PermissionProfile`, expands special paths into explicit sandbox entries, and updates permission request/conversion paths to consume those canonical entries. It also tightens the legacy bridge so root-wide write profiles with narrower carveouts are not silently projected as full-disk legacy access. ## Verification - `cargo test -p codex-protocol root_write_with_read_only_child_is_not_full_disk_write -- --nocapture` - `cargo test -p codex-sandboxing permission -- --nocapture` - `cargo test -p codex-tui permissions -- --nocapture`
This is the second cleanup in the await-holding lint stack. The higher-level goal, following openai#18178 and openai#18398, is to enable Clippy coverage for guards held across `.await` points without carrying broad suppressions. The stack is working toward enabling Clippy's [`await_holding_lock`](https://rust-lang.github.io/rust-clippy/master/index.html#await_holding_lock) lint and the configurable [`await_holding_invalid_type`](https://rust-lang.github.io/rust-clippy/master/index.html#await_holding_invalid_type) lint for Tokio guard types. Several existing fields used `tokio::sync::Mutex<()>` only as one-at-a-time async gates. Those guards intentionally lived across `.await` while an operation was serialized. A mutex over `()` suggests protected data and trips the await-holding lint shape; a single-permit `tokio::sync::Semaphore` expresses the intended serialization directly. ## What changed - Replace `Mutex<()>` serialization gates with `Semaphore::new(1)` for agent identity ensure, exec policy updates, guardian review session reuse, plugin remote sync, managed network proxy refresh, auth token refresh, and RMCP session recovery. - Update call sites from `lock().await` / `try_lock()` to `acquire().await` / `try_acquire()`. - Map closed-semaphore errors into the existing local error types, even though these semaphores are owned for the lifetime of their managers. - Update session test builders for the new `managed_network_proxy_refresh_lock` type. ## Verification - The split stack was verified at the final lint-enabling head with `just clippy`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18403). * openai#18698 * openai#18423 * openai#18418 * __->__ openai#18403
- Replace the active models-manager catalog with the deleted core catalog contents. - Replace stale hardcoded test model slugs with current bundled model slugs. - Keep this as a stacked change on top of the cleanup PR.
Wires patch_updated events through app_server. These events are parsed and streamed while apply_patch is being written by the model. Also adds 500ms of buffering to the patch_updated events in the diff_consumer. The eventual goal is to use this to display better progress indicators in the codex app.
## Problem The TUI still imported path utilities and config-loader symbols through app-server-client's legacy_core facade even though those APIs already exist in utility/config crates. This is part of our ongoing effort to whittle away at these old dependencies. ## Solution Rewire imports to avoid the TUI directly importing from the core crate and instead import from common lower-level crates. This PR doesn't include any functional changes; it's just a simple rewiring.
## Summary - Add the missing `background_task_id: None` field to the `AgentIdentityAuthRecord` fixture introduced in `auth_tests.rs`. ## Why - Current `main` fails Bazel/rust-ci compile paths after the background-task auth field landed and a later auth test fixture constructed `AgentIdentityAuthRecord` without that new field. - I intentionally removed the earlier broader CI-stability edits from this PR. The code-mode timeout, external-agent migration snapshot, and MCP resource timeout failures appear to be general/flaky or unrelated to the agent identity merge stack rather than cleanly caused by it. ## Validation - `cargo test -p codex-login dummy_chatgpt_auth_does_not_create_cwd_auth_json_when_identity_is_set -- --nocapture` - `just fmt`
Automated update of models.json. --------- Co-authored-by: aibrahim-oai <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>
## Summary - Pin vulnerable npm dependencies through the existing root `resolutions` mechanism so the lockfile moves only to patched versions. - Refresh `pnpm-lock.yaml` for `@modelcontextprotocol/sdk`, `handlebars`, `path-to-regexp`, `picomatch`, `minimatch`, `flatted`, `rollup`, and `glob`. - Bump `quinn-proto` from `0.11.13` to `0.11.14` and refresh `MODULE.bazel.lock`. ## Testing - `corepack pnpm --store-dir .pnpm-store install --frozen-lockfile --ignore-scripts` - `corepack pnpm audit --audit-level high` (passes; remaining advisories are low/moderate) - `corepack pnpm -r --filter ./sdk/typescript run build` - `corepack pnpm exec eslint 'src/**/*.ts' 'tests/**/*.ts'` - `cargo check --locked` - `cargo build -p codex-cli` - `bazel --output_user_root=/tmp/bazel-codex-dependabot --ignore_all_rc_files mod deps --lockfile_mode=error` - `just fmt` Note: `corepack pnpm -r --filter ./sdk/typescript run test` was also attempted after building `codex`; it is blocked on this workstation by host-managed Codex MDM/auth state (`approval_policy` restrictions and ChatGPT/API-key mismatch), not by this dependency change.
…17692) ## Why Guardian review analytics needs a Rust event shape that matches the backend schema while avoiding unnecessary PII exposure from reviewed tool calls. This PR narrows the analytics payload to the fields we intend to emit and keeps shared Guardian assessment enums in protocol instead of duplicating equivalent analytics-only enums. ## What changed - Uses protocol Guardian enums directly for `risk_level`, `user_authorization`, `outcome`, and command source values. - Removes high-risk reviewed-action fields from the analytics payload, including raw commands, display strings, working directories, file paths, network targets/hosts, justification text, retry reason, and rationale text. - Makes `target_item_id` and `tool_call_count` nullable so the Codex event can represent cases where the app-server protocol or producer does not have those values. - Keeps lower-risk structured reviewed-action metadata such as sandbox permissions, permission profile, `tty`, `execve` source/program, network protocol/port, and MCP connector/tool labels. - Adds an analytics reducer/client test covering `codex_guardian_review` serialization with an optional `target_item_id` and absent removed fields. ## Verification - `cargo test -p codex-analytics guardian_review_event_ingests_custom_fact_with_optional_target_item` - `cargo fmt --check` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17692). * openai#17696 * openai#17695 * openai#17693 * __->__ openai#17692
## Summary Disables apps, plugins, mcps for the guardian subagent thread ## Testing - [x] Added unit tests
## Summary This PR aims to improve integration between the realtime model and the codex agent by sharing more context with each other. In particular, we now share full realtime conversation transcript deltas in addition to the delegation message. realtime_conversation.rs now turns a handoff into: ``` <realtime_delegation> <input>...</input> <transcript_delta>...</transcript_delta> </realtime_delegation> ``` ## Implementation notes The transcript is accumulated in the realtime websocket layer as parsed realtime events arrive. When a background-agent handoff is requested, the current transcript snapshot is copied onto the handoff event and then serialized by `realtime_conversation.rs` into the hidden realtime delegation envelope that Codex receives as user-turn context. For Realtime V2, the session now explicitly enables input audio transcription, and the parser handles the relevant input/output transcript completion events so the snapshot includes both user speech and realtime model responses. The delegation `<input>` remains the actual handoff request, while `<transcript_delta>` carries the surrounding conversation history for context. Reviewers should note that the transcript payload is intended for Codex context sharing, not UI rendering. The realtime delegation envelope should stay hidden from the user-facing transcript surface, while still being included in the background-agent turn so Codex can answer with the same conversational context the realtime model had.
## Why `skills/list` refreshes are best-effort metadata updates. If one fails during startup or thread switching, the TUI should keep running and show enough detail to diagnose the app-server failure instead of leaving the user with only a log entry. This addresses the recoverability and observability issue reported in openai#16914. ## What Changed - Preserve the full startup `skills/list` error chain before sending it back through the app event queue. - Surface failed skills refreshes as recoverable TUI error messages while still logging the warning. This is related to the recent bug fix from [PR openai#18370](openai#18370).
Fixes stale test fixtures left after the active bundled model catalog updates in openai#18586 and openai#18388. Those changes made `gpt-5.4` the current default and removed several older hardcoded slugs, which left Windows Bazel shards failing TUI and config tests. What changed: - Refresh TUI model migration, availability NUX, plan-mode, status, and snapshot fixtures to use active bundled model slugs. - Update the config edit test expectation for the TOML-quoted `"gpt-5.2"` migration key. - Move the model catalog tests into `codex-rs/tui/src/app/tests/model_catalog.rs` so touching them does not trip the blob-size policy for `app.rs`. Verification: - CI Bazel/lint checks are expected to cover the affected test shards.
Add experimental config to use remote thread store rather than local thread store implementation in app server
## Why Fixes openai#18718. After rewinding a thread, `/copy` could still copy the latest assistant response from before the rewind. The transcript cells were rolled back, but the copy source was a single `last_agent_markdown` cache that was not synchronized with backtracking, so the visible conversation and copied content could diverge. ## What changed `ChatWidget` now keeps a bounded copy history for the most recent 32 assistant responses, keyed by the visible user-turn count. When local rollback trims transcript cells, the copy cache is trimmed to the same surviving user-turn count so `/copy` uses the latest visible assistant response. If the user rewinds past the retained copy window, `/copy` now reports: ```text Cannot copy that response after rewinding. Only the most recent 32 responses are available to /copy. ``` The change also adds coverage for copying the latest surviving response after rollback and for the over-limit rewind message. ## Verification - Manually resumed a synthetic 35-turn session, rewound within the retained window, and verified `/copy` copied the surviving response. - Manually rewound past the retained window and verified `/copy` showed the 32-response limit message. - `cargo test -p codex-tui slash_copy` - `just fix -p codex-tui` - `cargo insta pending-snapshots` Note: `cargo test -p codex-tui` currently fails on unrelated model catalog and snapshot drift around the default model changing to `gpt-5.4`; the focused `/copy` tests pass after fixing the new test setup.
This updates the spawn-agent tool contract so subagents are presented as inheriting the parent model by default. The visible model list is now framed as optional overrides, the model parameter tells callers to leave it unset and the delegation guidance no longer nudges models toward picking a smaller/mini override. Fixes reports that 5.4 would occasionally pick 5.2 or lower as sub-agents.
## Problem The TUI resolved fork parent titles from local CODEX_HOME metadata, which could show missing or stale titles when app-server metadata is authoritative. This is a lingering bug left over from the migration of the TUI to the app-server interface. I found it when I asked Codex to review all places where the TUI code was still directly accessing the local CODEX_HOME. ## Solution Route fork parent title metadata through the app-server session state and render only that supplied title, with focused snapshot coverage for stale local metadata. ## Testing I manually tested by renaming a thread then forking it and confirming that the "forked from" message indicated the parent thread's name.
Cascade the thread archive endpoint to all the sub-agents in the agent tree Fix: openai#17867 --------- Co-authored-by: Codex <noreply@openai.com>
Migrate the conversation summary App Server methods to ThreadStore Because this app server api allows explicitly fetching the thread by rollout path, intercept that case in the app server code and (a) route directly to underlying local thread store methods if we're using a local thread store, or (b) throw an unsupported error if we're using a remote thread store. This keeps the thread store API clean and all filesystem operations inside of the local thread store, which pushing the "fundamental incompatibility" check as early as possible.
(cherry picked from commit ec75d2d)
(cherry picked from commit 4597e7d)
(cherry picked from commit 4d3ed4c)
(cherry picked from commit 50ffe89)
(cherry picked from commit 33c9538)
(cherry picked from commit 3410913)
(cherry picked from commit 6c83144)
(cherry picked from commit 5966b1e)
(cherry picked from commit 8e1c2d2)
BREAKING CHANGE: thread control config and API rename router mode to orchestrator without router compatibility.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
rust-v0.123.0Verification
CARGO_HOME=/tmp/codex-cargo-home just write-config-schemaCARGO_HOME=/tmp/codex-cargo-home just write-app-server-schemaCARGO_HOME=/tmp/codex-cargo-home cargo test -p codex-tools -p codex-app-server-protocol -p codex-mcpCARGO_HOME=/tmp/codex-cargo-home cargo test -p codex-core --lib -- --skip tools::js_replCARGO_HOME=/tmp/codex-cargo-home cargo test -p codex-core resolve_router_turn_settingsCARGO_HOME=/tmp/codex-cargo-home cargo test -p codex-core multi_agent_v2_spawn_can_select_child_collaboration_modeCARGO_HOME=/tmp/codex-cargo-home cargo test -p codex-tui bottom_pane::tests::esc_interrupts_running_task_when_status_indicator_hiddenCARGO_HOME=/tmp/codex-cargo-home cargo test -p codex-tui status_indicator_widget::testsCARGO_HOME=/tmp/codex-cargo-home cargo test -p codex-tuiCODEX_HOME=/tmp/codex-test-home CARGO_HOME=/tmp/codex-cargo-home cargo test -p codex-app-serverpassed unit tests; integration binary still hits local nestedsandbox-exec: Operation not permittedfor command-exec/interrupt coverage under this Codex sandboxCARGO_HOME=/tmp/codex-cargo-home just fix -p codex-core -p codex-mcp -p codex-tools -p codex-app-server-protocol -p codex-app-server -p codex-tuiCARGO_HOME=/tmp/codex-cargo-home just fix -p codex-tuigit diff --checkLocal notes
just bazel-lock-updatewas attempted, but localbazelis not installed.codex-corewith JS REPL tests hits the same local sandbox limitation (sandbox-exec: sandbox_apply: Operation not permitted); the non-JS core suite passed.