Skip to content

chore(stable): refresh to rust-v0.123.0#22

Merged
richardgetz merged 120 commits intostablefrom
stable-refresh/rust-v0.123.0
Apr 23, 2026
Merged

chore(stable): refresh to rust-v0.123.0#22
richardgetz merged 120 commits intostablefrom
stable-refresh/rust-v0.123.0

Conversation

@richardgetz
Copy link
Copy Markdown
Owner

@richardgetz richardgetz commented Apr 23, 2026

Summary

  • refresh stable from upstream rust-v0.123.0
  • replay the maintained fork changes on top of the release tag
  • resolve replay drift in thread controls, collaboration mode tests, lazy MCP/tool search handling, generated schemas, lockfile versions, and TUI version snapshots
  • fix Esc interruption while task output has hidden the status indicator

Verification

  • CARGO_HOME=/tmp/codex-cargo-home just write-config-schema
  • CARGO_HOME=/tmp/codex-cargo-home just write-app-server-schema
  • CARGO_HOME=/tmp/codex-cargo-home cargo test -p codex-tools -p codex-app-server-protocol -p codex-mcp
  • CARGO_HOME=/tmp/codex-cargo-home cargo test -p codex-core --lib -- --skip tools::js_repl
  • CARGO_HOME=/tmp/codex-cargo-home cargo test -p codex-core resolve_router_turn_settings
  • CARGO_HOME=/tmp/codex-cargo-home cargo test -p codex-core multi_agent_v2_spawn_can_select_child_collaboration_mode
  • CARGO_HOME=/tmp/codex-cargo-home cargo test -p codex-tui bottom_pane::tests::esc_interrupts_running_task_when_status_indicator_hidden
  • CARGO_HOME=/tmp/codex-cargo-home cargo test -p codex-tui status_indicator_widget::tests
  • CARGO_HOME=/tmp/codex-cargo-home cargo test -p codex-tui
  • CODEX_HOME=/tmp/codex-test-home CARGO_HOME=/tmp/codex-cargo-home cargo test -p codex-app-server passed unit tests; integration binary still hits local nested sandbox-exec: Operation not permitted for command-exec/interrupt coverage under this Codex sandbox
  • CARGO_HOME=/tmp/codex-cargo-home just fix -p codex-core -p codex-mcp -p codex-tools -p codex-app-server-protocol -p codex-app-server -p codex-tui
  • CARGO_HOME=/tmp/codex-cargo-home just fix -p codex-tui
  • git diff --check

Local notes

  • just bazel-lock-update was attempted, but local bazel is not installed.
  • Full codex-core with JS REPL tests hits the same local sandbox limitation (sandbox-exec: sandbox_apply: Operation not permitted); the non-JS core suite passed.

jif-oai and others added 30 commits April 20, 2026 14:31
…ai#18662)

Add a metric `codex.turn.memory` to know if a turn used memories or not.
This is not part of the other turn metrics as a label to limit
cardinality
Due to the app-server rebase of the TUI, the review prompt was leaked
into the transcript on the TUI
This is not a security issue but it was bad UX. This PR fixes this
In the log client, use the log level filter as a minimum severity
instead of exact match

---------

Co-authored-by: Codex <noreply@openai.com>
## Summary

Introduces a single background/control-plane agent task for ChatGPT
backend requests that do not have a thread-scoped task, with
`AuthManager` owning the default ChatGPT backend authorization decision.

Callers now ask `AuthManager` for the default ChatGPT backend
authorization header. `AuthManager` decides whether that is bearer or
background AgentAssertion based on config/internal state, while
low-level bootstrap paths can explicitly request bearer-only auth.

This PR is stacked on PR4 and focuses on the shared background task auth
plumbing plus the first tranche of backend/control-plane consumers. The
remaining callsite wiring is split into PR4.2 to keep review size down.

## Stack

- PR1: openai#17385 - add
`features.use_agent_identity`
- PR2: openai#17386 - register agent
identities when enabled
- PR3: openai#17387 - register agent tasks
when enabled
- PR3.1: openai#17978 - persist and
prewarm registered tasks per thread
- PR4: openai#17980 - use task-scoped
`AgentAssertion` for downstream calls
- PR4.1: this PR - introduce AuthManager-owned background/control-plane
`AgentAssertion` auth
- PR4.2: openai#18260 - use background
task auth for additional backend/control-plane calls

## What Changed

- add background task registration and assertion minting inside
`codex-login`
- persist `agent_identity.background_task_id` separately from
per-session task state
- make `BackgroundAgentTaskManager` private to `codex-login`; call sites
do not instantiate or pass it around
- teach `AuthManager` the ChatGPT backend base URL and feature-derived
background auth mode from resolved config
- expose bearer-only helpers for bootstrap/registration/refresh-style
paths that must not use AgentAssertion
- wire `AuthManager` default ChatGPT authorization through app listing,
connector directory listing, remote plugins, MCP status/listing,
analytics, and core-skills remote calls
- preserve bearer fallback when the feature is disabled, the backend
host is unsupported, or background task registration is not available

## Validation

- `just fmt`
- `cargo check -p codex-core -p codex-login -p codex-analytics -p
codex-app-server -p codex-cloud-requirements -p codex-cloud-tasks -p
codex-models-manager -p codex-chatgpt -p codex-model-provider -p
codex-mcp -p codex-core-skills`
- `cargo test -p codex-login agent_identity`
- `cargo test -p codex-model-provider bearer_auth_provider`
- `cargo test -p codex-core agent_assertion`
- `cargo test -p codex-app-server remote_control`
- `cargo test -p codex-cloud-requirements fetch_cloud_requirements`
- `cargo test -p codex-models-manager manager::tests`
- `cargo test -p codex-chatgpt`
- `cargo test -p codex-cloud-tasks`
- `just fix -p codex-core -p codex-login -p codex-analytics -p
codex-app-server -p codex-cloud-requirements -p codex-cloud-tasks -p
codex-models-manager -p codex-chatgpt -p codex-model-provider -p
codex-mcp -p codex-core-skills`
- `just fix -p codex-app-server`
- `git diff --check`
## Why
Fresh app-server thread startup can create a shell snapshot through a
temp file and then promote it to the final snapshot path. The previous
implementation briefly wrapped the temp path in `ShellSnapshot`, so
after a successful rename its `Drop` attempted to delete the old temp
path and could log a false `ENOENT` warning.

Fixes openai#17549.

## What changed
- Validate the temp snapshot path directly before promotion.
- Rename the temp path directly to the final snapshot path.
- Keep explicit cleanup of the temp path on validation or finalization
failures.
…#18260)

## Summary

Splits the larger PR4.1 background task auth rollout by moving
additional backend/control-plane call sites into this downstream PR.

This PR keeps callers on the same design as PR4.1: most code asks
`AuthManager` for the default ChatGPT backend authorization header, and
`AuthManager` decides bearer vs background AgentAssertion internally.
Task-pinned inference auth remains separate because it needs the
thread's registered task id.

## Stack

- PR1: openai#17385 - add
`features.use_agent_identity`
- PR2: openai#17386 - register agent
identities when enabled
- PR3: openai#17387 - register agent tasks
when enabled
- PR3.1: openai#17978 - persist and
prewarm registered tasks per thread
- PR4: openai#17980 - use task-scoped
`AgentAssertion` for downstream calls
- PR4.1: openai#18094 - introduce
AuthManager-owned background/control-plane `AgentAssertion` auth
- PR4.2: this PR - use background task auth for additional
backend/control-plane calls

## What Changed

- pass full authorization header values through backend-client and
cloud-tasks-client call paths where needed
- move ChatGPT client, cloud requirements, cloud tasks, thread-manager,
and models-manager background auth usage into this downstream slice
- make app-server remote control enrollment/websocket auth ask
`AuthManager` for the local backend authorization header instead of
threading a background auth mode through transport options
- keep the same feature-gated bearer fallback behavior from PR4.1

## Validation

- `just fmt`
- `cargo check -p codex-core -p codex-login -p codex-analytics -p
codex-app-server -p codex-cloud-requirements -p codex-cloud-tasks -p
codex-models-manager -p codex-chatgpt -p codex-model-provider -p
codex-mcp -p codex-core-skills`
- `cargo test -p codex-login agent_identity`
- `cargo test -p codex-model-provider bearer_auth_provider`
- `cargo test -p codex-core agent_assertion`
- `cargo test -p codex-app-server remote_control`
- `cargo test -p codex-cloud-requirements fetch_cloud_requirements`
- `cargo test -p codex-models-manager manager::tests`
- `cargo test -p codex-chatgpt`
- `cargo test -p codex-cloud-tasks`
- `just fix -p codex-core -p codex-login -p codex-analytics -p
codex-app-server -p codex-cloud-requirements -p codex-cloud-tasks -p
codex-models-manager -p codex-chatgpt -p codex-model-provider -p
codex-mcp -p codex-core-skills`
- `just fix -p codex-app-server`
- `git diff --check`
Before this some tests were leaking an auth.json file into
`codex-rs/core`. This just fixes it
Fixes openai#18539.

## Summary
The recent `/mcp` performance work kept the default command fast by
avoiding resource and resource-template inventory probes, but it also
removed useful diagnostics for users trying to confirm MCP server state.

This keeps bare `/mcp` on the fast tools/auth path and adds `/mcp
verbose` for the slower diagnostic view. Verbose mode requests full MCP
server status from the app-server and restores status, resources, and
resource templates in the TUI output.

## Testing
In addition to running automation, I manually tested the feature to
confirm that it works.
## Problem

The TUI resume/fork picker was backfilling thread names from local
rollout indexes. This was left over from before the TUI was moved to the
app server. It should be using app-server APIs because the TUI might be
connected to a remote connection.

This bug wasn't (yet) reported by a user. I found it by asking Codex to
review places in the TUI code where it was still directly accessing the
CODEX_HOME directory rather than going through app-server APIs.

## Solution

The resume picker and session lookups should use app-server thread APIs
only. Remove legacy rollout name/list backfills, and avoid local name
reads in fork history.

## Testing

I manually tested `codex resume` and `codex resume --all` to look for
functional or performance regressions in the resume picker.
## Summary

Side conversations can hide important state changes from the parent
conversation while the user is focused on the side thread. In
particular, the parent may finish, fail, need user input, or require an
approval while the side conversation remains visible. Users need a
lightweight signal for those states, but parent approval overlays should
not interrupt the side conversation itself.

This change adds parent-conversation status to the side conversation
context label and defers parent interactive overlays while side mode is
active. When the user exits side mode, pending parent approvals and
input requests are restored in the main thread. The pending approval
footer avoids duplicating the same parent approval status, and replayed
notice cells are filtered when restoring a pending interactive request
so tips or warnings do not crowd out the approval prompt.

The change is contained to the TUI side-conversation and thread replay
paths.

Example 1: Approval pending
<img width="752" height="35" alt="Screenshot 2026-04-19 at 12 56 07 PM"
src="https://github.com/user-attachments/assets/1cc0f1a3-9cab-4d60-aed2-96523ccafc20"
/>

Example 2: Turn complete
<img width="754" height="35" alt="Screenshot 2026-04-19 at 12 56 27 PM"
src="https://github.com/user-attachments/assets/653521a5-e298-4366-ae1c-72b56eb88eeb"
/>
- Migrates unloaded `thread/name/set` and `thread/memoryModeSet`
app-server writes behind the generic
`ThreadStore::update_thread_metadata` API rather than adding one-off
store methods for setting thread name or memory mode.
- Implements the local ThreadStore metadata patch path for thread name
and memory mode, including rollout append, legacy name index updates,
SessionMeta validation/update, SQLite reconciliation, and re-reading the
stored thread.
- Adds focused local thread-store unit coverage plus app-server
integration coverage for the migrated unloaded write paths.
## Why

`PermissionProfile` needs stable, canonical file-system semantics before
it can become the primary runtime permissions abstraction. Without a
canonical form, callers have to keep re-deriving legacy sandbox maps and
profile comparisons remain lossy or order-dependent.

## What changed

This adds canonicalization helpers for `FileSystemPermissions` and
`PermissionProfile`, expands special paths into explicit sandbox
entries, and updates permission request/conversion paths to consume
those canonical entries. It also tightens the legacy bridge so root-wide
write profiles with narrower carveouts are not silently projected as
full-disk legacy access.

## Verification

- `cargo test -p codex-protocol
root_write_with_read_only_child_is_not_full_disk_write -- --nocapture`
- `cargo test -p codex-sandboxing permission -- --nocapture`
- `cargo test -p codex-tui permissions -- --nocapture`
This is the second cleanup in the await-holding lint stack. The
higher-level goal, following openai#18178
and openai#18398, is to enable Clippy
coverage for guards held across `.await` points without carrying broad
suppressions.

The stack is working toward enabling Clippy's
[`await_holding_lock`](https://rust-lang.github.io/rust-clippy/master/index.html#await_holding_lock)
lint and the configurable
[`await_holding_invalid_type`](https://rust-lang.github.io/rust-clippy/master/index.html#await_holding_invalid_type)
lint for Tokio guard types.

Several existing fields used `tokio::sync::Mutex<()>` only as
one-at-a-time async gates. Those guards intentionally lived across
`.await` while an operation was serialized. A mutex over `()` suggests
protected data and trips the await-holding lint shape; a single-permit
`tokio::sync::Semaphore` expresses the intended serialization directly.

## What changed

- Replace `Mutex<()>` serialization gates with `Semaphore::new(1)` for
agent identity ensure, exec policy updates, guardian review session
reuse, plugin remote sync, managed network proxy refresh, auth token
refresh, and RMCP session recovery.
- Update call sites from `lock().await` / `try_lock()` to
`acquire().await` / `try_acquire()`.
- Map closed-semaphore errors into the existing local error types, even
though these semaphores are owned for the lifetime of their managers.
- Update session test builders for the new
`managed_network_proxy_refresh_lock` type.

## Verification

- The split stack was verified at the final lint-enabling head with
`just clippy`.





---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/18403).
* openai#18698
* openai#18423
* openai#18418
* __->__ openai#18403
- Replace the active models-manager catalog with the deleted core
catalog contents.
- Replace stale hardcoded test model slugs with current bundled model
slugs.
- Keep this as a stacked change on top of the cleanup PR.
Wires patch_updated events through app_server. These events are parsed
and streamed while apply_patch is being written by the model. Also adds 500ms of buffering to the patch_updated events in the diff_consumer.

The eventual goal is to use this to display better progress indicators in
the codex app.
## Problem
The TUI still imported path utilities and config-loader symbols through
app-server-client's legacy_core facade even though those APIs already
exist in utility/config crates. This is part of our ongoing effort to
whittle away at these old dependencies.

## Solution
Rewire imports to avoid the TUI directly importing from the core crate
and instead import from common lower-level crates. This PR doesn't
include any functional changes; it's just a simple rewiring.
## Summary
- Add the missing `background_task_id: None` field to the
`AgentIdentityAuthRecord` fixture introduced in `auth_tests.rs`.

## Why
- Current `main` fails Bazel/rust-ci compile paths after the
background-task auth field landed and a later auth test fixture
constructed `AgentIdentityAuthRecord` without that new field.
- I intentionally removed the earlier broader CI-stability edits from
this PR. The code-mode timeout, external-agent migration snapshot, and
MCP resource timeout failures appear to be general/flaky or unrelated to
the agent identity merge stack rather than cleanly caused by it.

## Validation
- `cargo test -p codex-login
dummy_chatgpt_auth_does_not_create_cwd_auth_json_when_identity_is_set --
--nocapture`
- `just fmt`
Automated update of models.json.

---------

Co-authored-by: aibrahim-oai <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>
## Summary
- Pin vulnerable npm dependencies through the existing root
`resolutions` mechanism so the lockfile moves only to patched versions.
- Refresh `pnpm-lock.yaml` for `@modelcontextprotocol/sdk`,
`handlebars`, `path-to-regexp`, `picomatch`, `minimatch`, `flatted`,
`rollup`, and `glob`.
- Bump `quinn-proto` from `0.11.13` to `0.11.14` and refresh
`MODULE.bazel.lock`.

## Testing
- `corepack pnpm --store-dir .pnpm-store install --frozen-lockfile
--ignore-scripts`
- `corepack pnpm audit --audit-level high` (passes; remaining advisories
are low/moderate)
- `corepack pnpm -r --filter ./sdk/typescript run build`
- `corepack pnpm exec eslint 'src/**/*.ts' 'tests/**/*.ts'`
- `cargo check --locked`
- `cargo build -p codex-cli`
- `bazel --output_user_root=/tmp/bazel-codex-dependabot
--ignore_all_rc_files mod deps --lockfile_mode=error`
- `just fmt`

Note: `corepack pnpm -r --filter ./sdk/typescript run test` was also
attempted after building `codex`; it is blocked on this workstation by
host-managed Codex MDM/auth state (`approval_policy` restrictions and
ChatGPT/API-key mismatch), not by this dependency change.
…17692)

## Why

Guardian review analytics needs a Rust event shape that matches the
backend schema while avoiding unnecessary PII exposure from reviewed
tool calls. This PR narrows the analytics payload to the fields we
intend to emit and keeps shared Guardian assessment enums in protocol
instead of duplicating equivalent analytics-only enums.

## What changed

- Uses protocol Guardian enums directly for `risk_level`,
`user_authorization`, `outcome`, and command source values.
- Removes high-risk reviewed-action fields from the analytics payload,
including raw commands, display strings, working directories, file
paths, network targets/hosts, justification text, retry reason, and
rationale text.
- Makes `target_item_id` and `tool_call_count` nullable so the Codex
event can represent cases where the app-server protocol or producer does
not have those values.
- Keeps lower-risk structured reviewed-action metadata such as sandbox
permissions, permission profile, `tty`, `execve` source/program, network
protocol/port, and MCP connector/tool labels.
- Adds an analytics reducer/client test covering `codex_guardian_review`
serialization with an optional `target_item_id` and absent removed
fields.

## Verification

- `cargo test -p codex-analytics
guardian_review_event_ingests_custom_fact_with_optional_target_item`
- `cargo fmt --check`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17692).
* openai#17696
* openai#17695
* openai#17693
* __->__ openai#17692
## Summary
Disables apps, plugins, mcps for the guardian subagent thread

## Testing
- [x] Added unit tests
## Summary

This PR aims to improve integration between the realtime model and the
codex agent by sharing more context with each other. In particular, we
now share full realtime conversation transcript deltas in addition to
the delegation message.

realtime_conversation.rs now turns a handoff into:
```
<realtime_delegation>
  <input>...</input>
  <transcript_delta>...</transcript_delta>
</realtime_delegation>
```

## Implementation notes

The transcript is accumulated in the realtime websocket layer as parsed
realtime events arrive. When a background-agent handoff is requested,
the current transcript snapshot is copied onto the handoff event and
then serialized by `realtime_conversation.rs` into the hidden realtime
delegation envelope that Codex receives as user-turn context.

For Realtime V2, the session now explicitly enables input audio
transcription, and the parser handles the relevant input/output
transcript completion events so the snapshot includes both user speech
and realtime model responses. The delegation `<input>` remains the
actual handoff request, while `<transcript_delta>` carries the
surrounding conversation history for context.

Reviewers should note that the transcript payload is intended for Codex
context sharing, not UI rendering. The realtime delegation envelope
should stay hidden from the user-facing transcript surface, while still
being included in the background-agent turn so Codex can answer with the
same conversational context the realtime model had.
## Why

`skills/list` refreshes are best-effort metadata updates. If one fails
during startup or thread switching, the TUI should keep running and show
enough detail to diagnose the app-server failure instead of leaving the
user with only a log entry.

This addresses the recoverability and observability issue reported in
openai#16914.

## What Changed

- Preserve the full startup `skills/list` error chain before sending it
back through the app event queue.
- Surface failed skills refreshes as recoverable TUI error messages
while still logging the warning.

This is related to the recent bug fix from [PR
openai#18370](openai#18370).
Fixes stale test fixtures left after the active bundled model catalog
updates in openai#18586 and openai#18388. Those changes made `gpt-5.4` the current
default and removed several older hardcoded slugs, which left Windows
Bazel shards failing TUI and config tests.

What changed:
- Refresh TUI model migration, availability NUX, plan-mode, status, and
snapshot fixtures to use active bundled model slugs.
- Update the config edit test expectation for the TOML-quoted
`"gpt-5.2"` migration key.
- Move the model catalog tests into
`codex-rs/tui/src/app/tests/model_catalog.rs` so touching them does not
trip the blob-size policy for `app.rs`.

Verification:
- CI Bazel/lint checks are expected to cover the affected test shards.
Add experimental config to use remote thread store rather than local
thread store implementation in app server
## Why

Fixes openai#18718.

After rewinding a thread, `/copy` could still copy the latest assistant
response from before the rewind. The transcript cells were rolled back,
but the copy source was a single `last_agent_markdown` cache that was
not synchronized with backtracking, so the visible conversation and
copied content could diverge.

## What changed

`ChatWidget` now keeps a bounded copy history for the most recent 32
assistant responses, keyed by the visible user-turn count. When local
rollback trims transcript cells, the copy cache is trimmed to the same
surviving user-turn count so `/copy` uses the latest visible assistant
response.

If the user rewinds past the retained copy window, `/copy` now reports:

```text
Cannot copy that response after rewinding. Only the most recent 32 responses are available to /copy.
```

The change also adds coverage for copying the latest surviving response
after rollback and for the over-limit rewind message.

## Verification

- Manually resumed a synthetic 35-turn session, rewound within the
retained window, and verified `/copy` copied the surviving response.
- Manually rewound past the retained window and verified `/copy` showed
the 32-response limit message.
- `cargo test -p codex-tui slash_copy`
- `just fix -p codex-tui`
- `cargo insta pending-snapshots`

Note: `cargo test -p codex-tui` currently fails on unrelated model
catalog and snapshot drift around the default model changing to
`gpt-5.4`; the focused `/copy` tests pass after fixing the new test
setup.
This updates the spawn-agent tool contract so subagents are presented as
inheriting the parent model by default. The visible model list is now
framed as optional overrides, the model parameter tells callers to leave
it unset and the delegation guidance no longer nudges models toward
picking a smaller/mini override.

Fixes reports that 5.4 would occasionally pick 5.2 or lower as
sub-agents.
## Problem
The TUI resolved fork parent titles from local CODEX_HOME metadata,
which could show missing or stale titles when app-server metadata is
authoritative.

This is a lingering bug left over from the migration of the TUI to the
app-server interface. I found it when I asked Codex to review all places
where the TUI code was still directly accessing the local CODEX_HOME.

## Solution
Route fork parent title metadata through the app-server session state
and render only that supplied title, with focused snapshot coverage for
stale local metadata.

## Testing
I manually tested by renaming a thread then forking it and confirming
that the "forked from" message indicated the parent thread's name.
Cascade the thread archive endpoint to all the sub-agents in the agent
tree

Fix: openai#17867

---------

Co-authored-by: Codex <noreply@openai.com>
Migrate the conversation summary App Server methods to ThreadStore

Because this app server api allows explicitly fetching the thread by
rollout path, intercept that case in the app server code and (a) route
directly to underlying local thread store methods if we're using a local
thread store, or (b) throw an unsupported error if we're using a remote
thread store. This keeps the thread store API clean and all filesystem
operations inside of the local thread store, which pushing the
"fundamental incompatibility" check as early as possible.
BREAKING CHANGE: thread control config and API rename router mode to orchestrator without router compatibility.
@richardgetz richardgetz merged commit 3e755c0 into stable Apr 23, 2026
15 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.