codex_models_manager::manager: failed to refresh available models: timeout waiting for child process to exit (default config, every 15-45s)

### What version of Codex CLI is running?

codex-cli 0.130.0

### What subscription do you have?

ChatGPT (auth_mode="Chatgpt", `gpt-5.4` access via subscription)

### Which model were you using?

gpt-5.4 (the same error fires regardless of model; gpt-5.4 used here because it is the default)

### What platform is your computer?

Darwin 24.6.0 x86_64 i386 (macOS)

### What terminal emulator and version are you using (if applicable)?

Reproduces from `codex exec` invoked directly (no terminal multiplexer), and equally reproduces when codex-cli is invoked as a subprocess from another tool. Not terminal-specific.

### Codex doctor report

`codex doctor` requires an interactive stdin in 0.130.0; not available in this non-interactive sandbox. Verbose-run snippet attached in "Additional information" instead.

### What issue are you seeing?

`codex_models_manager` periodically emits this stderr line throughout every run, roughly every 15-45 seconds:

```
ERROR codex_models_manager::manager: failed to refresh available models: timeout waiting for child process to exit
```

The run itself completes normally — agents finish tool calls and produce expected output between occurrences — but the stderr noise is constant, the implied background-child latency is non-zero on every occurrence, and on long-running invocations the error count is significant: **12 occurrences in a single 10-minute production heartbeat** (sample run `a4864072-70a6-48ed-a98b-c8a6098e8499` of an MCP-driven agent host). In that reference run, **100% of stderr emitted by codex was this error** (12 of 12 stderr lines).

This is distinct from #22205 ("Azure custom provider: model refresh logs `missing field models`"), which involves the same `codex_models_manager::manager` module but a different underlying failure mode (response-decode error vs. child-process-exit timeout). Posting separately so the root causes don't get conflated, but linking #22205 as related.

### What steps can reproduce the bug?

Minimal reproducer (fresh `CODEX_HOME`, default OpenAI provider, default `gpt-5.4`, no curated plugins required):

```bash
SCRATCH=$(mktemp -d)
cp ~/.codex/auth.json "$SCRATCH/"
printf 'model = "gpt-5.4"\n' > "$SCRATCH/config.toml"

CODEX_HOME="$SCRATCH" codex exec --skip-git-repo-check "say DONE" 2>&1 \
    | grep -c codex_models_manager
# Observed: 2 occurrences (in ~46-55s wall-clock), even on a single-turn run.
```

Notes on the reproducer:

- The error only fires when the models cache is cold. The cache lives at `$CODEX_HOME/models_cache.json` with a 300s TTL — to reproduce reliably, either use a fresh `$CODEX_HOME` (as above) or delete `models_cache.json` between runs.
- Disabling curated plugins (`gmail`, `google-calendar`, `github`, `google-drive`, `canva` — all `@openai-curated`) does NOT suppress the error. The trigger is internal to the model-refresh path, not the plugin loader.
- Reproduces with the default OpenAI provider; no custom `[model_providers]` block needed.
- Production heartbeat density (10-min sessions): 12 occurrences per run, mean gap 55.4s, min gap 13.0s, max gap 140.6s. The variable gap suggests backoff-on-failure logic.

### What is the expected behavior?

Either of:

1. The model-refresh subprocess should not time out under default conditions. If something is genuinely deadlocking in the child, surface it through the normal error path; do not silently retry the same hung subprocess on every 300s TTL expiry.
2. If the refresh attempt is expected to fail in some configurations, downgrade the log level from `ERROR` to `WARN` (or `DEBUG`) and suppress repeated occurrences once the failure mode is established for a given session — the current pattern emits noise indistinguishable from a real failure to anyone tailing logs.

Either is acceptable. Right now downstream tooling cannot distinguish "agent dead" from "agent fine but Codex is logging spam" without parsing the specific error string.

### Additional information

**Operational impact for MCP host integrations:**

This affects any environment that hosts Codex as a subprocess and uses stderr as a liveness signal. We mitigated downstream (in Paperclip's heartbeat watchdog) by explicitly excluding stderr from the activity signal — but that requires every host integration to know about this Codex-specific noise pattern and ignore it. A cleaner fix in Codex would let host integrations rely on stderr again.

**Verbose-log snippet showing the model-cache TTL path** (RUST_LOG=info, default config):

```
INFO codex_models_manager::manager: models cache: evaluating cache eligibility client_version="0.130.0"
INFO codex_models_manager::cache:   models cache: attempting load_fresh cache_path=...
INFO codex_models_manager::cache:   models cache: cache hit cache_path=... cache_ttl_secs=300
INFO codex_models_manager::manager: models cache: cache entry applied models_count=6 etag=Some("...")
INFO codex_models_manager::manager: models cache: using cached models for OnlineIfUncached
```

When the cache misses (TTL elapsed, fresh CODEX_HOME, or models_cache.json absent), the `load_fresh` path spawns a child process that hits the `timeout waiting for child process to exit` failure — this is the error in the title.

**Related issue:** #22205 (Azure custom provider, same module, different root cause).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

codex_models_manager::manager: failed to refresh available models: timeout waiting for child process to exit (default config, every 15-45s) #23119

What version of Codex CLI is running?

What subscription do you have?

Which model were you using?

What platform is your computer?

What terminal emulator and version are you using (if applicable)?

Codex doctor report

What issue are you seeing?

What steps can reproduce the bug?

What is the expected behavior?

Additional information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

codex_models_manager::manager: failed to refresh available models: timeout waiting for child process to exit (default config, every 15-45s) #23119

Description

What version of Codex CLI is running?

What subscription do you have?

Which model were you using?

What platform is your computer?

What terminal emulator and version are you using (if applicable)?

Codex doctor report

What issue are you seeing?

What steps can reproduce the bug?

What is the expected behavior?

Additional information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions