Summary
When codex is pointed at a distributed inference cluster (exo, leader-follower llama.cpp, vLLM with ray, etc.), the active inference endpoint can rotate as nodes join or leave. The current model_providers.<name>.base_url field is a single static URL, which forces operators to either:
- Edit
~/.codex/config.toml every time the leader changes; or
- Front the cluster with a sticky proxy (extra hop, extra failure mode).
Neither is great. A thin sidecar that reports the current leader's URL on demand is a better fit, but codex has no way to consume one.
Proposed solution
Two new provider-scoped fields plus a DiscoveryResponse wire format:
discovery_url: Option<String> — URL codex GETs at session start to retrieve the current effective base_url.
discovery_request_timeout_ms: Option<u64> — per-attempt timeout (default 5_000 ms, capped at 60_000 ms so a misconfigured endpoint can never wedge session start).
The endpoint MUST respond with application/json matching:
{ "base_url": "http://leader.example.com:8080/v1" }
Extra fields are ignored so future TTL / version / alternate-URL metadata can land without breaking v1 clients.
Example config.toml:
[model_providers.my-cluster]
name = "my exo cluster"
base_url = "http://fallback.example.com/v1" # used if discovery fails
wire_api = "responses"
discovery_url = "http://cluster-manager.local/codex/discover"
discovery_request_timeout_ms = 2500
Behavior:
- At session start, codex
GETs discovery_url.
- On success, the response's
base_url replaces the static one for the lifetime of the session.
- On failure (timeout, non-2xx, invalid JSON, relative URL, body > 64 KiB cap), codex emits a startup warning and falls back to the static
base_url.
Periodic refresh / TTL is intentionally out of scope for v1; once-per-session resolution covers the common case. Codex sessions are short relative to typical leader-rotation intervals, and future work can add a background refresh task without changing the wire format.
Safety
- 64 KiB cap on discovery response body — bounds memory exposure if a misconfigured endpoint streams unbounded.
- Discovered
base_url is parsed via reqwest::Url::parse and rejected if not absolute, so an attacker controlling the discovery endpoint can't slip in a relative URL.
- 60-second hard cap on per-attempt timeout.
DiscoveryError is exhaustive and each variant carries enough context to surface a useful operator-facing warning.
What I considered and rejected
- Reading
base_url from an env var that the cluster updates: requires every codex caller to also have a sidecar polling the cluster, which is just relocating the problem.
- Periodic background refresh in v1: meaningfully larger surface (refresh task lifecycle, race against in-flight requests, TTL semantics) for a marginal UX win. v1 covers the 90% case.
- Single shared discovery endpoint for all providers: doesn't scale to clusters under different cluster managers.
Reference implementation
A working implementation with tests lives on the team-wcv fork:
Touches:
codex-rs/model-provider-info/src/lib.rs — new discovery_url, discovery_request_timeout_ms fields on ModelProviderInfo; new DiscoveryResponse struct; DEFAULT_DISCOVERY_REQUEST_TIMEOUT_MS / MAX_DISCOVERY_REQUEST_TIMEOUT_MS constants.
codex-rs/model-provider/src/discovery.rs — new module with two helpers:
resolve_provider_discovery(client, info) -> Result<ModelProviderInfo, DiscoveryError> — fail-fast variant.
resolve_provider_discovery_or_warn(client, info) -> ModelProviderInfo — best-effort variant that warns and falls back on failure.
codex-rs/model-provider/src/discovery_tests.rs — 9 tests using wiremock covering success, extra-fields-ignored, non-2xx, non-JSON, relative-URL rejection, invalid-discovery-URL, best-effort fallback, and no-op-when-unset.
codex-rs/core/config.schema.json — regenerated via just write-config-schema.
Scope of the reference implementation
The fields and helper are wired into ModelProviderInfo, but the helper is not yet invoked from the 30+ create_model_provider call sites across codex-core, codex-tui, codex-app-server, etc. Where to plumb session-start discovery is a design choice that materially affects async ergonomics, and I wanted to collect maintainer guidance on shape before touching every session-construction path. The helper is self-contained and ready to drop into whichever site(s) you'd prefer:
- A central session-init helper that wraps
create_model_provider?
- An async builder on
Config that pre-resolves discovery before the synchronous create_model_provider is called?
- Per-site, with operators opting in by feature flag?
Happy to do that integration as a follow-up PR once the wire format and helper shape are approved (or to rework if you'd prefer different ergonomics).
Open to alternatives
If discovery_url as a name conflicts with existing terminology, or if you'd prefer the wire format to carry more metadata up front (e.g., { "base_url": "...", "ttl_seconds": 30, "expires_at": "..." }), happy to rework. The patch shape is small enough that iteration is cheap.
Summary
When codex is pointed at a distributed inference cluster (exo, leader-follower llama.cpp, vLLM with ray, etc.), the active inference endpoint can rotate as nodes join or leave. The current
model_providers.<name>.base_urlfield is a single static URL, which forces operators to either:~/.codex/config.tomlevery time the leader changes; orNeither is great. A thin sidecar that reports the current leader's URL on demand is a better fit, but codex has no way to consume one.
Proposed solution
Two new provider-scoped fields plus a
DiscoveryResponsewire format:discovery_url: Option<String>— URL codexGETs at session start to retrieve the current effectivebase_url.discovery_request_timeout_ms: Option<u64>— per-attempt timeout (default 5_000 ms, capped at 60_000 ms so a misconfigured endpoint can never wedge session start).The endpoint MUST respond with
application/jsonmatching:{ "base_url": "http://leader.example.com:8080/v1" }Extra fields are ignored so future TTL / version / alternate-URL metadata can land without breaking v1 clients.
Example
config.toml:Behavior:
GETsdiscovery_url.base_urlreplaces the static one for the lifetime of the session.base_url.Periodic refresh / TTL is intentionally out of scope for v1; once-per-session resolution covers the common case. Codex sessions are short relative to typical leader-rotation intervals, and future work can add a background refresh task without changing the wire format.
Safety
base_urlis parsed viareqwest::Url::parseand rejected if not absolute, so an attacker controlling the discovery endpoint can't slip in a relative URL.DiscoveryErroris exhaustive and each variant carries enough context to surface a useful operator-facing warning.What I considered and rejected
base_urlfrom an env var that the cluster updates: requires every codex caller to also have a sidecar polling the cluster, which is just relocating the problem.Reference implementation
A working implementation with tests lives on the team-wcv fork:
Touches:
codex-rs/model-provider-info/src/lib.rs— newdiscovery_url,discovery_request_timeout_msfields onModelProviderInfo; newDiscoveryResponsestruct;DEFAULT_DISCOVERY_REQUEST_TIMEOUT_MS/MAX_DISCOVERY_REQUEST_TIMEOUT_MSconstants.codex-rs/model-provider/src/discovery.rs— new module with two helpers:resolve_provider_discovery(client, info) -> Result<ModelProviderInfo, DiscoveryError>— fail-fast variant.resolve_provider_discovery_or_warn(client, info) -> ModelProviderInfo— best-effort variant that warns and falls back on failure.codex-rs/model-provider/src/discovery_tests.rs— 9 tests usingwiremockcovering success, extra-fields-ignored, non-2xx, non-JSON, relative-URL rejection, invalid-discovery-URL, best-effort fallback, and no-op-when-unset.codex-rs/core/config.schema.json— regenerated viajust write-config-schema.Scope of the reference implementation
The fields and helper are wired into
ModelProviderInfo, but the helper is not yet invoked from the 30+create_model_providercall sites acrosscodex-core,codex-tui,codex-app-server, etc. Where to plumb session-start discovery is a design choice that materially affects async ergonomics, and I wanted to collect maintainer guidance on shape before touching every session-construction path. The helper is self-contained and ready to drop into whichever site(s) you'd prefer:create_model_provider?Configthat pre-resolves discovery before the synchronouscreate_model_provideris called?Happy to do that integration as a follow-up PR once the wire format and helper shape are approved (or to rework if you'd prefer different ergonomics).
Open to alternatives
If
discovery_urlas a name conflicts with existing terminology, or if you'd prefer the wire format to carry more metadata up front (e.g.,{ "base_url": "...", "ttl_seconds": 30, "expires_at": "..." }), happy to rework. The patch shape is small enough that iteration is cheap.