Skip to content

Add model_providers.<name>.discovery_url for cluster-aware base-URL refresh #22063

@team-wcv

Description

@team-wcv

Summary

When codex is pointed at a distributed inference cluster (exo, leader-follower llama.cpp, vLLM with ray, etc.), the active inference endpoint can rotate as nodes join or leave. The current model_providers.<name>.base_url field is a single static URL, which forces operators to either:

  • Edit ~/.codex/config.toml every time the leader changes; or
  • Front the cluster with a sticky proxy (extra hop, extra failure mode).

Neither is great. A thin sidecar that reports the current leader's URL on demand is a better fit, but codex has no way to consume one.

Proposed solution

Two new provider-scoped fields plus a DiscoveryResponse wire format:

  • discovery_url: Option<String> — URL codex GETs at session start to retrieve the current effective base_url.
  • discovery_request_timeout_ms: Option<u64> — per-attempt timeout (default 5_000 ms, capped at 60_000 ms so a misconfigured endpoint can never wedge session start).

The endpoint MUST respond with application/json matching:

{ "base_url": "http://leader.example.com:8080/v1" }

Extra fields are ignored so future TTL / version / alternate-URL metadata can land without breaking v1 clients.

Example config.toml:

[model_providers.my-cluster]
name = "my exo cluster"
base_url = "http://fallback.example.com/v1"   # used if discovery fails
wire_api = "responses"
discovery_url = "http://cluster-manager.local/codex/discover"
discovery_request_timeout_ms = 2500

Behavior:

  1. At session start, codex GETs discovery_url.
  2. On success, the response's base_url replaces the static one for the lifetime of the session.
  3. On failure (timeout, non-2xx, invalid JSON, relative URL, body > 64 KiB cap), codex emits a startup warning and falls back to the static base_url.

Periodic refresh / TTL is intentionally out of scope for v1; once-per-session resolution covers the common case. Codex sessions are short relative to typical leader-rotation intervals, and future work can add a background refresh task without changing the wire format.

Safety

  • 64 KiB cap on discovery response body — bounds memory exposure if a misconfigured endpoint streams unbounded.
  • Discovered base_url is parsed via reqwest::Url::parse and rejected if not absolute, so an attacker controlling the discovery endpoint can't slip in a relative URL.
  • 60-second hard cap on per-attempt timeout.
  • DiscoveryError is exhaustive and each variant carries enough context to surface a useful operator-facing warning.

What I considered and rejected

  • Reading base_url from an env var that the cluster updates: requires every codex caller to also have a sidecar polling the cluster, which is just relocating the problem.
  • Periodic background refresh in v1: meaningfully larger surface (refresh task lifecycle, race against in-flight requests, TTL semantics) for a marginal UX win. v1 covers the 90% case.
  • Single shared discovery endpoint for all providers: doesn't scale to clusters under different cluster managers.

Reference implementation

A working implementation with tests lives on the team-wcv fork:

Touches:

  • codex-rs/model-provider-info/src/lib.rs — new discovery_url, discovery_request_timeout_ms fields on ModelProviderInfo; new DiscoveryResponse struct; DEFAULT_DISCOVERY_REQUEST_TIMEOUT_MS / MAX_DISCOVERY_REQUEST_TIMEOUT_MS constants.
  • codex-rs/model-provider/src/discovery.rs — new module with two helpers:
    • resolve_provider_discovery(client, info) -> Result<ModelProviderInfo, DiscoveryError> — fail-fast variant.
    • resolve_provider_discovery_or_warn(client, info) -> ModelProviderInfo — best-effort variant that warns and falls back on failure.
  • codex-rs/model-provider/src/discovery_tests.rs — 9 tests using wiremock covering success, extra-fields-ignored, non-2xx, non-JSON, relative-URL rejection, invalid-discovery-URL, best-effort fallback, and no-op-when-unset.
  • codex-rs/core/config.schema.json — regenerated via just write-config-schema.

Scope of the reference implementation

The fields and helper are wired into ModelProviderInfo, but the helper is not yet invoked from the 30+ create_model_provider call sites across codex-core, codex-tui, codex-app-server, etc. Where to plumb session-start discovery is a design choice that materially affects async ergonomics, and I wanted to collect maintainer guidance on shape before touching every session-construction path. The helper is self-contained and ready to drop into whichever site(s) you'd prefer:

  • A central session-init helper that wraps create_model_provider?
  • An async builder on Config that pre-resolves discovery before the synchronous create_model_provider is called?
  • Per-site, with operators opting in by feature flag?

Happy to do that integration as a follow-up PR once the wire format and helper shape are approved (or to rework if you'd prefer different ergonomics).

Open to alternatives

If discovery_url as a name conflicts with existing terminology, or if you'd prefer the wire format to carry more metadata up front (e.g., { "base_url": "...", "ttl_seconds": 30, "expires_at": "..." }), happy to rework. The patch shape is small enough that iteration is cheap.

Metadata

Metadata

Assignees

No one assigned

    Labels

    custom-modelIssues related to custom model providers (including local models)enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions