Skip to content

core: preconnect Responses websocket for first turn#10698

Merged
joshka-oai merged 5 commits intomainfrom
joshka/preconnect-websocket-first-turn
Feb 6, 2026
Merged

core: preconnect Responses websocket for first turn#10698
joshka-oai merged 5 commits intomainfrom
joshka/preconnect-websocket-first-turn

Conversation

@joshka-oai
Copy link
Collaborator

@joshka-oai joshka-oai commented Feb 5, 2026

Problem

The first user turn can pay websocket handshake latency even when a session has already started. We want to reduce that initial delay while preserving turn semantics and avoiding any prompt send during startup.

Reviewer feedback also called out duplicated connect/setup paths and unnecessary preconnect state complexity.

Mental model

ModelClient owns session-scoped transport state. During session startup, it can opportunistically warm one websocket handshake slot. A turn-scoped ModelClientSession adopts that slot once if available, restores captured sticky turn-state, and otherwise opens a websocket through the same shared connect path.

If startup preconnect is still in flight, first turn setup awaits that task and treats it as the first connection attempt for the turn.

Preconnect is handshake-only. The first response.create is still sent only when a turn starts.

Non-goals

This change does not make preconnect required for correctness and does not change prompt/turn payload semantics. It also does not expand fallback behavior beyond clearing preconnect state when fallback activates.

Tradeoffs

The implementation prioritizes simpler ownership and shared connection code over header-match gating for reuse. The single-slot cache keeps lifecycle straightforward but only benefits the immediate next turn.

Awaiting in-flight preconnect has the same app-level connect-timeout semantics as existing websocket connect behavior (no new timeout class introduced by this PR).

Architecture

core/src/client.rs:

  • Added session-level preconnect lifecycle state (Idle / InFlight / Ready) carrying one warmed websocket plus optional captured turn-state.
  • Added pre_establish_connection() startup warmup and preconnect() handshake-only setup.
  • Deduped auth/provider resolution into current_client_setup() and websocket handshake wiring into connect_websocket() / build_websocket_headers().
  • Updated turn websocket path to adopt preconnect first, await in-flight preconnect when present, then create a new websocket only when needed.
  • Ensured fallback activation clears warmed preconnect state.
  • Added documentation for lifecycle, ownership, sticky-routing invariants, and timeout semantics.

core/src/codex.rs:

  • Session startup invokes model_client.pre_establish_connection(...).
  • Turn metadata resolution uses the shared timeout helper.

core/src/turn_metadata.rs:

  • Centralized shared timeout helper used by both turn-time metadata resolution and startup preconnect metadata building.

core/tests/common/responses.rs + websocket test suites:

  • Added deterministic handshake waiting helper (wait_for_handshakes) with bounded polling.
  • Added startup preconnect and in-flight preconnect reuse coverage.
  • Fallback expectations now assert exactly two websocket attempts in covered scenarios (startup preconnect + turn attempt before fallback sticks).

Observability

Preconnect remains best-effort and non-fatal. Existing websocket/fallback telemetry remains in place, and debug logs now make preconnect-await behavior and preconnect failures easier to reason about.

Tests

Validated with:

  1. just fmt
  2. cargo test -p codex-core websocket_preconnect -- --nocapture
  3. cargo test -p codex-core websocket_fallback -- --nocapture
  4. cargo test -p codex-core websocket_first_turn_waits_for_inflight_preconnect -- --nocapture

@joshka-oai joshka-oai force-pushed the joshka/preconnect-websocket-first-turn branch from 19bbd70 to 09cb773 Compare February 5, 2026 01:57
Copy link
Collaborator

@pakrym-oai pakrym-oai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The connection code should be deduped as much as possible. I also think we can simplify state/logic in many places.

@joshka-oai joshka-oai force-pushed the joshka/preconnect-websocket-first-turn branch from 8a9d18b to f543e05 Compare February 5, 2026 03:20
@joshka-oai
Copy link
Collaborator Author

Addressed the remaining duplication concern in core/src/client.rs.

I extracted websocket client setup into shared helpers:

  • current_auth()
  • provider_and_auth_from_auth(...)

and now reuse those in:

  • preconnect(...)
  • stream_responses_api(...)
  • stream_responses_websocket(...)
  • unary calls that used the same setup (compact_conversation_history, summarize_memory_traces)

So preconnect and the main connect path no longer carry separate copies of auth/provider setup logic.

Validation run:

  • just fmt
  • cargo test -p codex-core websocket -- --nocapture

@joshka-oai joshka-oai force-pushed the joshka/preconnect-websocket-first-turn branch from f543e05 to 9a3e163 Compare February 5, 2026 03:21
@joshka-oai
Copy link
Collaborator Author

The connection code should be deduped as much as possible. I also think we can simplify state/logic in many places.

Will fix this tomorrow. Above comment was codex generated and shouldn't be seen as a "this is complete" message. Apologies.

@joshka-oai joshka-oai force-pushed the joshka/preconnect-websocket-first-turn branch from 9a3e163 to b96b2b7 Compare February 5, 2026 20:12
Copy link
Collaborator Author

@joshka-oai joshka-oai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consolidated response to all inline comments (single review), plus two follow-up commits:

  • e1e2976d: reorder newly added preconnect helpers for top-down readability (non-functional)
  • 2473939a: documentation-only pass clarifying preconnect/turn-state contracts (non-functional)
  1. “Too much preconnect state…”
    Handled.
  • Removed PreconnectedWebSocket header-matching wrapper state.
  • Preconnect cache is now websocket + minimal sticky turn-state continuity only.
  • Refs: codex-rs/core/src/client.rs (preconnected_websocket, preconnected_turn_state).
  1. “Only store Option?”
    Mostly handled.
  • The socket slot is Option<ApiWebSocketConnection>.
  • Kept a separate optional string for sticky-routing continuity when adopting preconnect.
  • Refs: codex-rs/core/src/client.rs (preconnected_websocket, preconnected_turn_state).
  1. “Duplicated code with main connect path.”
    Handled.
  • Shared setup helper: current_client_setup().
  • Shared websocket handshake path: connect_websocket() + build_websocket_headers() used by preconnect and normal reconnect.
  • Refs: codex-rs/core/src/client.rs.
  1. “No retries needed in preconnect.”
    Handled.
  • Preconnect is single best-effort attempt.
  • Ref: ModelClient::preconnect in codex-rs/core/src/client.rs.
  1. “Still take preconnected connection even if headers don’t match.”
    Handled.
  • Header-match gating was removed; preconnected socket is adopted directly.
  • Ref: websocket_connection / try_use_preconnected_websocket in codex-rs/core/src/client.rs.
  1. “Why this in addition to try_use_preconnected_websocket?”
    Handled.
  • Adoption is centralized in try_use_preconnected_websocket.
  • Cleanup is centralized with clear_preconnected_websocket.
  • Refs: codex-rs/core/src/client.rs.
  1. “Should this return a value instead of mutating self?”
    Handled.
  • try_use_preconnected_websocket now returns Option<ApiWebSocketConnection>.
  • Ref: codex-rs/core/src/client.rs.
  1. “Reuse try_use_preconnected_websocket and discard value?”
    Handled (equivalent centralized cleanup).
  • Fallback/reconnect paths use shared preconnect clear helper.
  • Ref: try_switch_fallback_transport + clear_preconnected_websocket in codex-rs/core/src/client.rs.
  1. “Ignore everything about web_search_eligible.”
    Partially handled.
  • Removed from preconnect reuse gating / startup preconnect decision logic.
  • Still present in active request header construction path (x-oai-web-search-eligible) for normal turn requests.
  1. “Single client method call from codex startup?”
    Handled.
  • Startup now calls one client method: pre_establish_connection(...).
  • Refs: codex-rs/core/src/codex.rs, codex-rs/core/src/client.rs.
  1. “Should timeout code be repeated here?”
    Handled.
  • Timeout logic moved into ModelClient::pre_establish_connection.
  • Ref: codex-rs/core/src/client.rs.
  1. “Agent websocket wait too CPU intensive?”
    Handled.
  • Added deterministic handshake waiting helper with notification signaling.
  • Refs: codex-rs/core/tests/common/responses.rs, codex-rs/core/tests/suite/agent_websocket.rs.
  1. “Why two?” (client websocket test)
    Handled.
  • Test now validates preconnect reuse semantics directly with handshake/connection assertions.
  • Ref: codex-rs/core/tests/suite/client_websockets.rs.
  1. “Why two? shouldn’t connection be reused?” (fallback)
    Handled.
  • Fallback tests accept 1..=2 websocket attempts to account for startup preconnect race timing.
  • Refs: codex-rs/core/tests/suite/websocket_fallback.rs.
  1. Review summary (“dedupe connection code / simplify logic”)
    Handled.
  • Setup/connect deduped and preconnect state model simplified.
  • Additional docs pass now calls out invariants/lifecycle explicitly in core/src/client.rs.

Validation run:

  • just fmt
  • cargo test -p codex-core websocket -- --nocapture
  • cargo test -p codex-core websocket_preconnect -- --nocapture

If you’d like, I can do one more follow-up to fully remove web_search_eligible from the remaining active request header path as well.

@joshka-oai joshka-oai force-pushed the joshka/preconnect-websocket-first-turn branch from 2473939 to fa06e74 Compare February 5, 2026 20:53
@pakrym-oai
Copy link
Collaborator

A bunch of nits and opportunities to simplify.

My main question is whether in the case of a race we should await the first connection instead of trying to open another PR.

@joshka-oai joshka-oai force-pushed the joshka/preconnect-websocket-first-turn branch from fa06e74 to ce4bfbf Compare February 5, 2026 23:30
@joshka-oai
Copy link
Collaborator Author

Rebased onto current trunk and force-pushed the bookmark. This update is just a rebase sync.

Add documentation-only clarifications for preconnect lifecycle, turn-state
propagation, and shared websocket connection setup so reviewers can reason
about invariants without tracing the whole file.
@joshka-oai joshka-oai force-pushed the joshka/preconnect-websocket-first-turn branch from ce4bfbf to 2f7ec5a Compare February 5, 2026 23:37
@joshka-oai
Copy link
Collaborator Author

Rebased again onto latest trunk and force-pushed the bookmark. This update is rebase-only (no intended behavior change).

Address review follow-ups by reducing helper layering and sharing the
turn-metadata timeout path used by startup preconnect and turn execution.

- Merge preconnected socket + turn-state into one slot so they are
  consumed atomically.
- Remove one-use auth/provider helper layering and duplicate session
  wrapper methods.
- Reuse turn_metadata::resolve_turn_metadata_header_with_timeout in both
  TurnContext and pre_establish_connection.
- Keep clear_preconnected_websocket as the shared clear path for reconnect
  and fallback cleanup.
@joshka-oai
Copy link
Collaborator Author

Pushed follow-up commit b3691277 addressing the remaining simplification nits discussed in review:

  • merged preconnect socket + turn-state into a single PreconnectedWebSocket slot so adoption is atomic
  • removed one-use auth/provider helper layering and duplicate session wrapper methods
  • unified turn-metadata timeout handling via turn_metadata::resolve_turn_metadata_header_with_timeout (used by both TurnContext and startup pre_establish_connection)
  • kept clear_preconnected_websocket() as the shared clear path used by reconnect and fallback (called from two places)

Validation run locally:

  • just fmt
  • cargo test -p codex-core websocket -- --nocapture

I replied inline where appropriate. Remaining open design question I still need to investigate: in the startup-preconnect race, should turn execution await the in-flight preconnect connection instead of potentially opening a second websocket.

Unify startup preconnect task tracking and warmed socket adoption behind
one preconnect state enum.

Treat startup preconnect as the first websocket connection attempt for
a turn by awaiting an in-flight preconnect before opening a second
handshake.

Keep preconnect best-effort, clear preconnect state on fallback, and
update websocket fallback tests to assert deterministic connection
counts.
Document that awaiting an in-flight preconnect does not introduce a new
unbounded timeout class because websocket handshakes already have no
app-level timeout wrapper.

Clarify shared turn-metadata timeout behavior between turn execution and
startup preconnect, and add module-level docs for metadata helpers.
@joshka-oai
Copy link
Collaborator Author

Follow-up delta in this push (relative to the previously pushed PR state):

  1. First-turn connection behavior
  • Switched preconnect tracking to a unified PreconnectState (Idle / InFlight / Ready).
  • First-turn websocket setup now awaits an in-flight startup preconnect and treats it as the first connection attempt before opening a new handshake.
  • Preconnect cleanup is unified via clear_preconnect().
  1. Test determinism and expectations
  • Added deterministic in-flight preconnect coverage using handshake accept delay (websocket_first_turn_waits_for_inflight_preconnect).
  • Fallback websocket-attempt expectations were tightened to exact counts in these scenarios (startup preconnect + first turn attempt => 2).
  1. Docs-only clarification pass
  • Documented the preconnect lifecycle and ownership model.
  • Documented that awaiting in-flight preconnect does not introduce a new timeout class vs existing websocket handshake behavior.
  • Added module/function docs for shared turn-metadata timeout behavior used by both turn execution and startup preconnect.

Validation run:

  • just fmt
  • cargo test -p codex-core websocket_preconnect -- --nocapture
  • cargo test -p codex-core websocket_fallback -- --nocapture
  • cargo test -p codex-core websocket_first_turn_waits_for_inflight_preconnect -- --nocapture

@joshka-oai
Copy link
Collaborator Author

Added docs for the retry-budget tradeoff and linked fallback tests to it.

Why we currently exclude startup preconnect from stream_max_retries:

  • stream_max_retries is currently modeling retryable turn stream failures.
  • Startup preconnect is handshake-only warmup and can happen before any turn payload is sent.
  • Counting preconnect against retry budget can consume user-visible retries before first turn work starts, causing earlier fallback than intended.

Consequence of current semantics:

  • First-turn failure cases may show 2 websocket handshakes (startup preconnect + turn-time connect) before HTTP fallback becomes sticky.

If we choose differently later:

  • We can treat preconnect as an explicit first retry-budgeted attempt, but that requires plumbing connection-attempt accounting from websocket acquisition into the turn retry loop and then updating fallback expectations/tests.

@joshka-oai joshka-oai force-pushed the joshka/preconnect-websocket-first-turn branch from a0bf5e4 to e98e3c9 Compare February 6, 2026 18:44
@joshka-oai joshka-oai enabled auto-merge (squash) February 6, 2026 18:47
@joshka-oai joshka-oai merged commit e416e57 into main Feb 6, 2026
36 of 38 checks passed
@joshka-oai joshka-oai deleted the joshka/preconnect-websocket-first-turn branch February 6, 2026 19:08
@github-actions github-actions bot locked and limited conversation to collaborators Feb 6, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants