core: preconnect Responses websocket for first turn#10698
Conversation
19bbd70 to
09cb773
Compare
pakrym-oai
left a comment
There was a problem hiding this comment.
The connection code should be deduped as much as possible. I also think we can simplify state/logic in many places.
8a9d18b to
f543e05
Compare
|
Addressed the remaining duplication concern in I extracted websocket client setup into shared helpers:
and now reuse those in:
So preconnect and the main connect path no longer carry separate copies of auth/provider setup logic. Validation run:
|
f543e05 to
9a3e163
Compare
Will fix this tomorrow. Above comment was codex generated and shouldn't be seen as a "this is complete" message. Apologies. |
9a3e163 to
b96b2b7
Compare
joshka-oai
left a comment
There was a problem hiding this comment.
Consolidated response to all inline comments (single review), plus two follow-up commits:
e1e2976d: reorder newly added preconnect helpers for top-down readability (non-functional)2473939a: documentation-only pass clarifying preconnect/turn-state contracts (non-functional)
- “Too much preconnect state…”
Handled.
- Removed
PreconnectedWebSocketheader-matching wrapper state. - Preconnect cache is now websocket + minimal sticky turn-state continuity only.
- Refs:
codex-rs/core/src/client.rs(preconnected_websocket,preconnected_turn_state).
- “Only store Option?”
Mostly handled.
- The socket slot is
Option<ApiWebSocketConnection>. - Kept a separate optional string for sticky-routing continuity when adopting preconnect.
- Refs:
codex-rs/core/src/client.rs(preconnected_websocket,preconnected_turn_state).
- “Duplicated code with main connect path.”
Handled.
- Shared setup helper:
current_client_setup(). - Shared websocket handshake path:
connect_websocket()+build_websocket_headers()used by preconnect and normal reconnect. - Refs:
codex-rs/core/src/client.rs.
- “No retries needed in preconnect.”
Handled.
- Preconnect is single best-effort attempt.
- Ref:
ModelClient::preconnectincodex-rs/core/src/client.rs.
- “Still take preconnected connection even if headers don’t match.”
Handled.
- Header-match gating was removed; preconnected socket is adopted directly.
- Ref:
websocket_connection/try_use_preconnected_websocketincodex-rs/core/src/client.rs.
- “Why this in addition to try_use_preconnected_websocket?”
Handled.
- Adoption is centralized in
try_use_preconnected_websocket. - Cleanup is centralized with
clear_preconnected_websocket. - Refs:
codex-rs/core/src/client.rs.
- “Should this return a value instead of mutating self?”
Handled.
try_use_preconnected_websocketnow returnsOption<ApiWebSocketConnection>.- Ref:
codex-rs/core/src/client.rs.
- “Reuse try_use_preconnected_websocket and discard value?”
Handled (equivalent centralized cleanup).
- Fallback/reconnect paths use shared preconnect clear helper.
- Ref:
try_switch_fallback_transport+clear_preconnected_websocketincodex-rs/core/src/client.rs.
- “Ignore everything about web_search_eligible.”
Partially handled.
- Removed from preconnect reuse gating / startup preconnect decision logic.
- Still present in active request header construction path (
x-oai-web-search-eligible) for normal turn requests.
- “Single client method call from codex startup?”
Handled.
- Startup now calls one client method:
pre_establish_connection(...). - Refs:
codex-rs/core/src/codex.rs,codex-rs/core/src/client.rs.
- “Should timeout code be repeated here?”
Handled.
- Timeout logic moved into
ModelClient::pre_establish_connection. - Ref:
codex-rs/core/src/client.rs.
- “Agent websocket wait too CPU intensive?”
Handled.
- Added deterministic handshake waiting helper with notification signaling.
- Refs:
codex-rs/core/tests/common/responses.rs,codex-rs/core/tests/suite/agent_websocket.rs.
- “Why two?” (client websocket test)
Handled.
- Test now validates preconnect reuse semantics directly with handshake/connection assertions.
- Ref:
codex-rs/core/tests/suite/client_websockets.rs.
- “Why two? shouldn’t connection be reused?” (fallback)
Handled.
- Fallback tests accept
1..=2websocket attempts to account for startup preconnect race timing. - Refs:
codex-rs/core/tests/suite/websocket_fallback.rs.
- Review summary (“dedupe connection code / simplify logic”)
Handled.
- Setup/connect deduped and preconnect state model simplified.
- Additional docs pass now calls out invariants/lifecycle explicitly in
core/src/client.rs.
Validation run:
just fmtcargo test -p codex-core websocket -- --nocapturecargo test -p codex-core websocket_preconnect -- --nocapture
If you’d like, I can do one more follow-up to fully remove web_search_eligible from the remaining active request header path as well.
2473939 to
fa06e74
Compare
|
A bunch of nits and opportunities to simplify. My main question is whether in the case of a race we should await the first connection instead of trying to open another PR. |
fa06e74 to
ce4bfbf
Compare
|
Rebased onto current trunk and force-pushed the bookmark. This update is just a rebase sync. |
Add documentation-only clarifications for preconnect lifecycle, turn-state propagation, and shared websocket connection setup so reviewers can reason about invariants without tracing the whole file.
ce4bfbf to
2f7ec5a
Compare
|
Rebased again onto latest trunk and force-pushed the bookmark. This update is rebase-only (no intended behavior change). |
Address review follow-ups by reducing helper layering and sharing the turn-metadata timeout path used by startup preconnect and turn execution. - Merge preconnected socket + turn-state into one slot so they are consumed atomically. - Remove one-use auth/provider helper layering and duplicate session wrapper methods. - Reuse turn_metadata::resolve_turn_metadata_header_with_timeout in both TurnContext and pre_establish_connection. - Keep clear_preconnected_websocket as the shared clear path for reconnect and fallback cleanup.
|
Pushed follow-up commit
Validation run locally:
I replied inline where appropriate. Remaining open design question I still need to investigate: in the startup-preconnect race, should turn execution await the in-flight preconnect connection instead of potentially opening a second websocket. |
Unify startup preconnect task tracking and warmed socket adoption behind one preconnect state enum. Treat startup preconnect as the first websocket connection attempt for a turn by awaiting an in-flight preconnect before opening a second handshake. Keep preconnect best-effort, clear preconnect state on fallback, and update websocket fallback tests to assert deterministic connection counts.
Document that awaiting an in-flight preconnect does not introduce a new unbounded timeout class because websocket handshakes already have no app-level timeout wrapper. Clarify shared turn-metadata timeout behavior between turn execution and startup preconnect, and add module-level docs for metadata helpers.
|
Follow-up delta in this push (relative to the previously pushed PR state):
Validation run:
|
|
Added docs for the retry-budget tradeoff and linked fallback tests to it. Why we currently exclude startup preconnect from
Consequence of current semantics:
If we choose differently later:
|
a0bf5e4 to
e98e3c9
Compare
Problem
The first user turn can pay websocket handshake latency even when a session has already started. We want to reduce that initial delay while preserving turn semantics and avoiding any prompt send during startup.
Reviewer feedback also called out duplicated connect/setup paths and unnecessary preconnect state complexity.
Mental model
ModelClientowns session-scoped transport state. During session startup, it can opportunistically warm one websocket handshake slot. A turn-scopedModelClientSessionadopts that slot once if available, restores captured sticky turn-state, and otherwise opens a websocket through the same shared connect path.If startup preconnect is still in flight, first turn setup awaits that task and treats it as the first connection attempt for the turn.
Preconnect is handshake-only. The first
response.createis still sent only when a turn starts.Non-goals
This change does not make preconnect required for correctness and does not change prompt/turn payload semantics. It also does not expand fallback behavior beyond clearing preconnect state when fallback activates.
Tradeoffs
The implementation prioritizes simpler ownership and shared connection code over header-match gating for reuse. The single-slot cache keeps lifecycle straightforward but only benefits the immediate next turn.
Awaiting in-flight preconnect has the same app-level connect-timeout semantics as existing websocket connect behavior (no new timeout class introduced by this PR).
Architecture
core/src/client.rs:Idle/InFlight/Ready) carrying one warmed websocket plus optional captured turn-state.pre_establish_connection()startup warmup andpreconnect()handshake-only setup.current_client_setup()and websocket handshake wiring intoconnect_websocket()/build_websocket_headers().core/src/codex.rs:model_client.pre_establish_connection(...).core/src/turn_metadata.rs:core/tests/common/responses.rs+ websocket test suites:wait_for_handshakes) with bounded polling.Observability
Preconnect remains best-effort and non-fatal. Existing websocket/fallback telemetry remains in place, and debug logs now make preconnect-await behavior and preconnect failures easier to reason about.
Tests
Validated with:
just fmtcargo test -p codex-core websocket_preconnect -- --nocapturecargo test -p codex-core websocket_fallback -- --nocapturecargo test -p codex-core websocket_first_turn_waits_for_inflight_preconnect -- --nocapture