Summary
Many users behind proxies, especially in mainland China, see Codex print Reconnecting... 1/5 through 5/5 at the start of a turn before it finally begins responding. A practical workaround reported in #14297 is to define a custom provider with supports_websockets = false, which makes Codex use HTTP/SSE immediately.
I investigated the code path and prepared a small patch in my fork:
Root Cause
The default OpenAI provider supports Responses WebSocket transport. When the local/proxy environment cannot carry WebSocket traffic correctly, WebSocket connect fails with timeout/network errors. Today those failures are treated like retryable stream failures, so the turn loop consumes the full stream_max_retries budget before activating HTTP fallback.
This matches user logs from #14297: every attempt is transport="responses_websocket"; after Reconnecting... 5/5, Codex logs falling back to HTTP, then the HTTP Responses request completes quickly.
Proposed Change
Fallback to HTTP/SSE immediately when Responses WebSocket connection setup fails with:
TransportError::Timeout
TransportError::Network(_)
Keep the existing behavior for established stream failures and for explicit 426 Upgrade Required fallback.
The patch adds a small helper:
fn should_fallback_to_http_after_websocket_connect_error(error: &ApiError) -> bool {
matches!(
error,
ApiError::Transport(TransportError::Timeout | TransportError::Network(_))
)
}
and applies it in both WebSocket preconnect/prewarm and normal turn-time WebSocket connection setup.
Why This Helps
Users whose proxies do not support WebSocket/TUN routing should no longer wait through all 5 reconnect attempts before Codex switches to the HTTP path that already works for them. Users with working WebSocket transport should continue using WebSocket as before.
Test Coverage
The fork adds a regression test that simulates a WebSocket handshake timeout and asserts Codex performs only one WebSocket attempt before using HTTP/SSE successfully.
cargo test -p codex-core websocket_fallback_switches_to_http_on_connect_timeout -- --exact
I could not complete the test locally on my Windows machine because the environment is missing the MSVC linker link.exe, but formatting and git diff --check passed locally.
Summary
Many users behind proxies, especially in mainland China, see Codex print
Reconnecting... 1/5through5/5at the start of a turn before it finally begins responding. A practical workaround reported in #14297 is to define a custom provider withsupports_websockets = false, which makes Codex use HTTP/SSE immediately.I investigated the code path and prepared a small patch in my fork:
Root Cause
The default OpenAI provider supports Responses WebSocket transport. When the local/proxy environment cannot carry WebSocket traffic correctly, WebSocket connect fails with timeout/network errors. Today those failures are treated like retryable stream failures, so the turn loop consumes the full
stream_max_retriesbudget before activating HTTP fallback.This matches user logs from #14297: every attempt is
transport="responses_websocket"; afterReconnecting... 5/5, Codex logsfalling back to HTTP, then the HTTP Responses request completes quickly.Proposed Change
Fallback to HTTP/SSE immediately when Responses WebSocket connection setup fails with:
TransportError::TimeoutTransportError::Network(_)Keep the existing behavior for established stream failures and for explicit
426 Upgrade Requiredfallback.The patch adds a small helper:
and applies it in both WebSocket preconnect/prewarm and normal turn-time WebSocket connection setup.
Why This Helps
Users whose proxies do not support WebSocket/TUN routing should no longer wait through all 5 reconnect attempts before Codex switches to the HTTP path that already works for them. Users with working WebSocket transport should continue using WebSocket as before.
Test Coverage
The fork adds a regression test that simulates a WebSocket handshake timeout and asserts Codex performs only one WebSocket attempt before using HTTP/SSE successfully.
cargo test -p codex-core websocket_fallback_switches_to_http_on_connect_timeout -- --exactI could not complete the test locally on my Windows machine because the environment is missing the MSVC linker
link.exe, but formatting andgit diff --checkpassed locally.