Avoid instant remote disconnects under websocket queue pressure#18265
Avoid instant remote disconnects under websocket queue pressure#18265etraut-openai wants to merge 7 commits intomainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d896f5cffc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 64872b998a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 997a23330f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
maxj-oai
left a comment
There was a problem hiding this comment.
thank you @etraut-openai
Fixes #18203. ## Why Remote TUI clients connected through `codex app-server --listen ws://...` can receive short bursts of outbound turn and tool-output notifications. The WebSocket transport previously used the shared 128-message channel capacity for its outbound writer queue, so a healthy client that briefly lagged during normal output streaming could fill the queue and be disconnected immediately. This is a smaller mitigation than #18265: instead of adding a new overflow/backpressure pipeline, keep the existing non-blocking router behavior and give WebSocket clients enough bounded headroom for realistic bursts. ## What Changed - Added a WebSocket-only outbound writer capacity of `64 * 1024` messages. - Used that larger capacity only for the WebSocket data writer queue in `codex-rs/app-server/src/transport/websocket.rs`. - Left the shared `CHANNEL_CAPACITY` and the existing disconnect-on-full behavior unchanged for internal/control channels and genuinely stuck clients. ## Verification - `cargo test -p codex-app-server transport::tests::broadcast_does_not_block_on_slow_connection` - Manually retried the #18203 repro prompt against the remote TUI and confirmed it stayed connected.
Fixes #18203.
Why
Remote TUI clients connected through
codex app-server --listen ws://...can receive bursts of outbound turn/progress notifications. Before this change, a full per-connection writer queue was treated as an immediate slow-client signal, so a healthy WebSocket client that was only momentarily behind could be disconnected mid-turn.What Changed
Disconnectable WebSocket connections now get a bounded grace window when their outbound writer queue is full. If the queue drains within that window, the message is delivered and the connection stays open. If it remains full, the connection is cancelled so the outbound router cannot be blocked indefinitely.
The implementation also adds a per-connection bounded overflow channel for WebSocket sends. This keeps the shared outbound router from waiting on one slow connection, preserves message order for that connection while the overflow drains, and avoids creating an unbounded backlog of detached send waiters.
When a connection times out or its writer closes, the overflow worker now notifies the outbound router so the connection is removed from routing promptly instead of waiting for the later
ConnectionClosedevent. The overflow-channel-full path also uses the same grace window before disconnecting, so short bursts that saturate both bounded queues do not immediately drop the client.The stdio and in-process transports keep their existing reliable-delivery behavior; the grace/overflow path applies to disconnectable WebSocket connections.
Testing
Added transport tests covering:
I also manually confirmed that the repro methodology from #18203 disconnects before the fix and stays connected with this change.