Skip to content

(openai responses): add websocket connection pool#4985

Merged
tinalenguyen merged 5 commits intomainfrom
tina/oai-responses-pool-websockets
Mar 4, 2026
Merged

(openai responses): add websocket connection pool#4985
tinalenguyen merged 5 commits intomainfrom
tina/oai-responses-pool-websockets

Conversation

@tinalenguyen
Copy link
Member

@tinalenguyen tinalenguyen commented Mar 3, 2026

when there are two parallel streams, restart and send the full context on the next request
each request will use its own websocket connection (edit: and each WS connection is independent of response IDs)

@chenghao-mou chenghao-mou requested a review from a team March 3, 2026 02:28
devin-ai-integration[bot]

This comment was marked as resolved.

self._pool = utils.ConnectionPool[aiohttp.ClientWebSocketResponse](
connect_cb=self._create_ws_conn,
close_cb=self._close_ws,
max_session_duration=3600,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we can't reuse the same WS connection for different conversations, right? Even with store=True, the server has to rehydrate the chat history if the ws connection expects a different previous_response_id.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could have worded my comment better, but each websocket connection is independent of previous_response_id

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the server-side in-memory cache for each connection seems very dependent:

On an active WebSocket connection, the service keeps one previous-response state in a connection-local in-memory cache (the most recent response). Continuing from that most recent response is fast because the service can reuse connection-local state. Because the previous-response state is retained only in memory and is not written to disk, you can use WebSocket mode in a way that is compatible with store=false and Zero Data Retention (ZDR).

If a previous_response_id is not in the in-memory cache, behavior depends on whether you store responses:

With store=true, the service may hydrate older response IDs from persisted state when available. Continuation can still work, but it usually loses the in-memory latency benefit.
With store=false (including ZDR), there is no persisted fallback. If the ID is uncached, the request returns previous_response_not_found.

Do we see previous_response_not_found when store=false if you reuse the same connection for two conversations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah we don't use previous_response_id when store=False, we just send the entire context in that case (relevant line)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, that makes sense.

Copy link
Member

@chenghao-mou chenghao-mou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly lgtm. Tested it locally and it worked well.

if isinstance(o, openai.BaseModel):
return o.model_dump()
raise TypeError(f"unexpected type {type(o)}")
async def send_request(self, msg: dict) -> AsyncGenerator[dict, None]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the name seems misleading in a sense that it sends the request but also receives the responses. Maybe process_request?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about generate_response?

devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Member

@chenghao-mou chenghao-mou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. One nit about APIStatusError construction.

f"OpenAI Responses WebSocket closed: {close_reason}",
status_code=close_code,
retryable=False,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raw_msg has both .data and .extra we can leverage:

APIStatusError(
                        "AssemblyAI connection closed unexpectedly",
                        status_code=ws.close_code or -1,
                        body=f"{msg.data=} {msg.extra=}",
                    )

@tinalenguyen tinalenguyen merged commit f3063a1 into main Mar 4, 2026
19 checks passed
@tinalenguyen tinalenguyen deleted the tina/oai-responses-pool-websockets branch March 4, 2026 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants