Skip to content

Conversation

@karlseguin
Copy link
Collaborator

@karlseguin karlseguin commented Jun 5, 2025

This is a more complicated one. It's depends on both #765 and #766

TL;DR. The async HTTP client can block and, when it does block, the entire browser freezes. The blocking comes from waiting on the state pool's condition variable. Solving this means getting the *Request instance itself has to be asynchronous. So, in XHR, instead of:

self._request = page.http_client.async(.GET, url, &self, onData)
...
fn onData(ctx: *anyopaque, data: []const u8) !void {...}

We need to do something like this (2 steps)

page.http_client.async(.GET, URL, &self, requestReady);
...

fn requestReady(ctx: *anyopaque, request: *Request) !void {...}
  self: *XHR = @ptrCast(ctx);
  self.request = request;
  self.async(self, onData);
}

fn onData(ctx: *anyopaque, data: []const u8) !void {...}

This is often called in a tight loop (the callback to requestAnimation typically
calls requestAnimation).

Instead, we can treat it like a setTimeout with a short delay (5ms ?). This has
the added benefit of making it cancelable, FWIW.
Async HTTP request work by emitting a "Progress" object to a callback. This
object has a "done" flag which, when `true`, indicates that all data has been
emitting and no future "Progress" objects will be sent.

Callers like XHR buffer the response and wait for "done = true" to then process
the request.

The HTTP client relies on two important object pools: the connection and the
state (with all the buffers for reading/writing).

In its current implementation, the async flow does not release these pooled
objects until the final callback has returned. At best, this is inefficient:
we're keeping the connection and state objects checked out for longer than they
have to be. At worse, it can lead to a deadlock. If the calling code issues a
new request when done == true, we'll eventually run out of state objects in the
pool.

This commit now releases the state objects before emit the final "done" Progress
message. For this to work, this final message will always have null data and
an empty header object.
The HTTP Client has a state pool. It blocks when we've exceeded max_concurrency.
This can block processing forever. A simple way to reproduce this is to go into
the demo cdp.js, and execute the XHR request 5 times (loading json/product.json)

To some degree, I think this is a result of weird / non-intuitive execution
flow. If you exec a JS with 100 XHR requests, it'll call our XHR _send function
but none of these will execute until the loop is run (after the script is done
being executed). This can result in poor utilization of our connection and
state pool.

For an async request, getting the *Request object is itself now asynchronous.
If no state is available, we use the Loop's timeout (at 20ms) to keep checking
for an available state.
We currently keep the main request open during loadHTMLDoc and processHTMLDoc.
It _has_ to be open during loadHTMLDoc, since that streams the body. But it
does not have to be open during processHTMLDoc, which can be log and itself
could make use of that same connection if it was released. Reorganized the
navigate flow to limit the scope of the request.

Also, just like we track pending_write and pending_read, we now also track
pending_connect and only shutdown when all are not pending.
@karlseguin karlseguin merged commit f789c84 into main Jun 6, 2025
11 checks passed
@karlseguin karlseguin deleted the unblock_async_http_request branch June 6, 2025 05:22
@github-actions github-actions bot locked and limited conversation to collaborators Jun 6, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants