Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upIntegrate HTTP network stack, including the HTTP cache, with streams #26743
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This could be a nice follow-up from #25873
Basically, once we integrate the body of a
script/dom/response.rswithReadableStream, as is done in that PR, then that means we can actually move the entire network stack, include the cache, to a streaming model.Currently, we call
into_bodyon the Hyper response over atservo/components/net/http_loader.rs
Line 1620 in c4ea4b1
Then we simply map on the hyper body, and as chunks come in from the network, they accumulate into a
Vec, as can be seen atservo/components/net/http_loader.rs
Line 1630 in c4ea4b1
So this
Vecis both the "body" of the current request, and it is also the body of the response that is in the cache, if any, seeservo/components/net/http_cache.rs
Line 59 in c4ea4b1
Then, the "payload" is also signalled to the fetch worker via
servo/components/net/http_loader.rs
Line 1631 in c4ea4b1
and when this signal is received by the fetch worker, it is then sent across IPC to script, over at
servo/components/net/fetch/methods.rs
Line 519 in c4ea4b1
This payload is then received in script on the IPC router setup at
servo/components/script/fetch.rs
Line 184 in c4ea4b1
and then via the
FetchResponseListenermechanism a task is queued on the event-loop and finally the payload is fed intoReadableStreamof the scriptResponse, over atservo/components/script/fetch.rs
Line 265 in f3c70f1
At the Fetch standard level, this corresponds to the parallel steps starting at Step 13 of https://fetch.spec.whatwg.org/#concept-http-network-fetch
And as can been seen at step 13.1.1.7, there is a concept of only suspending the fetch if the stream doesn't need more data, or otherwise said only continuing the fetch if the stream does need more data.
So what could be improved?
While the IPC from
nettoscript, and the stream in script, are in fact already "streaming" the data, the problem is that innetwe're still accumulating response data into the vector insideResponseBody, found atservo/components/net_traits/response.rs
Line 39 in f3c70f1
which is used both in the HTTP cache and the fetch worker that initiated the request.
Also, we're effectively "pushing" data from
nettoscriptas fast as it comes in, whereas the spec hints at a "pull" model, where it's theReadableStreamof the response in script that drive the workflow by pulling chunk from the network.So we could try to replace this vector in
ResponseBody, with a streaming mechanism, that would be driven by the stream inscript"pulling" chunks from the equivalent streaming mechanism innet.It would be sort of the mirror of that is done to transmit the body of a request, which consists of an IPC route in
net, at https://github.com/servo/servo/pull/25873/files#diff-aa469beb5619907dbccd88364264b9b8R449,which matches an IPC route in
script, found at https://github.com/servo/servo/pull/25873/files#diff-ae0ff1fd98b06dfb13baf427cdffc28aR338In the case of transmitting the body of a request, it's
netthat "requests a new chunk" fromscripteach time one has been transmitted over the network.We could do something similar where a route in
netwould only poll the hyperHttpBodywhen receiving a request fromscriptfor a chunk(indicating the response stream inscriptneeds more data).And this route in
netwould then spawn a little "networking worker" to pull the next chunk from the body, similar to what is done atservo/components/net/http_loader.rs
Line 1619 in c4ea4b1
This would also need to integrate with the HTTP cache. We probably still want the cache to end up with a
Vec<u8>of cached response data, however the cache could similarly provide this in some form of chunks, where chunks would be "pulled" by the stream in script.Also note how the spec, at https://fetch.spec.whatwg.org/#concept-fetch-suspend, says that while the initial fetch can be suspended if the stream doesn't need more data, this can be, and should be, ignored if the response is being updated in the cache. So we should have the ability to suspend one part of the workflow, without affecting another part.
Note that the "hook" in the
ReadableStreamintegration with SpiderMonkey that signals that one can "pull" for more data(if the buffer is below from desired size), is found at https://github.com/servo/servo/pull/25873/files#diff-4a2b21dadec30a5cccff658e252da1a5R474So essentially we're talking about making the underlying source of the stream of a response be "pull" in nature, versus "push", where we currently simply push data over IPC as fast as it becomes available.
It would be a big and complicated piece of work, but I think it's quite interesting and it would leverage our recent integrations with streams.