Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate HTTP network stack, including the HTTP cache, with streams #26743

Open
gterzian opened this issue Jun 2, 2020 · 0 comments
Open

Integrate HTTP network stack, including the HTTP cache, with streams #26743

gterzian opened this issue Jun 2, 2020 · 0 comments

Comments

@gterzian
Copy link
Member

@gterzian gterzian commented Jun 2, 2020

This could be a nice follow-up from #25873

Basically, once we integrate the body of a script/dom/response.rs with ReadableStream, as is done in that PR, then that means we can actually move the entire network stack, include the cache, to a streaming model.


Currently, we call into_body on the Hyper response over at

res.into_body()

Then we simply map on the hyper body, and as chunks come in from the network, they accumulate into a Vec, as can be seen at

body.extend_from_slice(&*bytes);

So this Vec is both the "body" of the current request, and it is also the body of the response that is in the cache, if any, see

body: Arc<Mutex<ResponseBody>>,

Then, the "payload" is also signalled to the fetch worker via

let _ = done_sender.send(Data::Payload(bytes.to_vec()));

and when this signal is received by the fetch worker, it is then sent across IPC to script, over at

target.process_response_chunk(vec);

This payload is then received in script on the IPC router setup at

ROUTER.add_route(

and then via the FetchResponseListener mechanism a task is queued on the event-loop and finally the payload is fed into ReadableStream of the script Response, over at

response.stream_chunk(chunk);


At the Fetch standard level, this corresponds to the parallel steps starting at Step 13 of https://fetch.spec.whatwg.org/#concept-http-network-fetch

And as can been seen at step 13.1.1.7, there is a concept of only suspending the fetch if the stream doesn't need more data, or otherwise said only continuing the fetch if the stream does need more data.


So what could be improved?

While the IPC from net to script, and the stream in script, are in fact already "streaming" the data, the problem is that in net we're still accumulating response data into the vector inside ResponseBody, found at

pub enum ResponseBody {

which is used both in the HTTP cache and the fetch worker that initiated the request.

Also, we're effectively "pushing" data from net to script as fast as it comes in, whereas the spec hints at a "pull" model, where it's the ReadableStream of the response in script that drive the workflow by pulling chunk from the network.

So we could try to replace this vector in ResponseBody, with a streaming mechanism, that would be driven by the stream in script "pulling" chunks from the equivalent streaming mechanism in net.

It would be sort of the mirror of that is done to transmit the body of a request, which consists of an IPC route in net, at https://github.com/servo/servo/pull/25873/files#diff-aa469beb5619907dbccd88364264b9b8R449,
which matches an IPC route in script, found at https://github.com/servo/servo/pull/25873/files#diff-ae0ff1fd98b06dfb13baf427cdffc28aR338

In the case of transmitting the body of a request, it's net that "requests a new chunk" from script each time one has been transmitted over the network.

We could do something similar where a route in net would only poll the hyper HttpBody when receiving a request from script for a chunk(indicating the response stream in script needs more data).

And this route in net would then spawn a little "networking worker" to pull the next chunk from the body, similar to what is done at

HANDLE.lock().unwrap().as_mut().unwrap().spawn(
, but it would read a single chunk, and the next chunk would be read in response to receiving the next request from script over IPC.

This would also need to integrate with the HTTP cache. We probably still want the cache to end up with a Vec<u8> of cached response data, however the cache could similarly provide this in some form of chunks, where chunks would be "pulled" by the stream in script.

Also note how the spec, at https://fetch.spec.whatwg.org/#concept-fetch-suspend, says that while the initial fetch can be suspended if the stream doesn't need more data, this can be, and should be, ignored if the response is being updated in the cache. So we should have the ability to suspend one part of the workflow, without affecting another part.

Note that the "hook" in the ReadableStream integration with SpiderMonkey that signals that one can "pull" for more data(if the buffer is below from desired size), is found at https://github.com/servo/servo/pull/25873/files#diff-4a2b21dadec30a5cccff658e252da1a5R474

So essentially we're talking about making the underlying source of the stream of a response be "pull" in nature, versus "push", where we currently simply push data over IPC as fast as it becomes available.

It would be a big and complicated piece of work, but I think it's quite interesting and it would leverage our recent integrations with streams.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.