Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement an HTTP cache #12972

Closed
jdm opened this issue Aug 22, 2016 · 23 comments
Closed

Implement an HTTP cache #12972

jdm opened this issue Aug 22, 2016 · 23 comments
Labels

Comments

@jdm
Copy link
Member

@jdm jdm commented Aug 22, 2016

@KiChjang
Copy link
Member

@KiChjang KiChjang commented Aug 22, 2016

Is this what http_network_or_cache_fetch would essentially use?

@jdm
Copy link
Member Author

@jdm jdm commented Aug 22, 2016

Precisely.

@ashrwin
Copy link
Contributor

@ashrwin ashrwin commented Aug 24, 2016

I'd like to have a crack at this :)

@KiChjang
Copy link
Member

@KiChjang KiChjang commented Aug 24, 2016

@ashrko619 Are you sure? I don't think this issue is good for a new contributor. You need to be familiar with the entire Fetch infrastructure and caching.

@jdm
Copy link
Member Author

@jdm jdm commented Aug 24, 2016

@ashrko619 I'm not going to stop you from working on it, but I highly recommend discussing your implementation plan here.

@ashrwin
Copy link
Contributor

@ashrwin ashrwin commented Aug 26, 2016

@KiChjang @jdm I went over the code and the specifications. It looks a bit more complicated than what I thought. I will work on the easy and less-easy issues for now :). I ll keep an eye on this issue and see if I can contribute something to it down the line :)

@KiChjang
Copy link
Member

@KiChjang KiChjang commented Dec 12, 2016

I haven't read the spec at all yet, but my initial thoughts were to copy what CookieStorage does and iterate the design over this prototype.

@jdm
Copy link
Member Author

@jdm jdm commented Dec 13, 2016

We can and should be smarter about our caching strategy. We should create a HashableRequest structure for the cache hashmap key which contains:

  • the request URL (stripped of any fragment)
  • the request headers

and implements Hash by hashing these together. The values of the hashmap should be the complete Response object that would otherwise be obtained by fetching the request.

@gterzian
Copy link
Member

@gterzian gterzian commented Jul 31, 2017

I've been sort of dreaming of taking on this one... 😄

What's the current status? I'm assuming this is a large project, yet a fairly standalone one, that wouldn't have too many opportunities to block on other issues, is that correct?

@KiChjang
Copy link
Member

@KiChjang KiChjang commented Jul 31, 2017

I was planning on taking this, but I have a couple of pull requests that I have to follow before I can work on it. Feel free to work on this!

@jdm
Copy link
Member Author

@jdm jdm commented Jul 31, 2017

@gterzian Working on something like this seems like it would be right up your alley in terms of complexity and standalone-ness :)

@gterzian
Copy link
Member

@gterzian gterzian commented Aug 2, 2017

awesome, thanks! I'll start looking more into this...

@gterzian
Copy link
Member

@gterzian gterzian commented Aug 4, 2017

Ok so couple of questions...

When I look at @jdm previous work, it seems that, when it was available(which was in all cases but CacheOperationResult::Uncacheable), the cache was positioned to be the 'consumer facing' source of response data, by which I mean that even in the case of a new cache entry, the data coming in via the network would first be fed to the cache, copied into the new entry, and it was the cache would send it on to the final consumer. See process_payload.
Conversely, in the case of an existing complete or partial entry, it was also the cache that would be responsible for sending the complete or partial data to the final consumer over a channel, see for example send_complete_entry.

Obviously the fetch and resource code has changed a lot since, and after starting to do some housekeeping on the types used(https://github.com/gterzian/servo/tree/continue_http_cache_work), I realized that we perhaps first need to answer a few high-level design questions:

  1. Do we want the cache to deal with partial responses/entries? In the light of https://tools.ietf.org/html/rfc7234#section-3.1 it seems dealing with partial responses is an optional feature of a cache, and requires supporting "Range" and "Content-Range" headers.
  2. The current fetch code seems to imply a cache design where the initial cache check in "http_network_or_cache_fetch" would return an Option<Response> , and only invokes "http_network_fetch" when that option is none. In general it seems to tend towards putting the cache somewhat "behind" the fetch code, and not allowing it to directly update consumers.

After reading the spec and the current fetch code, I tend towards a design that would only handle complete requests for now and look something like:

A. a cache check inside "http_network_or_cache_fetch", simply passing the net_traits::request::Request to the cache(I think it contains the info required to compute a cache key), and returning either a complete net_traits::response::Response or none.
B. a preliminary cache update in "http_network_fetch"(again by passing the request to the cache) around here. At the point, if the response body matches ResponseBody::Done, the response can be cached readily, if it is ResponseBody::Receiving, the entry can be created but it isn't usable yet.
C. A channel to be used by the fetch worker thread to update the cache entry whose body is still receiving data. Once the body is set to ResponseBody::Done, the cache entry becomes available for a fetch as per A.
D. A cache entry itself would be similar to the previous design(net_traits::Metadata, a Vec for the body, and info on expiration). I guess it would need one more "available" boolean.

In summary, my main point is that I would skip the 'Pending resource' part for now, and start with a cache that only returns complete responses or None.

Please let me know what you think.

@KiChjang
Copy link
Member

@KiChjang KiChjang commented Aug 4, 2017

I think it makes sense for now; my original plan when reading through all the specs and jdm's branch was to forget about partial response for now, and implement a cache that only stores complete responses.

If I understand correctly, you are planning to key the cache using net_traits::Metadata, correct? I am not sure if that is sufficient - the URL initially passed to fetch may not necessarily be the same as the final_url field. You would also need to strip away any URL fragments, as pointed out by @jdm at an earlier comment.

@gterzian
Copy link
Member

@gterzian gterzian commented Aug 5, 2017

@KiChjang I was indeed thinking of using net_traits::Metadata, as the previous code was using the now gone resource_task::Metadata. Thanks for the pointer, I'll make sure to look further into the URL issue you point out. Good to hear you were also thinking of forgetting about partial responses for now...

By the way, after a good night's sleep I woke up realizing I was conflating two concepts above, while in fact I think there are two:

  1. The concept of a storing an incomplete response as per the spec. I believe that's really about responses that are "in response" to a request with a Range or Content-Range header, and I guess we won't be supporting that for now.
  2. The concept of a 'Pending resource' as per @jdm previous work. I think that's more about having the cache responding with a partial response, in the case that the response is in the process of being fetched by a "http_network" fetch, and then updating consumers as the data comes in(very similar to what the fetch worker does now with the DoneChannel).

To support 2, I guess we would need to have the fetch worker not only send data to the consumer via the "DoneChannel", but also update the cache at the same time(which is necessary anyway), and also have the cache update "pending consumers", by doing in fact the same thing as what the fetch worker does now, which I guess is sending the data on the DoneChannel and also replacing the memory of the partial Response that was sent out previously.

Having separated the two concepts, I still think that such a case of "the network is currently fetching the resource" could simply result in a cache miss, which would allow for a more simple initial design of the cache(which could be build upon later, to support pending resources). This could also be following the spirit of the spec, which seems to be more about "when not to cache". 😄

@KiChjang
Copy link
Member

@KiChjang KiChjang commented Aug 5, 2017

Ah, I remembered why I didn't pick this up when I had the chance - it's because I have an outstanding PR that updates our existing networking code to follow the more recent Fetch Standard (#17521). You may want to wait for that to land first before editing the various fetch methods.

@gterzian
Copy link
Member

@gterzian gterzian commented Aug 5, 2017

Ok I see, I'll just work on the cache internals in the meantime and keep an eye out on that PR, thanks for the heads-up!

@jdm
Copy link
Member Author

@jdm jdm commented Aug 7, 2017

I agree with all of your conclusions. Having any cache at all seems like a bigger improvement than ensuring that we can return in-progress cache entries instead of making separate requests.

@KiChjang
Copy link
Member

@KiChjang KiChjang commented Aug 21, 2017

#17521 has landed; @gterzian you may now work on it unhindered!

@gterzian
Copy link
Member

@gterzian gterzian commented Aug 22, 2017

awesome, thanks for letting me know and congrats on landing that one!

@gterzian
Copy link
Member

@gterzian gterzian commented Sep 9, 2017

Sorry for the delay in getting started, I was busy with other things last couple of weeks, still committed to this one though, I'll try to take a first stab asap...

@gterzian gterzian mentioned this issue Sep 29, 2017
0 of 11 tasks complete
@nox
Copy link
Member

@nox nox commented Oct 4, 2017

I think we are going to need this in some sort pretty soon for the media stack.

@jdm
Copy link
Member Author

@jdm jdm commented Oct 4, 2017

Good thing that #18676 is chugging right along!

bors-servo added a commit that referenced this issue Nov 12, 2017
[WIP] Continue http cache work

<!-- Please describe your changes on the following line: -->

Work in progress, and not quite worth a review yet. (Continuation of #4117)

TODO

- [ ] cache metadata (find some subset of`net_traits::Metadata` that can be shared across threads, it seems the problem is mainly stuff inside `hyper::header` in the `headers` field)

- [ ] determine which other fields of a `Response` need to be cached, so a full and valid one can be returned upon a cache hit.

- [ ] determine how to best share the cache across fetch threads (inside HttpState like I tried now?)

- [ ] Spend more time reading the spec and make sure the cache follows it where it matters.

- [ ] Make the current wpt tests pass.

- [ ] More...

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `__` with appropriate data: -->
- [ ] `./mach build -d` does not report any errors
- [ ] `./mach test-tidy` does not report any errors
- [ ] These changes fix #12972  (github issue number if applicable).

<!-- Either: -->
- [ ] There are tests for these changes OR
- [ ] These changes do not require tests because _____

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/18676)
<!-- Reviewable:end -->
bors-servo added a commit that referenced this issue Nov 15, 2017
[WIP] Continue http cache work

<!-- Please describe your changes on the following line: -->

Work in progress, and not quite worth a review yet. (Continuation of #4117)

TODO

- [ ] cache metadata (find some subset of`net_traits::Metadata` that can be shared across threads, it seems the problem is mainly stuff inside `hyper::header` in the `headers` field)

- [ ] determine which other fields of a `Response` need to be cached, so a full and valid one can be returned upon a cache hit.

- [ ] determine how to best share the cache across fetch threads (inside HttpState like I tried now?)

- [ ] Spend more time reading the spec and make sure the cache follows it where it matters.

- [ ] Make the current wpt tests pass.

- [ ] More...

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `__` with appropriate data: -->
- [ ] `./mach build -d` does not report any errors
- [ ] `./mach test-tidy` does not report any errors
- [ ] These changes fix #12972  (github issue number if applicable).

<!-- Either: -->
- [ ] There are tests for these changes OR
- [ ] These changes do not require tests because _____

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/18676)
<!-- Reviewable:end -->
bors-servo added a commit that referenced this issue Nov 15, 2017
[WIP] Continue http cache work

<!-- Please describe your changes on the following line: -->

Work in progress, and not quite worth a review yet. (Continuation of #4117)

TODO

- [ ] cache metadata (find some subset of`net_traits::Metadata` that can be shared across threads, it seems the problem is mainly stuff inside `hyper::header` in the `headers` field)

- [ ] determine which other fields of a `Response` need to be cached, so a full and valid one can be returned upon a cache hit.

- [ ] determine how to best share the cache across fetch threads (inside HttpState like I tried now?)

- [ ] Spend more time reading the spec and make sure the cache follows it where it matters.

- [ ] Make the current wpt tests pass.

- [ ] More...

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `__` with appropriate data: -->
- [ ] `./mach build -d` does not report any errors
- [ ] `./mach test-tidy` does not report any errors
- [ ] These changes fix #12972  (github issue number if applicable).

<!-- Either: -->
- [ ] There are tests for these changes OR
- [ ] These changes do not require tests because _____

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/18676)
<!-- Reviewable:end -->
bors-servo added a commit that referenced this issue Nov 15, 2017
[WIP] Continue http cache work

<!-- Please describe your changes on the following line: -->

Work in progress, and not quite worth a review yet. (Continuation of #4117)

TODO

- [ ] cache metadata (find some subset of`net_traits::Metadata` that can be shared across threads, it seems the problem is mainly stuff inside `hyper::header` in the `headers` field)

- [ ] determine which other fields of a `Response` need to be cached, so a full and valid one can be returned upon a cache hit.

- [ ] determine how to best share the cache across fetch threads (inside HttpState like I tried now?)

- [ ] Spend more time reading the spec and make sure the cache follows it where it matters.

- [ ] Make the current wpt tests pass.

- [ ] More...

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `__` with appropriate data: -->
- [ ] `./mach build -d` does not report any errors
- [ ] `./mach test-tidy` does not report any errors
- [ ] These changes fix #12972  (github issue number if applicable).

<!-- Either: -->
- [ ] There are tests for these changes OR
- [ ] These changes do not require tests because _____

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/18676)
<!-- Reviewable:end -->
bors-servo added a commit that referenced this issue Nov 15, 2017
[WIP] Continue http cache work

<!-- Please describe your changes on the following line: -->

Work in progress, and not quite worth a review yet. (Continuation of #4117)

TODO

- [ ] cache metadata (find some subset of`net_traits::Metadata` that can be shared across threads, it seems the problem is mainly stuff inside `hyper::header` in the `headers` field)

- [ ] determine which other fields of a `Response` need to be cached, so a full and valid one can be returned upon a cache hit.

- [ ] determine how to best share the cache across fetch threads (inside HttpState like I tried now?)

- [ ] Spend more time reading the spec and make sure the cache follows it where it matters.

- [ ] Make the current wpt tests pass.

- [ ] More...

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `__` with appropriate data: -->
- [ ] `./mach build -d` does not report any errors
- [ ] `./mach test-tidy` does not report any errors
- [ ] These changes fix #12972  (github issue number if applicable).

<!-- Either: -->
- [ ] There are tests for these changes OR
- [ ] These changes do not require tests because _____

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/18676)
<!-- Reviewable:end -->
bors-servo added a commit that referenced this issue Nov 19, 2017
Continue http cache work

<!-- Please describe your changes on the following line: -->

Work in progress, and not quite worth a review yet. (Continuation of #4117)

TODO

- [ ] cache metadata (find some subset of`net_traits::Metadata` that can be shared across threads, it seems the problem is mainly stuff inside `hyper::header` in the `headers` field)

- [ ] determine which other fields of a `Response` need to be cached, so a full and valid one can be returned upon a cache hit.

- [ ] determine how to best share the cache across fetch threads (inside HttpState like I tried now?)

- [ ] Spend more time reading the spec and make sure the cache follows it where it matters.

- [ ] Make the current wpt tests pass.

- [ ] More...

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `__` with appropriate data: -->
- [ ] `./mach build -d` does not report any errors
- [ ] `./mach test-tidy` does not report any errors
- [ ] These changes fix #12972  (github issue number if applicable).

<!-- Either: -->
- [ ] There are tests for these changes OR
- [ ] These changes do not require tests because _____

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/18676)
<!-- Reviewable:end -->
bors-servo added a commit that referenced this issue Nov 21, 2017
Continue http cache work

<!-- Please describe your changes on the following line: -->

Work in progress, and not quite worth a review yet. (Continuation of #4117)

TODO

- [ ] cache metadata (find some subset of`net_traits::Metadata` that can be shared across threads, it seems the problem is mainly stuff inside `hyper::header` in the `headers` field)

- [ ] determine which other fields of a `Response` need to be cached, so a full and valid one can be returned upon a cache hit.

- [ ] determine how to best share the cache across fetch threads (inside HttpState like I tried now?)

- [ ] Spend more time reading the spec and make sure the cache follows it where it matters.

- [ ] Make the current wpt tests pass.

- [ ] More...

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `__` with appropriate data: -->
- [ ] `./mach build -d` does not report any errors
- [ ] `./mach test-tidy` does not report any errors
- [ ] These changes fix #12972  (github issue number if applicable).

<!-- Either: -->
- [ ] There are tests for these changes OR
- [ ] These changes do not require tests because _____

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/18676)
<!-- Reviewable:end -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

5 participants
You can’t perform that action at this time.