Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Http cache: introduce cache entries, prevent cache misses when resource will be fetched #24203

Closed
wants to merge 1 commit into from

Conversation

@gterzian
Copy link
Member

gterzian commented Sep 12, 2019

FIX #24166


  • ./mach build -d does not report any errors
  • ./mach test-tidy does not report any errors
  • These changes fix #___ (GitHub issue number if applicable)
  • There are tests for these changes OR
  • These changes do not require tests because ___

This change is Reviewable

@highfive
Copy link

highfive commented Sep 12, 2019

Heads up! This PR modifies the following files:

  • @KiChjang: components/net/tests/http_cache.rs, components/net/fetch/methods.rs, components/net/http_loader.rs, components/net/http_cache.rs
@gterzian gterzian force-pushed the gterzian:cache_experiment branch 4 times, most recently from 7986748 to d6ea5dc Sep 12, 2019
@gterzian gterzian changed the title Http cache: prevent cache misses when resource will be fetched Http cache: introduce cache entries, prevent cache misses when resource will be fetched Sep 12, 2019
@gterzian gterzian force-pushed the gterzian:cache_experiment branch from d6ea5dc to c4f6a6b Sep 12, 2019
@gterzian
Copy link
Member Author

gterzian commented Sep 12, 2019

@bors-servo try=wpt

bors-servo added a commit that referenced this pull request Sep 12, 2019
Http cache: introduce cache entries, prevent cache misses when resource will be fetched

<!-- Please describe your changes on the following line: -->
FIX #24166

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [ ] `./mach build -d` does not report any errors
- [ ] `./mach test-tidy` does not report any errors
- [ ] These changes fix #___ (GitHub issue number if applicable)

<!-- Either: -->
- [ ] There are tests for these changes OR
- [ ] These changes do not require tests because ___

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24203)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

bors-servo commented Sep 12, 2019

Trying commit c4f6a6b with merge 2742af8...

@bors-servo
Copy link
Contributor

bors-servo commented Sep 12, 2019

💔 Test failed - linux-rel-css

Copy link
Member

asajeffrey left a comment

The big comment is that this looks like it's structured differently from the spec, which (IIRC) leaves all caching to network_or_cache_fetch.

Also, it would be nice if a GET could join on to a GET in progress, rather than waiting for all its payload to arrive. That might be a big change though.

@@ -224,6 +225,13 @@ pub fn main_fetch(
// Step 11.
// Not applicable: see fetch_async.

// Get the cache entry corresponding to the url, after potential hsts switch.
let cache_entry = if let Ok(mut http_cache) = context.state.http_cache.write() {
http_cache.get_entry(&request)

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 12, 2019

Member

Hmm, is this spec-complaint? Shouldn't anything that touches the cache be in http_network_or_cache_fetch?

This comment has been minimized.

Copy link
@gterzian

gterzian Sep 13, 2019

Author Member

Strictly-speaking, I think it's ok, since at this point we're not actually running any specified cache related logic, we're just getting the entry corresponding to the current request, later to be used to check the cache. So basically the "entry" is the interface to the "cache" from the spec perspective, and it will be used later in http_network_or_cache_fetch.

I think the spec in this case is not really providing the full algorithm, rather describing the outcome from a black-box perspective.

For example, see the diagram for the state-machine of the Chromium cache, little of which shows up in the spec: https://www.chromium.org/developers/design-documents/network-stack/http-cache

Same thing goes for the Gecko cache: https://developer.mozilla.org/en-US/docs/Mozilla/HTTP_cache#Implementation (more specifically, see their CacheEntry description).

If anything, ours matches the spec most closely, and yet ours seems to be the most naive implementation of them all.

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 16, 2019

Member

I'd still prefer a solution that kept all the caching logic inside http_network_or_cache_fetch, I find it confusing moving some bits of caching out. What's the motivation for doing that?

response: response,
needs_validation: has_expired,
};
if cached_resource.aborted.load(Ordering::Relaxed) {

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 12, 2019

Member

Ordering::Relaxed? You are a brave soul!

This comment has been minimized.

Copy link
@gterzian

gterzian Sep 13, 2019

Author Member

The other parts of this workflow also use Relaxed, and we could change that in this PR as well in another commit, see

response.aborted.store(true, Ordering::Relaxed);

This comment has been minimized.

Copy link
@gterzian
response: response,
needs_validation: has_expired,
};
if cached_resource.aborted.load(Ordering::Relaxed) {
return None;

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 12, 2019

Member

Shouldn't we construct a new CachedResponse at this point and return it, rather than returning None?

This comment has been minimized.

Copy link
@gterzian

This comment has been minimized.

Copy link
@gterzian

gterzian Sep 13, 2019

Author Member

Actually I realize this change was left over from #22813, and I'm not sure if its' really necessary. Basically there is no need to make this return an Option<CacheResource>, based on the check for aborted, or at least it's not solving any known problem.

This comment has been minimized.

Copy link
@gterzian

gterzian Sep 13, 2019

Author Member

On second-thoughts, I think this is a change that makes sense, since we shouldn't construct an aborted response, and I should have extracted it from the pervious PR at the time.

So I switched to release/acquire for that flow.

@@ -563,6 +580,132 @@ impl HttpCache {
}
}

/// Get the cache entry corresponding to this request
pub fn get_entry(&mut self, request: &Request) -> Option<HttpCacheEntry> {

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 12, 2019

Member

Slightly annoyingly that means we need a write lock just to read the cache :( We may as well make it a Mutex rather than a RWLock.

This comment has been minimized.

Copy link
@gterzian

gterzian Sep 13, 2019

Author Member

Ok, I can change that to a mutex.

This comment has been minimized.

Copy link
@gterzian

gterzian Sep 13, 2019

Author Member

How about we just keep it a RwLock for now, since everything else on HttpState uses one?

pub struct HttpState {

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 16, 2019

Member

sure, bit of a waste, but not a huge problem.

let entry = self
.entries
.entry(entry_key.clone())
.or_insert(HttpCacheEntry::new(entry_key));

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 12, 2019

Member

Interesting that invalidation can end up inserting an entry into the cache.

This comment has been minimized.

Copy link
@gterzian

gterzian Sep 13, 2019

Author Member

Yes, I think we actually want to return if there is no corresponding entry :)

/// if a pending network fetch has already begun,
/// and a pending response has been stored in the cache,
/// or if there is no pending network fetch.
is_ready_to_construct: Arc<(Mutex<bool>, Condvar)>,

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 12, 2019

Member

Can we use futures rather than a condvar here? Essentially we're building a Future<Option<CachedResource>> here.

This comment has been minimized.

Copy link
@gterzian

gterzian Sep 13, 2019

Author Member

I think that would require moving all of fetch to futures/tasks, since a fetch currently runs on a thread, and in such a setup its more reliable to do this with condvars and locks.

There is an issue at #22813

@gterzian
Copy link
Member Author

gterzian commented Sep 13, 2019

Ok lots of TIMEOUT, I must have missed a wake-up somewhere...

@gterzian
Copy link
Member Author

gterzian commented Sep 13, 2019

Also, it would be nice if a GET could join on to a GET in progress, rather than waiting for all its payload to arrive. That might be a big change though.

On that, with the current changes, there are in fact two separate "wait" workflows:

  1. The newly introduced wait on a condvar based on whether an entry is "ready to construct" a response. "ready" is false only for as long as a concurrent fetch checked the cache(via a specific entry), found nothing, and moved on to a network fetch, but has not had the chance yet to store a "pending" resource in the cache. This workflow prevent "quick succession" concurrent request from getting a cache miss just because the "first" fetch hasn't actually started the network fetch yet. The "wait" does not really wait on any network data, it only waits for the network future to be launched, after which a "pending" resource will immediately be stored in the cache.

  2. The existing "wait for a pending resource" workflow. That happens when a concurrent fetch hits the cache, and get's a pending resource. At that point this fetch will wait on the concurrent network fetch to finish. See

    fn wait_for_cached_response(done_chan: &mut DoneChannel, response: &mut Option<Response>) {

If by "joining" you mean actually sending the intermediary steps on the FetchTaskTarget, I wonder if we should do that, it might be best from the perspective of script to see any entry from the cache as coming in one chunk, as opposed to "join" into the networking operation of the fetch that added the resource to the cache, since the fetch that hits the cache doesn't go through a "network fetch"(at least not directly, although it could be said that it indirectly goes through a network fetch via the concurrent cache read), but I'm not sure.

@gterzian
Copy link
Member Author

gterzian commented Sep 13, 2019

@bors-servo try=wpt

@bors-servo
Copy link
Contributor

bors-servo commented Sep 13, 2019

Trying commit 5087540 with merge faa97eb...

bors-servo added a commit that referenced this pull request Sep 13, 2019
Http cache: introduce cache entries, prevent cache misses when resource will be fetched

<!-- Please describe your changes on the following line: -->
FIX #24166

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [ ] `./mach build -d` does not report any errors
- [ ] `./mach test-tidy` does not report any errors
- [ ] These changes fix #___ (GitHub issue number if applicable)

<!-- Either: -->
- [ ] There are tests for these changes OR
- [ ] These changes do not require tests because ___

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24203)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

bors-servo commented Sep 13, 2019

💔 Test failed - linux-rel-css

@gterzian
Copy link
Member Author

gterzian commented Sep 13, 2019

@bors-servo try=wpt

@bors-servo
Copy link
Contributor

bors-servo commented Sep 13, 2019

Trying commit 7ea31c0 with merge 0bde330...

bors-servo added a commit that referenced this pull request Sep 13, 2019
Http cache: introduce cache entries, prevent cache misses when resource will be fetched

<!-- Please describe your changes on the following line: -->
FIX #24166

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [ ] `./mach build -d` does not report any errors
- [ ] `./mach test-tidy` does not report any errors
- [ ] These changes fix #___ (GitHub issue number if applicable)

<!-- Either: -->
- [ ] There are tests for these changes OR
- [ ] These changes do not require tests because ___

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24203)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

bors-servo commented Sep 13, 2019

💔 Test failed - linux-rel-css

@gterzian
Copy link
Member Author

gterzian commented Sep 16, 2019

@bors-servo try=wpt (100ms timeout)

@bors-servo
Copy link
Contributor

bors-servo commented Sep 16, 2019

Trying commit 474dcf3 with merge e7fca8d...

bors-servo added a commit that referenced this pull request Sep 16, 2019
Http cache: introduce cache entries, prevent cache misses when resource will be fetched

<!-- Please describe your changes on the following line: -->
FIX #24166

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [ ] `./mach build -d` does not report any errors
- [ ] `./mach test-tidy` does not report any errors
- [ ] These changes fix #___ (GitHub issue number if applicable)

<!-- Either: -->
- [ ] There are tests for these changes OR
- [ ] These changes do not require tests because ___

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24203)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

bors-servo commented Sep 16, 2019

☀️ Test successful - linux-rel-css, linux-rel-wpt, status-taskcluster
State: approved= try=True

@gterzian gterzian force-pushed the gterzian:cache_experiment branch from 474dcf3 to 589c09a Sep 16, 2019
@gterzian
Copy link
Member Author

gterzian commented Sep 16, 2019

@bors-servo try=wpt (increase timeout from 100 to 500ms)

@bors-servo
Copy link
Contributor

bors-servo commented Sep 16, 2019

Trying commit 589c09a with merge bb81703...

bors-servo added a commit that referenced this pull request Sep 16, 2019
Http cache: introduce cache entries, prevent cache misses when resource will be fetched

<!-- Please describe your changes on the following line: -->
FIX #24166

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [ ] `./mach build -d` does not report any errors
- [ ] `./mach test-tidy` does not report any errors
- [ ] These changes fix #___ (GitHub issue number if applicable)

<!-- Either: -->
- [ ] There are tests for these changes OR
- [ ] These changes do not require tests because ___

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24203)
<!-- Reviewable:end -->
@gterzian
Copy link
Member Author

gterzian commented Sep 16, 2019

Screen Shot 2019-09-16 at 7 27 53 PM

Ok now its really ready for another round of review @asajeffrey

@jdm jdm assigned asajeffrey and unassigned emilio Sep 16, 2019
@asajeffrey
Copy link
Member

asajeffrey commented Sep 16, 2019

Oh /html/semantics/scripting-1/the-script-element/async_004.htm is an interesting one.

One way we could deal with that is change how prefetching is done, so that rather than having it just treated like a regular cache access, we treat the first request for the resource as being a cache hit, even if the resource didn't have cache metadata? This does worry me a bit, as it's not clear whether that's spec compiliant, but it would get this test to pass, and would improve pref in cases where we prefetched a resource that turned out to have no cache metadata.

I'll read over the latest version.

Copy link
Member

asajeffrey left a comment

OK, I'm still confused about why we're moving caching logic out of http_network_or_cache_fetch, and there's still the question about how to treat the first fetch of a cached prefetch, but otherwise LGTM.

@@ -224,6 +225,13 @@ pub fn main_fetch(
// Step 11.
// Not applicable: see fetch_async.

// Get the cache entry corresponding to the url, after potential hsts switch.
let cache_entry = if let Ok(mut http_cache) = context.state.http_cache.write() {
http_cache.get_entry(&request)

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 16, 2019

Member

I'd still prefer a solution that kept all the caching logic inside http_network_or_cache_fetch, I find it confusing moving some bits of caching out. What's the motivation for doing that?

components/net/http_cache.rs Outdated Show resolved Hide resolved
components/net/http_cache.rs Outdated Show resolved Hide resolved
@@ -563,6 +580,132 @@ impl HttpCache {
}
}

/// Get the cache entry corresponding to this request
pub fn get_entry(&mut self, request: &Request) -> Option<HttpCacheEntry> {

This comment has been minimized.

Copy link
@asajeffrey

asajeffrey Sep 16, 2019

Member

sure, bit of a waste, but not a huge problem.

components/net/http_cache.rs Show resolved Hide resolved
@gterzian
Copy link
Member Author

gterzian commented Sep 17, 2019

I'd still prefer a solution that kept all the caching logic inside http_network_or_cache_fetch, I find it confusing moving some bits of caching out. What's the motivation for doing that?

I understand your concern, and I think the proposed changes actually fit in with your thinking, for the following reasons:

  1. All the underlying cache operations involving cached resources(construct/store/refresh) are still happening inside http_network_or_cache_fetch. The proposed changes add a get_entry operation, however that doesn't involve manipulating the resources of the cache directly. get_entry only gives us the "partition" of the cache matching the current request, based on it's URL, later to be used for specified cache operations inside http_network_or_cache_fetch.

  2. To make the situation clearer, I could rename the HttpCache to something like HttpCacheManager and the HttpCacheEntry to HttpCache, to show that the "entry" really is the "cache", from the point of view of the current request.

In terms of the motivations of the proposed approach, they are:

  1. In practice, it makes it easier to introduce some blocking like this PR does, since we only want to block other fetches using the same entry, not all fetches. This would be harder to do if all fetches were still operating on the same RwLock<HttpCache>.

  2. The CacheEntry acts as a partitioned interface to the underlying cache storage. The partitioning is based on the url, and it could also be based on additional keys, for example the origin of the top-level, to support something like whatwg/fetch#904

  3. In theory, it also improves parallelism, since actual construct/store/refresh operation by different fetches will not contend, unless they are using the same entry, compared to the current situation where all fetches operate on the same RwLock<HttpCache>, hence any construct/store/refresh potentially blocks any other fetch, regardless of the URL.

Please let me know what you think.

@gterzian gterzian force-pushed the gterzian:cache_experiment branch from a672080 to eeb3a4e Sep 18, 2019
@gterzian gterzian requested a review from asajeffrey Sep 25, 2019
@asajeffrey
Copy link
Member

asajeffrey commented Sep 27, 2019

OK, we appear to be at a bit of a stalemate here, about whether all caching logic should be in http_network_or_cache_fetch as per the spec. What I was hoping is that at step 5.19.1 of https://fetch.spec.whatwg.org/#http-network-or-cache-fetch, if there's already an outstanding request then we'd block waiting for its headers to arrive to know whether we can return a cached response or not. We'd have to block without holding the lock on the cache. The rest of the implementation wouldn't change, there'd be another case when doing a cache lookup, corresponding to "a cacheable GET request has been issued, but the response headers haven't come back yet, so we don't know yet whether there's a cache entry".

@gterzian
Copy link
Member Author

gterzian commented Sep 29, 2019

Ok so I've opened #24318 as an alternative implementation, which does not introduce the concept of entries, so you can choose one, or suggest a third approach.

The different commit is 89fe0ba

@asajeffrey

@gterzian gterzian mentioned this pull request Oct 2, 2019
0 of 5 tasks complete
@gterzian
Copy link
Member Author

gterzian commented Oct 3, 2019

In favor of #24318

@gterzian gterzian closed this Oct 3, 2019
@gterzian gterzian deleted the gterzian:cache_experiment branch Jan 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

5 participants
You can’t perform that action at this time.