Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix HTTP cache revalidation bugs #24388

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
51 changes: 45 additions & 6 deletions components/net/http_cache.rs
Original file line number Diff line number Diff line change
Expand Up @@ -223,13 +223,15 @@ fn get_response_expiry(response: &Response) -> Duration {
// If the response has a Last-Modified header field,
// caches are encouraged to use a heuristic expiration value
// that is no more than some fraction of the interval since that time.
response.headers.typed_get::<LastModified>() {
response.headers.typed_get::<LastModified>()
{
let current = time::now().to_timespec();
let last_modified: SystemTime = last_modified.into();
let last_modified = last_modified.duration_since(SystemTime::UNIX_EPOCH).unwrap();
let last_modified = Timespec::new(last_modified.as_secs() as i64, 0);
// A typical setting of this fraction might be 10%.
let raw_heuristic_calc = (current - last_modified) / 10;
trace!("calculated {:?} vs. {:?} ({:?})", current, last_modified, raw_heuristic_calc);
let result = if raw_heuristic_calc < max_heuristic {
raw_heuristic_calc
} else {
Expand Down Expand Up @@ -331,8 +333,12 @@ fn create_cached_response(
// TODO: take must-revalidate into account <https://tools.ietf.org/html/rfc7234#section-5.2.2.1>
// TODO: if this cache is to be considered shared, take proxy-revalidate into account
// <https://tools.ietf.org/html/rfc7234#section-5.2.2.7>
let has_expired =
(adjusted_expires < time_since_validated) || (adjusted_expires == time_since_validated);
let has_expired = adjusted_expires <= time_since_validated;
Copy link
Member

@gterzian gterzian Oct 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here, we could be returning None if has_expired is true, and the stored resource doesn't contain a Date and Last-Modified header, since we need those to do a successful refresh, and if we can't, we should not construct a response that requires validation, and instead just go to the network "normally".

We could even purge such an expired resource that can't be successfully refreshed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmh, one issue here is that fetch doesn't have a hook to deal with the "could not select a stored response for update", as Step 7.4 seems to assume that a refresh is always a succesful operation.

I've filed whatwg/fetch#950

trace!(
"time_since_validated: {:?}, adjusted_expires: {:?}",
time_since_validated,
adjusted_expires
);
CachedResponse {
response: response,
needs_validation: has_expired,
Expand Down Expand Up @@ -721,6 +727,11 @@ impl HttpCache {
assert_eq!(response.status.map(|s| s.0), Some(StatusCode::NOT_MODIFIED));
let entry_key = CacheKey::new(&request);
if let Some(cached_resources) = self.entries.get_mut(&entry_key) {
trace!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I would say the easy way to fix the problem, is to choose the cached resources that is the most recent, and then select that one for update and construct a response from it.

This also matches the "weak validator" part of https://httpwg.org/http-core/draft-ietf-httpbis-cache-latest.html#rfc.section.4.3.4

We can do the strong validator checking and so on in a different PR, since the problem here was the cache "going back in time", choosing the most recent entry should fix it.

"there are {} cached responses for {:?}",
cached_resources.len(),
request.url()
);
for cached_resource in cached_resources.iter_mut() {
Copy link
Member

@gterzian gterzian Oct 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple, and perhaps effective, fix for the "refresh of older resource", for now, might be to sort those by their original expiry. Then we can leave a TODO regarding the full "selection" logic as described at https://tools.ietf.org/html/rfc7234#section-4.3.4

Copy link
Member

@gterzian gterzian Oct 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, I think it might be best to start with the strictest case, and default to None for the less certain ones, and then over time move down the ladder of selection.

The weak validator and no validator case mentioned at https://tools.ietf.org/html/rfc7234#section-4.3.4 are softer and require us to handle all other "strong validator" cases correctly, so we probably shouldn't even try to do those for now, since the risk of erroneous caching is not worth it.

As described at https://tools.ietf.org/html/rfc7232#section-2.2.2 , if the stored resource has a Date header, and the Last-Modified header of the 304 response is at least 60 seconds earlier than the stored date, the Last-Modified header becomes a strong validator.

And per the "strong validator" case at https://tools.ietf.org/html/rfc7234#section-4.3.4:

All of the stored responses with the
same strong validator are selected. If none of the stored
responses contain the same strong validator, then the cache MUST
NOT use the new response to update any stored responses.

So if we don't handle this case first, we are not following this MUST NOT.

Then we would need to cover all other "strong validator" cases, and only then can we start to try to use weak or no validators.

// done_chan will have been set to Some(..) by http_network_fetch.
// If the body is not receiving data, set the done_chan back to None.
Expand Down Expand Up @@ -756,10 +767,12 @@ impl HttpCache {
constructed_response.referrer_policy = request.referrer_policy.clone();
constructed_response.raw_status = cached_resource.data.raw_status.clone();
constructed_response.url_list = cached_resource.data.url_list.clone();
{
let mut stored_headers = cached_resource.data.metadata.headers.lock().unwrap();
stored_headers.extend(response.headers);
constructed_response.headers = stored_headers.clone();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes a lot of sense, it seemed like it was a mistake to calculate the expiry without first replacing the headers.

cached_resource.data.expires = get_response_expiry(&constructed_response);
let mut stored_headers = cached_resource.data.metadata.headers.lock().unwrap();
stored_headers.extend(response.headers);
constructed_response.headers = stored_headers.clone();
return Some(constructed_response);
}
}
Expand Down Expand Up @@ -832,6 +845,7 @@ impl HttpCache {
return;
}
let expiry = get_response_expiry(&response);
debug!("new cached response has expiry of {:?}", expiry);
let cacheable_metadata = CachedMetadata {
headers: Arc::new(Mutex::new(response.headers.clone())),
data: Measurable(MeasurableCachedMetadata {
Expand All @@ -857,7 +871,32 @@ impl HttpCache {
last_validated: time::now(),
}),
};
debug!("storing new cached response for {:?}", request.url());
let entry = self.entries.entry(entry_key).or_insert(vec![]);

if response
.status
.as_ref()
.map_or(false, |s| s.0 == StatusCode::OK)
{
// Ensure that any existing complete response is overwritten by the new
Copy link
Member

@gterzian gterzian Jun 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not overwrite entries, since it will make it harder to have a spec compliant cache(needs to support multiple entries, matching on Etag and so on).

// complete response.
let existing_complete_response = entry.iter().position(|response| {
Copy link
Member

@gterzian gterzian Oct 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm less sure about this, since in theory different complete responses could be cached and used to satisfy different requests, for example in the case of responses with different Vary headers. We can try running this test with those changes:

name: "HTTP cache doesn't invalidate existing Vary response",

response
.data
.status
.as_ref()
.map_or(false, |s| s.0 == StatusCode::OK)
});
if let Some(idx) = existing_complete_response {
debug!(
"Removing existing cached 200 OK response for {:?}",
request.url()
);
entry.remove(idx);
}
}

entry.push(entry_resource);
// TODO: Complete incomplete responses, including 206 response, when stored here.
// See A cache MAY complete a stored incomplete response by making a subsequent range request
Expand Down
13 changes: 13 additions & 0 deletions components/net/http_loader.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1040,6 +1040,7 @@ fn http_network_or_cache_fetch(
),
};
if needs_revalidation {
debug!("Need to revalidate {:?}", http_request.url());
revalidating_flag = true;
// Substep 5
if let Some(http_date) = response_headers.typed_get::<LastModified>() {
Expand All @@ -1054,6 +1055,7 @@ fn http_network_or_cache_fetch(
.insert(header::IF_NONE_MATCH, entity_tag.clone());
}
} else {
debug!("Using cached response for {:?}", http_request.url());
// Substep 6
response = cached_response;
}
Expand Down Expand Up @@ -1108,6 +1110,10 @@ fn http_network_or_cache_fetch(
// Substep 3
if let Some((200..=399, _)) = forward_response.raw_status {
if !http_request.method.is_safe() {
debug!(
"Unsafe method for response; invalidating cached response for {:?}",
http_request.url(),
);
if let Ok(mut http_cache) = context.state.http_cache.write() {
http_cache.invalidate(&http_request, &forward_response);
}
Expand All @@ -1120,6 +1126,7 @@ fn http_network_or_cache_fetch(
.as_ref()
.map_or(false, |s| s.0 == StatusCode::NOT_MODIFIED)
{
debug!("Refreshing cached response for {:?}", http_request.url());
if let Ok(mut http_cache) = context.state.http_cache.write() {
response = http_cache.refresh(&http_request, forward_response.clone(), done_chan);
wait_for_cached_response(done_chan, &mut response);
Expand All @@ -1129,10 +1136,16 @@ fn http_network_or_cache_fetch(
// Substep 5
if response.is_none() {
if http_request.cache_mode != CacheMode::NoStore {
debug!("Storing new cached response for {:?}", http_request.url());
// Subsubstep 2, doing it first to avoid a clone of forward_response.
if let Ok(mut http_cache) = context.state.http_cache.write() {
http_cache.store(&http_request, &forward_response);
}
} else {
debug!(
"Can't store new response for {:?} in cached",
http_request.url()
);
}
// Subsubstep 1
response = Some(forward_response);
Expand Down