-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix HTTP cache revalidation bugs #24388
base: main
Are you sure you want to change the base?
Changes from all commits
ca32d2c
e093eee
da66443
2313f7f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -223,13 +223,15 @@ fn get_response_expiry(response: &Response) -> Duration { | |||
// If the response has a Last-Modified header field, | ||||
// caches are encouraged to use a heuristic expiration value | ||||
// that is no more than some fraction of the interval since that time. | ||||
response.headers.typed_get::<LastModified>() { | ||||
response.headers.typed_get::<LastModified>() | ||||
{ | ||||
let current = time::now().to_timespec(); | ||||
let last_modified: SystemTime = last_modified.into(); | ||||
let last_modified = last_modified.duration_since(SystemTime::UNIX_EPOCH).unwrap(); | ||||
let last_modified = Timespec::new(last_modified.as_secs() as i64, 0); | ||||
// A typical setting of this fraction might be 10%. | ||||
let raw_heuristic_calc = (current - last_modified) / 10; | ||||
trace!("calculated {:?} vs. {:?} ({:?})", current, last_modified, raw_heuristic_calc); | ||||
let result = if raw_heuristic_calc < max_heuristic { | ||||
raw_heuristic_calc | ||||
} else { | ||||
|
@@ -331,8 +333,12 @@ fn create_cached_response( | |||
// TODO: take must-revalidate into account <https://tools.ietf.org/html/rfc7234#section-5.2.2.1> | ||||
// TODO: if this cache is to be considered shared, take proxy-revalidate into account | ||||
// <https://tools.ietf.org/html/rfc7234#section-5.2.2.7> | ||||
let has_expired = | ||||
(adjusted_expires < time_since_validated) || (adjusted_expires == time_since_validated); | ||||
let has_expired = adjusted_expires <= time_since_validated; | ||||
trace!( | ||||
"time_since_validated: {:?}, adjusted_expires: {:?}", | ||||
time_since_validated, | ||||
adjusted_expires | ||||
); | ||||
CachedResponse { | ||||
response: response, | ||||
needs_validation: has_expired, | ||||
|
@@ -721,6 +727,11 @@ impl HttpCache { | |||
assert_eq!(response.status.map(|s| s.0), Some(StatusCode::NOT_MODIFIED)); | ||||
let entry_key = CacheKey::new(&request); | ||||
if let Some(cached_resources) = self.entries.get_mut(&entry_key) { | ||||
trace!( | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So I would say the easy way to fix the problem, is to choose the cached resources that is the most recent, and then select that one for update and construct a response from it. This also matches the "weak validator" part of https://httpwg.org/http-core/draft-ietf-httpbis-cache-latest.html#rfc.section.4.3.4 We can do the strong validator checking and so on in a different PR, since the problem here was the cache "going back in time", choosing the most recent entry should fix it. |
||||
"there are {} cached responses for {:?}", | ||||
cached_resources.len(), | ||||
request.url() | ||||
); | ||||
for cached_resource in cached_resources.iter_mut() { | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A simple, and perhaps effective, fix for the "refresh of older resource", for now, might be to sort those by their original expiry. Then we can leave a TODO regarding the full "selection" logic as described at https://tools.ietf.org/html/rfc7234#section-4.3.4 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On second thought, I think it might be best to start with the strictest case, and default to The weak validator and no validator case mentioned at https://tools.ietf.org/html/rfc7234#section-4.3.4 are softer and require us to handle all other "strong validator" cases correctly, so we probably shouldn't even try to do those for now, since the risk of erroneous caching is not worth it. As described at https://tools.ietf.org/html/rfc7232#section-2.2.2 , if the stored resource has a And per the "strong validator" case at https://tools.ietf.org/html/rfc7234#section-4.3.4:
So if we don't handle this case first, we are not following this MUST NOT. Then we would need to cover all other "strong validator" cases, and only then can we start to try to use weak or no validators. |
||||
// done_chan will have been set to Some(..) by http_network_fetch. | ||||
// If the body is not receiving data, set the done_chan back to None. | ||||
|
@@ -756,10 +767,12 @@ impl HttpCache { | |||
constructed_response.referrer_policy = request.referrer_policy.clone(); | ||||
constructed_response.raw_status = cached_resource.data.raw_status.clone(); | ||||
constructed_response.url_list = cached_resource.data.url_list.clone(); | ||||
{ | ||||
let mut stored_headers = cached_resource.data.metadata.headers.lock().unwrap(); | ||||
stored_headers.extend(response.headers); | ||||
constructed_response.headers = stored_headers.clone(); | ||||
} | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This change makes a lot of sense, it seemed like it was a mistake to calculate the expiry without first replacing the headers. |
||||
cached_resource.data.expires = get_response_expiry(&constructed_response); | ||||
let mut stored_headers = cached_resource.data.metadata.headers.lock().unwrap(); | ||||
stored_headers.extend(response.headers); | ||||
constructed_response.headers = stored_headers.clone(); | ||||
return Some(constructed_response); | ||||
} | ||||
} | ||||
|
@@ -832,6 +845,7 @@ impl HttpCache { | |||
return; | ||||
} | ||||
let expiry = get_response_expiry(&response); | ||||
debug!("new cached response has expiry of {:?}", expiry); | ||||
let cacheable_metadata = CachedMetadata { | ||||
headers: Arc::new(Mutex::new(response.headers.clone())), | ||||
data: Measurable(MeasurableCachedMetadata { | ||||
|
@@ -857,7 +871,32 @@ impl HttpCache { | |||
last_validated: time::now(), | ||||
}), | ||||
}; | ||||
debug!("storing new cached response for {:?}", request.url()); | ||||
let entry = self.entries.entry(entry_key).or_insert(vec![]); | ||||
|
||||
if response | ||||
.status | ||||
.as_ref() | ||||
.map_or(false, |s| s.0 == StatusCode::OK) | ||||
{ | ||||
// Ensure that any existing complete response is overwritten by the new | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should not overwrite entries, since it will make it harder to have a spec compliant cache(needs to support multiple entries, matching on Etag and so on). |
||||
// complete response. | ||||
let existing_complete_response = entry.iter().position(|response| { | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm less sure about this, since in theory different complete responses could be cached and used to satisfy different requests, for example in the case of responses with different
|
||||
response | ||||
.data | ||||
.status | ||||
.as_ref() | ||||
.map_or(false, |s| s.0 == StatusCode::OK) | ||||
}); | ||||
if let Some(idx) = existing_complete_response { | ||||
debug!( | ||||
"Removing existing cached 200 OK response for {:?}", | ||||
request.url() | ||||
); | ||||
entry.remove(idx); | ||||
} | ||||
} | ||||
|
||||
entry.push(entry_resource); | ||||
// TODO: Complete incomplete responses, including 206 response, when stored here. | ||||
// See A cache MAY complete a stored incomplete response by making a subsequent range request | ||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here, we could be returning
None
ifhas_expired
is true, and the stored resource doesn't contain aDate
andLast-Modified
header, since we need those to do a successful refresh, and if we can't, we should not construct a response that requires validation, and instead just go to the network "normally".We could even purge such an expired resource that can't be successfully refreshed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmh, one issue here is that fetch doesn't have a hook to deal with the "could not select a stored response for update", as Step 7.4 seems to assume that a refresh is always a succesful operation.
I've filed whatwg/fetch#950