New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core(uses-long-cache-ttl): ignore private, must-validate, no-cache #6835
Conversation
@@ -202,6 +202,12 @@ class CacheHeaders extends Audit { | |||
cacheControl | |||
); | |||
|
|||
// Ignore assets where policy implies they should not be cached long periods | |||
if (cacheControl && | |||
(cacheControl['must-validate'] || cacheControl['no-cache'] || cacheControl['private'])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we figure out how to consolidate some of this with the logic in computeCacheLifetimeInSeconds
in L115? right now these type of checks are in there to return a lifetime of 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe? these cache controls could still have ttls, so it'd make computeCacheLifetimeInSeconds
a bit of a liar. I get that if we have this method return 0 for all cases we want to ignore, things work. which is cool, I'm just arguing semantics I suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't have to be moving this there, we can extract the L115 part too (and probably should it's not the most obvious right now). I'm just advocating for collecting the special cases we handle so it's clear what's going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. How's that?
cacheControl, | ||
// Include cacheControl in results, but cast as any so table types | ||
// are happy. cacheControl is not shown in the table so this is OK. | ||
cacheControl: /** @type {any} */ (cacheControl), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine with me! But I've never really been known as the typecheck police 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
summoning @brendankenny
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cacheControl is not shown in the table so this is OK.
It's not ok, why use a table if you don't want it to be a table, but it's not your fault that you're inheriting this :)
Will you add a TODO(bckenny): fix DetailsItem
to the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @hoten!! this is much, much clearer what's going on now IMO 👍
*/ | ||
static shouldProcessRecord(headers, cacheControl, cacheLifetimeInSeconds, cacheHitProbability) { | ||
// The HTTP/1.0 Pragma header can disable caching if cache-control is not set, see https://tools.ietf.org/html/rfc7234#section-5.4 | ||
if ((headers.get('pragma') || '').includes('no-cache')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this have to be only if cache-control
is not set? or was that wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't mean to remove that. added back.
} else if ((headers.get('pragma') || '').includes('no-cache')) { | ||
// The HTTP/1.0 Pragma header can disable caching if cache-control is not set, see https://tools.ietf.org/html/rfc7234#section-5.4 | ||
return 0; | ||
if (cacheControl && cacheControl['max-age'] !== undefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we've lost Number.isFinite(maxAge)
, do we know where that path goes? did it never matter before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I suppose, if the cache control value for max-age
were 111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
it would get parsed as Infinity. I'd like to remove the clamping going on in this fn, so wb moving the finite check to the "shouldProcess" bit?
Oh, just saw that while +Infinity passes the cacheLifetimeInSeconds
check, it would fail at the cacheHitProbability
check. This is not true for -Infinity. So let's add a call to isFinite to the cacheLifetimeInSeconds
check.
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. | ||
*/ | ||
|
||
declare module 'parse-cache-control' { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏 👏 👏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 👍
cacheControl, | ||
// Include cacheControl in results, but cast as any so table types | ||
// are happy. cacheControl is not shown in the table so this is OK. | ||
cacheControl: /** @type {any} */ (cacheControl), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine with me! But I've never really been known as the typecheck police 😉
return false; | ||
} | ||
|
||
if (cacheLifetimeInSeconds !== null && cacheLifetimeInSeconds <= 0) return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we get a few comments explaining these branchs' intention so we don't get confused later again?
maybe...
// Ignore assets whose cache lifetime is explicitly 0 or negative
...
// Ignore assets whose cache lifetime is already high enough
?
|
||
if (cacheLifetimeInSeconds !== null && cacheLifetimeInSeconds <= 0) return false; | ||
|
||
if (cacheHitProbability > IGNORE_THRESHOLD_IN_PERCENT) return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should just move the cacheHitProbability
calculation to after this shouldProcessRecord
and switch the ignore threshold to the equivalent time.
That might make it easier to figure out what's going on....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
switch the ignore threshold to the equivalent time
I don't understand what you mean by this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cacheLifetimeInSeconds gets converted into a cacheHitProbability
instead of ignoring records with a cacheHitProbability
above 92.5% what if we ignored records with cacheLifetimeInSeconds
above 2 weeks (should be the equivalent of 92.5%)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree with with what @patrickhulce is saying here (if I'm understanding what he's saying :).
I could be making an opposite suggestion to what @patrickhulce has suggested before, but to me there are two different sets of checks in here:
- things that we shouldn't be checking for caching suitability (for cache rule reasons) and
- things that shouldn't be displaying in the report (for insufficient data or they don't meet an interestingness threshold).
Combining them disrupts the narrative of the code (why calculate cache lifetime of something when we should be ignoring it? etc)
One way audit()
might read more easily is kind of like it was before but with this new method incorporated
shouldProcessRecord
would just be checking cache checking suitability. Then
for each record
gather headers in loop
const cacheControl = parseCacheControl(headers.get('cache-control'));
if (!this.shouldProcessRecord(headers, cacheControl) continue;
cacheLifetimeInSeconds = CacheHeaders.computeCacheLifetimeInSeconds(headers, cacheControl)
if (cacheLifetimeInSeconds === useless) continue;
const cacheHitProbability = CacheHeaders.getCacheHitProbability(cacheLifetimeInSeconds)
if (cacheHitProbability > useless) continue;
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could be making an opposite suggestion to what @patrickhulce has suggested before, but to me there are two different sets of checks in here:
I didn't do a great job of clearly saying what I was lookin for before (sorry @hoten !!) so it certainly could have been interpreted as opposite to that :)
My primary concern is not sprinkling around the different cases where things get ignored for the same category of reason, before this we checked private
somewhere different from no-cache
which are both ignored because of the signal in the cache-control
directive.
I think it's relatively difficult to segment shouldProcessRecord
and computeCacheLifetime
as cleanly as you've done in that example @brendankenny because there are some interdependencies, but if everyone's happy with some duplicated logic and think it's easy to read, I'm onboard.
IMO the fact that we're dealing with some mixture of pragma
, expires
, and cache-control
all at the same time also complicates this, but I guess those two other cases are pretty simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
understood. I ran a bit too much with the idea of only using one continue
;)
it('ignores nonpositive and nonfinite max-age headers', () => { | ||
const infinityMaxAgeStringValue = '1'.repeat(400); | ||
assert.equal(Number.parseInt(infinityMaxAgeStringValue), Infinity); | ||
const networkRecords = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, we wish to ignore all of these records right?
FYI max-age=blahblah
would not be ignored.
Shouldn't something silly like max-age=-anything
be surfaced? or is that out of scope for this audit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what did the spec say for cacheControl? for expires
it was anything invalid is treated as no cache lifetime, so if cache-control is the same we should probably surface them as having no caching?
Would really love a PR summary on things like this where parsing the conversation in the original issue takes way too long :P |
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. | ||
*/ | ||
|
||
declare module 'parse-cache-control' { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 👍
* @param {Map<string, string>} headers | ||
* @param {{'no-cache'?: boolean,'no-store'?: boolean, 'max-age'?: number}} cacheControl Follows the potential settings of cache-control, see https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control | ||
* @param {ReturnType<import('parse-cache-control')>} cacheControl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ReturnType<typeof parseCacheControl>
reads a little cleaner to me (since it's clear it's from the same, already imported thing)
* Computes the user-specified cache lifetime, 0 if explicit no-cache policy is in effect, and null if not | ||
* user-specified. See https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html | ||
* | ||
* Return max-age if defined, and null if not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also falls back to expires
?
|
||
if (cacheLifetimeInSeconds !== null && cacheLifetimeInSeconds <= 0) return false; | ||
|
||
if (cacheHitProbability > IGNORE_THRESHOLD_IN_PERCENT) return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree with with what @patrickhulce is saying here (if I'm understanding what he's saying :).
I could be making an opposite suggestion to what @patrickhulce has suggested before, but to me there are two different sets of checks in here:
- things that we shouldn't be checking for caching suitability (for cache rule reasons) and
- things that shouldn't be displaying in the report (for insufficient data or they don't meet an interestingness threshold).
Combining them disrupts the narrative of the code (why calculate cache lifetime of something when we should be ignoring it? etc)
One way audit()
might read more easily is kind of like it was before but with this new method incorporated
shouldProcessRecord
would just be checking cache checking suitability. Then
for each record
gather headers in loop
const cacheControl = parseCacheControl(headers.get('cache-control'));
if (!this.shouldProcessRecord(headers, cacheControl) continue;
cacheLifetimeInSeconds = CacheHeaders.computeCacheLifetimeInSeconds(headers, cacheControl)
if (cacheLifetimeInSeconds === useless) continue;
const cacheHitProbability = CacheHeaders.getCacheHitProbability(cacheLifetimeInSeconds)
if (cacheHitProbability > useless) continue;
WDYT?
cacheControl, | ||
// Include cacheControl in results, but cast as any so table types | ||
// are happy. cacheControl is not shown in the table so this is OK. | ||
cacheControl: /** @type {any} */ (cacheControl), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cacheControl is not shown in the table so this is OK.
It's not ok, why use a table if you don't want it to be a table, but it's not your fault that you're inheriting this :)
Will you add a TODO(bckenny): fix DetailsItem
to the comment?
anything more? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a couple of comment nits, but otherwise LGTM!
@patrickhulce for correctness check, though :P
@@ -169,6 +160,30 @@ class CacheHeaders extends Audit { | |||
); | |||
} | |||
|
|||
/** | |||
* @param {Map<string, string>} headers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a jsdoc description to this? Don't want to be redundant with the comments below, just a general category for what would disqualify a record
* @param {ReturnType<typeof parseCacheControl>} cacheControl | ||
* @returns {boolean} | ||
*/ | ||
static shouldProcessRecord(headers, cacheControl) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we rename shouldSkipRecord
or something since it's only used negated?
|
||
// Ignore assets with an explicit no-cache policy | ||
if (cacheLifetimeInSeconds === 0) continue; | ||
// Ignore if cacheLifetimeInSeconds is defined and a nonpositive number. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the is defined
throws me for a loop when reading this. Ignore if cacheLifetimeInSeconds is a nonpositive number
seems sufficient and is still a true description :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Fixes #6140.