Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core(uses-long-cache-ttl): ignore private, must-validate, no-cache #6835

Merged
merged 9 commits into from Jan 17, 2019

Conversation

connorjclark
Copy link
Collaborator

Fixes #6140.

@@ -202,6 +202,12 @@ class CacheHeaders extends Audit {
cacheControl
);

// Ignore assets where policy implies they should not be cached long periods
if (cacheControl &&
(cacheControl['must-validate'] || cacheControl['no-cache'] || cacheControl['private'])) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we figure out how to consolidate some of this with the logic in computeCacheLifetimeInSeconds in L115? right now these type of checks are in there to return a lifetime of 0

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe? these cache controls could still have ttls, so it'd make computeCacheLifetimeInSeconds a bit of a liar. I get that if we have this method return 0 for all cases we want to ignore, things work. which is cool, I'm just arguing semantics I suppose.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't have to be moving this there, we can extract the L115 part too (and probably should it's not the most obvious right now). I'm just advocating for collecting the special cases we handle so it's clear what's going on.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. How's that?

cacheControl,
// Include cacheControl in results, but cast as any so table types
// are happy. cacheControl is not shown in the table so this is OK.
cacheControl: /** @type {any} */ (cacheControl),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me! But I've never really been known as the typecheck police 😉

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

summoning @brendankenny

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cacheControl is not shown in the table so this is OK.

It's not ok, why use a table if you don't want it to be a table, but it's not your fault that you're inheriting this :)

Will you add a TODO(bckenny): fix DetailsItem to the comment?

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @hoten!! this is much, much clearer what's going on now IMO 👍

*/
static shouldProcessRecord(headers, cacheControl, cacheLifetimeInSeconds, cacheHitProbability) {
// The HTTP/1.0 Pragma header can disable caching if cache-control is not set, see https://tools.ietf.org/html/rfc7234#section-5.4
if ((headers.get('pragma') || '').includes('no-cache')) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this have to be only if cache-control is not set? or was that wrong?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean to remove that. added back.

} else if ((headers.get('pragma') || '').includes('no-cache')) {
// The HTTP/1.0 Pragma header can disable caching if cache-control is not set, see https://tools.ietf.org/html/rfc7234#section-5.4
return 0;
if (cacheControl && cacheControl['max-age'] !== undefined) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we've lost Number.isFinite(maxAge), do we know where that path goes? did it never matter before?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I suppose, if the cache control value for max-age were 111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 it would get parsed as Infinity. I'd like to remove the clamping going on in this fn, so wb moving the finite check to the "shouldProcess" bit?

Oh, just saw that while +Infinity passes the cacheLifetimeInSeconds check, it would fail at the cacheHitProbability check. This is not true for -Infinity. So let's add a call to isFinite to the cacheLifetimeInSeconds check.

* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
*/

declare module 'parse-cache-control' {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 👏 👏

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 👍

cacheControl,
// Include cacheControl in results, but cast as any so table types
// are happy. cacheControl is not shown in the table so this is OK.
cacheControl: /** @type {any} */ (cacheControl),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me! But I've never really been known as the typecheck police 😉

return false;
}

if (cacheLifetimeInSeconds !== null && cacheLifetimeInSeconds <= 0) return false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get a few comments explaining these branchs' intention so we don't get confused later again?

maybe...

// Ignore assets whose cache lifetime is explicitly 0 or negative
...
// Ignore assets whose cache lifetime is already high enough

?


if (cacheLifetimeInSeconds !== null && cacheLifetimeInSeconds <= 0) return false;

if (cacheHitProbability > IGNORE_THRESHOLD_IN_PERCENT) return false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should just move the cacheHitProbability calculation to after this shouldProcessRecord and switch the ignore threshold to the equivalent time.

That might make it easier to figure out what's going on....

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switch the ignore threshold to the equivalent time

I don't understand what you mean by this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cacheLifetimeInSeconds gets converted into a cacheHitProbability

instead of ignoring records with a cacheHitProbability above 92.5% what if we ignored records with cacheLifetimeInSeconds above 2 weeks (should be the equivalent of 92.5%)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with with what @patrickhulce is saying here (if I'm understanding what he's saying :).

I could be making an opposite suggestion to what @patrickhulce has suggested before, but to me there are two different sets of checks in here:

  • things that we shouldn't be checking for caching suitability (for cache rule reasons) and
  • things that shouldn't be displaying in the report (for insufficient data or they don't meet an interestingness threshold).

Combining them disrupts the narrative of the code (why calculate cache lifetime of something when we should be ignoring it? etc)

One way audit() might read more easily is kind of like it was before but with this new method incorporated
shouldProcessRecord would just be checking cache checking suitability. Then

for each record
  gather headers in loop
  const cacheControl = parseCacheControl(headers.get('cache-control'));
  if (!this.shouldProcessRecord(headers, cacheControl) continue;
  cacheLifetimeInSeconds = CacheHeaders.computeCacheLifetimeInSeconds(headers, cacheControl)
  if (cacheLifetimeInSeconds === useless) continue;
  const cacheHitProbability = CacheHeaders.getCacheHitProbability(cacheLifetimeInSeconds)
  if (cacheHitProbability > useless) continue;

WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be making an opposite suggestion to what @patrickhulce has suggested before, but to me there are two different sets of checks in here:

I didn't do a great job of clearly saying what I was lookin for before (sorry @hoten !!) so it certainly could have been interpreted as opposite to that :)

My primary concern is not sprinkling around the different cases where things get ignored for the same category of reason, before this we checked private somewhere different from no-cache which are both ignored because of the signal in the cache-control directive.

I think it's relatively difficult to segment shouldProcessRecord and computeCacheLifetime as cleanly as you've done in that example @brendankenny because there are some interdependencies, but if everyone's happy with some duplicated logic and think it's easy to read, I'm onboard.

IMO the fact that we're dealing with some mixture of pragma, expires, and cache-control all at the same time also complicates this, but I guess those two other cases are pretty simple.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

understood. I ran a bit too much with the idea of only using one continue ;)

it('ignores nonpositive and nonfinite max-age headers', () => {
const infinityMaxAgeStringValue = '1'.repeat(400);
assert.equal(Number.parseInt(infinityMaxAgeStringValue), Infinity);
const networkRecords = [
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, we wish to ignore all of these records right?

FYI max-age=blahblah would not be ignored.

Shouldn't something silly like max-age=-anything be surfaced? or is that out of scope for this audit

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what did the spec say for cacheControl? for expires it was anything invalid is treated as no cache lifetime, so if cache-control is the same we should probably surface them as having no caching?

@brendankenny
Copy link
Member

Would really love a PR summary on things like this where parsing the conversation in the original issue takes way too long :P

* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
*/

declare module 'parse-cache-control' {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 👍

* @param {Map<string, string>} headers
* @param {{'no-cache'?: boolean,'no-store'?: boolean, 'max-age'?: number}} cacheControl Follows the potential settings of cache-control, see https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control
* @param {ReturnType<import('parse-cache-control')>} cacheControl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ReturnType<typeof parseCacheControl> reads a little cleaner to me (since it's clear it's from the same, already imported thing)

* Computes the user-specified cache lifetime, 0 if explicit no-cache policy is in effect, and null if not
* user-specified. See https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
*
* Return max-age if defined, and null if not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also falls back to expires?


if (cacheLifetimeInSeconds !== null && cacheLifetimeInSeconds <= 0) return false;

if (cacheHitProbability > IGNORE_THRESHOLD_IN_PERCENT) return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with with what @patrickhulce is saying here (if I'm understanding what he's saying :).

I could be making an opposite suggestion to what @patrickhulce has suggested before, but to me there are two different sets of checks in here:

  • things that we shouldn't be checking for caching suitability (for cache rule reasons) and
  • things that shouldn't be displaying in the report (for insufficient data or they don't meet an interestingness threshold).

Combining them disrupts the narrative of the code (why calculate cache lifetime of something when we should be ignoring it? etc)

One way audit() might read more easily is kind of like it was before but with this new method incorporated
shouldProcessRecord would just be checking cache checking suitability. Then

for each record
  gather headers in loop
  const cacheControl = parseCacheControl(headers.get('cache-control'));
  if (!this.shouldProcessRecord(headers, cacheControl) continue;
  cacheLifetimeInSeconds = CacheHeaders.computeCacheLifetimeInSeconds(headers, cacheControl)
  if (cacheLifetimeInSeconds === useless) continue;
  const cacheHitProbability = CacheHeaders.getCacheHitProbability(cacheLifetimeInSeconds)
  if (cacheHitProbability > useless) continue;

WDYT?

cacheControl,
// Include cacheControl in results, but cast as any so table types
// are happy. cacheControl is not shown in the table so this is OK.
cacheControl: /** @type {any} */ (cacheControl),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cacheControl is not shown in the table so this is OK.

It's not ok, why use a table if you don't want it to be a table, but it's not your fault that you're inheriting this :)

Will you add a TODO(bckenny): fix DetailsItem to the comment?

@connorjclark
Copy link
Collaborator Author

anything more?

Copy link
Member

@brendankenny brendankenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple of comment nits, but otherwise LGTM!

@patrickhulce for correctness check, though :P

@@ -169,6 +160,30 @@ class CacheHeaders extends Audit {
);
}

/**
* @param {Map<string, string>} headers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a jsdoc description to this? Don't want to be redundant with the comments below, just a general category for what would disqualify a record

* @param {ReturnType<typeof parseCacheControl>} cacheControl
* @returns {boolean}
*/
static shouldProcessRecord(headers, cacheControl) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we rename shouldSkipRecord or something since it's only used negated?


// Ignore assets with an explicit no-cache policy
if (cacheLifetimeInSeconds === 0) continue;
// Ignore if cacheLifetimeInSeconds is defined and a nonpositive number.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the is defined throws me for a loop when reading this. Ignore if cacheLifetimeInSeconds is a nonpositive number seems sufficient and is still a true description :)

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@paulirish paulirish merged commit 37a68a0 into master Jan 17, 2019
@paulirish paulirish deleted the issue-6140-private-cache branch January 17, 2019 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants