Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quoted cache-control directives #128

Closed
mnot opened this issue Aug 2, 2018 · 52 comments · Fixed by #288
Closed

Quoted cache-control directives #128

mnot opened this issue Aug 2, 2018 · 52 comments · Fixed by #288

Comments

@mnot
Copy link
Member

@mnot mnot commented Aug 2, 2018

In my testing, many caches do not recognise Cache-Control: max-age="3600" as a valid CC directive.

Since this isn't interoperable, we should consider changing the spec so that the interoperable case is documented.

@mnot mnot added the caching label Aug 2, 2018
@mnot
Copy link
Member Author

@mnot mnot commented Aug 2, 2018

In bis, we decided to make this SHOULD NOT generate, MUST accept. I think we should consider moving to MUST NOT generate, MAY accept.

@reschke
Copy link
Contributor

@reschke reschke commented Aug 2, 2018

I believe we should continue to recommend writing sane parsers. For that, a "MAY" is not sufficient.

@mnot
Copy link
Member Author

@mnot mnot commented Aug 2, 2018

So, I think we need to have a discussion in the WG about whether architectural purity / reuse of parsers trumps interoperability, given that it's been ~five years since that recommendation and there still hasn't been any movement.

@reschke
Copy link
Contributor

@reschke reschke commented Aug 2, 2018

My theory is that the parsers are broken in many more aspects - I don't believe that writing to spec so that it is consistent with components that just do substring searches is the right thing to do (yes, this needs investigation).

For instance: https://bugzilla.mozilla.org/show_bug.cgi?id=716346

@mnot
Copy link
Member Author

@mnot mnot commented Aug 2, 2018

AFAICT they're not doing substring searches, they're correctly implementing other aspects of the spec; it's this specific thing.

@mnot mnot added the discuss label Oct 9, 2018
@royfielding
Copy link
Member

@royfielding royfielding commented Nov 5, 2018

-1

All I see is one example of a known bug in a broken parser. We do not make all deployed senders of valid HTTP syntax suddenly non-compliant just because one vendor is too lazy to fix an obviously broken parser.

@mcmanus
Copy link

@mcmanus mcmanus commented Nov 6, 2018

in bkk: mnot to gather more data with bias towards closing with no action if the data isnt helpful

@reschke
Copy link
Contributor

@reschke reschke commented Nov 6, 2018

Do you have evidence of parsers which fail for

Cache-Control: max-age="3600"

and work for

Cache-Control: max-age=3600

but also do the right thing for

Cache-Control: extension="max-age=100", max-age="3600"

?

EDIT - actually:

Cache-Control: extension="max-age=100", max-age=3600

@mnot mnot removed the discuss label Nov 12, 2018
@mnot mnot self-assigned this Nov 12, 2018
@mnot
Copy link
Member Author

@mnot mnot commented Dec 18, 2018

Yes; on this page, see

HTTP cache should reuse a response with quoted Cache-Control: max-age

and

HTTP cache must ignore the phrase "max-age" in a quoted string, even when max-age has a quoted value too

respectively. Currently, it looks like Chrome, Firefox, Edge, nginx, Squid, and Varnish all have that set of behaviours.

@reschke
Copy link
Contributor

@reschke reschke commented Dec 18, 2018

Ok. great work.

At this point I'd like to understand why a UA would do that. If the parser is capable enough to understand the syntax, why it then does not handle the values as specified? That is, is this intentional or just an oversight? Yes, that likely requires studying the code.

@mnot
Copy link
Member Author

@mnot mnot commented Dec 19, 2018

/cc @bslassey @ddragana @ericlaw1979 @yadij @bsdphk

Any insights as to why parsing Cache-Control behaves the way it does in your implementations, as per above?

Thanks,

@yadij
Copy link

@yadij yadij commented Dec 19, 2018

AFAIK for Squid it is just that logic still uses the RFC2616 definitions for parsing the C-C header.

RFC2616 documented exact lower case field-names for all the mandatory controls. It also uses the delta-seconds ABNF without mention of quoting for the fields with numeric parameters and explicitly mentions quotes in others. So Squid interprets the mixed-case field name and quoting as extensions and thus ignore.

With neither complaints nor normative requirements in RFC723x to support these odd syntax we had no reason to look at these until now. We can change easily enough.

@mnot
Copy link
Member Author

@mnot mnot commented Dec 20, 2018

Sorry, Varnish was a false positive there (it was caching things by default); I've updated the tests.

@annevk
Copy link
Contributor

@annevk annevk commented Jan 2, 2019

@MattMenke2 can probably help here. (I suspect implementations in browsers are mostly old parsers that haven't really been touched or checked if they actually conform to anything. And figuring out whether they can be changed at this point is a somewhat costly endeavor.)

@mnot mnot added the discuss label Feb 26, 2019
@mcmanus
Copy link

@mcmanus mcmanus commented Mar 25, 2019

ietf104: this isn't interoperable - should core note this?

@mcmanus
Copy link

@mcmanus mcmanus commented Mar 25, 2019

ietf104: need to continue digging to into test scenarios

@MattMenke2
Copy link

@MattMenke2 MattMenke2 commented Mar 25, 2019

Chrome has one global shared parser for handling quotes in header lines regardless of context, and breaking up lines with commas into multiple lies (With a blacklist of headers this doesn't work for).

That code breaks up Cache-Control: extension="max-age=100", max-age=3600

into:
Cache-Control: extension="max-age=100"
Cache-Control: max-age=3600

We then handle them as separate headers, and our (very simple) max-age parser has no trouble handling the second header.

The quote handling logic doesn't consider the context of the quotes (after an equal sign, after a semicolon, first character of a string, after a random alphanumeric character, etc). It treats them all as a start of a quoted string.

@royfielding
Copy link
Member

@royfielding royfielding commented Mar 26, 2019

I found the bug that causes Apache httpd's cache to fail on quoted max-age values. It correctly parses the field to extract either max-age=100 or max-age="100", but when it converts the value to a large integer it assumes the value is unquoted.

This is in modules/cache/cache_util.c: ap_cache_control()

                if (!ap_cstr_casecmpn(token, "max-age", 7)) {
                    if (token[7] == '='
                            && !apr_strtoff(&offt, token + 8, &endp, 10)
                            && endp > token + 8 && !*endp) {
                        cc->max_age = 1;
                        cc->max_age_value = offt;
                    }
                }

@reschke
Copy link
Contributor

@reschke reschke commented Mar 26, 2019

Awesome. Let's fix it :-)

@royfielding
Copy link
Member

@royfielding royfielding commented Mar 26, 2019

@mnot mnot removed the discuss label Apr 12, 2019
@mnot
Copy link
Member Author

@mnot mnot commented Jul 5, 2019

@MattMenke2 it'd be much safer to do that with an allow list, rather than a deny list.

Overall -- the reality here seems to be that for whatever reason, the majority of cache implementations (by a large margin) don't honour Cache-Control: max-age if the value is quoted. I strongly suspect that they treat things like stale-while-revalidate (where implemented) in the same way, despite this text:

Cache directives are identified by a token, to be compared case-insensitively, and have an optional argument, that can use both token and quoted-string syntax. For the directives defined below that define arguments, recipients ought to accept both forms, even if one is documented to be preferred. For any directive not defined by this specification, a recipient MUST accept both forms.

I understand that having these two forms being considered equivalent is architecturally more consistent and aesthetically more pleasing, but that is not the world we are living in, and I see no evidence that the world will change to match these expectations on any reasonable timeline.

If cache implementers were standing up and saying "yes, we should fix that bug" I would feel differently, but they're not; as far as I can see, they're both resource-poor and very reluctant to make arbitrary changes to products for little perceptible benefit.

Therefore, my proposal is:

  1. Change the text above to:

Cache directives are identified by a token, to be compared case-insensitively, and have an optional argument, that can use either the token or quoted-string syntax. Directives will define which form is expected; senders MUST send in that form, and recipients MUST accept that form. Recipients MAY accept both forms (converting from one to the other where required and possible).

  1. Update the existing directives defined in this specification to follow the approach outlined above.

@mnot mnot added the discuss label Jul 5, 2019
@reschke
Copy link
Contributor

@reschke reschke commented Jul 5, 2019

Chrome has one global shared parser for handling quotes in header lines regardless of context, and breaking up lines with commas into multiple lies (With a blacklist of headers this doesn't work for).

That code breaks up Cache-Control: extension="max-age=100", max-age=3600

into:
Cache-Control: extension="max-age=100"
Cache-Control: max-age=3600

We then handle them as separate headers, and our (very simple) max-age parser has no trouble handling the second header.

The quote handling logic doesn't consider the context of the quotes (after an equal sign, after a semicolon, first character of a string, after a random alphanumeric character, etc). It treats them all as a start of a quoted string.

But the question remains why Chrome doesn't process

Cache-Control: extension="max-age=100", max-age="3600"

the same way as:

Cache-Control: extension="max-age=100", max-age=3600

@MattMenke2
Copy link

@MattMenke2 MattMenke2 commented Jul 6, 2019

I do apologize, but it's been 4 months, and I've since lost all context. While still working on network-adjacent stuff in Chrome I'm no longer on Chrome's network stack team (And even if I were, I'd be reluctant to dig into this code and relevant specs once every four months to pick up a stalled discussion). Unfortunately, that also means I'm not a good person to commit to sweeping changes to Chrome's header parsing logic.

@reschke
Copy link
Contributor

@reschke reschke commented Sep 30, 2019

Strongly disagreed.

A correct parser already MUST handle (ignore) extension directives, and to do that, it MUST handle both token and quoted-string values. Saying that this does not apply to certain existing directives only complicates the parser, and makes it more likely to fail on extensions.

@mnot
Copy link
Member Author

@mnot mnot commented Sep 30, 2019

What real-world cache-control directive uses quotes today?

@reschke
Copy link
Contributor

@reschke reschke commented Sep 30, 2019

Why is that relevant? Do you want to remove quoted-string as option in general?

@reschke
Copy link
Contributor

@reschke reschke commented Sep 30, 2019

I don't know that we're going to be able to get more data here. Can we resolve this with MUST NOT generate, SHOULD accept?

We currently say "MUST accept" for extension directives. Do you want to change that?

For existing directives, we say "ought to". Changing that to "SHOULD" is a normative change (that I would actually support). But then the unevitable question would be: under which circumstances is it ok to ignore the requirement?

@mnot
Copy link
Member Author

@mnot mnot commented Oct 1, 2019

We currently say "MUST accept" for extension directives. Do you want to change that?

For existing directives, we say "ought to". Changing that to "SHOULD" is a normative change (that I would actually support). But then the unevitable question would be: under which circumstances is it ok to ignore the requirement?

The discussion and the data gathered make me think the most realistic thing to do is to deprecate all quoted forms of cache-control directives. If I wanted to introduce a new cache-control directive that used quoting today -- especially if it had the possibility of a comma or other syntacticly interesting component as payload -- I'd use a different header field.

I do not believe we're going to be able to get the majority of cache implementations to change their behaviour here; if they step up and say they will, I'll be happy to be wrong. Absent that, we should make the specification match reality, so it's relevant.

@reschke
Copy link
Contributor

@reschke reschke commented Oct 1, 2019

Well, we need to be clear what's being discussed. We currently have: "MUST accept extensions that might use quoted strings as params". Do you have evidence that existing current implementations get this wrong?

@mnot
Copy link
Member Author

@mnot mnot commented Oct 1, 2019

@reschke
Copy link
Contributor

@reschke reschke commented Oct 1, 2019

Yes, I was looking at them. But the results seem to be "behaviour check results" or "test dependency failed", so I have no idea how to evaluate them...

@mnot
Copy link
Member Author

@mnot mnot commented Oct 1, 2019

The filled-in circle is "yes"; the empty circle is "no". The first test indictates that almost no cache honours max-age with quotes; only ATS and Safari do.

@reschke
Copy link
Contributor

@reschke reschke commented Oct 1, 2019

So "Does HTTP cache ignore the phrase max-age in a quoted string (before the "real" max-age)?" - which is about extension parameters with quoted strings is almost (ahem) universally passing - isn't this exactly the case we're discussing right now?

@MattMenke2
Copy link

@MattMenke2 MattMenke2 commented Oct 1, 2019

The reason why we treat

Cache-Control: extension="max-age=100", max-age="3600"

and

Cache-Control: extension="max-age=100", max-age=3600

differently is because the Cache-Control logic has its own parser.

We do have a shared parser for stuff in the format: foo=bar; foo2="bar2";..., but a lot of stuff doesn't use it. And I'm not sure we use it for anything that doesn't need to support semi-colon-separated entries.

Sorry for dropping this discussion - I was getting this discussion confused with a more general one on comma handling, which is something I don't have time to work on and can't feel like I can commit to on behalf of Chrome. For things limited to parsing of a single header I certainly can work on, however.

Edit: Fixed paragraph breaks for readability.

@tfpauly
Copy link

@tfpauly tfpauly commented Nov 18, 2019

Discussed in Singapore: want to get more info from cache implementations by January to understand this better.

@davidben
Copy link

@davidben davidben commented Nov 18, 2019

@MattMenke2 already described this up in #128 (comment), but since folks in Singapore were asking what the clients were doing and there seems to be some confusion, I'll (re)clarify Chrome's behavior:

Chrome has generic logic splitting headers, including Cache-Control by commas, in a way which pays attention to quoting. So Cache-Control: extension="foo,max-age=10,bar", max-age=10 will be split correctly. This logic does not remove the quotes, it simply does a quote-aware split by comma and removes surrounding whitespace.

Various bits of Cache-Control logic parse the split values. The parser for timestamps like max-age and stale-while-revalidate simply looks for values with a case-insensitive ${DIRECTIVE}= prefix and tries to decode the remainder as an integer. Quotes have not been removed at this point, so quoted integers are rejected. This is consistent with the original definitions of those directives.
https://tools.ietf.org/html/rfc2616#section-14.9
https://tools.ietf.org/html/rfc5861#section-3

(This is based on source inspection just now. @MattMenke2 can correct me if I got this wrong.)

@MattMenke2
Copy link

@MattMenke2 MattMenke2 commented Nov 18, 2019

That's my reading as well, though I've never touched the cache code either.

Also, if we want double-quote handling it Cache-Control lines, it's not clear to me if we want semi-colon handling as well, which is part of every instance in which Chrome's general HTTP header quote handler is used, so is a non-optional part of Chrome's internal API for doing so.

Should "Cache-Control: max-age=10; min-dingos=2" be parsed as max-age=10, with an an supported secondary parameter? If so, we'd be deviating even more from spec. If not, we'd be introducing a novel use of quotes in HTTP headers - supporting them in them in header values where they're never needed, and where semi-colons are not supported.

@reschke
Copy link
Contributor

@reschke reschke commented Nov 18, 2019

@MattMenke2 - no, Cache-Control does not have semicolon-separated characters.

@reschke
Copy link
Contributor

@reschke reschke commented Nov 18, 2019

@davidben - thanks for the explanation (and apologies for not processing the earlier feedback properly).

Two comments/questions on the genetic split-by-comma code:

  1. I assume it will process an escaped double quote inside quoted-string properly?
  2. As @mnot said earlier: this really ought to be based on a white list; nobody stops people from defining fields where this code will do the wrong thing (and maintaining a black list would really not scale)

As a general comment: so yes, I do now understand that the design of the parser predates RFC 7234, and just does try to implement 2616. In particular, extracting the actual values uses specific logic per directive.

The intent of the change in RFC 7234 was to allow implementers to get rid of special cases: to be able to use the same code path for all parameters. Was it ever considered to implement this? If so, what led to the decision not to?

@MattMenke2
Copy link

@MattMenke2 MattMenke2 commented Nov 18, 2019

Not supporting semi-colons itself seems like a special case, no? So rather than a list of headers that do/do not support general parsing, we'd need a list of headers that don't adhere to the pattern, a list of ones that don't support semi-colons, and a list of those that do, no? The general behavior in the case of unsupported parameters is just to ignore them, rather than to ignore the entire header.

@reschke
Copy link
Contributor

@reschke reschke commented Nov 18, 2019

Well, at the end of the day there is common syntax in HTTP header fields. Sometimes it looks like that, but there a lot of subtle differences. Yes, that's bad (and that's why we are trying to improve things for new header fields).

And yes, Cache-Control (comma-separated list name/value pairs) is very different from things like Content-Type (comma-separated list of identifiers + optional parameters).

I can see that it would be nice to use a generic parser here, and just either drop parameters or fail if some are present. But even in that case, you'd probably accept more than what should be accepted (for instance: "Cache-Control: foo;,bar").

What should be possible is to build parsers from common components, or to have a single parser that is sufficiently parametrized.

@MattMenke2
Copy link

@MattMenke2 MattMenke2 commented Nov 18, 2019

Oh, and as for whether anyone on the Chrome team considered updating behavior to match RFC 7234 - neither David nor I work on the cache, but I don't think anyone from the network stack team, at least, was aware of this change from RFC 2616.

@davidben
Copy link

@davidben davidben commented Nov 18, 2019

I assume it will process an escaped double quote inside quoted-string properly?

Looking at the code, I believe so. I don't see a test for it at the HTTP layer function, but the lower-level thing it uses does have logic and tests for escapes.

@reschke
Copy link
Contributor

@reschke reschke commented Nov 18, 2019

...and in fact it doesn't show up in https://greenbytes.de/tech/webdav/rfc7234.html#changes.from.rfc.2616, which might be one reason why it has been ignored (yes, one reason :-).

This makes me think that advertising this change more would be better than reverting it.

@davidben
Copy link

@davidben davidben commented Dec 11, 2019

An observation: Assuming I'm reading the grammars correctly, Cache-Control aligns with a structured header sh-dict. If we were to make it an sh-dict, integer-valued keys like max-age would use a value of sh-integer. This would mean max-age=1234 is okay and max-age="1234" is not.

If so, that suggests reverting it, better aligning with both running code and structured headers.

@bsdphk
Copy link

@bsdphk bsdphk commented Jan 3, 2020

seconded.

@reschke
Copy link
Contributor

@reschke reschke commented Jan 3, 2020

I don't see how this is relevant for Cache-Control. We can't make normative protocol changes based on the design of a new header field syntax.

FWIW, I disagree with:

This would mean max-age=1234 is okay and max-age="1234" is not.

...whether that is okay or not would depend on the definition of the header field. It could accept both.

@mnot
Copy link
Member Author

@mnot mnot commented Feb 2, 2020

Discussed in Basel.

  1. For directives we define, upgrade SHOULD NOT generate quoted form to MUST NOT.
  2. Remove the sentence "For any directive not defined by this specification, a recipient must accept both forms."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

10 participants