-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
field-value production rules out CTL characters, but these are common in the wild #683
Comments
See #215. https://fetch.spec.whatwg.org/#concept-header-value is what browsers implement, which is slightly more lenient than what you have there. web-platform-tests has tests for that as well. |
Was this bug reported to Google Analytics? |
To be clear, at Mozilla we found the same thing on routers and such. It's not an isolated incident. |
As with the discussion we’re having on-thread about #31, is it worth us revisiting the language in this specification when we know that non-conformance is widespread? |
No, control characters as a set being disallowed in header fields is a relatively new requirement, whereas HTTP has explicitly forbidden CR-only line breaks since 1993 because that is a well-known security hole. Allowing some of the control characters to pass through is not going to decrease the security of valid implementations, whereas allowing CR will only increase the number of insecure browsers subjecting their own users to cache poisoning attacks. |
Hmmm - unless I'm missing something they weren't allowed in RFC 2616 either. What's new are the instructions for recipients how to deal with malformed values. |
Right, that makes sense, though more strict handling of field values in 7230 was part of the move away from lenient handling for robustness in 2616. |
As an HTTP implementer, I'd like to respectfully plead for you to reconsider here, and align the RFC text with the WHAT-WG text for better real-world interoperability. YOLOing together a HTTP/1.1 implementation that's interoperable but insecure is easy. Writing a secure but non-interoperable implementation is pointless, because no-one will use it. Writing a secure AND interoperable implementation is extremely difficult, because it requires that you find the narrowest possible rules that allow only the things that are required to interoperate, and reject everything else, and that's essentially impossible without browser-maker-level visibility into exactly what weird HTTP implementations are out there. The RFCs are an incredible resource for anyone who's trying to build a high-quality implementation, because they contain so much distilled wisdom on where this line is... except for bits like the field-value production, where if you follow the spec you'll get burned and end up having to YOLO something anyway. It's very frustrating to carefully follow the specs, and then be punished for it. And it makes it hard to justify following the spec versus using some YOLO implementation. |
We can't align the text. HTTP has always forbidden bare CR as a line break. Almost all deployed implementations rely on that while parsing CR into field values and (CR)LF as line breaks. If we changed that now, all deployed implementations with a cache would have a security hole, instead of just the two evergreen browsers that broke their own parsers and can easily change them back. What servers have been doing (and slowly deploying) is active removal of bare CR in fields when they do character-by-character parsing, just to fix the new browser brokenness. This is a workaround. |
With regard to other CTL characters, many are found in the wild, but it would be a stretch to say that they are interoperable when sent in HTTP. Some might be. I don't think we have time to validate such a change. |
The WHAT-WG spec text that @annevk linked above forbids NUL, CR, and LF inside field values, while allowing everything else. So I'm not suggesting allowing CR. Re: the other control characters: if you can't validate interoperability, then that's a tough position to be in for sure. But surely there's something that would be better than the current text? Currently, all general-usage http clients MUST actively violate the spec. Even if all you can say is "you should definitely disallow NUL/CR/LF and for everything else we're not sure, use your best judgement", then that seems like it would at least be more accurate and helpful than what we have now. |
It really would be interesting to see what the effect of converting-to-SP instead of preserving the CTL characters would be. That is, are those characters sent intentionally (I see that this probably is ghe case for G Analytics). Unfortunately, even if we ran an experiment right now, it wouldn't help in this LC. |
FWIW suggests that the HTTP/2 has a "MUST reject" rule. When we achieve consensus here, the H2bis probably should align. |
Yeah, specifically the CTL characters from G Analytics occur in |
I see. Another related question would be: is this something that's specific to cookies? |
Note also that the G analytics cookies refer to an old version of custom variables in a library that has been supplanted at least twice (ga.js --> analytics.js --> gtag.js). It's possible that they were removed because of lacking interop. https://developers.google.com/analytics/devguides/collection/analyticsjs/cookie-usage |
Discussed in Feb 21 Interim; align with Fetch for hard refusal of characters, strongly caution against CTL (minus NUL) for senders / field definitions. Don't require CTL (minus NUL) to be replaced with space. CR+NUL MUST be converted to space or rejected. |
draft-ietf-httpbis-semantics-latest currently says:
However, when I implemented an HTTP/1.1 parsing library that followed this rule, it turned out to broken in the real world. Specifically, users reported that sites using Google Analytics were setting cookies with values containing the ASCII character
\x01
, which is a control character (python-hyper/h11#57). And in my investigations at the time, browsers and popular clients like curl all supported this just fine.It would be great if the next round of RFCs could look into this in more detail and define the
field-content
production in a way that would be acceptable to e.g. the browsers.In my library we currently reject NUL and exotic whitespace (
\r
,\v
,\f
,\n
), on the grounds that those are high-risk for interop bugs and splitting exploits, but allow all other bytes to pass through, including control characters. I don't know if that's the best solution, but in our experience it's at least closer to being real-world compatible than what the RFC draft currently says.The text was updated successfully, but these errors were encountered: