field-value production rules out CTL characters, but these are common in the wild #683

njsmith · 2021-01-21T20:57:52Z

draft-ietf-httpbis-semantics-latest currently says:

Field values containing control (<x:ref>CTL</x:ref>) characters such as CR or LF are invalid; recipients &MUST; either reject a field value containing control characters, or convert them to SP before processing or forwarding the message.

However, when I implemented an HTTP/1.1 parsing library that followed this rule, it turned out to broken in the real world. Specifically, users reported that sites using Google Analytics were setting cookies with values containing the ASCII character \x01, which is a control character (python-hyper/h11#57). And in my investigations at the time, browsers and popular clients like curl all supported this just fine.

It would be great if the next round of RFCs could look into this in more detail and define the field-content production in a way that would be acceptable to e.g. the browsers.

In my library we currently reject NUL and exotic whitespace (\r, \v, \f, \n), on the grounds that those are high-risk for interop bugs and splitting exploits, but allow all other bytes to pass through, including control characters. I don't know if that's the best solution, but in our experience it's at least closer to being real-world compatible than what the RFC draft currently says.

The text was updated successfully, but these errors were encountered:

annevk · 2021-01-22T06:22:51Z

See #215. https://fetch.spec.whatwg.org/#concept-header-value is what browsers implement, which is slightly more lenient than what you have there. web-platform-tests has tests for that as well.

reschke · 2021-01-22T06:24:29Z

Was this bug reported to Google Analytics?

annevk · 2021-01-22T06:27:35Z

To be clear, at Mozilla we found the same thing on routers and such. It's not an isolated incident.

Lukasa · 2021-01-22T07:59:47Z

As with the discussion we’re having on-thread about #31, is it worth us revisiting the language in this specification when we know that non-conformance is widespread?

royfielding · 2021-01-22T20:28:44Z

As with the discussion we’re having on-thread about #31, is it worth us revisiting the language in this specification when we know that non-conformance is widespread?

No, control characters as a set being disallowed in header fields is a relatively new requirement, whereas HTTP has explicitly forbidden CR-only line breaks since 1993 because that is a well-known security hole. Allowing some of the control characters to pass through is not going to decrease the security of valid implementations, whereas allowing CR will only increase the number of insecure browsers subjecting their own users to cache poisoning attacks.

reschke · 2021-01-24T16:36:25Z

No, control characters as a set being disallowed in header fields is a relatively new requirement (...)

Hmmm - unless I'm missing something they weren't allowed in RFC 2616 either. What's new are the instructions for recipients how to deal with malformed values.

royfielding · 2021-01-27T19:30:33Z

Right, that makes sense, though more strict handling of field values in 7230 was part of the move away from lenient handling for robustness in 2616.

njsmith · 2021-02-03T01:55:26Z

As an HTTP implementer, I'd like to respectfully plead for you to reconsider here, and align the RFC text with the WHAT-WG text for better real-world interoperability.

YOLOing together a HTTP/1.1 implementation that's interoperable but insecure is easy. Writing a secure but non-interoperable implementation is pointless, because no-one will use it. Writing a secure AND interoperable implementation is extremely difficult, because it requires that you find the narrowest possible rules that allow only the things that are required to interoperate, and reject everything else, and that's essentially impossible without browser-maker-level visibility into exactly what weird HTTP implementations are out there. The RFCs are an incredible resource for anyone who's trying to build a high-quality implementation, because they contain so much distilled wisdom on where this line is... except for bits like the field-value production, where if you follow the spec you'll get burned and end up having to YOLO something anyway.

It's very frustrating to carefully follow the specs, and then be punished for it. And it makes it hard to justify following the spec versus using some YOLO implementation.

royfielding · 2021-02-04T03:12:06Z

We can't align the text. HTTP has always forbidden bare CR as a line break. Almost all deployed implementations rely on that while parsing CR into field values and (CR)LF as line breaks. If we changed that now, all deployed implementations with a cache would have a security hole, instead of just the two evergreen browsers that broke their own parsers and can easily change them back.

What servers have been doing (and slowly deploying) is active removal of bare CR in fields when they do character-by-character parsing, just to fix the new browser brokenness. This is a workaround.

royfielding · 2021-02-04T03:43:03Z

With regard to other CTL characters, many are found in the wild, but it would be a stretch to say that they are interoperable when sent in HTTP. Some might be. I don't think we have time to validate such a change.

njsmith · 2021-02-04T04:26:32Z

The WHAT-WG spec text that @annevk linked above forbids NUL, CR, and LF inside field values, while allowing everything else. So I'm not suggesting allowing CR.

Re: the other control characters: if you can't validate interoperability, then that's a tough position to be in for sure. But surely there's something that would be better than the current text? Currently, all general-usage http clients MUST actively violate the spec. Even if all you can say is "you should definitely disallow NUL/CR/LF and for everything else we're not sure, use your best judgement", then that seems like it would at least be more accurate and helpful than what we have now.

reschke · 2021-02-04T06:52:07Z

It really would be interesting to see what the effect of converting-to-SP instead of preserving the CTL characters would be. That is, are those characters sent intentionally (I see that this probably is ghe case for G Analytics). Unfortunately, even if we ran an experiment right now, it wouldn't help in this LC.

reschke · 2021-02-04T07:00:10Z

FWIW

python-hyper/h11#57 (comment)

suggests that the HTTP/2 has a "MUST reject" rule. When we achieve consensus here, the H2bis probably should align.

njsmith · 2021-02-04T07:45:27Z

It really would be interesting to see what the effect of converting-to-SP instead of preserving the CTL characters would be. That is, are those characters sent intentionally (I see that this probably is ghe case for G Analytics).

Yeah, specifically the CTL characters from G Analytics occur in Set-Cookie/Cookie headers, so it's an opaque value that the server expects to get back. I don't know for sure, but my impression is that someone chose \x01 as a "clever" separator character that they knew wouldn't occur inside the values being separated, so converting to SP would probably break it.

reschke · 2021-02-04T12:29:24Z

I see. Another related question would be: is this something that's specific to cookies?

royfielding · 2021-02-04T18:25:46Z

Note also that the G analytics cookies refer to an old version of custom variables in a library that has been supplanted at least twice (ga.js --> analytics.js --> gtag.js). It's possible that they were removed because of lacking interop.

https://developers.google.com/analytics/devguides/collection/analyticsjs/cookie-usage

mnot · 2021-02-11T22:43:13Z

Discussed in Feb 21 Interim; align with Fetch for hard refusal of characters, strongly caution against CTL (minus NUL) for senders / field definitions. Don't require CTL (minus NUL) to be replaced with space. CR+NUL MUST be converted to space or rejected.

Fixes #683.

mnot added the semantics label Feb 3, 2021

mnot added the discuss label Feb 5, 2021

reschke mentioned this issue Feb 5, 2021

characters allowed in field values httpwg/http2-spec#815

Closed

mnot removed the discuss label Feb 15, 2021

mnot self-assigned this Feb 24, 2021

mnot added a commit that referenced this issue Feb 24, 2021

Relax invalid characters in field values

f594c2f

Fixes #683.

mnot mentioned this issue Feb 24, 2021

Restore recipient requirements for invalid characters in field values #783

Merged

mnot added the has-proposal label Feb 24, 2021

royfielding closed this as completed in #783 Mar 4, 2021

gregw mentioned this issue Aug 24, 2021

Move field validation text around httpwg/http2-spec#936

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

field-value production rules out CTL characters, but these are common in the wild #683

field-value production rules out CTL characters, but these are common in the wild #683

njsmith commented Jan 21, 2021

annevk commented Jan 22, 2021

reschke commented Jan 22, 2021

annevk commented Jan 22, 2021

Lukasa commented Jan 22, 2021

royfielding commented Jan 22, 2021

reschke commented Jan 24, 2021 •

edited

Loading

royfielding commented Jan 27, 2021

njsmith commented Feb 3, 2021

royfielding commented Feb 4, 2021

royfielding commented Feb 4, 2021

njsmith commented Feb 4, 2021

reschke commented Feb 4, 2021 •

edited

Loading

reschke commented Feb 4, 2021

njsmith commented Feb 4, 2021

reschke commented Feb 4, 2021

royfielding commented Feb 4, 2021

mnot commented Feb 11, 2021

field-value production rules out CTL characters, but these are common in the wild #683

field-value production rules out CTL characters, but these are common in the wild #683

Comments

njsmith commented Jan 21, 2021

annevk commented Jan 22, 2021

reschke commented Jan 22, 2021

annevk commented Jan 22, 2021

Lukasa commented Jan 22, 2021

royfielding commented Jan 22, 2021

reschke commented Jan 24, 2021 • edited Loading

royfielding commented Jan 27, 2021

njsmith commented Feb 3, 2021

royfielding commented Feb 4, 2021

royfielding commented Feb 4, 2021

njsmith commented Feb 4, 2021

reschke commented Feb 4, 2021 • edited Loading

reschke commented Feb 4, 2021

njsmith commented Feb 4, 2021

reschke commented Feb 4, 2021

royfielding commented Feb 4, 2021

mnot commented Feb 11, 2021

reschke commented Jan 24, 2021 •

edited

Loading

reschke commented Feb 4, 2021 •

edited

Loading