Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newline parsing in H1 #31

Closed
annevk opened this issue Jul 10, 2017 · 22 comments · Fixed by #373
Closed

Newline parsing in H1 #31

annevk opened this issue Jul 10, 2017 · 22 comments · Fixed by #373

Comments

@annevk
Copy link
Contributor

annevk commented Jul 10, 2017

Popular clients handle LF, CR, and CRLF as newlines. HTTP only defines CRLF.

Previously: https://lists.w3.org/Archives/Public/ietf-http-wg/2014JulSep/0123.html.

@annevk
Copy link
Contributor Author

annevk commented Apr 12, 2018

To clarify, this is about the CRLF in the HTTP-message production.

@mnot
Copy link
Member

mnot commented Apr 13, 2018

Covered here:
http://httpwg.org/specs/rfc7230.html#message.robustness

... but I agree it could be more visible (e.g., in 3).

@annevk
Copy link
Contributor Author

annevk commented Apr 13, 2018

And clients also accept lone CR. It seems worth testing either that they reject (network error) or adding that.

@mnot mnot added the editorial label Jun 29, 2018
@mnot
Copy link
Member

mnot commented Jun 29, 2018

@annevk see #68 - in situ, see https://httpwg.org/http-core/draft-ietf-httpbis-messaging-latest.html#message.parsing

I'm still not sure it's obvious enough to someone reading in isolation; thoughts?

@annevk
Copy link
Contributor Author

annevk commented Jul 4, 2018

@mnot that text does not treat CR without LF as a newline, unless I'm missing something. That text even seems to suggest that CRCRLF would be a single newline, whereas I'm pretty sure it ends up being treated as two newlines.

@wtarreau
Copy link

wtarreau commented Jul 4, 2018

@annevk I'm pretty sure CR alone is not valid, because I remember such a discussion a few years ago by the time haproxy used to accept this, after which I removed support for this. CRCRLF could at best be considered as a CR at the end of a line, though normally it's not a valid character there. It may possibly be replaced by LWS though.

@wtarreau
Copy link

wtarreau commented Jul 4, 2018

Just found it, its in RFC7230#3.5 : https://trac.tools.ietf.org/html/rfc7230#section-3.5

Although the line terminator for the start-line and header fields is
the sequence CRLF, a recipient MAY recognize a single LF as a line
terminator and ignore any preceding CR.

@annevk
Copy link
Contributor Author

annevk commented Jul 20, 2018

Right, I realize it's not valid, but I also know that clients nevertheless accept it, which is why I raised this.

@reschke
Copy link
Contributor

reschke commented Jul 20, 2018

All clients?

@mnot
Copy link
Member

mnot commented Jul 20, 2018

If they don't, they're not interoperable on the Web as we know it.

@royfielding
Copy link
Member

The Web as we know it does not contain any CR-only line-delimited protocol services other than those deliberately attempting response smuggling. The protocol specifically forbids bare CR as a line delimiter, since that is known to not be interoperable with the vast majority of protocol parsers that exclusively check for LF and ignore CR. This was a big deal in 1994, but even then it was obvious that CR-delimited messages wouldn't make it through proxies. Nobody has objected to it since OS X was released and the last of the native Mac servers disappeared.

If some browsers interpret CR as a line delimiter in HTTP protocol fields (other than payload content), they have security holes that should be fixed. No change to the RFC is possible here; we can't make changes that introduce new security holes to previously compliant parsers.

annevk added a commit to web-platform-tests/wpt that referenced this issue Oct 15, 2018
@annevk
Copy link
Contributor Author

annevk commented Oct 15, 2018

I wrote some tests for this: web-platform-tests/wpt#13524. Suggestions for more tests appreciated. Firefox seems to generally take the CR and include it in the value of whatever preceded it (reason phrase or header value, despite those not allowing this character). Chrome and Safari treat it as LF in those places, but they do not allow two CRs to end the header block (network error in Safari, empty body in Chrome).

It does seem that a stricter stance is possible here, given the intersection of the results.

cc @ddragana @MattMenke2

@MattMenke2
Copy link

If we can be stricter here and just fail on CRs in headers, that sounds good to me. That having been said, I think changing behavior to anything but just rejecting the response would be worse than the status quo (Treating CRs as part of header values, for instance).

@annevk
Copy link
Contributor Author

annevk commented Oct 16, 2018

Yeah, it seems like CR CR could be treated as a network error at least. CR "after" reason phrase or individual header is less clear. But I agree that making CR part of the value isn't good.

annevk added a commit to web-platform-tests/wpt that referenced this issue Oct 21, 2019
@mnot
Copy link
Member

mnot commented Apr 27, 2020

Regarding at the end of the line, we [currently say](https://httpwg.org/http-core/draft-ietf-httpbis-messaging-latest.html#message.parsing:

Although the line terminator for the start-line and header fields is the sequence CRLF, a recipient may recognize a single LF as a line terminator and ignore any preceding CR.

  • That seems to make it pretty clear it's not part of the value. If you think it could be more clear (or said somewhere else), please say so.

Inside the value, it could be obs-fold, talked about here, provided that it has the appropriate whitespace before and after. That section currently requires:

A user agent that receives an obs-fold in a response message that is not within a message/http container must replace each received obs-fold with one or more SP octets prior to interpreting the field value.

  • We should probably consider whether CRLF in the ABNF (and prose) needs to consider the appearance of just one of those characters.

Finally, CR could appear in a header value without surrounding whitespace as obs-fold requires. It isn't allowed by field-content here, and we say:

A recipient should treat other octets in field content (obs‑text) as opaque data.

  • Is it worth specifically addressing CR here -- e.g., a warning, a recommendation, or a requirement regarding handling it in content?

@mnot mnot self-assigned this Apr 27, 2020
@annevk
Copy link
Contributor Author

annevk commented Apr 27, 2020

OP's issue is specifically about how a H1 response/request is parsed into something more concrete and what we should consider an error there. #31 (comment) goes into this best I think (prior comments from me were not informed by tests). One suggestion there is that sometimes CR can maybe be turned into an error.

Now as for what's valid in a value, I think that's best discussed and described separately from H1 parsing requirements as never H versions do not have the same parser limitations.

@mnot
Copy link
Member

mnot commented Apr 27, 2020

So, if I read that correctly, we could consider explicitly stating that CR on its own is opaque data as far as h1 message parsing is concerned.

@wtarreau
Copy link

We must not consider it as opaque data or it risks to introduce a new class of issues which is that depending where it appears on the line it will be passed as-is or silently dropped. It may even happen that some proxies finding it before a comma would keep it and others unfolding the value into multiple headers would drop it.

I'm really in favor of clearly stating that it's strictly forbidden anywhere in the header block, except if it precedes an LF character in which case both constitute a CRLF end of line marker. By the way I think it's really not widespread: after probably more than a decade with this isolated character rejected by haproxy we haven't had any single report of an abusive blocking, while we've got such occasional reports of rejects of forbidden characters in header field names for example. So I think we should keep this strict.

Another point is that the text "ignore any CR before LF" can make one think that multiple CR could be ignored (as the CRCRLF Anne suggested), which was apparently not the intent of the original text, and which apparently led to this by trying to simplify the wording for "LF or CRLF".

@mnot
Copy link
Member

mnot commented Apr 28, 2020

Reject the message or convert to SP?

@mnot mnot removed the editorial label May 5, 2020
@mnot
Copy link
Member

mnot commented May 20, 2020

E.g., something like:

One or more CR characters not succeeded by LF MUST NOT be generated; recipients MUST either reject a message containing them, or convert them to SP before processing or forwarding the message.

@royfielding
Copy link
Member

We used to call this a bare CR (CR not immediately followed by LF).

+1 on that handling.

@mnot
Copy link
Member

mnot commented May 21, 2020

Next question is where to put this. I actually think it needs to be in two places:

  • Semantics 4.4 Field Values
  • Messaging 2.2 Message Parsing

Make sense?

@reschke reschke closed this as completed in f20cacb Jun 3, 2020
annevk added a commit to web-platform-tests/wpt that referenced this issue Feb 1, 2021
annevk added a commit to web-platform-tests/wpt that referenced this issue Feb 3, 2021
annevk added a commit to web-platform-tests/wpt that referenced this issue Feb 4, 2021
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Feb 9, 2021
Automatic update from web-platform-tests
HTTP: message parsing with lone CR

See httpwg/http-core#31 for context.
--

wpt-commits: d7227af30ff9723db9e615fe38d6dff8792c4f46
wpt-pr: 13524
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this issue Feb 10, 2021
Automatic update from web-platform-tests
HTTP: message parsing with lone CR

See httpwg/http-core#31 for context.
--

wpt-commits: d7227af30ff9723db9e615fe38d6dff8792c4f46
wpt-pr: 13524

UltraBlame original commit: 91ed67c2bc9d36aa143cc8d3c7852d10ea83d907
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this issue Feb 10, 2021
Automatic update from web-platform-tests
HTTP: message parsing with lone CR

See httpwg/http-core#31 for context.
--

wpt-commits: d7227af30ff9723db9e615fe38d6dff8792c4f46
wpt-pr: 13524

UltraBlame original commit: 91ed67c2bc9d36aa143cc8d3c7852d10ea83d907
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this issue Feb 10, 2021
Automatic update from web-platform-tests
HTTP: message parsing with lone CR

See httpwg/http-core#31 for context.
--

wpt-commits: d7227af30ff9723db9e615fe38d6dff8792c4f46
wpt-pr: 13524

UltraBlame original commit: 91ed67c2bc9d36aa143cc8d3c7852d10ea83d907
jamienicol pushed a commit to jamienicol/gecko that referenced this issue Feb 11, 2021
Automatic update from web-platform-tests
HTTP: message parsing with lone CR

See httpwg/http-core#31 for context.
--

wpt-commits: d7227af30ff9723db9e615fe38d6dff8792c4f46
wpt-pr: 13524
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

6 participants