Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exact separator used when combining header fields #148

Closed
annevk opened this issue Oct 12, 2018 · 12 comments · Fixed by #309
Closed

Exact separator used when combining header fields #148

annevk opened this issue Oct 12, 2018 · 12 comments · Fixed by #309

Comments

@annevk
Copy link
Contributor

annevk commented Oct 12, 2018

Fetch and XMLHttpRequest have for the longest time used comma-followed-by-space (0x2C 0x20) as separator when multiple header fields are combined into a single header field.

This is also what at least Firefox and Chrome do from code inspection:

I think this is because of https://tools.ietf.org/html/rfc2616#section-4.2 encouraging values to start with a single space and folks conceptually still seeing it as multiple values.

Now https://tools.ietf.org/html/rfc7230#section-3.2.2 has a much more explicit definition, which no longer allows for a space as far as I can tell, "breaking" with what implementations are doing.

Most of the time this difference is insignificant, but anything with quoted-string or equivalent will be able to tell what was used if those configuring the headers like to play games.

I'd like HTTP to either allow 0x2C 0x20 as an alternative separator (and have Fetch mandate it for browsers) or simply adopt 0x2C 0x20 as separator for everyone.

I have some tests around this as well, but they only affect APIs, which theoretically could have a different non-HTTP-conformant view, but that'd seem really bad: web-platform-tests/wpt#13471. See whatwg/fetch#813 and whatwg/fetch#752 for further context.

cc @ddragana @MattMenke2 @youennf

@reschke
Copy link
Contributor

reschke commented Oct 12, 2018

Whitespace is allowed (and insignificant).

Can you point at the actual difference between 2616 and 7230 which makes you think that something changed?

@reschke
Copy link
Contributor

reschke commented Oct 12, 2018

Most of the time this difference is insignificant, but anything with quoted-string or equivalent will be able to tell what was used if those configuring the headers like to play games.

You mean when a quoted-string is broken into several header field instances?

@annevk
Copy link
Contributor Author

annevk commented Oct 12, 2018

Can you point at the actual difference between 2616 and 7230 which makes you think that something changed?

In 2616 the same section states that values can be preceded by whitespace. Whereas in 7230 it only describes how to combine, without any mention of whitespace. Both sections are linked in OP.

You mean when a quoted-string is broken into several header field instances?

Yeah, also relevant for structured headers, which is how I started stumbling into this. If one client combines with a space, another without, and another with three spaces, you get subtly different strings and possibly room for security bugs.

@reschke
Copy link
Contributor

reschke commented Oct 12, 2018

This one?

The field value MAY be preceded by any amount of LWS, though a single SP is preferred.

That's not specific to recombination.

@annevk
Copy link
Contributor Author

annevk commented Oct 12, 2018

I know...

@mnot
Copy link
Member

mnot commented Oct 16, 2018

Anne, I think what you're suggesting is to change:

A recipient MAY combine multiple header fields with the same field name into one "field-name: field-value" pair, without changing the semantics of the message, by appending each subsequent field value to the combined field value in order, separated by a comma.

to:

A recipient MAY combine multiple header fields with the same field name into one "field-name: field-value" pair, without changing the semantics of the message, by appending each subsequent field value (after stripping surrounding OWS) to the combined field value in order, separated by COMMA SP.

This is meaningful in some admittedly pathological cases, such as a header like this:

Foo: "abc
Foo: def"

Because the current text has a combined value of abc,def -- whereas most implementations will do abc, def.

@mnot mnot added the semantics label Oct 16, 2018
@annevk
Copy link
Contributor Author

annevk commented Oct 16, 2018

So I think things are even more broken than I thought. A recipient that knows about https://tools.ietf.org/html/rfc7230#section-7 and knows the header, might drop empty or whitespace values. So

Foo: "abc
Foo: 
Foo: def"

becomes "abc, def" (or maybe they'd reject it, depending on how the header was defined), but a basic intermediary would turn it into "abc, , def". The final recipient in that case cannot figure out what the sender meant.

It seems to me that https://tools.ietf.org/html/rfc7230#section-7 should only ever apply after combining header values, including empty ones, not before.

@reschke
Copy link
Contributor

reschke commented Oct 16, 2018

FWIW, the message was invalid wrt that header field already, so "figure out what the sender meant" doesn't make a lot of sense to me.

@annevk
Copy link
Contributor Author

annevk commented Oct 16, 2018

The problem is that "invalid" depends on whether the recipient knows the syntax, which is just broken if the recipient is allowed to combine before forwarding (which makes it "valid").

@annevk
Copy link
Contributor Author

annevk commented Oct 16, 2018

The only sensible thing to do is to always combine before determining validity.

@mnot
Copy link
Member

mnot commented Sep 3, 2019

See PR. @annevk I haven't done anything to address your later comments; how do folks feel about adding a statement spelling out that consequence, either here, or in Considerations for New Fields?

@reschke
Copy link
Contributor

reschke commented Sep 3, 2019

I think explaining the issue would indeed be good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment