Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are headers always defined with ABNF? #74

Closed
mnot opened this issue Jun 4, 2018 · 14 comments
Closed

Are headers always defined with ABNF? #74

mnot opened this issue Jun 4, 2018 · 14 comments

Comments

@mnot
Copy link
Member

mnot commented Jun 4, 2018

https://tools.ietf.org/html/rfc7230#section-3.2.2 says:

A sender MUST NOT generate multiple header fields with the same field name in a message unless either the entire field value for that header field is defined as a comma-separated list [i.e., #(values)] or the header field is a well-known exception (as noted below).

This ties sender behaviour to how the header field is specified; do we require headers to be defined using ABNF?

If so, we should say so explicitly. If not, we should clarify this requirement.

Furthermore, Section 7 ties handling of specific syntax (e.g., empty lines) to the #-rule. If headers can be defined without ABNF, we should clarify whether that behaviour is required, or specified to #-specified headers.

See also httpwg/http-extensions#596

@mnot mnot added the semantics label Jun 4, 2018
@reschke
Copy link
Contributor

reschke commented Jun 4, 2018

my 2 cents: I can see header fields not being defined by ABNF (but why?). But even in that case, the core protocol semantics should remain wrt list handling (such as the fact that empty elements get ignored) - anything else would be an incompatible change, no?

@royfielding
Copy link
Member

I don't read that as requiring it be defined using ABNF. It is requiring that its field-value definition be equivalent to a comma-separated list that matches the grammar rule #(values) in ABNF. This is required because recipients will extract and combine fields without/before knowing their definition.

Perhaps we could just say that. shrug

@mnot
Copy link
Member Author

mnot commented Jun 29, 2018

AIUI Julian is saying that there's something more here; if your header can be combined, it has to be combined using exactly the rules defined in section 7 (e.g., regarding how whitespace is handled, multiple headers, etc.).

If I define a header field that allows comma separation, I think it's OK to do so without ABNF, and I think it can define its own handling for exactly how that combination should happen -- keeping in mind that upstream handlers might combine it in the specified fashion.

Proposal:

Replace the text above with:

A sender MUST NOT generate multiple header fields with the same field name in a message unless that field's definition allows this; e.g., that header field is defined as a comma-separated list [in ABNF, #(values)], or the header field is a well-known exception (as noted below).

It would also be great if we could move some of the logic for combining headers out of the #rule (section 11 currently) and into somewhere around 4.2. Field Values.

@mnot mnot added the discuss label Jul 4, 2018
@reschke
Copy link
Contributor

reschke commented Jul 18, 2018

See slightly related issue #7

@mnot
Copy link
Member Author

mnot commented Jul 18, 2018

Discussed in Montreal; Roy suggested that we specify senders can only generate headers with multiple instances where the field's syntax allows recombining with commas.

Also make explicit that recipients can blindly combine multiple instances.

@wtarreau
Copy link

Also make explicit that recipients can blindly combine multiple instances
... provided that the field is not a well-known exception (i.e. set-cookie).

And conversely should we make it clearer that unless a field is defined as possibly containing commas in its value, a recipient may rightfully split it around commas (and strip LWS) and consider each part as an individual value for that field ? This basically means that a recipient could iterate over "values" between commas in fields it doesn't know (thus with no harm) and properly deal with those it knows (date, cookie, expires, etc which support commas in the value).

@reschke
Copy link
Contributor

reschke commented Jul 19, 2018

You can't split on comma without knowing the syntax of the list elements.

@wtarreau
Copy link

@reschke definitely, but what I mean is that if we stipulate that multiple header fields may be concatenated by default, then conversely it could be assumed that an unknown header field containing commas might be the result of a concatenation. At the very least, suggesting that passing commas in a new header field should be thought about twice because some recipients along the chain might not know how this field is formed seems like a good idea to me.

@mnot
Copy link
Member Author

mnot commented Oct 16, 2018

So I think the currently proposal (post-Montreal) is:

A sender MUST NOT generate multiple header fields with the same field name in a message unless that field's definition allows recombining them with commas (e.g., that header field is defined as a comma-separated list [in ABNF, #(values)]), or the header field is a well-known exception (as noted below).

@mnot
Copy link
Member Author

mnot commented Oct 16, 2018

Slight revision:

A sender MUST NOT generate multiple header fields with the same field name in a message unless that field's definition allows recombining them with commas (e.g., that header field is defined as a comma-separated list [in ABNF, #(values)]), or the header field is a well-known exception (as noted below).

This includes cases where a sender is adding a header field to an existing message (e.g., an intermediary appending a field to a forwarded message).

@royfielding
Copy link
Member

I had in mind something a little more general that would include fields which are defined like Vary:

   Aside from the well-known exception noted below,
   a sender MUST NOT generate multiple header fields with the same field
   name in a message, or append a header field when a field of the same name
   already exists in the message, unless that field's definition allows multiple
   field values to be recombined as a comma-separated list [i.e., at least one
   alternative of the field's definition allows a comma-separated list, such as
   an ABNF rule of #(values)].

OTOH, I personally feel that the existing requirement is a bit misguided. It doesn't seem to help in practice.

@wtarreau
Copy link

Reading the example mentioning the intermediary adding the field makes me think that the main problem we're having is caused by this combination of facts :

  1. field syntax is ambiguous and most people don't know it (nothing new here)
  2. users configure their intermediaries to add the fields they think they need, or at least those that appear to fix an issue they're facing
  3. our rules indicate that a sender MUST NOT this or that
  4. the configured component has no other option but blindly executing the configured rule, most often without even checking the existing message's contents nor whether or not it violates a rule
  5. the offending message gets emitted with all requests or responses and recipients have to deal with the rule's violation otherwise they're designated as the incompatible one in the room

We can't use point 3 above on users to avoid point 2, it's for developers. It's not realistic either for developers to implement an exhaustive list of permitted/not permitted header fields in their product, nor to always look up every single added field and emit "500 server error" every time the rule matches.

So we're left with MUST NOT that affect hard-coded implementations (which is already important) but we don't help recipients recover when they face the situation, despite it being declared unlikely by the specification. For this reason I think that the effort should be mostly put on guidance to recover from such violations which in my opinion are extremely common and are the reason we're discussing this.

We already did this to recover from multiple "content-length" values, there definitely is some value in going further to generalize this to any invalid case.

@annevk
Copy link
Contributor

annevk commented Nov 19, 2018

I agree with @wtarreau I think. I've put quite a bit of effort into defining parsers on the combined value for Content-Length, Content-Type, et al: whatwg/fetch#814. As indicated in other issues I'd be happy to upstream what makes sense to HTTP.

@mnot
Copy link
Member Author

mnot commented Nov 29, 2018

Same. I've been thinking we need to have general advice that single-value header fields should always define such error handling, and we should do an audit of the ones defined by HTTP.

That said, I still think this text is helpful. I could see modifying it to make the target of the requirement clearer, or removing the 2119 language.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants