Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify semantics of OWS in header field values #53

Closed
davidmatson opened this issue Apr 9, 2018 · 12 comments · Fixed by #299
Closed

Clarify semantics of OWS in header field values #53

davidmatson opened this issue Apr 9, 2018 · 12 comments · Fixed by #299

Comments

@davidmatson
Copy link

From Julian Reschke:
"We probably should clarify
https://www.greenbytes.de/tech/webdav/rfc7230.html#rule.OWS

with statements about what kind of rewriting is allowed. And, for

https://www.greenbytes.de/tech/webdav/rfc7231.html#header.allow

we need to clarify that "MUST NOT modify" doesn't apply to the *WS rewriting we allow in general."

@davidmatson
Copy link
Author

Further details from the original erratum report:

Section: 3.2.4

Original Text

A field value might be preceded and/or followed by optional whitespace (OWS); a single SP preceding the field-value is preferred for consistent readability by humans. The field value does not include any leading or trailing whitespace: OWS occurring before the first non-whitespace octet of the field value or after the last non-whitespace octet of the field value ought to be excluded by parsers when extracting the field value from a header field.

Corrected Text

A field value might be preceded and/or followed by optional whitespace (OWS); a single SP preceding the field-value is preferred for consistent readability by humans. The field value does not include any leading or trailing whitespace: OWS occurring before the first non-whitespace octet of the field value or after the last non-whitespace octet of the field value ought to be excluded by parsers when extracting the field value from a header field.

All optional whitespace between tokens in field-content has the same semantics as SP. Any sequence of SP / HTAB that occurs between tokens in field-content MAY be replaced with a single SP before interpreting the field value or forwarding the message downstream.

Notes

RFC 2616, section 2.2, contained the following text:

All linear white space, including folding, has the same semantics as SP. A recipient MAY replace any linear white space with a single SP before interpreting the field value or forwarding the message downstream.

Similarly, RFC 2616 section 4.2 contained the following text:
Any LWS that occurs between field-content MAY be replaced with a single SP before interpreting the field value or forwarding the message downstream.

In section A.2. Changes from RFC 2616, the document does not list any intended change for how space and tab are handled, but the current text does appear to constitute a change. I suspect the change is accidental due to rewording the document when line folding was made deprecated.

Note that in RFC 2616, LWS is defined as follows:
LWS = [CRLF] 1*( SP | HT )

In particular, the leading CRLF was optional.

Thus, the wording in RFC 2616 covered two cases:

  1. LWS that includes line folding.
  2. LWS that does not include line folding.

The current text does cover how to handle case #1 - former LWS that began with a CRLF; later in section 3.2.4 it requires rejecting or replacing with SP. (The old "MAY" language has effectively become a "MUST" for the leading CRLF case.)

However, the current text does not appear to address case #2 - former LWS that does not begin with a CRLF - in other words, SP and HTAB occurring between field-content. I suspect the intention is still that a recipient should treat such whitespace as insignificant, and may replace any sequence of SP and HTAB with a single SP before interpreting the field content, but I believe the text of the current RFC no longer provides this behavior.

@davidmatson
Copy link
Author

And from later emails on the subject:

When generic code does know the ABNF of the header field, is it still allowed to remove spaces in field values before processing/passing downstream? If so, where should this behavior be documented as permissible? (Is this something that each field now needs to document separately?)

(FWIW, the spec doesn't technically use OWS inside field-content; the ABNF between field-vchar is equivalent to RWS, I believe.)

For example, if a proxy understands the ABNF of all the following headers and receives a response with Allow, Server, Vary, etc header fields set where there's more than one space separating items in the list, can it still normalize down to one space before passing the response back to the client? The Allow case is perhaps the most interesting - RFC 7231 specifically prohibits modifying the Allow header field, but I suspect the intent there is more around the list of methods rather than the whitespace characters between them.


For context, the reason this came up is that in Microsoft Azure Storage, certain header fields are canonicalized to be used in a cryptographic hash, and the canonocalized form normalizes this whitespace to a single SP to ensure the signature doesn't break when a proxy makes the kind of transformation formerly allowed under RFC 2616. Had this transformation not been permitted by the RFC, I suspect we would have left the inside of the string as-is when canonicalizing.
https://docs.microsoft.com/en-us/rest/api/storageservices/authentication-for-the-azure-storage-services

@reschke
Copy link
Contributor

reschke commented Jun 22, 2018

@peteroupc
Copy link

The suggested correction is ambiguous: "Any sequence of SP / HTAB..." ought to be "Any sequence of SP and/or HTAB..."

@mnot mnot added the semantics label Oct 10, 2018
@mnot
Copy link
Member

mnot commented Oct 10, 2018

"All optional whitespace between tokens in field-content" implies that the header is defined using ABNF and that the party replacing the whitespace knows it.

The original text in 2616 was more precise and actionable - "All linear white space, including folding, has the same semantics as SP. A recipient MAY replace any linear white space with a single SP before interpreting the field value or forwarding the message downstream."

Having said that, I very much wonder if this is widely understood or implemented. If we keep this intent, I think we need to highlight it in "Considerations for New Header Fields", otherwise some header author is going to think that whitespace inside of quotes (for example) is "protected."

@reschke
Copy link
Contributor

reschke commented Oct 10, 2018

SP in quoted strings is protected. If we give the impression that it is not, we need to fix this.

I guess the misunderstanding is based on the use of "linear white space" when it's really only about places in the ABNF that use LWS (implied or explicit).

@mnot
Copy link
Member

mnot commented Oct 10, 2018

Right. We should restrict this so it only applies to headers that are defined using ABNF and OWS/BWS specifically.

I do wonder at the practicality of it, however, since the recipient has to have the ABNF for the header.

@royfielding
Copy link
Member

The description is just using ABNF rules as a shorthand for the content. This does not say anything about how a field is defined. All fields match ABNF regardless of how a specification defines them.

@mnot
Copy link
Member

mnot commented Oct 11, 2018

Right now, we define:

field-content  = field-vchar [ 1*( SP / HTAB ) field-vchar ]

So, all fields match field-content, thereby matching that ABNF. However, my understanding from #74 was that we do not require header values use ABNF to define their specific syntax (which I think is correct).

If you're saying that all of those 1*( SP / HTAB) sequences can be collapsed into single SP, that's something we could do. I don't think it's a great idea, as it will surprise some people, especially inside quoted strings.

If you're saying they can be collapsed except inside of quoted strings, it would require all instances of " in all HTTP headers -- no matter how defined -- to adopt the quoted string conventions, including escaping. I don't know how we can support doing that.

What I thought we were saying was that when header fields are using ABNF, and they specifically use the OWS or BWS rules, those sequences can be collapsed to a single space. I think that's entirely sane and should be made clear.

@royfielding
Copy link
Member

We seem to be in the weeds. The original errata was about line-folding being included in LWS. It doesn't recognize that we disallowed line-folding to be sent, so that doesn't appear in the ABNF any more, so more words are required to account for its processing to SP. This is already defined in the spec already, but we need to be consistent in every section.

What @mnot is saying about the spec talking about transforming ABNF rules is not the case. The spec uses ABNF rules to match the content reflected by those rules. The bit about spaces between tokens and other grammar elements dates back to the original descriptions of HTTP parsing, in general, and is now found in "https://httpwg.org/http-core/draft-ietf-httpbis-semantics-latest.html#rfc.section.4.2.3".

These sections (and Whitespace) are still due for a general rewrite. All we did so far is place the old text in approximately the right place -- we still need to do the work of defining just the field value syntax here and somehow explain only the 1.1-specific field manipulation in Messaging. This is a work in progress.

@mnot
Copy link
Member

mnot commented Oct 12, 2018

That "bit about spaces between tokens and other grammar elements" begins with "Most HTTP header fields are defined..." -- so I don't see how it can be read to imply the ability to process whitespace in all header fields in any particular way.

Please make a proposal for explicit text / requirements that reflects what you think should happen.

@mnot
Copy link
Member

mnot commented Feb 2, 2020

Discussed in Basel. Suggestion:

OWS has the same semantics as a single SP. Any field-content known to be defined as OWS MAY be replaced with a single SP before interpreting the field value or forwarding the message downstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

5 participants