Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantics: 8.6 Content-length: too lax about multiple values #748

Closed
wtarreau opened this issue Feb 7, 2021 · 3 comments · Fixed by #760
Closed

Semantics: 8.6 Content-length: too lax about multiple values #748

wtarreau opened this issue Feb 7, 2021 · 3 comments · Fixed by #760

Comments

@wtarreau
Copy link

wtarreau commented Feb 7, 2021

Regarding the case of multi-valued header, we have this:

If a message is received that has a Content-Length header field value consisting of the same decimal value as a comma-separated list (Section 5.6.1) - for example, "Content-Length: 42, 42" - indicating that duplicate Content-Length header fields have been generated or combined by an upstream message processor, then the recipient MUST either reject the message as invalid or replace the duplicated field values with a single valid Content-Length field containing that decimal value.

I'm not seeing anything there which forbids the use of "Content-length: 5,10" anymore. I'd rather emphasize the need for extremely strict parsing of Content-length and keep the duplicate case as the only acceptable exception to this rule, approximately like this:

"The Content-Length header field plays a crucial role in message delimitation in HTTP/1.1, and a different error handling between agents was shown to have important consequences in terms of security. As such, it is extremely important that agents do not accept Content-Length header field values that do not strictly comply with the ABNF, that they take care of rejecting values that they are not able to accurately represent internally, and that multiple occurrences of the header field are always checked for, and rejected as invalid. As an exception, if the header field is received multiple times with the same value, or is received as a comma-separated list of identical values, the recipient MAY replace all of them with a single valid Content-Length field containing that decimal value instead of rejecting the message."

@mnot
Copy link
Member

mnot commented Feb 8, 2021

Messaging 6.3 says:

If a message is received without Transfer-Encoding and with an invalid Content-Length header field, then the message framing is invalid and the recipient must treat it as an unrecoverable error, unless the field value can be successfully parsed as a comma-separated list (Section 5.6.1 of [Semantics]), all values in the list are valid, and all values in the list are the same.

I think that covers it for HTTP/1.1 (if Transfer-Encoding and Content-Length are both present, that's encouraged to be handled like an error).

HTTP/2 says:

A request or response is also malformed if the value of a content-length header field does not equal the sum of the DATA frame payload lengths that form the body. A response that is defined to have no payload, as described in [RFC7230], Section 3.3.2, can have a non-zero content-length header field, even though no content is included in DATA frames.

HTTP/3 says:

A request or response that is defined as having content when it contains a Content-Length header field (Section 6.4.1 of [SEMANTICS]), is malformed if the value of a Content-Length header field does not equal the sum of the DATA frame lengths received. A response that is defined as never having content, even when a Content-Length is present, can have a non-zero Content-Length field even though no content is included in DATA frames.¶

... so we could say something in semantics about invalid content-length fields when it's not being used for delimitation, if we wanted to.

@wtarreau
Copy link
Author

wtarreau commented Feb 8, 2021

Hmmm interesting. I haven't read enough of both parts yet but it's not that bad because I'm in the same situation as someone seeking a solution. I think in the semantics part, there should be an explicit reference to the messaging, for example, "Certain versions of HTTP use Content-Length for message delimitation and may impose stricter rules". This could be sufficient to make the reader check the respective messaging parts.

What about the point about being careful about internal representation (i.e. no overflow nor loss of resolution) ? I know I placed it in the middle of the text because it flowed naturally, but this remains a valid item as well.

@royfielding
Copy link
Member

As a general rule, I think we should not add text to semantics that says some other document has a requirement unless that specifically enlightens the reader regarding the meaning of that protocol element. This is important because the Semantics spec is already "too long" for easy review.

If the requirement matters, it will be read by readers of that other document.

@mnot mnot closed this as completed in #760 Feb 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

4 participants