Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content on GET: requirement strength #904

Closed
Tracked by #849
mnot opened this issue Jul 14, 2021 · 25 comments · Fixed by #906
Closed
Tracked by #849

Content on GET: requirement strength #904

mnot opened this issue Jul 14, 2021 · 25 comments · Fixed by #906

Comments

@mnot
Copy link
Member

mnot commented Jul 14, 2021

No description provided.

@mnot mnot changed the title 13. -----A client SHOULD NOT generate content in a GET request. ContentFP: The text could add something for when it is acceptable or expected to make sense (so why this is a SHOULD). GET body requirement strength Jul 14, 2021
@mnot mnot added the semantics label Jul 14, 2021
@mnot
Copy link
Member Author

mnot commented Jul 14, 2021

My previous comment:

I think that we should probably convert this one (and similar) to MUST NOT -- there isn't any situation where we recommend this behaviour, as it has interoperability and security implications.

@mnot
Copy link
Member Author

mnot commented Jul 14, 2021

@reschke you pushed back on MUST. My position is that our task is to define what's interoperable in HTTP, and bodies on GET clearly aren't. Using SHOULD here is a concession to a vocal minority who want to use the protocol in this way because they have pairwise agreements; I understand that if we use MUST here it'll be violated, but that puts the responsibility squarely and unambiguously on their shoulders.

@fpalombini's point is valid -- if we use SHOULD, that ought (ha!) to be contextualised. To do so here, we'd need to write something like:

A client SHOULD NOT generate content in a GET request, unless support for content is negotiated (in or out of band).

The problem is that the message can still be handled by software that isn't party to or aware of that negotiation -- especially, intermediaries. So this isn't really something we should be encouraging.

Hence, MUST.

@reschke
Copy link
Contributor

reschke commented Jul 14, 2021

I continue to disagree for the reasons you cited me above. Explaining the "SHOULD NOT" would work for me.

If you want a change here, we might want to get those involved who actually use it in practice.

@mnot
Copy link
Member Author

mnot commented Jul 14, 2021

@reschke please disambiguate 'above'.

@mnot
Copy link
Member Author

mnot commented Jul 14, 2021

The 'people who use it in practice' are not using HTTP -- they're building private protocols based upon it that don't interoperate with normal HTTP implementations. What they're doing breaks HTTP and HTTP implementations.

@reschke
Copy link
Contributor

reschke commented Jul 14, 2021

RFC 7231: "A payload within a GET request message has no defined semantics; sending a payload body on a GET request might cause some existing implementations to reject the request."

I disagree that makes an implementation sending content "not HTTP". You are pushing for a change here, and I really believe you need better arguments to support that.

If the intent is to discourage things like that, I'm all in, but using "MUST NOT" is not likely to achieve that goal.

@mnot
Copy link
Member Author

mnot commented Jul 14, 2021

We got here because the 7231 language was ambiguous enough that people misinterpreted it, so we added a RFC2119 requirement. Our AD is rightly pointing out that if we use SHOULD, we need to contextualise it with when it's acceptable to violate the requirement.

What text do you propose to perform that function?

@reschke
Copy link
Contributor

reschke commented Jul 14, 2021

"A client SHOULD NOT generate content in a GET request, unless support for content is negotiated (in or out of band)."

sounds good to me

@mnot
Copy link
Member Author

mnot commented Jul 14, 2021

As I explained above, that won't work.

Out-of-Band negotiation isn't available to intermediaries.

In-band negotiation is the SEARCH method (or whatever it ends up being called); it would be strange for the WG to allow other methods of negotiation when it's working on how to accommodate this properly.

Also, while we place the requirement on the client here, realize that the impact here is really on the server -- it must not require a request to have a body in order to interoperate, because that precludes the client from using otherwise conformant HTTP implementations and infrastructure.

@reschke
Copy link
Contributor

reschke commented Jul 14, 2021

Also, while we place the requirement on the client here, realize that the impact here is really on the server -- it must not require a request to have a body in order to interoperate, because that precludes the client from using otherwise conformant HTTP implementations and infrastructure.

How so? Could you please clarify that? Are you saying that intermediaries are allowed to drop the payload? Where do we say that?

@reschke
Copy link
Contributor

reschke commented Jul 14, 2021

In-band negotiation is the SEARCH method (or whatever it ends up being called); it would be strange for the WG to allow other methods of negotiation when it's working on how to accommodate this properly.

This discussion is not about future code, but about existing, deployed code which currently works for people, and which IMHO is not strictly forbidden by the existing spec. In past discussions, we agreed on strengthening how we discourage this, but a "MUST NOT" is very different from a "SHOULD NOT". For instance, do you expect HTTP libraries that currently support request bodies on GET to stop doing that, although there's production code relying on it?

@mnot
Copy link
Member Author

mnot commented Jul 15, 2021

How so? Could you please clarify that? Are you saying that intermediaries are allowed to drop the payload? Where do we say that?

No, I'm saying that the pressure we see to allow bodies on GET is being placed upon clients by servers who create services that use it, doing so because they read the current ambiguous language and find enough wiggle room to think it's allowed and a good idea. This is because the power dynamic between a client and server is often unbalanced, especially for APIs.

This discussion is not about future code, but about existing, deployed code which currently works for people, and which IMHO is not strictly forbidden by the existing spec.

It does not work in an interoperable fashion on the scale of the Internet - which is what we're documenting. Those are private agreements. The fact that it's not strictly forbidden is how we got here.

do you expect HTTP libraries that currently support request bodies on GET to stop doing that, although there's production code relying on it?

Not any time soon, but those libraries aren't supporting HTTP in doing so; they're effectively supporting the creation of new, HTTP-derived protocols.

How about:

A client MUST NOT generate content in a GET request. Note that some implementations might support content on GET requests, but the semantics of doing so is undefined, support for it cannot be relied upon in all HTTP implementations, and its presence can cause interoperability and security issues.

@reschke
Copy link
Contributor

reschke commented Jul 15, 2021

It does not work in an interoperable fashion on the scale of the Internet - which is what we're documenting.

Not following. It does work for some people (maybe because they use HTTPS or control the request path by other means). Saying that if something does not work "on the scale of the Internet" then it's not "HTTP" or shouldn't inform our spec does not work for me. If this was the case, we really would need to start removing many more pieces from the protocol.

Also, telling libraries that they should stop supporting a use case that was allowed (with care) in RFC 2616 and 723x and which is used in practice doesn't look right to me.

We are past WG LC and IETF LC; if you really really want to change this normative requirement now (instead of explaining the SHOULD), then you should get support for that on the mailing list.

@mnot
Copy link
Member Author

mnot commented Jul 16, 2021

Not following.

What I mean is that if you create a resource that uses a body on GET, you can't just use any HTTP implementation - you have to find one that does bodies on GET. Here, 'implementation' includes the framework in use, the server software, reverse proxies, CDNs, description languages, etc.

Likewise on the client side; it's no longer a HTTP service, in that I can use any HTTP client; I have to find one that supports this extra constraint. Furthermore for proxies I might want to use, etc. Effectively, it's HTTP+GETBODY, not HTTP. It's not an extension, because it's not backwards-compatible.

You say it's "used in practice", but as far as I see it, it's a few large services using their popularity / market power to convince people to break the protocol.

If this was the case, we really would need to start removing many more pieces from the protocol.

That ignores the security and interoperability issues that this practice brings. Practically no one uses From, but that doesn't matter. This does.

We are past WG LC and IETF LC; if you really really want to change this normative requirement now (instead of explaining the SHOULD), then you should get support for that on the mailing list.

If you can suggest a qualification for the SHOULD that doesn't compound the problem, I'm all for it.

Absent that, I'll take it to the list.

@mnot mnot changed the title GET body requirement strength Content on GET: requirement strength Jul 16, 2021
@mnot
Copy link
Member Author

mnot commented Jul 16, 2021

Also, looking at the text in situ, I think the current second sentence doesn't need modification, leaving the proposal as:

A client MUST NOT generate content in a GET request. Content received in a GET request has no defined semantics, cannot alter the meaning or target of the request, and might lead some implementations to reject the request and close the connection because of its potential as a request smuggling attack (Section 11.2 of [Messaging]).

@mnot
Copy link
Member Author

mnot commented Jul 16, 2021

... and since we've tried to align this language in HEAD and DELETE, the change would also be made there.

@wtarreau
Copy link

I really disagree with changing this. There are enough mentions on the net about "what happens if I send a body with a GET" that it's pretty sure we're going to declare a few in-house deployments non-compliant. For example we could imagine that some local deployments use a request body for an auth scheme. We could even more imagine this with H2. And with H2-to-H1 gateways, the risk of seeing a request body with a GET is even higher.

The real problem with this spec is that from the very early days we only say how to emit but much less what to receive, and that many implementors on the receive path tend to take the server's "MUST NOT" for granted and ignore corner cases. Marking a MUST NOT here would certainly re-open some smuggling issues just because some implementors would be less careful about this and would never check the presence of a content-length.

So I definitely agree with Julian here, I'd rather keep the "SHOULD NOT unless negotiated by other means" at least to make sure everyone remains careful about this.

Last, I'd rather not make GET such a special method after all the work we've done in 723x to try to uniformize processing for all of them! And as seen in a recent issue, at least both Apache and HAProxy deal fine with bodies in GET requests.

@MikeBishop
Copy link
Contributor

Maybe we split the difference -- there are certainly plenty of instances of "MUST NOT... unless..." in IETF specs.

@wtarreau
Copy link

I'm fine with this approach.

@reschke
Copy link
Contributor

reschke commented Jul 16, 2021 via email

@MikeBishop
Copy link
Contributor

I think there's a difference, but it's subtle. "SHOULD NOT" implies that there are valid reasons you might choose to violate the spec, and describes what you should consider in making that choice. "MUST NOT unless" is an absolute prohibition outside of the described circumstance. I think Mark makes an excellent case for an absolute prohibition in the absence of prearranged support.

@wtarreau
Copy link

With that said, that will still declare all 723x-compliant gateways non-compliant, because they will emit this if they receive it. I'm really embarrassed by this last minute change considering the time spent on this during 723x before reaching that situation :-/

@martinthomson
Copy link
Contributor

I think Mark makes an excellent case for an absolute prohibition in the absence of prearranged support.

That is the very definition of a different protocol though. Why not just "MUST NOT" then?

@reschke
Copy link
Contributor

reschke commented Jul 17, 2021

That is the very definition of a different protocol though. Why not just "MUST NOT" then?

It is HTTP. Is has not been forbidden in HTTP. It works in certain scenarios, and these are deployed. We can't undo that, even by "yelling". If we want to discourage people from doing this in new protocols, a "SHOULD NOT" wtih explanation is just right.

@wtarreau
Copy link

Right now you could use curl as the client, haproxy as the gateway and apache as the server and all of them would happily process this without a sweat because it properly respects the messaging. Regarding the semantics, it's up to the application making use of that to define it.

Actually it is even already used and documented: the broadly used ElasticSearch explicitly documents how to perform JSON search requests using curl -XGET: https://www.bmc.com/blogs/elasticsearch-commands/

So please let's stick to "SHOULD NOT" with Julian's "unless negotiated via other means" or something like this, instead of suddenly adding dangerous exceptions to the protocol and breaking interoperability between already deployed and working components.

@mnot mnot closed this as completed in #906 Jul 23, 2021
haproxy-mirror pushed a commit to haproxy/haproxy that referenced this issue Sep 28, 2021
This kind of requests is now forbidden and rejected with a
413-Payload-Too-Large error.

It is unexpected to have a payload for GET/HEAD/DELETE requests. It is
explicitly allowed in HTTP/1.1 even if some servers may reject such
requests. However, HTTP/1.0 is not clear on this point and some old servers
don't expect any payload and never look for body length (via Content-Length
or Transfer-Encoding headers).

It means that some intermediaries may properly handle the payload for
HTTP/1.0 GET/HEAD/DELETE requests, while some others may totally ignore
it. That may lead to security issues because a request smuggling attack is
possible.

To prevent any issue, those requests are now rejected.

See also httpwg/http-core#904
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
5 participants