Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Rack::Request headers usage. #1880

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Fix Rack::Request headers usage. #1880

wants to merge 4 commits into from

Conversation

ioquatix
Copy link
Member

@ioquatix ioquatix commented Apr 27, 2022

As discussed previously there is a mismatch between Rack::Response#get/set_headers and Rack::Request#get/set_headers.

Specifically, Rack::Response headers are actual HTTP headers, while Rack::Request headers are env key pairs.

This PR is one option for changing Rack::Request#get/set_headers to be consistent with the semantic of HTTP headers. I think this is the best option. The proposed methods are backwards compatible but emit warnings when used with things which are likely to be env keys.

The biggest limitation is that not all HTTP header keys are allowed, specifically headers with periods (.) are considered env keys, even though periods are considered valid token characters and thus valid header keys. This allows rack.internal.keys to pass correctly as non-headers retaining compatibility (following the spec which specifies one period must be included in application specific keys).

I believe this PR is 100% backwards compatible and I've split it into two parts: (commit 1) is just the change itself and emits lots of warnings, and (commit 2) is reworking all Rack code to use env directly where it makes sense.

I believe this PR will allow Rails to remove their "patch" https://github.com/rails/rails/blob/f95c0b7e96eb36bc3efc0c5beffbb9e84ea664e4/actionpack/lib/action_dispatch/http/headers.rb#L121-L129 althought in theory it remains fully compatible.

@ioquatix
Copy link
Member Author

ioquatix commented Apr 27, 2022

There are some small details which I think we need to flesh out too.

Specifically, I noticed that the linter does not do any validation on incoming request HTTP_ header keys/values.

I suspect it might be a good idea to add this.

Specifically, I've added (and should probably revert, but left it in for this specific discussion):

- set_header key, "#{get_header key},#{v}"
+ @env[key] = [current, value]

Generally speaking there is only one response header whihc does not allow comma separated values: set-cookie. On the request side, all headers (AFAIK) support comma separated form. That being said, splitting headers can be tricky and I wonder if we want to have a more general model, like get_header_values -> Array or something. There are some methods where this would be useful but I'm not sure how safe a general solution is.

Regardless of what we do here, we should update Rack::Lint to validate that HTTP_ header keys conform to the expected format - String, Array, normalised form.. etc

@jeremyevans
Copy link
Contributor

I think this breaks backwards compatibility too much. I have submitted an alternative backwards compatible proposal that I think offers a nicer API: #1881

@ioquatix
Copy link
Member Author

ioquatix commented Apr 27, 2022

@jeremyevans what part of this PR is not compatible with existing code? Basically, all the specs pass with no changes (except one specific point mentioned above).

@jeremyevans
Copy link
Contributor

The plan that this PR sets forth (removing the warnings and breaking code in Rack 3.1) is not backwards compatible. I think if the goal of the PR is to break backwards compatibility later, the PR should be considered as not backwards compatible, even if it technically retains immediate compatibility via code that warns that it will be removed in a later version.

Considering you like the interface in #1881, and that approach is backwards compatible, I think we should use that approach. We can warn in the documentation against using the existing *_header methods because they don't operate the way you would expect based on the name. Personally, I don't think the benefits of deprecating and removing outweigh the cost of backwards compatibility loss, but if we have to, we could deprecate now and remove in Rack 4.

In terms of implementation in this PR, content-type and content-length headers are not handled correctly. Those are special cased in the Rack SPEC to not be prefixed via HTTP_ in env, but header_name and env_name don't handle that. For example, get_header('content-type') and get_header('content-length') should work without warning (since they were submitted as HTTP headers), but looking at the implementation, I'm guessing they currently warn. Likewise, each_header will not yield content-type and content-length headers, even if they were submitted to the webserver. I also don't see a reason for HTTP_HEADER_PATTERN to exclude A-Z, considering you are going to upcase anyway.

@ioquatix
Copy link
Member Author

I essentially see this as a bug fix PR where as your proposal is a new feature. Rails is already working around this behaviour. The implementation here should be safely composable with existing Rails releases and also allow them with minimal changes to remove their patch over get/set_headers.

The plan that this PR sets forth (removing the warnings and breaking code in Rack 3.1) is not backwards compatible.

With that interpretation, what about all the other deprecated functionality that is slated to be removed in 3.1? Quite a bit of of that is not backwards compatible.

we could deprecate now and remove in Rack 4.

I'm fine with that.

In terms of implementation in this PR, content-type and content-length headers are not handled correctly

Define "correctly"...? From https://www.rfc-editor.org/rfc/rfc3875.html#section-4.1.18

   The server is not required to create meta-variables for all the
   header fields that it receives.  In particular, it SHOULD remove any
   header fields carrying authentication information, such as
   'Authorization'; or that are available to the script in other
   variables, such as 'Content-Length' and 'Content-Type'.  The server
   MAY remove header fields that relate solely to client-side
   communication issues, such as 'Connection'.

Based on this interpretation from the CGI RFC, these are no longer headers: "remove any header fields... Content-Length... Content-Type...". Not sure what the ideal mechanism is here, but after the point they are extracted, they should no longer be present in the "headers" according to the RFC.

Considering you like the interface in #1881, and that approach is backwards compatible

It doesn't solve the problem being addressed here. I like the general idea though. I think we should discuss it for Rack 3.1 Perfectly happy to punt this PR to a later Rack release too. It seems like something we should try to fix though.

@jeremyevans
Copy link
Contributor

I essentially see this as a bug fix PR where as your proposal is a new feature. Rails is already working around this behaviour. The implementation here should be safely composable with existing Rails releases and also allow them with minimal changes to remove their patch over get/set_headers.

I do not see this as a bug fix API. The existing *_header methods are poorly named, but they were never intended to do the type of mangling you are doing. What you are doing is implementing a backwards incompatible new feature using existing methods.

The plan that this PR sets forth (removing the warnings and breaking code in Rack 3.1) is not backwards compatible.

With that interpretation, what about all the other deprecated functionality that is slated to be removed in 3.1? Quite a bit of of that is not backwards compatible.

We've agreed to make many backwards incompatible changes. We knew when doing so that removal was not backwards compatible. It's not like we accepted deprecation with a plan to remove and considered it a backwards compatible solution.

we could deprecate now and remove in Rack 4.

I'm fine with that.

To be clear, you are fine with merging #1881, closing this, and deprecating the existing *_header methods? If not, can you please state what approach you are fine with?

In terms of implementation in this PR, content-type and content-length headers are not handled correctly

Define "correctly"...? From https://www.rfc-editor.org/rfc/rfc3875.html#section-4.1.18

   The server is not required to create meta-variables for all the
   header fields that it receives.  In particular, it SHOULD remove any
   header fields carrying authentication information, such as
   'Authorization'; or that are available to the script in other
   variables, such as 'Content-Length' and 'Content-Type'.  The server
   MAY remove header fields that relate solely to client-side
   communication issues, such as 'Connection'.

Based on this interpretation from the CGI RFC, these are no longer headers: "remove any header fields... Content-Length... Content-Type...". Not sure what the ideal mechanism is here, but after the point they are extracted, they should no longer be present in the "headers" according to the RFC.

My reading of this is that HTTP_CONTENT_LENGTH and HTTP_CONTENT_TYPE meta variables should not be created, because the values are already set in CONTENT_LENGTH and CONTENT_TYPE. This matches what SPEC states. Also, SPEC only requires RFC3875 conformance for HTTP_ variables, it doesn't say anything about RFC3875 conformance for non-HTTP_ variables.

I think almost all rack users would assume that content-length and content-type, if submitted as request headers, with values present in env in the CONTENT_LENGTH and CONTENT_TYPE keys, would be included in headers. Do you honestly think that users expect that content-length and content-type would not show up as headers?

Considering you like the interface in #1881, and that approach is backwards compatible

It doesn't solve the problem being addressed here. I like the general idea though. I think we should discuss it for Rack 3.1 Perfectly happy to punt this PR to a later Rack release too. It seems like something we should try to fix though.

The problem being addressed here is that users don't have a good way to do header to env and env to header mangling. You propose modifying *_header methods to do such mangling, in a backwards incompatible manner. I propose a new headers method to do such mangling, in a backwards compatible manner. It's exactly the same problem, IMO.

@ioquatix
Copy link
Member Author

I do not see this as a bug fix API. The existing *_header methods are poorly named, but they were never intended to do the type of mangling you are doing.

Okay, either (1) change the name to match the function or (2) fix the function to match the name. To me, it's both bug. Since (1) introduce major compatibility issues, a careful implementation of (2) seems like the only way forward.

I think it's clear from the comment made 7 (!) years ago that there is confusion around what these methods are supposed to do, with the original documentation giving examples using raw HTTP header names which seems to imply the name is correct and the implementation is wrong.

Since generally, CGI variables, headers and metadata keys can be distinct (with some very minor/obscure overlap) we can definitely fix these methods to be both compatible and do the right thing.

I think almost all rack users would assume that content-length and content-type, if submitted as request headers, with values present in env in the CONTENT_LENGTH and CONTENT_TYPE keys, would be included in headers. Do you honestly think that users expect that content-length and content-type would not show up as headers?

content-length is not a header in HTTP/2 and in theory it might not even be a header in CGI / HTTP/1 (chunked encoding could be buffered for example). So my opinion is get_header('content-length') should not be used as it's not reliable (and can introduce security issues too, imagine someone provides content_length header for example).

I find content-type more peculiar. I can't imagine any security issues coming out of content-type spoofing that aren't already present from the original provider of the request. I therefore don't know why the CGI spec chose to expose CONTENT_TYPE instead of HTTP_CONTENT_TYPE. The CGI RFC outlines several headers like this (authorization being another one) and I wonder to what extent we should try to map them back? Probably we need a general reverse mapping of CGI variable to header for some subset of CGI variables that are documented to be extracted from headers.

The problem being addressed here is that users don't have a good way to do header to env and env to header mangling. You propose modifying *_header methods to do such mangling, in a backwards incompatible manner. I propose a new headers method to do such mangling, in a backwards compatible manner. It's exactly the same problem, IMO.

Your solution does not fix the original source of the problem, it just introduce a new interface without fixing the original issue. In order to deal with that, you'd then need to introduce blanket deprecations on Rack::Request#*_header which is worse than what I've proposed here which is specific, targeted deprecations based on specific questionable usage patterns.

I think we all agree that Rack::Request#*_header and Rack::Response#*_header methods should deal with headers. This PR tries to make a step towards that goal. If we want to retain compatibility indefinitely, we can just remove the warning and leave it as proposed here, which is completely compatible with all existing specs, external tests, etc.

@jeremyevans
Copy link
Contributor

I do not see this as a bug fix API. The existing *_header methods are poorly named, but they were never intended to do the type of mangling you are doing.

Okay, either (1) change the name to match the function or (2) fix the function to match the name. To me, it's both bug. Since (1) introduce major compatibility issues, a careful implementation of (2) seems like the only way forward.

I prefer (3), realize that backwards compatibility is important, don't break it, and instead introduce a better API for solving the same problem.

I think it's clear from the comment made 7 (!) years ago that there is confusion around what these methods are supposed to do, with the original documentation giving examples using raw HTTP header names which seems to imply the name is correct and the implementation is wrong.

It's clear from when these methods were first introduced in ef5546c that they were not intended to deal with HTTP specific headers, but with arbitrary data associated witht the request object. Look at the commit message and tests in the commit.

Since generally, CGI variables, headers and metadata keys can be distinct (with some very minor/obscure overlap) we can definitely fix these methods to be both compatible and do the right thing.

You can 100% not fix these methods to be backwards compatible and "do the right thing". Either the methods mangle, or they don't. If they mangle, they are not backwards compatible.

I think almost all rack users would assume that content-length and content-type, if submitted as request headers, with values present in env in the CONTENT_LENGTH and CONTENT_TYPE keys, would be included in headers. Do you honestly think that users expect that content-length and content-type would not show up as headers?

content-length is not a header in HTTP/2 and in theory it might not even be a header in CGI / HTTP/1 (chunked encoding could be buffered for example). So my opinion is get_header('content-length') should not be used as it's not reliable (and can introduce security issues too, imagine someone provides content_length header for example).

99% of users are using HTTP/1 and would expect that get_header('content-length') returns the length submitted in the content-length header.

The content-length vs content_length is not specific to this header, but to all headers. Most front-end webservers like Nginx would block/ignore content_length, and I wouldn't have a problem updating SPEC to say that webservers should ignore headers with underscores.

I find content-type more peculiar. I can't imagine any security issues coming out of content-type spoofing that aren't already present from the original provider of the request. I therefore don't know why the CGI spec chose to expose CONTENT_TYPE instead of HTTP_CONTENT_TYPE. The CGI RFC outlines several headers like this (authorization being another one) and I wonder to what extent we should try to map them back? Probably we need a general reverse mapping of CGI variable to header for some subset of CGI variables that are documented to be extracted from headers.

content-length and content-type are the only headers treated specially by not being prefixed with HTTP_. Here's some examples from popular web servers:

The problem being addressed here is that users don't have a good way to do header to env and env to header mangling. You propose modifying *_header methods to do such mangling, in a backwards incompatible manner. I propose a new headers method to do such mangling, in a backwards compatible manner. It's exactly the same problem, IMO.

Your solution does not fix the original source of the problem, it just introduce a new interface without fixing the original issue. In order to deal with that, you'd then need to introduce blanket deprecations on Rack::Request#*_header which is worse than what I've proposed here which is specific, targeted deprecations based on specific questionable usage patterns.

I disagree. My solution fixes the problem of allowing users to easily deal with HTTP headers. You want to modify *_header to do that, in a backwards incompatible manner, instead of adding a new backwards compatible method to do it.

You are fixated on the fact that *_header isn't a good name. The fact that it isn't a good name is not a bug, and this is not a bug fix.

We don't need to introduce blanket deprecations at all. We can keep the existing methods, and just indicate via documentation that it is a bad idea to use them in new code. Old code stays working, new code uses headers, nobody's application breaks just to satisfy a foolish desire for consistency.

I think we all agree that Rack::Request#*_header and Rack::Response#*_header methods should deal with headers.

No, we don't all agree about that. We probably all agree that absent backwards compatibility concerns, it would make more sense for Request#*_header to deal with HTTP headers. However, backwards compatibility is important. We should aim to be like Ruby itself, and only break backwards compatibility when the benefits outweigh the costs.

This PR tries to make a step towards that goal. If we want to retain compatibility indefinitely, we can just remove the warning and leave it as proposed here, which is completely compatible with all existing specs, external tests, etc.

This approach is backwards incompatible if you change the behavior that currently warns to always treating as an HTTP header. Until you do that, the semantics of the *_header methods are questionable. For example, if I submit a request with a path-info header, get_header('path-info') does not do what I want. So this approach is either backwards incompatible or has even worse semantics than just a bad method name.

@ioquatix
Copy link
Member Author

ioquatix commented Apr 28, 2022

Thanks for the great investigation. I don't think consistency is a foolish desire. Being inconsistent leads to real world bugs, hard to use code, and security issues.

By choosing (3) we don't fix any existing code paths and end up with confusing/inconsistent interfaces - so I think (3) needs to be combined with warnings & a deprecation strategy.

This approach is backwards incompatible if you change the behavior that currently warns to always treating as an HTTP header.

Yes.

Until you do that, the semantics of the *_header methods are questionable.

The semantics are already questionable and will remain questionable unless changed. This is an incremental step towards being consistent with other methods of the same name in Rack::Response and being less questionable. The middle step proposed here is backwards compatible. The end goal is not. I think we have to accept that the end goal is not compatible because the compatible use case is what we are calling questionable.

Not sure if there are any alternatives to fixing this except doing nothing at all to the implementation and adding documentation saying it's questionable, but that does not address my root concern.

Do you mind explaining what your expectations are for get_header('path-info') because I'm not sure how that wouldn't work, but in any case, let me add a spec for it.

@ioquatix
Copy link
Member Author

ioquatix commented Apr 28, 2022

Ok, never mind about get_header('path-info'), I've fixed that case and added a spec, thanks for pointing it out. By the way, I basically copied the implementation from Rails, so they might have the same bug...

@ioquatix ioquatix modified the milestones: v3.0.0, v3.1.0 Aug 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants