-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure that multiline input doesn't cause Rack::Request to break #1607
Conversation
As written, the `AUTHORITY` regex intends to match the entire input string, and the sole usage of it expects that it will always return a match. After some consideration, there *is* a way to feed it input that will not match, causing errors on the subsequent line. The issue at hand is that `.` in a regular expression matches everything *except newlines* by default, meaning that even the most generous regular expression with start- and end-of-string anchors (`/\A.*\z/`) cannot match a multiline string. The solution, then, is to use the `m` (multiline) flag on the regex, which allows `.` to match newlines as well.
I'm not sure if we want to accept hosts that are definitely incorrect. More relaxing parsing than the RFC requires may be fine, but accepting newlines may be going a bit too far. Are there clients are actually sending such hosts? Is the current behavior when such hosts are given an exception? If so, that sounds wrong, we should probably just skip that authority in such cases and use the next authority. |
I have no idea whether there are clients sending such hosts, and I would consider them broken if there were. (You might be able to fudge it through I took this approach is because the current implementation assumes no newlines (and throws exceptions otherwise), and it’s backwards-compatible with the previous implementation. I think that using fallback authorities sounds great in principle, but would break backwards compatibility in subtle ways. As someone who lost over half a day to debugging why a trivial Rails upgrade was causing asset serving to fail (tl;dr: Webpacker depends on Rack::Proxy (neither updated), which uses Rack::Request#host to determine which server to proxy to, and the Rails version bump also bumped Rack, returning different results in my environment), I find maintaining backwards compatibility fair important, even when it’s “incorrect”. |
How would you send a host header including newlines with HTTP/1? I think it’s invalid. |
@ioquatix There's no question that it's invalid, and you're very likely correct that it cannot be sent over HTTP/1. To be clear, I really only see this as an issue for the 2.x line of releases – it seems completely reasonable to do more/better validations in a new major version, including:
|
Fair enough I get your point. Previously the parser was very loose. But now it’s a bit more strict. The key point is do we have some definition of what a host is, as in |
#host should return a string suitable for making a URI…
That’s also a completely fair point, and I suspect that’s also a fairly
common expectation of the contract (if not its use in practice). Given
that, the previous behavior is obviously a bug, and anyone who depended on
that behavior would most likely have ended up with a non-functional URI.
I’ll update the PR accordingly as soon as my hands are free. 👍
…On Sun, Feb 16, 2020 at 3:32 PM Samuel Williams ***@***.***> wrote:
Fair enough I get your point. Previously the parser was very loose. But
now it’s a bit more strict. The key point is do we have some definition of
what a host is, as in Request#host. My internal definition is a strong
that fits the authority/host part of an absolute URI. I don’t think it was
documented or specced clearly but under this definition we shouldn’t accept
newline characters (or any non-printing characters actually). Because
#host should return a string suitable for making a URI, as in other
methods of Request.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1607?email_source=notifications&email_token=AAAASLQQF4MBLBV3MS4JH6TRDHEHLA5CNFSM4KWF6YT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEL4V7CY#issuecomment-586768267>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAASLQK2XNWN6SGOOL4FG3RDHEHLANCNFSM4KWF6YTQ>
.
|
If the address cannot reasonably be used to reconstruct a URI, we return nil. This includes generous capture patterns for IPv6 and IPv4 addresses, with validation of the captured data happening as a secondary pass. Additionally added some testing with extreme examples of valid inputs, and a few invalid ones.
The latest commit narrows acceptable inputs somewhat, and performs some additional validations on the machine-facing classes.
|
The URI RFCs specifically allow future versions of ip addresses in square brackets. Do we handle this correctly? |
Great question!
IP-literal = "[" ( IPv6address / IPvFuture ) "]"
IPvFuture = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
If a URI containing an IP-literal that starts with "v"
(case-insensitive), indicating that the version flag is present, is
dereferenced by an application that does not know the meaning of that
version flag, then the application should return an appropriate error for
"address mechanism not supported".
This PR captures anything wrapped in square brackets as IPv6, but it
matches *anything* (non-blank). Addresses captured as IPv6 are validated
as IPv6, which will fail for IPvFuture and return nil. ~As we cannot
currently purport to understand or validate a future version of the spec,
this seems to be a credible interpretation of the RFC.~ **Actually, nevermind that – we don't need to _interpret_ or _valdiate_ an IPvFuture address to be able to parse it. I'll get another change in.**
In the future, when we do need to add compatibility with an iteration of
IPvFuture, we can prepend a clause specifically for that `v` prefix, and
validate it independently.
…On Sun, Feb 16, 2020 at 11:48 PM Samuel Williams ***@***.***> wrote:
The URI RFCs specifically allow future versions of ip addresses in square
brackets. Do we handle this correctly?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1607?email_source=notifications&email_token=AAAASLUPZYMUYYHPOXAOGV3RDI6NVA5CNFSM4KWF6YT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEL5MCTQ#issuecomment-586858830>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAASLQYRXTJOYF6BG2JQA3RDI6NVANCNFSM4KWF6YTQ>
.
|
This in theory looks good to me but I need to do a closer review. Can you please rebase on master? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the long delay in reviewing this.
I don't think this warrants requiring the entire resolv library if we are just using a couple regexps in it. Can we copy the regexp constants and define them in lib/rack/request.rb, so we don't need to require resolv? It may be possible to inline the IPv6/IPv4 regexps into the AUTHORITY regexp so we don't need the separate checks for it in split_authority. If so, that would be best.
I agree with Jeremy on this point, and also just for the sake of efficiency, a single Regexp should be enough. |
Tighten up IPv6 parsing rules using regexp extracted from resolv in stdlib, simplified to avoid creating additional groups. Tighten up hostname matching to graphical characters, except square brackets (so it doesn't overlap with IPv6 parsing). Avoid unnecessary IPv4 matching, since anything that matches as an IPv4 address would match as a hostname. Remove unnecessary named group creation. Don't allow trailing newlines in host names. Fixes rack#1607 Co-authored-by: Pieter van de Bruggen <pvande@gmail.com>
I did another review of this and I don't think all of this complexity is needed. Handling IPvFuture seems unnecessary. The odds are low that rack will have to deal with anything beyond IPv6. I don't think it makes sense to allow trailing newlines in hostnames. We don't actually use the IPv4 vs. hostname distinction, so that can be removed. We can tighten the IPv6 parsing by inlining the resolv values, so this doesn't need to depend on resolv at runtime. I submitted a pull request for my recommended approach: #1835 |
Tighten up IPv6 parsing rules using regexp extracted from resolv in stdlib, simplified to avoid creating additional groups. Tighten up hostname matching to graphical characters, except square brackets (so it doesn't overlap with IPv6 parsing). Avoid unnecessary IPv4 matching, since anything that matches as an IPv4 address would match as a hostname. Remove unnecessary named group creation. Don't allow trailing newlines in host names. Fixes #1607 Co-authored-by: Pieter van de Bruggen <pvande@gmail.com>
As written, the
AUTHORITY
regex intends to match the entire input string, and the sole usage of it expects that it will always return a match. After some consideration, there is a way to feed it input that will not match, causing errors on the subsequent line.The issue at hand is that
.
in a regular expression matches everything except newlines by default, meaning that even the most generous regular expression with start- and end-of-string anchors (/\A.*\z/
) cannot match a multiline string. The solution, then, is to use them
(multiline) flag on the regex, which allows.
to match newlines as well.This change also adds a test for the behavior of the
\Z
anchor (end-of-chomped-string) vs the\z
anchor (actual end-of-string).