Ensure that multiline input doesn't cause Rack::Request to break #1607

pvande · 2020-02-16T19:28:21Z

As written, the AUTHORITY regex intends to match the entire input string, and the sole usage of it expects that it will always return a match. After some consideration, there is a way to feed it input that will not match, causing errors on the subsequent line.

The issue at hand is that . in a regular expression matches everything except newlines by default, meaning that even the most generous regular expression with start- and end-of-string anchors (/\A.*\z/) cannot match a multiline string. The solution, then, is to use the m (multiline) flag on the regex, which allows . to match newlines as well.

This change also adds a test for the behavior of the \Z anchor (end-of-chomped-string) vs the \z anchor (actual end-of-string).

As written, the `AUTHORITY` regex intends to match the entire input string, and the sole usage of it expects that it will always return a match. After some consideration, there *is* a way to feed it input that will not match, causing errors on the subsequent line. The issue at hand is that `.` in a regular expression matches everything *except newlines* by default, meaning that even the most generous regular expression with start- and end-of-string anchors (`/\A.*\z/`) cannot match a multiline string. The solution, then, is to use the `m` (multiline) flag on the regex, which allows `.` to match newlines as well.

jeremyevans · 2020-02-16T19:37:23Z

I'm not sure if we want to accept hosts that are definitely incorrect. More relaxing parsing than the RFC requires may be fine, but accepting newlines may be going a bit too far. Are there clients are actually sending such hosts?

Is the current behavior when such hosts are given an exception? If so, that sounds wrong, we should probably just skip that authority in such cases and use the next authority.

pvande · 2020-02-16T22:42:42Z

I have no idea whether there are clients sending such hosts, and I would consider them broken if there were. (You might be able to fudge it through curl.) It can absolutely happen through the Ruby API, though.

I took this approach is because the current implementation assumes no newlines (and throws exceptions otherwise), and it’s backwards-compatible with the previous implementation.

I think that using fallback authorities sounds great in principle, but would break backwards compatibility in subtle ways. As someone who lost over half a day to debugging why a trivial Rails upgrade was causing asset serving to fail (tl;dr: Webpacker depends on Rack::Proxy (neither updated), which uses Rack::Request#host to determine which server to proxy to, and the Rails version bump also bumped Rack, returning different results in my environment), I find maintaining backwards compatibility fair important, even when it’s “incorrect”.

ioquatix · 2020-02-16T23:12:16Z

How would you send a host header including newlines with HTTP/1? I think it’s invalid.

pvande · 2020-02-16T23:25:11Z

@ioquatix There's no question that it's invalid, and you're very likely correct that it cannot be sent over HTTP/1. Rack::Request does get used sometimes with data that doesn't come directly from an HTTP request, however, and retaining newlines is important for maintaining compatibility for that case, despite the absurdity.

To be clear, I really only see this as an issue for the 2.x line of releases – it seems completely reasonable to do more/better validations in a new major version, including:

IPv6 address contains non-hex characters
IPv6 address contains more than one elision (::)
IPv6 address segment contains more that four digits
IPv6 address contains more than eight segments
IPv6 address contains invalid IPv4 address
IPv4 address does not contain four segments
IPv4 segment contains a number greater than 255
DNS name contains a newline
DNS name contains a colon

ioquatix · 2020-02-16T23:32:04Z

Fair enough I get your point. Previously the parser was very loose. But now it’s a bit more strict. The key point is do we have some definition of what a host is, as in Request#host. My internal definition is a string that fits the authority/host part of an absolute URI. I don’t think it was documented or specced clearly but under this definition we shouldn’t accept newline characters (or any non-printing characters actually). Because #host should return a string suitable for making a URI, as in other methods of Request.

pvande · 2020-02-16T23:42:37Z

#host should return a string suitable for making a URI…

That’s also a completely fair point, and I suspect that’s also a fairly common expectation of the contract (if not its use in practice). Given that, the previous behavior is obviously a bug, and anyone who depended on that behavior would most likely have ended up with a non-functional URI. I’ll update the PR accordingly as soon as my hands are free. 👍

…

On Sun, Feb 16, 2020 at 3:32 PM Samuel Williams ***@***.***> wrote: Fair enough I get your point. Previously the parser was very loose. But now it’s a bit more strict. The key point is do we have some definition of what a host is, as in Request#host. My internal definition is a strong that fits the authority/host part of an absolute URI. I don’t think it was documented or specced clearly but under this definition we shouldn’t accept newline characters (or any non-printing characters actually). Because #host should return a string suitable for making a URI, as in other methods of Request. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1607?email_source=notifications&email_token=AAAASLQQF4MBLBV3MS4JH6TRDHEHLA5CNFSM4KWF6YT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEL4V7CY#issuecomment-586768267>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAASLQK2XNWN6SGOOL4FG3RDHEHLANCNFSM4KWF6YTQ> .

If the address cannot reasonably be used to reconstruct a URI, we return nil. This includes generous capture patterns for IPv6 and IPv4 addresses, with validation of the captured data happening as a secondary pass. Additionally added some testing with extreme examples of valid inputs, and a few invalid ones.

pvande · 2020-02-17T07:19:02Z

The latest commit narrows acceptable inputs somewhat, and performs some additional validations on the machine-facing classes.

Any address containing blank characters is rejected.
Any address wrapped in square brackets is captured as an IPv6 address.
Any address comprised entirely of decimal digits and dots is captured as an IPv4 address.
Any other address is captured as a hostname.
IPv6 and IPv4 captures are tested against the appropriate regular expression from resolv (IPv6, IPv4), and are rejected if they do not match.
- This seemed appropriate given the machine-facing nature of these addresses, the relaxed constraints implied by the regular expressions, and the goal of ultimately producing functional URIs.

ioquatix · 2020-02-17T07:48:41Z

The URI RFCs specifically allow future versions of ip addresses in square brackets. Do we handle this correctly?

pvande · 2020-02-17T16:29:36Z

Great question!

IP-literal = "[" ( IPv6address / IPvFuture ) "]" IPvFuture = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" ) If a URI containing an IP-literal that starts with "v"

(case-insensitive), indicating that the version flag is present, is dereferenced by an application that does not know the meaning of that version flag, then the application should return an appropriate error for "address mechanism not supported". This PR captures anything wrapped in square brackets as IPv6, but it matches *anything* (non-blank). Addresses captured as IPv6 are validated as IPv6, which will fail for IPvFuture and return nil. ~As we cannot currently purport to understand or validate a future version of the spec, this seems to be a credible interpretation of the RFC.~ **Actually, nevermind that – we don't need to _interpret_ or _valdiate_ an IPvFuture address to be able to parse it. I'll get another change in.** In the future, when we do need to add compatibility with an iteration of IPvFuture, we can prepend a clause specifically for that `v` prefix, and validate it independently.

…

On Sun, Feb 16, 2020 at 11:48 PM Samuel Williams ***@***.***> wrote: The URI RFCs specifically allow future versions of ip addresses in square brackets. Do we handle this correctly? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1607?email_source=notifications&email_token=AAAASLUPZYMUYYHPOXAOGV3RDI6NVA5CNFSM4KWF6YT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEL5MCTQ#issuecomment-586858830>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAASLQYRXTJOYF6BG2JQA3RDI6NVANCNFSM4KWF6YTQ> .

ioquatix · 2020-05-24T15:07:30Z

This in theory looks good to me but I need to do a closer review. Can you please rebase on master?

jeremyevans

Sorry for the long delay in reviewing this.

I don't think this warrants requiring the entire resolv library if we are just using a couple regexps in it. Can we copy the regexp constants and define them in lib/rack/request.rb, so we don't need to require resolv? It may be possible to inline the IPv6/IPv4 regexps into the AUTHORITY regexp so we don't need the separate checks for it in split_authority. If so, that would be best.

ioquatix · 2022-01-20T08:51:45Z

I agree with Jeremy on this point, and also just for the sake of efficiency, a single Regexp should be enough.

Tighten up IPv6 parsing rules using regexp extracted from resolv in stdlib, simplified to avoid creating additional groups. Tighten up hostname matching to graphical characters, except square brackets (so it doesn't overlap with IPv6 parsing). Avoid unnecessary IPv4 matching, since anything that matches as an IPv4 address would match as a hostname. Remove unnecessary named group creation. Don't allow trailing newlines in host names. Fixes rack#1607 Co-authored-by: Pieter van de Bruggen <pvande@gmail.com>

jeremyevans · 2022-03-21T19:16:19Z

I did another review of this and I don't think all of this complexity is needed. Handling IPvFuture seems unnecessary. The odds are low that rack will have to deal with anything beyond IPv6. I don't think it makes sense to allow trailing newlines in hostnames. We don't actually use the IPv4 vs. hostname distinction, so that can be removed. We can tighten the IPv6 parsing by inlining the resolv values, so this doesn't need to depend on resolv at runtime. I submitted a pull request for my recommended approach: #1835

Tighten up IPv6 parsing rules using regexp extracted from resolv in stdlib, simplified to avoid creating additional groups. Tighten up hostname matching to graphical characters, except square brackets (so it doesn't overlap with IPv6 parsing). Avoid unnecessary IPv4 matching, since anything that matches as an IPv4 address would match as a hostname. Remove unnecessary named group creation. Don't allow trailing newlines in host names. Fixes #1607 Co-authored-by: Pieter van de Bruggen <pvande@gmail.com>

pvande requested review from ioquatix and jeremyevans February 16, 2020 19:28

Ensure that syntactically valid IPvFuture addresses are passed through

d686fb6

jeremyevans requested changes Jul 13, 2020

View reviewed changes

jeremyevans mentioned this pull request Mar 21, 2022

Tighten authority matching #1835

Merged

jeremyevans closed this in #1835 Mar 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure that multiline input doesn't cause Rack::Request to break #1607

Ensure that multiline input doesn't cause Rack::Request to break #1607

pvande commented Feb 16, 2020

jeremyevans commented Feb 16, 2020

pvande commented Feb 16, 2020 •

edited

Loading

ioquatix commented Feb 16, 2020

pvande commented Feb 16, 2020

ioquatix commented Feb 16, 2020 •

edited

Loading

pvande commented Feb 16, 2020 via email

pvande commented Feb 17, 2020

ioquatix commented Feb 17, 2020

pvande commented Feb 17, 2020 via email •

edited

Loading

ioquatix commented May 24, 2020

jeremyevans left a comment

ioquatix commented Jan 20, 2022

jeremyevans commented Mar 21, 2022

Ensure that multiline input doesn't cause Rack::Request to break #1607

Ensure that multiline input doesn't cause Rack::Request to break #1607

Conversation

pvande commented Feb 16, 2020

jeremyevans commented Feb 16, 2020

pvande commented Feb 16, 2020 • edited Loading

ioquatix commented Feb 16, 2020

pvande commented Feb 16, 2020

ioquatix commented Feb 16, 2020 • edited Loading

pvande commented Feb 16, 2020 via email

pvande commented Feb 17, 2020

ioquatix commented Feb 17, 2020

pvande commented Feb 17, 2020 via email • edited Loading

ioquatix commented May 24, 2020

jeremyevans left a comment

Choose a reason for hiding this comment

ioquatix commented Jan 20, 2022

jeremyevans commented Mar 21, 2022

pvande commented Feb 16, 2020 •

edited

Loading

ioquatix commented Feb 16, 2020 •

edited

Loading

pvande commented Feb 17, 2020 via email •

edited

Loading