Various suspected issues with the `absoluteURI` path in `DefaultRequest.getUri()` #1517

thomaslee · 2020-04-30T10:02:36Z

Below I differentiate between absoluteURI URIs (e.g. http://foo.com/path?bar=baz) and abs_path URIs (e.g. /path?bar=baz). These names were taken from the HTTP 1.1 spec.

The first time it's called for a given DefaultRequest instance, getUri() seems to either accept an abs_path URI unmodified or to convert an absoluteURI style URI into something resembling an abs_path. In the latter case it attempts to parse the absoluteURI using URI.create(rawUri) to separate the path, query and fragment components and then reconstructs the URI from those components.

There seems to be a few issues with aspects of this.

`URI.create(...)` rejects imperfectly encoded URLs

This one is hurting us the most because it's hard to work around, and it's also probably the most controversial point.

A client-side query string encoding bug is preventing a ratpack proxy from accepting requests with non-conformant query strings even though the backend service (and other proxies) can accept and process the exact same request when the ratpack proxy is bypassed.

The reason for this is if the absoluteURI is imperfectly encoded, URI.create(...) will throw and cause a HTTP 500 error. In our case, /some/path?foo={bar} would happily accepted by the backend service but causes a 500 in ratpack due to an IllegalArgumentException due to the unencoded {/} characters.

Such characters are strictly-speaking "not allowed" per the spec but are often happily accepted by HTTP servers. I think ratpack could loosen its URI parsing logic a little and allow these requests through to the backend to decide if they should be rejected or not.

Bad encoding for `uri` when the input `absoluteURI` is correct

Sort of the "reverse" of the above issue: if URI.create(...) succeeds, getUri() will then try to reconstruct an abs_path-style URI from the absoluteURI it was given. It uses the decoded values for the query and fragment in the reconstructed abs_path. The resulting abs_path URI is not properly encoded as a result. This isn't causing trouble for us, but it's not hard to imagine how it might cause confusion.

This is trivially seen with a test containing percent-encoded characters. IMO the following two inputs should give the same output for uri, but that's not the case:

# correct
/a/b?foo=%7babcdef%7d -> /a/b?foo=%7babcdef%7d

# incorrect
http://example.com/a/b?foo=%7babcdef%7d -> /a/b?foo={abcdef}

Fragment "support" should probably be removed

A well-behaved client will not typically pass the fragment along to the server, but the absoluteURI path in getUri() will try to preserve the #fragment part of a URI for some reason. Adding tests for fragments seems to show that preserving the fragment will break things like getQuery and getPath anyway, so I question whether this support is well-vetted or valuable.

Fragments are also seemingly not part of Request-URI in the HTTP spec afaict, but given I'm arguing for less strict parsing elsewhere I'm not going to use that as an argument here. 😉

WIP patch

I have a patch I believe would address most/all of it by loosening up the parsing of absoluteURIs a bit and skipping virtually all of the decoding, but wanted to raise this here first given the finer details are probably up for debate.

The text was updated successfully, but these errors were encountered:

It helps to see Request-URI per the HTTP spec to understand this change: Request-URI = "*" | absoluteURI | abs_path | authority Previously abs_paths would be passed through unmodified while absoluteURIs were parsed and either rejected because of malformed URIs (triggering HTTP 500s) or accepted but with subtle encoding issues. "*" was also not properly handled by getPath(). There were also other implementation-specific quirks like "uri:foo" yielding a path consisting of the "null" string literal. The "authority-only" case doesn't need to be handled because Ratpack does not support the CONNECT method. This change leaves abs_paths alone but tries to make all the other cases more consistent with one another. When in doubt we try to do something predictable. This does, however, change the semantics of a few edge cases. I have kept the commit implementing tests independent of the key change here so the previous semantics and the new semantics can be compared. This is the proposed fix/work-around for ratpack#1517.

thomaslee mentioned this issue May 14, 2020

A more consistent DefaultRequest.getUri() #1518

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various suspected issues with the `absoluteURI` path in `DefaultRequest.getUri()` #1517

Various suspected issues with the `absoluteURI` path in `DefaultRequest.getUri()` #1517

thomaslee commented Apr 30, 2020

Various suspected issues with the absoluteURI path in DefaultRequest.getUri() #1517

Various suspected issues with the absoluteURI path in DefaultRequest.getUri() #1517

Comments

thomaslee commented Apr 30, 2020

URI.create(...) rejects imperfectly encoded URLs

Bad encoding for uri when the input absoluteURI is correct

Fragment "support" should probably be removed

WIP patch

Various suspected issues with the `absoluteURI` path in `DefaultRequest.getUri()` #1517

Various suspected issues with the `absoluteURI` path in `DefaultRequest.getUri()` #1517

`URI.create(...)` rejects imperfectly encoded URLs

Bad encoding for `uri` when the input `absoluteURI` is correct