Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various suspected issues with the absoluteURI path in DefaultRequest.getUri() #1517

Open
thomaslee opened this issue Apr 30, 2020 · 0 comments

Comments

@thomaslee
Copy link

Below I differentiate between absoluteURI URIs (e.g. http://foo.com/path?bar=baz) and abs_path URIs (e.g. /path?bar=baz). These names were taken from the HTTP 1.1 spec.

The first time it's called for a given DefaultRequest instance, getUri() seems to either accept an abs_path URI unmodified or to convert an absoluteURI style URI into something resembling an abs_path. In the latter case it attempts to parse the absoluteURI using URI.create(rawUri) to separate the path, query and fragment components and then reconstructs the URI from those components.

There seems to be a few issues with aspects of this.

URI.create(...) rejects imperfectly encoded URLs

This one is hurting us the most because it's hard to work around, and it's also probably the most controversial point.

A client-side query string encoding bug is preventing a ratpack proxy from accepting requests with non-conformant query strings even though the backend service (and other proxies) can accept and process the exact same request when the ratpack proxy is bypassed.

The reason for this is if the absoluteURI is imperfectly encoded, URI.create(...) will throw and cause a HTTP 500 error. In our case, /some/path?foo={bar} would happily accepted by the backend service but causes a 500 in ratpack due to an IllegalArgumentException due to the unencoded {/} characters.

Such characters are strictly-speaking "not allowed" per the spec but are often happily accepted by HTTP servers. I think ratpack could loosen its URI parsing logic a little and allow these requests through to the backend to decide if they should be rejected or not.

Bad encoding for uri when the input absoluteURI is correct

Sort of the "reverse" of the above issue: if URI.create(...) succeeds, getUri() will then try to reconstruct an abs_path-style URI from the absoluteURI it was given. It uses the decoded values for the query and fragment in the reconstructed abs_path. The resulting abs_path URI is not properly encoded as a result. This isn't causing trouble for us, but it's not hard to imagine how it might cause confusion.

This is trivially seen with a test containing percent-encoded characters. IMO the following two inputs should give the same output for uri, but that's not the case:

# correct
/a/b?foo=%7babcdef%7d -> /a/b?foo=%7babcdef%7d

# incorrect
http://example.com/a/b?foo=%7babcdef%7d -> /a/b?foo={abcdef}

Fragment "support" should probably be removed

A well-behaved client will not typically pass the fragment along to the server, but the absoluteURI path in getUri() will try to preserve the #fragment part of a URI for some reason. Adding tests for fragments seems to show that preserving the fragment will break things like getQuery and getPath anyway, so I question whether this support is well-vetted or valuable.

Fragments are also seemingly not part of Request-URI in the HTTP spec afaict, but given I'm arguing for less strict parsing elsewhere I'm not going to use that as an argument here. 😉

WIP patch

I have a patch I believe would address most/all of it by loosening up the parsing of absoluteURIs a bit and skipping virtually all of the decoding, but wanted to raise this here first given the finer details are probably up for debate.

thomaslee added a commit to thomaslee/ratpack that referenced this issue May 14, 2020
It helps to see Request-URI per the HTTP spec to understand this
change:

Request-URI    = "*" | absoluteURI | abs_path | authority

Previously abs_paths would be passed through unmodified while
absoluteURIs were parsed and either rejected because of malformed
URIs (triggering HTTP 500s) or accepted but with subtle encoding
issues.

"*" was also not properly handled by getPath(). There were also
other implementation-specific quirks like "uri:foo" yielding a
path consisting of the "null" string literal.

The "authority-only" case doesn't need to be handled because
Ratpack does not support the CONNECT method.

This change leaves abs_paths alone but tries to make all the
other cases more consistent with one another. When in doubt
we try to do something predictable. This does, however,
change the semantics of a few edge cases. I have kept the
commit implementing tests independent of the key change
here so the previous semantics and the new semantics can
be compared.

This is the proposed fix/work-around for ratpack#1517.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant