You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Below I differentiate between absoluteURI URIs (e.g. http://foo.com/path?bar=baz) and abs_path URIs (e.g. /path?bar=baz). These names were taken from the HTTP 1.1 spec.
The first time it's called for a given DefaultRequest instance, getUri() seems to either accept an abs_path URI unmodified or to convert an absoluteURI style URI into something resembling an abs_path. In the latter case it attempts to parse the absoluteURI using URI.create(rawUri) to separate the path, query and fragment components and then reconstructs the URI from those components.
There seems to be a few issues with aspects of this.
URI.create(...) rejects imperfectly encoded URLs
This one is hurting us the most because it's hard to work around, and it's also probably the most controversial point.
A client-side query string encoding bug is preventing a ratpack proxy from accepting requests with non-conformant query strings even though the backend service (and other proxies) can accept and process the exact same request when the ratpack proxy is bypassed.
The reason for this is if the absoluteURI is imperfectly encoded, URI.create(...) will throw and cause a HTTP 500 error. In our case, /some/path?foo={bar} would happily accepted by the backend service but causes a 500 in ratpack due to an IllegalArgumentException due to the unencoded {/} characters.
Such characters are strictly-speaking "not allowed" per the spec but are often happily accepted by HTTP servers. I think ratpack could loosen its URI parsing logic a little and allow these requests through to the backend to decide if they should be rejected or not.
Bad encoding for uri when the input absoluteURI is correct
Sort of the "reverse" of the above issue: if URI.create(...) succeeds, getUri() will then try to reconstruct an abs_path-style URI from the absoluteURI it was given. It uses the decoded values for the query and fragment in the reconstructed abs_path. The resulting abs_path URI is not properly encoded as a result. This isn't causing trouble for us, but it's not hard to imagine how it might cause confusion.
This is trivially seen with a test containing percent-encoded characters. IMO the following two inputs should give the same output for uri, but that's not the case:
A well-behaved client will not typically pass the fragment along to the server, but the absoluteURI path in getUri() will try to preserve the #fragment part of a URI for some reason. Adding tests for fragments seems to show that preserving the fragment will break things like getQuery and getPath anyway, so I question whether this support is well-vetted or valuable.
Fragments are also seemingly not part of Request-URI in the HTTP spec afaict, but given I'm arguing for less strict parsing elsewhere I'm not going to use that as an argument here. 😉
WIP patch
I have a patch I believe would address most/all of it by loosening up the parsing of absoluteURIs a bit and skipping virtually all of the decoding, but wanted to raise this here first given the finer details are probably up for debate.
The text was updated successfully, but these errors were encountered:
thomaslee
added a commit
to thomaslee/ratpack
that referenced
this issue
May 14, 2020
It helps to see Request-URI per the HTTP spec to understand this
change:
Request-URI = "*" | absoluteURI | abs_path | authority
Previously abs_paths would be passed through unmodified while
absoluteURIs were parsed and either rejected because of malformed
URIs (triggering HTTP 500s) or accepted but with subtle encoding
issues.
"*" was also not properly handled by getPath(). There were also
other implementation-specific quirks like "uri:foo" yielding a
path consisting of the "null" string literal.
The "authority-only" case doesn't need to be handled because
Ratpack does not support the CONNECT method.
This change leaves abs_paths alone but tries to make all the
other cases more consistent with one another. When in doubt
we try to do something predictable. This does, however,
change the semantics of a few edge cases. I have kept the
commit implementing tests independent of the key change
here so the previous semantics and the new semantics can
be compared.
This is the proposed fix/work-around for ratpack#1517.
Below I differentiate between
absoluteURI
URIs (e.g.http://foo.com/path?bar=baz
) andabs_path
URIs (e.g./path?bar=baz
). These names were taken from the HTTP 1.1 spec.The first time it's called for a given
DefaultRequest
instance,getUri()
seems to either accept anabs_path
URI unmodified or to convert anabsoluteURI
style URI into something resembling anabs_path
. In the latter case it attempts to parse theabsoluteURI
usingURI.create(rawUri)
to separate the path, query and fragment components and then reconstructs the URI from those components.There seems to be a few issues with aspects of this.
URI.create(...)
rejects imperfectly encoded URLsThis one is hurting us the most because it's hard to work around, and it's also probably the most controversial point.
A client-side query string encoding bug is preventing a ratpack proxy from accepting requests with non-conformant query strings even though the backend service (and other proxies) can accept and process the exact same request when the ratpack proxy is bypassed.
The reason for this is if the
absoluteURI
is imperfectly encoded,URI.create(...)
will throw and cause a HTTP 500 error. In our case,/some/path?foo={bar}
would happily accepted by the backend service but causes a 500 in ratpack due to anIllegalArgumentException
due to the unencoded{
/}
characters.Such characters are strictly-speaking "not allowed" per the spec but are often happily accepted by HTTP servers. I think ratpack could loosen its URI parsing logic a little and allow these requests through to the backend to decide if they should be rejected or not.
Bad encoding for
uri
when the inputabsoluteURI
is correctSort of the "reverse" of the above issue: if
URI.create(...)
succeeds,getUri()
will then try to reconstruct anabs_path
-style URI from theabsoluteURI
it was given. It uses the decoded values for the query and fragment in the reconstructedabs_path
. The resultingabs_path
URI is not properly encoded as a result. This isn't causing trouble for us, but it's not hard to imagine how it might cause confusion.This is trivially seen with a test containing percent-encoded characters. IMO the following two inputs should give the same output for
uri
, but that's not the case:Fragment "support" should probably be removed
A well-behaved client will not typically pass the fragment along to the server, but the
absoluteURI
path ingetUri()
will try to preserve the#fragment
part of a URI for some reason. Adding tests for fragments seems to show that preserving the fragment will break things likegetQuery
andgetPath
anyway, so I question whether this support is well-vetted or valuable.Fragments are also seemingly not part of
Request-URI
in the HTTP spec afaict, but given I'm arguing for less strict parsing elsewhere I'm not going to use that as an argument here. 😉WIP patch
I have a patch I believe would address most/all of it by loosening up the parsing of
absoluteURIs
a bit and skipping virtually all of the decoding, but wanted to raise this here first given the finer details are probably up for debate.The text was updated successfully, but these errors were encountered: