New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HLRC: Fix '+' Not Correctly Encoded in GET Req. #33164
Conversation
* Encode `+` correctly as `%2B` * Closes elastic#33077
Pinging @elastic/es-core-infra |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a question.
@@ -1597,7 +1597,7 @@ private static String encodePart(String pathPart) { | |||
//paths that start with `-` or contain `:` | |||
URI uri = new URI(null, null, null, -1, "/" + pathPart, null, null); | |||
//manually encode any slash that each part may contain | |||
return uri.getRawPath().substring(1).replaceAll("/", "%2F"); | |||
return uri.getRawPath().substring(1).replaceAll("/", "%2F").replaceAll("\\+", "%2B"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we doing this manually and piecemeal instead of using URLEncoder
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very peculiar method. See the comments around it for some history, corresponding test here (there's a test specifically for the '+' character) and original PR including review comments. I am not even sure that we need to automatically encode '+', what needs to be encoded should already be encoded and that is why we create the URI instance to encode each part. As far as I remember, path parts are subject to different encoding compared to query_string parameters, and that is what makes this part of the client tricky (also why we don't use URLEncoder
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"what needs to be encoded should already be encoded and that is why we create the URI instance to encode each part."
That cannot be accurate. If I encode the data before calling this, is going to be re-encoded.
So this code should be the responsible for encoding and it should happen only once.
Why are you using URI ? and Not URLEncoder?
} | ||
{ | ||
EndpointBuilder endpointBuilder = new EndpointBuilder().addPathPart("foo+bar"); | ||
assertEquals("/foo+bar", endpointBuilder.build()); | ||
assertEquals("/foo%2Bbar", endpointBuilder.build()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember adding this test to make sure that such a change would not sneak in unnoticed. Let's try and figure out whether this is needed. Isn't '+' an accepted character within a path part? I didn't try to repro the original issue but my guts feeling is that we should try and and understand better what the original issue is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@javanna yea now that you mention it, I think the problem is with the code that handles the request, not the request itself.
+
doesn't need to be encoded in a path component.
=> this PR and the issue seem to be missing the actual problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A +
does not need to be encoded by the specifications indeed. However, the issue here is that we treat +
specially, converting it to a
. This is because some browsers encode spaces as +
. See:
elasticsearch/server/src/main/java/org/elasticsearch/rest/RestUtils.java
Lines 166 to 168 in f7a9186
case '+': | |
buf[pos++] = ' '; // "+" -> " " | |
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the actual problem is that if you have an ID that contains a +
and you try to use the HLRC to get a document by that then since we convert +
to space server side then you can not get that document by ID unless you also encode the +
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jasontedor yes exactly that is the issue. So do we do the encoding/replacing here to account for the non-standard server behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting, is this something that we want to fix/change on the server-side rather than adapting the client then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it is worth us making that change although it's a breaking one so we have to be careful and age it in slowly, and maybe provide a system property for BWC for a time period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Section 2.4 of RFC 3986 is germane here. Since we never treat +
as a sub-component delimiter (unlike, e.g., &
) we do not need to require that it be encoded when it is to be treated as data. Treating a plus as a space is only required by the specification in the query string component of application/x-www-form-urlencoded
requests. Since we do not handle this media type for any requests with bodies anyway, we never need to treat +
as
. Our handling here is just odd, I think it's safe for us to remove this legacy behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im +1 to @jasontedor logic here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should start with a deprecation in 6.x, and probably a system property introduced there to cut over to the new behavior immediately. Then we can work to remove this in master. We also need to let the reporter on the original issue know how to workaround for today. Finally, we should consider opening an issue with this as our plan for feedback from the community in case there is a compelling reason that we are collectively missing for making this change. An issue will give it more visibility than a PR discussion where the title does not really reveal what we are planning to do here now.
@jasontedor Looks like there already is an issue for this: #5341 |
I also noticed that the IndexRequest incorrectly handles the "+" symbol, is it going to be fixed as a part of this PR or there is a separate one for the IndexRequest? UPDATE: |
Can someone decide on a fix and merge the code? I'm experiencing this exact issue where if I sent a + it is converted to space in the key and breaking linkage between servers. I'm not sure what the correct solution is, but it's definitely broken as is. |
Jenkins test this |
return new Result( | ||
logoutRequest.getID(), SamlNameId.fromXml(getNameID(logoutRequest)), | ||
getSessionIndex(logoutRequest), | ||
relayState == null ? null : URLDecoder.decode(relayState, StandardCharsets.US_ASCII.name()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of hacky and I wonder what the correct fix is here. The problem is that we use URLEncode.encode
to encode this String
but use our RestUtils
to decode it when we parse the params from the URL.
This worked out symmetrically when we were handling the +
sign as a space but won't be symmetric when we don't causing tests to fail that put a space in relayState
.
=> makes me wonder if it's even correct to use our RestUtils
parser here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read some brief SO/docs on RelayState and it seems like we should not be doing decoding/encoding on it, but instead storing it as an "opaque object"... I think that it might be a correct assumption to not use the decodeQueryString
here, but I dont know if we should take my word as gospel. Ill let @jasontedor comment further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I have read, URL encoding is correct here, using RestUtils
to decode looks wrong, and so the change here is valid. Let us validate this with @tvernum though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think is correct.
This string is already (mostly) decoded in parseQueryStringAndValidateSignature
above, so any %-encoded entities will be expanded at that point.
If we decode
here we will be double decoding.
Specifically the RelayState of 99%3a
would be encoded on the wire as RelayState=99%253a
.
Then parseQueryStringAndValidateSignature
will parse that back to 99%3a
If we then try to UrlDecoder.decode
that, it would end up at 99:
which is incorrect.
I'll need to look into what we ought to be doing with +
, but I don't think this change is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tvernum I think I figured this out actually and was able to revert this change.
The problem actually came from the assumption that we made here #33164 (comment) (never handling form data). It turns out that the form data encoding logic (+
->
) applies to parameters in the URL. I now made the changes to the logic to decode +
to a space in query params only and this test passes again without any changes to the code here :)
@jasontedor @hub-cap sorry for the long pause here, brought this one back to life now :) |
@jasontedor no worries, there is actually a silver lining to this delay :) (we'd have merged a pretty serious bug here without it I think) This wasn't visible in our tests when I last worked on this, but now we have a few new REST tests that actually run search in line queries with spaces in them and those failed with the changes here. Adjusting the logic to properly encode/decode spaces as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks @jasontedor ! |
* HLRC: Fix '+' Not Correctly Encoded in GET Req. * Encode `+` correctly as `%2B` in URL paths * Keep encoding `+` as space in URL parameters * Closes elastic#33077
* This was accidentally left at `7.3` but elastic#33164 was merged too late and it should now be `7.4`
* This was accidentally left at `7.3` but #33164 was merged too late and it should now be `7.4`
* HLRC: Fix '+' Not Correctly Encoded in GET Req. * Encode `+` correctly as `%2B` in URL paths * Keep encoding `+` as space in URL parameters * Closes elastic#33077
* This was accidentally left at `7.3` but elastic#33164 was merged too late and it should now be `7.4`
* HLRC: Fix '+' Not Correctly Encoded in GET Req. * Encode `+` correctly as `%2B` in URL paths * Keep encoding `+` as space in URL parameters * Closes elastic#33077
* This was accidentally left at `7.3` but elastic#33164 was merged too late and it should now be `7.4`
Is there a fix rolled out in older version of client 5.6.x? If not how can I deal with it? |
@falu2010-netflix I'm afraid this fix will not be back-ported to older versions. |
+
correctly as%2B