Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HLRC: Fix '+' Not Correctly Encoded in GET Req. #33164

Merged
merged 24 commits into from Jul 15, 2019

Conversation

original-brownbear
Copy link
Member

* Encode `+` correctly as `%2B`
* Closes elastic#33077
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a question.

@@ -1597,7 +1597,7 @@ private static String encodePart(String pathPart) {
//paths that start with `-` or contain `:`
URI uri = new URI(null, null, null, -1, "/" + pathPart, null, null);
//manually encode any slash that each part may contain
return uri.getRawPath().substring(1).replaceAll("/", "%2F");
return uri.getRawPath().substring(1).replaceAll("/", "%2F").replaceAll("\\+", "%2B");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we doing this manually and piecemeal instead of using URLEncoder?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very peculiar method. See the comments around it for some history, corresponding test here (there's a test specifically for the '+' character) and original PR including review comments. I am not even sure that we need to automatically encode '+', what needs to be encoded should already be encoded and that is why we create the URI instance to encode each part. As far as I remember, path parts are subject to different encoding compared to query_string parameters, and that is what makes this part of the client tricky (also why we don't use URLEncoder).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"what needs to be encoded should already be encoded and that is why we create the URI instance to encode each part."

That cannot be accurate. If I encode the data before calling this, is going to be re-encoded.
So this code should be the responsible for encoding and it should happen only once.

Why are you using URI ? and Not URLEncoder?

}
{
EndpointBuilder endpointBuilder = new EndpointBuilder().addPathPart("foo+bar");
assertEquals("/foo+bar", endpointBuilder.build());
assertEquals("/foo%2Bbar", endpointBuilder.build());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember adding this test to make sure that such a change would not sneak in unnoticed. Let's try and figure out whether this is needed. Isn't '+' an accepted character within a path part? I didn't try to repro the original issue but my guts feeling is that we should try and and understand better what the original issue is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@javanna yea now that you mention it, I think the problem is with the code that handles the request, not the request itself.
+ doesn't need to be encoded in a path component.

=> this PR and the issue seem to be missing the actual problem?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A + does not need to be encoded by the specifications indeed. However, the issue here is that we treat + specially, converting it to a . This is because some browsers encode spaces as +. See:

case '+':
buf[pos++] = ' '; // "+" -> " "
break;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the actual problem is that if you have an ID that contains a + and you try to use the HLRC to get a document by that then since we convert + to space server side then you can not get that document by ID unless you also encode the +.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasontedor yes exactly that is the issue. So do we do the encoding/replacing here to account for the non-standard server behaviour?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, is this something that we want to fix/change on the server-side rather than adapting the client then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it is worth us making that change although it's a breaking one so we have to be careful and age it in slowly, and maybe provide a system property for BWC for a time period.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Section 2.4 of RFC 3986 is germane here. Since we never treat + as a sub-component delimiter (unlike, e.g., &) we do not need to require that it be encoded when it is to be treated as data. Treating a plus as a space is only required by the specification in the query string component of application/x-www-form-urlencoded requests. Since we do not handle this media type for any requests with bodies anyway, we never need to treat + as . Our handling here is just odd, I think it's safe for us to remove this legacy behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im +1 to @jasontedor logic here.

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should start with a deprecation in 6.x, and probably a system property introduced there to cut over to the new behavior immediately. Then we can work to remove this in master. We also need to let the reporter on the original issue know how to workaround for today. Finally, we should consider opening an issue with this as our plan for feedback from the community in case there is a compelling reason that we are collectively missing for making this change. An issue will give it more visibility than a PR discussion where the title does not really reveal what we are planning to do here now.

@original-brownbear
Copy link
Member Author

@jasontedor Looks like there already is an issue for this: #5341

@casper1149
Copy link

casper1149 commented Sep 5, 2018

I also noticed that the IndexRequest incorrectly handles the "+" symbol, is it going to be fixed as a part of this PR or there is a separate one for the IndexRequest?

UPDATE:
ok, seems like this PR updates the RequestConverters and so both Index and Get requests will be fixed

@colings86 colings86 added v6.6.0 and removed v6.5.0 labels Oct 25, 2018
@Selikoff
Copy link

Selikoff commented Nov 1, 2018

Can someone decide on a fix and merge the code? I'm experiencing this exact issue where if I sent a + it is converted to space in the key and breaking linkage between servers. I'm not sure what the correct solution is, but it's definitely broken as is.

@original-brownbear
Copy link
Member Author

Jenkins test this

return new Result(
logoutRequest.getID(), SamlNameId.fromXml(getNameID(logoutRequest)),
getSessionIndex(logoutRequest),
relayState == null ? null : URLDecoder.decode(relayState, StandardCharsets.US_ASCII.name())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of hacky and I wonder what the correct fix is here. The problem is that we use URLEncode.encode to encode this String but use our RestUtils to decode it when we parse the params from the URL.
This worked out symmetrically when we were handling the + sign as a space but won't be symmetric when we don't causing tests to fail that put a space in relayState.
=> makes me wonder if it's even correct to use our RestUtils parser here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read some brief SO/docs on RelayState and it seems like we should not be doing decoding/encoding on it, but instead storing it as an "opaque object"... I think that it might be a correct assumption to not use the decodeQueryString here, but I dont know if we should take my word as gospel. Ill let @jasontedor comment further.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I have read, URL encoding is correct here, using RestUtils to decode looks wrong, and so the change here is valid. Let us validate this with @tvernum though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think is correct.
This string is already (mostly) decoded in parseQueryStringAndValidateSignature above, so any %-encoded entities will be expanded at that point.
If we decode here we will be double decoding.

Specifically the RelayState of 99%3a would be encoded on the wire as RelayState=99%253a.
Then parseQueryStringAndValidateSignature will parse that back to 99%3a
If we then try to UrlDecoder.decode that, it would end up at 99: which is incorrect.

I'll need to look into what we ought to be doing with +, but I don't think this change is correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tvernum I think I figured this out actually and was able to revert this change.

The problem actually came from the assumption that we made here #33164 (comment) (never handling form data). It turns out that the form data encoding logic (+ -> ) applies to parameters in the URL. I now made the changes to the logic to decode + to a space in query params only and this test passes again without any changes to the code here :)

@original-brownbear
Copy link
Member Author

@jasontedor @hub-cap sorry for the long pause here, brought this one back to life now :)
I added a system property to make the new behaviour optional but ran into a bit of a tricky spot with the SAML auth code. Maybe you guys can take a look and/or know more about what the correct approach is there? :)

@original-brownbear
Copy link
Member Author

original-brownbear commented Jun 24, 2019

@jasontedor no worries, there is actually a silver lining to this delay :) (we'd have merged a pretty serious bug here without it I think)
See here: #33164 (comment)

This wasn't visible in our tests when I last worked on this, but now we have a few new REST tests that actually run search in line queries with spaces in them and those failed with the changes here. Adjusting the logic to properly encode/decode spaces as + and vice versa in query parameters fixed this. We can get back to it when you're back I guess :)

@jpountz jpountz added v7.4.0 and removed v7.3.0 labels Jul 5, 2019
Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@original-brownbear
Copy link
Member Author

Thanks @jasontedor !

@original-brownbear original-brownbear merged commit fe2a870 into elastic:master Jul 15, 2019
@original-brownbear original-brownbear deleted the 33077 branch July 15, 2019 06:09
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jul 15, 2019
* HLRC: Fix '+' Not Correctly Encoded in GET Req.

* Encode `+` correctly as `%2B` in URL paths
* Keep encoding `+` as space in URL parameters
* Closes elastic#33077
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jul 15, 2019
* This was accidentally left at `7.3` but elastic#33164 was merged too late and it should now be `7.4`
original-brownbear added a commit that referenced this pull request Jul 15, 2019
* This was accidentally left at `7.3` but #33164 was merged too late and it should now be `7.4`
original-brownbear added a commit that referenced this pull request Jul 15, 2019
* HLRC: Fix '+' Not Correctly Encoded in GET Req.

* Encode `+` correctly as `%2B` in URL paths
* Keep encoding `+` as space in URL parameters
* Closes #33077
michalperlak pushed a commit to michalperlak/elasticsearch that referenced this pull request Jul 16, 2019
* HLRC: Fix '+' Not Correctly Encoded in GET Req.

* Encode `+` correctly as `%2B` in URL paths
* Keep encoding `+` as space in URL parameters
* Closes elastic#33077
michalperlak pushed a commit to michalperlak/elasticsearch that referenced this pull request Jul 16, 2019
* This was accidentally left at `7.3` but elastic#33164 was merged too late and it should now be `7.4`
polyfractal pushed a commit to polyfractal/elasticsearch that referenced this pull request Jul 29, 2019
* HLRC: Fix '+' Not Correctly Encoded in GET Req.

* Encode `+` correctly as `%2B` in URL paths
* Keep encoding `+` as space in URL parameters
* Closes elastic#33077
polyfractal pushed a commit to polyfractal/elasticsearch that referenced this pull request Jul 29, 2019
* This was accidentally left at `7.3` but elastic#33164 was merged too late and it should now be `7.4`
@falu2010-netflix
Copy link

Is there a fix rolled out in older version of client 5.6.x? If not how can I deal with it?

@original-brownbear
Copy link
Member Author

@falu2010-netflix I'm afraid this fix will not be back-ported to older versions.
As a workaround, encoding + signs in doc ids as %2B is you best option I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Elastic Search high level client not encoding + symbol properly in Get Request