HLRC: Fix '+' Not Correctly Encoded in GET Req. #33164

original-brownbear · 2018-08-27T11:55:31Z

Encode + correctly as %2B
Closes Elastic Search high level client not encoding + symbol properly in Get Request #33077

* Encode `+` correctly as `%2B` * Closes elastic#33077

elasticmachine · 2018-08-27T11:55:32Z

Pinging @elastic/es-core-infra

jasontedor

I left a question.

jasontedor · 2018-08-27T13:06:32Z

client/rest-high-level/src/main/java/org/elasticsearch/client/RequestConverters.java

@@ -1597,7 +1597,7 @@ private static String encodePart(String pathPart) {
                //paths that start with `-` or contain `:`
                URI uri = new URI(null, null, null, -1, "/" + pathPart, null, null);
                //manually encode any slash that each part may contain
-                return uri.getRawPath().substring(1).replaceAll("/", "%2F");
+                return uri.getRawPath().substring(1).replaceAll("/", "%2F").replaceAll("\\+", "%2B");


Why are we doing this manually and piecemeal instead of using URLEncoder?

This is a very peculiar method. See the comments around it for some history, corresponding test here (there's a test specifically for the '+' character) and original PR including review comments. I am not even sure that we need to automatically encode '+', what needs to be encoded should already be encoded and that is why we create the URI instance to encode each part. As far as I remember, path parts are subject to different encoding compared to query_string parameters, and that is what makes this part of the client tricky (also why we don't use URLEncoder).

"what needs to be encoded should already be encoded and that is why we create the URI instance to encode each part."

That cannot be accurate. If I encode the data before calling this, is going to be re-encoded.
So this code should be the responsible for encoding and it should happen only once.

Why are you using URI ? and Not URLEncoder?

javanna · 2018-08-27T13:25:00Z

client/rest-high-level/src/test/java/org/elasticsearch/client/RequestConvertersTests.java

        }
        {
            EndpointBuilder endpointBuilder = new EndpointBuilder().addPathPart("foo+bar");
-            assertEquals("/foo+bar", endpointBuilder.build());
+            assertEquals("/foo%2Bbar", endpointBuilder.build());


I remember adding this test to make sure that such a change would not sneak in unnoticed. Let's try and figure out whether this is needed. Isn't '+' an accepted character within a path part? I didn't try to repro the original issue but my guts feeling is that we should try and and understand better what the original issue is.

@javanna yea now that you mention it, I think the problem is with the code that handles the request, not the request itself.
+ doesn't need to be encoded in a path component.

=> this PR and the issue seem to be missing the actual problem?

A + does not need to be encoded by the specifications indeed. However, the issue here is that we treat + specially, converting it to a . This is because some browsers encode spaces as +. See:

elasticsearch/server/src/main/java/org/elasticsearch/rest/RestUtils.java

Lines 166 to 168 in f7a9186

case '+':

buf[pos++] = ' '; // "+" -> " "

break;

I think the actual problem is that if you have an ID that contains a + and you try to use the HLRC to get a document by that then since we convert + to space server side then you can not get that document by ID unless you also encode the +.

@jasontedor yes exactly that is the issue. So do we do the encoding/replacing here to account for the non-standard server behaviour?

interesting, is this something that we want to fix/change on the server-side rather than adapting the client then?

Yes, I think it is worth us making that change although it's a breaking one so we have to be careful and age it in slowly, and maybe provide a system property for BWC for a time period.

Section 2.4 of RFC 3986 is germane here. Since we never treat + as a sub-component delimiter (unlike, e.g., &) we do not need to require that it be encoded when it is to be treated as data. Treating a plus as a space is only required by the specification in the query string component of application/x-www-form-urlencoded requests. Since we do not handle this media type for any requests with bodies anyway, we never need to treat + as . Our handling here is just odd, I think it's safe for us to remove this legacy behavior.

Im +1 to @jasontedor logic here.

jasontedor

I think that we should start with a deprecation in 6.x, and probably a system property introduced there to cut over to the new behavior immediately. Then we can work to remove this in master. We also need to let the reporter on the original issue know how to workaround for today. Finally, we should consider opening an issue with this as our plan for feedback from the community in case there is a compelling reason that we are collectively missing for making this change. An issue will give it more visibility than a PR discussion where the title does not really reveal what we are planning to do here now.

original-brownbear · 2018-09-03T15:58:43Z

@jasontedor Looks like there already is an issue for this: #5341

casper1149 · 2018-09-05T23:06:13Z

I also noticed that the IndexRequest incorrectly handles the "+" symbol, is it going to be fixed as a part of this PR or there is a separate one for the IndexRequest?

UPDATE:
ok, seems like this PR updates the RequestConverters and so both Index and Get requests will be fixed

Selikoff · 2018-11-01T17:09:38Z

Can someone decide on a fix and merge the code? I'm experiencing this exact issue where if I sent a + it is converted to space in the key and breaking linkage between servers. I'm not sure what the correct solution is, but it's definitely broken as is.

original-brownbear · 2018-11-25T15:46:31Z

Jenkins test this

original-brownbear · 2018-11-26T08:28:57Z

...rity/src/main/java/org/elasticsearch/xpack/security/authc/saml/SamlLogoutRequestHandler.java

+            return new Result(
+                logoutRequest.getID(), SamlNameId.fromXml(getNameID(logoutRequest)),
+                getSessionIndex(logoutRequest),
+                relayState == null ? null : URLDecoder.decode(relayState, StandardCharsets.US_ASCII.name())


This is kind of hacky and I wonder what the correct fix is here. The problem is that we use URLEncode.encode to encode this String but use our RestUtils to decode it when we parse the params from the URL.
This worked out symmetrically when we were handling the + sign as a space but won't be symmetric when we don't causing tests to fail that put a space in relayState.
=> makes me wonder if it's even correct to use our RestUtils parser here?

I read some brief SO/docs on RelayState and it seems like we should not be doing decoding/encoding on it, but instead storing it as an "opaque object"... I think that it might be a correct assumption to not use the decodeQueryString here, but I dont know if we should take my word as gospel. Ill let @jasontedor comment further.

From what I have read, URL encoding is correct here, using RestUtils to decode looks wrong, and so the change here is valid. Let us validate this with @tvernum though.

I don't think is correct.
This string is already (mostly) decoded in parseQueryStringAndValidateSignature above, so any %-encoded entities will be expanded at that point.
If we decode here we will be double decoding.

Specifically the RelayState of 99%3a would be encoded on the wire as RelayState=99%253a.
Then parseQueryStringAndValidateSignature will parse that back to 99%3a
If we then try to UrlDecoder.decode that, it would end up at 99: which is incorrect.

I'll need to look into what we ought to be doing with +, but I don't think this change is correct.

Thanks @tvernum I think I figured this out actually and was able to revert this change.

The problem actually came from the assumption that we made here #33164 (comment) (never handling form data). It turns out that the form data encoding logic (+ -> ) applies to parameters in the URL. I now made the changes to the logic to decode + to a space in query params only and this test passes again without any changes to the code here :)

original-brownbear · 2018-11-26T08:31:01Z

@jasontedor @hub-cap sorry for the long pause here, brought this one back to life now :)
I added a system property to make the new behaviour optional but ran into a bit of a tricky spot with the SAML auth code. Maybe you guys can take a look and/or know more about what the correct approach is there? :)

original-brownbear · 2019-06-24T09:36:21Z

@jasontedor no worries, there is actually a silver lining to this delay :) (we'd have merged a pretty serious bug here without it I think)
See here: #33164 (comment)

This wasn't visible in our tests when I last worked on this, but now we have a few new REST tests that actually run search in line queries with spaces in them and those failed with the changes here. Adjusting the logic to properly encode/decode spaces as + and vice versa in query parameters fixed this. We can get back to it when you're back I guess :)

jasontedor

LGTM.

original-brownbear · 2019-07-15T06:08:53Z

Thanks @jasontedor !

* HLRC: Fix '+' Not Correctly Encoded in GET Req. * Encode `+` correctly as `%2B` in URL paths * Keep encoding `+` as space in URL parameters * Closes elastic#33077

* This was accidentally left at `7.3` but elastic#33164 was merged too late and it should now be `7.4`

* This was accidentally left at `7.3` but #33164 was merged too late and it should now be `7.4`

* HLRC: Fix '+' Not Correctly Encoded in GET Req. * Encode `+` correctly as `%2B` in URL paths * Keep encoding `+` as space in URL parameters * Closes #33077

* HLRC: Fix '+' Not Correctly Encoded in GET Req. * Encode `+` correctly as `%2B` in URL paths * Keep encoding `+` as space in URL parameters * Closes elastic#33077

* This was accidentally left at `7.3` but elastic#33164 was merged too late and it should now be `7.4`

* HLRC: Fix '+' Not Correctly Encoded in GET Req. * Encode `+` correctly as `%2B` in URL paths * Keep encoding `+` as space in URL parameters * Closes elastic#33077

* This was accidentally left at `7.3` but elastic#33164 was merged too late and it should now be `7.4`

falu2010-netflix · 2020-01-14T19:00:20Z

Is there a fix rolled out in older version of client 5.6.x? If not how can I deal with it?

original-brownbear · 2020-01-14T19:15:36Z

@falu2010-netflix I'm afraid this fix will not be back-ported to older versions.
As a workaround, encoding + signs in doc ids as %2B is you best option I think.

HLRC: Fix '+' Not Correctly Encoded in GET Req.

d607643

* Encode `+` correctly as `%2B` * Closes elastic#33077

original-brownbear added >bug v7.0.0 :Core/Java High Level REST Client v6.5.0 labels Aug 27, 2018

jasontedor requested changes Aug 27, 2018

View reviewed changes

jasontedor requested a review from hub-cap August 27, 2018 13:06

Merge remote-tracking branch 'elastic/master' into 33077

127aa1c

javanna reviewed Aug 27, 2018

View reviewed changes

original-brownbear added 4 commits August 27, 2018 16:26

Merge remote-tracking branch 'elastic/master' into 33077

f3e9208

Remove server side handling of + as ' '

c16be99

fix test

ce12e8d

Merge remote-tracking branch 'elastic/master' into 33077

e9862d9

jasontedor requested changes Aug 28, 2018

View reviewed changes

Merge remote-tracking branch 'elastic/master' into 33077

906d51e

jasontedor mentioned this pull request Sep 5, 2018

GetRequest can not get a document when "id" contains a special character #33445

Closed

colings86 added v6.6.0 and removed v6.5.0 labels Oct 25, 2018

original-brownbear added 3 commits November 25, 2018 13:15

Merge remote-tracking branch 'elastic/master' into 33077

9da77eb

CR: Add system property for decoding + as space or not

33ac4c8

fix typo

88661c1

original-brownbear added 2 commits November 25, 2018 19:40

hack around encoding asymetry in saml

1d59ba8

Merge remote-tracking branch 'elastic' into 33077

20ee2f8

original-brownbear commented Nov 26, 2018

View reviewed changes

original-brownbear added 7 commits June 23, 2019 10:18

CR: Add deprecation/breaking changes note

0f8be02

Merge remote-tracking branch 'elastic/master' into 33077

c792bda

Keep encoding + as space in query strings

357e801

Merge remote-tracking branch 'elastic/master' into 33077

fbe3323

correctly encode jobid in test

d0f38a1

Merge remote-tracking branch 'elastic/master' into 33077

d34ae83

CR:revert SAML change

94b2ba9

jpountz added v7.4.0 and removed v7.3.0 labels Jul 5, 2019

jasontedor approved these changes Jul 15, 2019

View reviewed changes

original-brownbear merged commit fe2a870 into elastic:master Jul 15, 2019

original-brownbear deleted the 33077 branch July 15, 2019 06:09

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jul 15, 2019

Fix Incorrect Version in Migration Docs

9dc0efb

* This was accidentally left at `7.3` but elastic#33164 was merged too late and it should now be `7.4`

This was referenced Jul 15, 2019

Fix Incorrect Version in Migration Docs #44325

Merged

HLRC: Fix '+' Not Correctly Encoded in GET Req. (#33164) #44324

Merged

original-brownbear added a commit that referenced this pull request Jul 15, 2019

Fix Incorrect Version in Migration Docs (#44325)

58aae32

* This was accidentally left at `7.3` but #33164 was merged too late and it should now be `7.4`

michalperlak pushed a commit to michalperlak/elasticsearch that referenced this pull request Jul 16, 2019

Fix Incorrect Version in Migration Docs (elastic#44325)

6e21f52

* This was accidentally left at `7.3` but elastic#33164 was merged too late and it should now be `7.4`

polyfractal pushed a commit to polyfractal/elasticsearch that referenced this pull request Jul 29, 2019

$@polyfractal$

Fix Incorrect Version in Migration Docs (elastic#44325)

92aa5c8

* This was accidentally left at `7.3` but elastic#33164 was merged too late and it should now be `7.4`

original-brownbear mentioned this pull request Aug 8, 2019

Rest: Path components with plus signs are decoded wrong #5341

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

jrodewig mentioned this pull request Sep 11, 2021

[DOCS] Fix docs for removed system properties #77601

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HLRC: Fix '+' Not Correctly Encoded in GET Req. #33164

HLRC: Fix '+' Not Correctly Encoded in GET Req. #33164

original-brownbear commented Aug 27, 2018

elasticmachine commented Aug 27, 2018

jasontedor left a comment

jasontedor Aug 27, 2018

javanna Aug 27, 2018

jloyolask8 Jan 6, 2019

javanna Aug 27, 2018

original-brownbear Aug 27, 2018

jasontedor Aug 27, 2018

jasontedor Aug 27, 2018

original-brownbear Aug 27, 2018 •

edited

javanna Aug 27, 2018

jasontedor Aug 27, 2018

jasontedor Aug 27, 2018

hub-cap Aug 27, 2018

jasontedor left a comment

original-brownbear commented Sep 3, 2018

casper1149 commented Sep 5, 2018 •

edited

Selikoff commented Nov 1, 2018

original-brownbear commented Nov 25, 2018

original-brownbear Nov 26, 2018

hub-cap Nov 26, 2018

jasontedor Jun 22, 2019

tvernum Jun 24, 2019

original-brownbear Jun 24, 2019

original-brownbear commented Nov 26, 2018

original-brownbear commented Jun 24, 2019 •

edited

jasontedor left a comment

original-brownbear commented Jul 15, 2019

falu2010-netflix commented Jan 14, 2020

original-brownbear commented Jan 14, 2020

HLRC: Fix '+' Not Correctly Encoded in GET Req. #33164

HLRC: Fix '+' Not Correctly Encoded in GET Req. #33164

Conversation

original-brownbear commented Aug 27, 2018

elasticmachine commented Aug 27, 2018

jasontedor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

original-brownbear Aug 27, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jasontedor left a comment

Choose a reason for hiding this comment

original-brownbear commented Sep 3, 2018

casper1149 commented Sep 5, 2018 • edited

Selikoff commented Nov 1, 2018

original-brownbear commented Nov 25, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

original-brownbear commented Nov 26, 2018

original-brownbear commented Jun 24, 2019 • edited

jasontedor left a comment

Choose a reason for hiding this comment

original-brownbear commented Jul 15, 2019

falu2010-netflix commented Jan 14, 2020

original-brownbear commented Jan 14, 2020

original-brownbear Aug 27, 2018 •

edited

casper1149 commented Sep 5, 2018 •

edited

original-brownbear commented Jun 24, 2019 •

edited