New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spring is inconsistent in the encoding/decoding of URLs [SPR-16860] #21399

Closed
spring-issuemaster opened this Issue May 22, 2018 · 19 comments

Comments

Projects
None yet
2 participants
@spring-issuemaster
Copy link
Collaborator

spring-issuemaster commented May 22, 2018

J. Pablo Fernández opened SPR-16860 and commented

I have a Spring Boot server application and a client that uses RestTemplate to talk to it. I'm finding that RestTemplate doesn't encode plus signs ("+") while Spring decodes them as spaces.

For example, this snippet of code:

String foo = "fo+o";
String bar = "ba r";
restTemplate.exchange("http://example.com/?foo={foo}&bar={bar}", HttpMethod.GET, null, foo, bar)

will make a request to

http://example.com/?foo=fo+o&bar=ba%20r

which means, inside Spring, the variable foo will contain "fo o" and the variable bar will contain "ba r". Using UriTemplate directly, it's easier to reproduce the generation of that URL:

String foo = "fo+o";
String bar = "ba r";
UriTemplate uriTemplate = new UriTemplate("http://example.com/?foo={foo}&bar={bar}");
Map<String, String> vars = new HashMap<>();
vars.put("foo", foo);
vars.put("bar", bar);
URI uri = uriTemplate.expand(vars);
System.out.println(uri);

which prints

http://example.com/?foo=fo+o&bar=ba%20r

Not even UriComponentsBuilder encodes the plus sign:

String foo = "fo+o";
String bar = "ba r";
UriComponentsBuilder ucb = UriComponentsBuilder.fromUriString("http://example.com/?foo={foo}&bar={bar}");
System.out.println(ucb.build().expand(foo, bar).toUri());
System.out.println(ucb.build().expand(foo, bar).toString());
System.out.println(ucb.build().expand(foo, bar).toUriString());
System.out.println(ucb.build().expand(foo, bar).encode().toUri());
System.out.println(ucb.build().expand(foo, bar).encode().toString());
System.out.println(ucb.build().expand(foo, bar).encode().toUriString());
System.out.println(ucb.buildAndExpand(foo, bar).toUri());
System.out.println(ucb.buildAndExpand(foo, bar).toString());
System.out.println(ucb.buildAndExpand(foo, bar).toUriString());
System.out.println(ucb.buildAndExpand(foo, bar).encode().toUri());
System.out.println(ucb.buildAndExpand(foo, bar).encode().toString());
System.out.println(ucb.buildAndExpand(foo, bar).encode().toUriString());

That prints:

http://example.com/?foo=fo+o&bar=ba%20r
http://example.com/?foo=fo+o&bar=ba r
http://example.com/?foo=fo+o&bar=ba r
http://example.com/?foo=fo+o&bar=ba%20r
http://example.com/?foo=fo+o&bar=ba%20r
http://example.com/?foo=fo+o&bar=ba%20r
http://example.com/?foo=fo+o&bar=ba%20r
http://example.com/?foo=fo+o&bar=ba r
http://example.com/?foo=fo+o&bar=ba r
http://example.com/?foo=fo+o&bar=ba%20r
http://example.com/?foo=fo+o&bar=ba%20r
http://example.com/?foo=fo+o&bar=ba%20r

all of which are incorrect, not encoding the plus sign.

I have posted questions about this on StackOverflow that further point to other bug reports and there's a lot of confusion around this subject:

https://stackoverflow.com/questions/50270372/why-is-spring-de-coding-the-plus-character-on-application-json-get-requests

https://stackoverflow.com/questions/50432395/whats-the-proper-way-to-escape-url-variables-with-springs-resttemplate-when-ca


Affects: 5.0.5

Issue Links:

  • #22006 Not encoding '+' in URLs anymore breaks backwards compatibility with apps running on spring 4 ("is duplicated by")
  • #21577 Support stricter encoding of URI variables in UriComponents ("is superseded by")

0 votes, 6 watchers

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator

spring-issuemaster commented May 22, 2018

Rossen Stoyanchev commented

while Spring decodes them as spaces.

Do you mean @RequestParam? That's not Spring behavior. It's what HttpServletRequest#getRequestParameter returns.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator

spring-issuemaster commented May 22, 2018

J. Pablo Fernández commented

Yes, @RequestParam.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator

spring-issuemaster commented May 22, 2018

Rossen Stoyanchev commented

Note that the treatment of "+" in Spring (e.g. RestTemplate, UriTemplate, UriComponents) is intentional. It was a change made in 5.0, see #19394, to comply with RFC 3986.

There are options for having the "+" encoded, but I won't go into the details here because the documentation has been recently expanded and revised majorly to discuss the various scenarios and options available in 5.0. Please review the URI Links chapter.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator

spring-issuemaster commented May 22, 2018

J. Pablo Fernández commented

I'm still unsure where the boundaries of each thing are and I'm not an expert in Spring nor Java servlets and all that. What I know is that I created a Spring RestController that I'm trying to talk to Spring RestTemplate and it's impossible.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator

spring-issuemaster commented May 22, 2018

Rossen Stoyanchev commented

Basically "+" has meaning in a URL (space) but it's otherwise legal. A client can choose to encode it (suppress its meaning) or not. It's a choice, and not an automatic decision.

 

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator

spring-issuemaster commented May 22, 2018

J. Pablo Fernández commented

Is plus-encoding-a-space part of the URI RFC? I think it's RFC3986. It's been a while since I read it all and I cannot find any mention of that (but maybe I'm missing something).

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator

spring-issuemaster commented May 22, 2018

Rossen Stoyanchev commented

From an RFC 3986 perspecitve, "+" is a legal character. By default the RestTemplate leaves it as is. However if you want to encode everything in a URI variable that could have meaning, you can switch to a different encoding mode. See the part in the above referenced chapter that talks about customizing encoding options.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator

spring-issuemaster commented Jul 12, 2018

Christophe Levesque commented

Rossen Stoyanchev: sorry for reopening this issue but I don't think the behavior introduced with #19394 is correct. I understand and agree that a "+" is a valid character in a query param value, it will be interpreted as a space by pretty much any server (including Spring apps). For example:

While your change in f2e293a fix a parsing issue when a + appears in query param, it also introduces a change that breaks the building of URLs. For example:

UriComponentsBuilder.fromHttpUrl("http://localhost:8080").queryParam("param1", "foo+bar").queryParam("param2", "foo&bar").toUriString(); 

Before your change this would produce the string:

http://localhost:8080?email1=foo%2Bbar@gmail.com&email2=foo%26bar@gmail.com 

When sent to a server (including a Spring Boot server with QueryParam("parm1") annotation), the parameter param1 will be interpreted as foo+bar while param2 will be interpreted as foo&bar, which is what I'd expect.

Now after your change, the code above produces the string:

http://localhost:8080?param1=foo+bar&param2=foo%26bar 

The first parameter was incorrectly encoded as a consequence of your change. Now the server will get foo bar as param1 (incorrect) and foo&bar as param2 (correct).

The "+" sign in a query parameter should be encoded as %2B when calling toUriString(). We just upgraded from Spring Boot 1.5.x to 2.0 and this is causing major regressions.

I have a simplistic fix for this which I'm not 100% sure is correct:

	static String encodeUriComponent(String source, Charset charset, Type type) {
		if (!StringUtils.hasLength(source)) {
			return source;
		}
		Assert.notNull(charset, "Charset must not be null");
		Assert.notNull(type, "Type must not be null");

		byte[] bytes = source.getBytes(charset);
		ByteArrayOutputStream bos = new ByteArrayOutputStream(bytes.length);
		boolean changed = false;
		for (byte b : bytes) {
			if (b < 0) {
				b += 256;
			}
			if (type.isAllowed(b) && (type != Type.QUERY_PARAM || b != '+')) { // <=== this line changed
				bos.write(b);
			}
			else {
				bos.write('%');
				char hex1 = Character.toUpperCase(Character.forDigit((b >> 4) & 0xF, 16));
				char hex2 = Character.toUpperCase(Character.forDigit(b & 0xF, 16));
				bos.write(hex1);
				bos.write(hex2);
				changed = true;
			}
		}
		return (changed ? new String(bos.toByteArray(), charset) : source);
	}

 
Please let me know what you think.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator

spring-issuemaster commented Jul 12, 2018

Rossen Stoyanchev commented

Christophe Levesque, please take a look at the "URI Encoding" section in the reference.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator

spring-issuemaster commented Jul 12, 2018

Christophe Levesque commented

I did have a look at the reference and assume you refer about this bit:

By default UriComponents encodes only characters that are illegal within a given URI component, but not all characters with reserved meaning.

I guess here the "+" sign is not illegal but has a reserved meaning (it's decoded as a space). So I understand that the behavior is somewhat explained by the documentation. What is really unfortunate, is that the UriComponentBuilder was a great utility to safely and concisely generate URLs without having to manually escape / encode parameters. All you had to do is write:

UriComponentsBuilder.fromHttpUrl(baseUrl).queryParam("foo", foo).toUriString() 

You could throw pretty much any query param value at it (with or without "+" sign or special character) and it would just work (the server receiving the request would get the param with the same value as what you passed to the queryParam method). This was the behavior for years.

Now, the new UriComponentsBuilder feels broken and violates the Principle of least astonishment. The code above works in almost all cases except if the param contains a "+" sign. In which case you're pretty much SOL as there is no simple workaround to get it encoded as %2B other than jumping through hoops:

  • instantiate a DefaultUriBuilderFactory,
  • switch the encoding more to VALUES_ONLY,
  • switch to full blown param templating (since "strict encoding is only applied to expanded URI variable values").

A simple one-liner becomes an odyssey. :P

Rossen Stoyanchev: I hope you'll consider addressing this!

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator

spring-issuemaster commented Jul 12, 2018

Brian Clozel commented

Christophe Levesque, I understand your frustration - especially since this was working a way that matched your needs for quite a while.

Looking at #19394, that's actually the issue violating that principle of least astonishment: "+" was the only character that wasn't dealt with the way it's meant to be by the RFC. Why make that exception?

We don't have many options here:

  • revert that fix, post 5.0 and break RFC compliance and all developers counting on the fix in #19394
  • improve even more the reference documentation on that (suggestions are welcome)

It's quite unfortunate, but you (and many others) were relying on a bug, or at least an implementation out of sync with the RFC. URL encoding is a very complex matter and we're trying to make things developer-friendly while maintaining RFC compliance.

The DefaultUriBuilderFactory is in my opinion not a bad choice here, as it's helping you to clarify the hard parts about URL encoding while still being flexible about your opinions.

DefaultUriBuilderFactory factory = new DefaultUriBuilderFactory();
factory.setEncodingMode(DefaultUriBuilderFactory.EncodingMode.VALUES_ONLY);
URI uri = factory.uriString("https://spring.io/")
			.queryParam("query", "{query}")
			.build("spring+framework"); // http://spring.io/?query=spring%2Bframework
@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator

spring-issuemaster commented Jul 12, 2018

Rossen Stoyanchev commented

I hear you about the odyssey .. To expand on what Brian said, the category of related encoding issue is broader. Besides "+" there are other characters that are common and legal but have reserved meaning, e.g. slash is legal in a path, so is semicolon, etc. The UriComponentsBuilder has always behaved the same way about those as it does for "+" now.

The reference explains the approach to encoding, which for good or for bad is modeled after, and consistent with java.net.URI. We can't change how it works now but it shouldn't be an odyssey either.

The reference explains switching to a different encoding mode, which for RestTemplate or WebClient is straight forward as there are built-in options to change the behavior.

The challenge with UriComponentsBuilder is that it's used statically and there is no place to configure options. You need something like DefaultUriBuilderFactory to wrap its use and provide a singleton instance that can be configured and of course that cannot be used statically so you lose some convenience.

So at this point the main gap that remains is with direct use of UriComponentsBuilder. The only options I see there are some overloaded methods on UriComponents related to expanding and encoding, that would rely on a different way of encoding. Or explicit methods on UriComponentsBuilder itself to customize its behavior, which would have to be invoked every time to deviate from the default behavior.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator

spring-issuemaster commented Jul 12, 2018

Mike Placentra commented

Rossen pointed out to me in #21473 that UriUtils#encode(String, Charset) has the behavior we're looking for, that's a good workaround in cases where an URL builder isn't needed.

That being said, in the time leading up to submitting that ticket and until now, I've been unsuccessful in convincing myself that having the encoder encode "+" as "%2B" by default would "break" compliance with RFC 3986. (This is separate from decoding "+" as a space by default – I agree that's bad.) I've looked through the RFC several times (basically each time I stub my toe on this issue in Spring Web) and it certainly gives us these points:

  • Since "+" (and the other reserved characters) are valid, a decoder would be non-compliant if it threw an exception when it saw them (as #19394 addresses)
  • When "normalizing" an already-formed URI for comparison with another URI (e.g. to look it up in a cache), "+" (and other reserved characters) must be left alone (must be considered different from "%2B") because they may be implementation-specific delimiters.

It goes far enough to ensure that 1) once a URI component is formed it is composed of only characters that make it distinguishable from other components in the URI, 2) a (percent) encoding mechanism is available for escaping other characters, and 3) URI normalizers (for comparing URIs) treat (potentially) implementation-specific delimiters as first-class delimiters.

Otherwise, the RFC largely leaves the implementation-specific encoding of data and data structures in a URI component out-of-scope. As far as I can tell, there's no rule against overzealously encoding data "abcd" as "%61%62%63%64" while forming the URI, and likewise there's no rule against encoding data "+" as "%2B" while forming the URI.

In fact, the RFC says this in section 2.2:

If data for a URI component would conflict with a reserved character's purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed.

It could be argued, given that a vanilla Spring Web application relies on Tomcat to decode request URIs, and Tomcat has an implementation-specific purpose for "+" as a delimiter (it decodes it as a space, I suppose because it assumes the value is a phrase or prose and we're delimiting the words), and a URI generated by a Spring Web application is likely to be subsequently decoded by the same or another Spring Web application, +data "+" must be encoded as "%2B" according to RFC 3986+ because the data "would conflict with a reserved character's purpose as a delimiter".

Also, I don't know how much weight it holds, but here's a W3C recommendation that says to encode "+" as "%2B".

I couldn't imagine encoding "+" as "%2B" by default would cause grief for anyone because any URI parser will properly decode that back to a "+". It could be an issue if someone is designing their own implementation-specific delimiter system for the URI component (encoding complex data structures), but then I probably wouldn't use UriComponentsBuilder, or I would research its functionality first and discover that DefaultUriBuilderFactory is more configurable.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator

spring-issuemaster commented Jul 12, 2018

Christophe Levesque commented

First off, thanks Rossen Stoyanchev, Brian Clozel and Mike Placentra for taking the time to reply! :)

I think the current implementation is mixing validation and encoding of query params. I completely agree that #19394 was describing a buggy behavior: when calling UriComponentsBuilder.build(true), HierarchicalUriComponents.verifyUriComponent("%EA%B0%80+%EB%82%98", Type.QUERY_PARAM) was crashing because the encoded param value contained a "+" even though it's a perfectly valid character in that context. So in this sense, the fix is correct: QUERY_PARAM.isAllowed('+') should indeed return true.

Now the confusion comes with the UriComponentsBuilder.build(boolean encoded) method param. The JavaDoc states:

encoded: whether all the components set in this builder are encoded (true) or not (false)

From my (limited) understanding, when it comes to the query parameters, passing encoded as true means that the entries in the queryParams multi value map are already encoded and should be used as is, while passing false, means that the query params need to be encoded. When passing false, the query params are encoded in method HierarchicalUriComponents.encodeUriComponent. This method uses the +same logic+ as the validation mentioned above to determine whether a character should be % encoded or not. This seems wrong and contradicts with java.net.URLEncoder:

When encoding a String, the following rules apply:

  • The alphanumeric characters "a" through "z", "A" through "Z" and "0" through "9" remain the same.
  • The special characters ".", "-", "*", and "_" remain the same.
  • The space character " " is converted into a plus sign "+".
  • All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the byte. The recommended encoding scheme to use is UTF-8. However, for compatibility reasons, if an encoding is not specified, then the default encoding of the platform is used.

For example using UTF-8 as the encoding scheme the string "The string ü@foo-bar" would get converted to "The+string+%C3%BC%40foo-bar" because in UTF-8 the character ü is encoded as two bytes C3 (hex) and BC (hex), and the character @ is encoded as one byte 40 (hex).

IMHO, when HierarchicalUriComponents.encodeUriComponent is called (at least for Type.QUERY_PARAM), it should either defer to java.net.URLEncoder.encode or follow the same logic (as Mike Placentra mentioned "data "+" must be encoded as "%2B" according to RFC 3986").

I cannot think of a use case where you would not want the "+" sign to be URL-encoded when calling encodeUriComponent, especially when this method has the same name as JavaScript's standard method (which does encode it as %2B :P).

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator

spring-issuemaster commented Jul 12, 2018

Rossen Stoyanchev commented

This isn't about breaking compliance with the RFC 3986 and it's not about encoding like URLEncoder (which encodes form data, not URIs). Simply, the approach to encoding is modeled after java.net.URI which does the following:

System.out.println(new URI("http", "example.org", "/foo bar", null));
System.out.println(new URI("http", "example.org", "/foo+bar", null));

// http://example.org/foo%20bar
// http://example.org/foo+bar

The space is encoded because it's not legal. The "+" is not because it's legal, even if it has reserved meaning in form data as a space. It's possible to want either the effect of "+" or not in which case it has to be encoded. I don't think one is valid and the other is not. This is more about providing the options to express which you want.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator

spring-issuemaster commented Jul 13, 2018

Christophe Levesque commented

it's not about encoding like URLEncoder (which encodes form data, not URIs)

It's true but in practice it works and is used for both. If you want to pass a parameter safely in a query string and make sure the server receives what you sent, the easiest way is to encode it with URLEncoder.encode. If you do a Google seach for "encoding url parameters in java", you'll find the vast majority of example use that method.

the approach to encoding is modeled after java.net.URI

There are 2 construtors in the java.net.URI class that pass the query parameter as you describe (I think your example was using the path parameter but since we're talking about query param here I feel this is more relevant). I did a quick usage search on my ~300MB Spring Boot app (don't judge!) that has a huge number of miscellaneous dependencies (Spring, AWS client, CXF, etc.). Those 2 constructors had a total of just 11 callers (including several internal JDK methods and instances where query was passed as null). By comparison, the single string URI constructor and the java.net.URI.create static method had a total of 256 callers.

My point is that you are modelling UriComponentBuilder after a very obscure method whose functionality is confusing at best. It is not about following an RFC. It's your choice and I respect that you think this is the right approach. But I also very strongly disagree, especially because this behavior only changed recently after being "correct" for so many years. Also to repeat what I said in my previous comment: who would want or expect the current behavior where a method called encodeUriComponent does not actually... fully encode a URI component? :P

Ironically, the JDK isn't even consistent with itself. Trying to create a URI with the single string constructor or the static create method with a space in either the path or query string like:

URI.create("http://localhost:8080/foo bar?test=abc xyz")

fails with:

Exception in thread "main" java.lang.IllegalArgumentException: Illegal character in path at index 25: http://localhost:8080/foo bar?test=abc xyz
	at java.net.URI.create(URI.java:852)
	at com.tripactions.core.util.UriComponentsBuilder.main(UriComponentsBuilder.java:70)
Caused by: java.net.URISyntaxException: Illegal character in path at index 25: http://localhost:8080/foo bar?test=abc xyz
	at java.net.URI$Parser.fail(URI.java:2848)
	at java.net.URI$Parser.checkChars(URI.java:3021)
	at java.net.URI$Parser.parseHierarchical(URI.java:3105)
	at java.net.URI$Parser.parse(URI.java:3053)
	at java.net.URI.<init>(URI.java:588)
	at java.net.URI.create(URI.java:850)
	... 1 more

This would have been a much more reasonable behavior for the other URI constructors.

As a workaround I've extended the UriComponentsBuilder class and overriden the queryParam method by calling URIEncoder.encode on its argument, and changed the toUriString method to call build(true) rather than build(false). I hope this won't have any side-effects.

If there is no way you can be convinced, feel free to close this ticket. :(

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator

spring-issuemaster commented Jul 13, 2018

Rossen Stoyanchev commented

I'm not trying to defend how it works, to be honest. Only explaining how it was designed, a long time ago.

The behavior you want is available when using the RestTemplate and the WebClient. What's missing is something when using UriComponentsBuilder directly. I will give it some further thought.

This is what we would need to expose, and which you can use as a workaround:

String value = "A+B=C";
value = UriUtils.encode(value, StandardCharsets.UTF_8);
URI uri = UriComponentsBuilder.newInstance().queryParam("test", value).build(true).toUri();
@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator

spring-issuemaster commented Jul 13, 2018

Rossen Stoyanchev commented

Christophe Levesque, I've made some progress on providing a solution in UriComponents. Please follow #21577. I would appreciate feedback from as many as possible once the solution is available early next week.

@spring-issuemaster

This comment has been minimized.

Copy link
Collaborator

spring-issuemaster commented Jul 13, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment