Skip to content

Commit

Permalink
(fix) Severe encoding bug in Action.stringContent(String content). See
Browse files Browse the repository at this point in the history
  • Loading branch information
mkotsur committed Oct 2, 2017
1 parent e01d784 commit c6f269b
Show file tree
Hide file tree
Showing 2 changed files with 97 additions and 2 deletions.
19 changes: 17 additions & 2 deletions src/main/java/com/xebialabs/restito/semantics/Action.java
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ public static Action resourceContent(URL resourceUrl) {
}

/**
* Combines {@link #resourceContent(java.net.URL)} and {@link #charset(java.nio.charset.Charset)}
* Combines {@link #resourceContent(java.net.URL)} and {@link #charset(Charset)}
*/
public static Action resourceContent(URL resourceUrl, Charset charset) {
final Action charsetAction = charset != null ? charset(charset) : Action.noop();
Expand All @@ -151,12 +151,27 @@ public static Action bytesContent(final byte[] content) {
}

/**
* Writes string content to response
* Writes string content into response. Important: when using this method, the string gets encoded
* into a sequence of bytes using the platform's default charset. Please use {@link #stringContent(String, Charset)}
* if you need to be specific about the encoding.
*/
public static Action stringContent(final String content) {
return bytesContent(content.getBytes());
}

/**
* Writes string content into response. The passed charset is used:
* - To encoded the string into bytes;
* - As character set of the response.
* The latter will result in a <pre>charset=XXX</pre> part of <pre>Content-Type</pre> header
* if and only if the header has been set in some way (e.g. by using {@link #contentType(String)} action.
*
* See `integration.CharsetAndEncodingTest` for more details.
*/
public static Action stringContent(final String content, final Charset charset) {
return composite(charset(charset), bytesContent(content.getBytes(charset)));
}

/**
* Sets key-value header on response
*/
Expand Down
80 changes: 80 additions & 0 deletions src/test/java/integration/CharsetAndEncodingTest.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
package integration;

import com.jayway.restassured.RestAssured;
import com.jayway.restassured.response.Response;
import com.xebialabs.restito.server.StubServer;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import static org.junit.Assert.*;

import java.nio.charset.Charset;

import static com.jayway.restassured.RestAssured.expect;
import static com.xebialabs.restito.builder.stub.StubHttp.whenHttp;
import static com.xebialabs.restito.semantics.Action.*;
import static org.hamcrest.Matchers.*;
import static org.hamcrest.Matchers.nullValue;

public class CharsetAndEncodingTest {

private StubServer server;

private static Charset CP_1251 = Charset.forName("Cp1251");

This comment has been minimized.

Copy link
@garretwilson

garretwilson Oct 10, 2017

Do you really know for sure that Cp1251 is going to be available on all JVMs on all platforms? See https://docs.oracle.com/javase/8/docs/api/java/nio/charset/Charset.html#forName-java.lang.String-

There are the only ones guaranteed to be present: https://docs.oracle.com/javase/8/docs/api/java/nio/charset/StandardCharsets.html

And even if you stick with Cp1251, I would consider testing UTF-8 as well, as that's what 90% of people will use, I would think.

This comment has been minimized.

Copy link
@mkotsur

mkotsur Oct 23, 2017

Author Owner

I don't think it's gonna be available on all the platforms, but for the tests, I think, it's fine. They run successfully on Linux, OSX... will run on Windows too, I guess.

I choose to test it with non-UTF8 as the default one is most likely UTF8, and that might interfere with the logic.


@Before
public void start() {
server = new StubServer().run();
RestAssured.port = server.getPort();
}

@After
public void stop() {
server.stop();
}


@Test
public void shouldEncodeStringsUsingDefaultPlatformCharset() {
whenHttp(server)
.match("/default-charset").
then(stringContent("Хеллоу"));

expect().header("Content-Type", is(nullValue()))
.content(equalTo("Хеллоу"))

This comment has been minimized.

Copy link
@garretwilson

garretwilson Oct 10, 2017

You'll want to be very careful, here. The whole idea of why we don't want to use the default system charset is that it may change from system to system. Here you're taking advantage of the fact that hopefully both the server and the client will both use the system default, so it will all come out OK in the end. Maybe that's OK, but it would probably be a good idea to check the REST Assured documentation just to make sure they are using the system default, too. It's probably not a practice to promote in real-life code; I understand that you're just using it for the test of the API implementation, though.

This comment has been minimized.

Copy link
@garretwilson

garretwilson Oct 18, 2017

So I'm starting to look at REST-assured, and it looks like they may have the same problem. How do you know what they are using the system default encoding here? In fact it looks like they may have changed to a default of UTF-8 for some content types; see rest-assured/rest-assured#567 and rest-assured/rest-assured#926.

This comment has been minimized.

Copy link
@mkotsur

mkotsur Oct 23, 2017

Author Owner

Thanks for looking so deep into this. I agree, it makes more sense to read the bytes from the response and do the conversion explicitly. See ca32ee7

.when().get("/default-charset");
}

@Test
public void shouldEncodeStringsUsingExplicitlyPassedCharset() {
whenHttp(server)
.match("/custom-charset-1").
then(contentType("application/text"), stringContent("Хеллоу", CP_1251));

whenHttp(server)
.match("/custom-charset-2").
then(stringContent("Хеллоу", CP_1251), contentType("application/text"));

expect().header("Content-Type", is("application/text;charset=windows-1251"))
.content(equalTo("Хеллоу"))
.when().get("/custom-charset-1");

expect().header("Content-Type", is("application/text;charset=windows-1251"))
.content(equalTo("Хеллоу"))
.when().get("/custom-charset-2");
}

@Test
public void shouldNotIncludeExplicitlyPassedInHeadersCharsetWhenContentTypeHeaderIsMissing() {
whenHttp(server)
.match("/custom-charset-3").
then(stringContent("Хеллоу", CP_1251));

Response response = expect().header("Content-Type", is(nullValue()))
.when().get("/custom-charset-1");

assertArrayEquals(response.body().asByteArray(), "Хеллоу".getBytes(CP_1251));
}


}

0 comments on commit c6f269b

Please sign in to comment.