Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSONValue.parse does not correctly decode UTF-8 bytes from an inputstream #48

Closed
GoogleCodeExporter opened this issue Sep 10, 2015 · 2 comments

Comments

@GoogleCodeExporter
Copy link

What steps will reproduce the problem?
// Note that the testJsonString includes unicode characters and is UTF-8 endoded

String testJsonString = 
"{\"balance\":1000.21,\"num\":100,\"nickname\":null,\"is_vip\":true,\"Sinhalese\
":\"සිංහල ජාතිය\",\"name\":\"foo\"}";

ByteArrayInputStream bis = new 
ByteArrayInputStream(testJsonString.getBytes(StandardCharsets.UTF_8));

JSONObject obj = JSONValue.parse(bis, JSONObject.class);

obj.get("Sinhalese"); // result is incorrect


What is the expected output? What do you see instead?

I would expect obj.get("Sinhalese") to return the characters in the original 
UTF-8 String


What version of the product are you using? On what operating system?

Using json-smart 2.0, openjdk 7 on freebsd


Please provide any additional information below.

Note that:

JSONValue.parse(ByteStreams.toByteArray(bis), JSONObject.class);

works correctly. So the code works fine when decoding byte arrays

Original issue reported on code.google.com by patrick....@gmail.com on 13 Sep 2014 at 6:42

@GoogleCodeExporter
Copy link
Author

I tried my hand at a fix for this for version 1 and created a pull request for 
it.
I'd be happy to try applying a similar for to version 2, but am less familiar 
with it.

Regardless of whether the pull request is accepted or not, a workaround could 
be to just wrap the InputStream in a InputStreamReader 
(http://docs.oracle.com/javase/7/docs/api/java/io/InputStreamReader.html) and 
use the parse methods that take a Reader instead. e.g.

{{{
String testJsonString = 
"{\"balance\":1000.21,\"num\":100,\"nickname\":null,\"is_vip\":true,\"Sinhalese\
":\"සිංහල ජාතිය\",\"name\":\"foo\"}";

ByteArrayInputStream bis = new 
ByteArrayInputStream(testJsonString.getBytes(StandardCharsets.UTF_8));

JSONObject obj = JSONValue.parse(new InputStreamReader(bis, 
StandardCharsets.UTF_8), JSONObject.class);
}}}


I noticed this same issue whilst using V1 of the library and parsing JSON 
serially. The same workaround should work there too, e.g.:

{{{
JSONValue.SAXParse(new InputStreamReader(bis, StandardCharsets.UTF_8), 
someHandler);
}}}

Original comment by toadm...@googlemail.com on 4 Feb 2015 at 4:45

@GoogleCodeExporter
Copy link
Author

I have misunderstood, the json specs.

I have juste patch json-smart to handle InputStream as UTF-8 data for 
json-smart V1 and V2.

Original comment by uriel.chemouni on 20 Aug 2015 at 7:06

  • Changed state: Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant