New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
json responses with 'application/json' and no charset are parsed in ISO-8859-1 charset #150
Comments
Good catch. |
Technically, JSON is not always UTF-8, but it is always Unicode, and |
I've run into the same issue, where |
Is there an ETA on a release that includes this fix? |
@Malax within the next couple of weeks -- however, do note that due to the way that the standalone client works, it's fairly easy to create your own custom body readables and writables as a workaround. |
@wsargent Thanks for the reply! 👍 If it's in the next couple of weeks, we will just delay the migration to 2.6 until then. |
@wsargent Can you comment on my workaround above for accessing the raw bytes when I don't want to use the play JSON parser? |
@jsw You are broadly correct -- the best option is to use |
Would it be feasible to avoid the Array[Byte] -> ByteString -> Array[Byte] copying by using the "underlying" response?! |
Unless I'm doing things wrong, I think the issue still exists in play-ws 1.0.1.
Output
|
You should be calling
|
In theory yes, in practice array copying usually doesn't yield much in the way of performance improvements because the JVM is pretty effective at array copying: https://psy-lob-saw.blogspot.com/2015/04/on-arraysfill-intrinsics-superword-and.html If you're interested, you can poke at the JVM overhead with JMH: https://psy-lob-saw.blogspot.com/p/jmh-related-posts.html There's a sbt plugin at https://github.com/ktoso/sbt-jmh that will help. |
@wsargent My use cases are:
Given that, I'm not sure why I would want to do an extra parse via Also, can you explain why |
@wsargent Thanks for that valuable information. 👍
Json.parse detects the encoding of the data and decodes them appropriately.
|
The remaining problem being discussed here is basically that the
|
'application/json' responses encoded using utf-8 are wrongly parsed using ISO-8859-1 charset in play-ws (at least until v1.0.7) See: playframework/play-ws#150
'application/json' responses encoded using utf-8 are wrongly parsed using ISO-8859-1 charset in play-ws (at least until v1.0.7) when using WSResponse.body See: playframework/play-ws#150 WSResponse.json doesn't seem to suffer from the same problem.
'application/json' responses encoded using utf-8 are wrongly parsed using ISO-8859-1 charset in play-ws (at least until v1.0.7) when using WSResponse.body See: playframework/play-ws#150 WSResponse.json doesn't seem to suffer from the same problem.
'application/json' responses encoded using utf-8 are wrongly parsed using ISO-8859-1 charset in play-ws (at least until v1.0.7) when using WSResponse.body See: playframework/play-ws#150 WSResponse.json doesn't seem to suffer from the same problem.
'application/json' responses encoded using utf-8 are wrongly parsed using ISO-8859-1 charset in play-ws (at least until v1.0.7) when using WSResponse.body See: playframework/play-ws#150 WSResponse.json doesn't seem to suffer from the same problem.
'application/json' responses encoded using utf-8 are wrongly parsed using ISO-8859-1 charset in play-ws (at least until v1.0.7) when using WSResponse.body See: playframework/play-ws#150 WSResponse.json doesn't seem to suffer from the same problem.
'application/json' responses encoded using utf-8 are wrongly parsed using ISO-8859-1 charset in play-ws (at least until v1.0.7) when using WSResponse.body See: playframework/play-ws#150 WSResponse.json doesn't seem to suffer from the same problem.
The issue still exists in play-ws 1.1.1. Another possible solution would be to change the |
somewhat related... how can we make the server include the charset instead of trying to be too smart for its own good? |
See possible fix on the Play server side: playframework/playframework#8239 |
JSON in HTTP are always encoded in UTF-8. Responses are parsed correctly when server writes content type header like
application/json; charset=utf-8
.However, many servers (like Play framework itself) uses
application/json
without charset. In Play 2.6, that responses are parsed in ISO-8859-1 charset.I think that problem is in
JsonBodyReadables
implementation.Json.parse(response.body)
should be replaced withJson.parse(response.bodyAsBytes.utf8string)
. Charset forresponse.body
is always ISO-8859-1 in async http client when no charset info is provided in content type header.The text was updated successfully, but these errors were encountered: