writeJSON is not UTF-8 compliant #5

Open
adinapoli opened this Issue Aug 18, 2013 · 2 comments

2 participants

@adinapoli

If we write an instance of a ToJSON data type using writeJSON, it doesn't handle correctly utf-8 text with contains accented letters. This is an excerpt of an italian text, with the current function:

di essere il più precisi possibile nell'inserimento

The problem is twofold:

a) We need to encode using not the standard encode function, but the one inside Data.Aeson.Encode
b) We need to set the charset=utf-8 encoding inside the HTTP header.

This is the proposed patch:

-------------------------------------------------------------------------------
-- | Set MIME to 'application/json' and write given object into
-- 'Response' body. Exactly as Snap.Extras' @writeJSON@, but handles correctly
-- UTF-8 text.
writeEncodedJSON :: (MonadSnap m, ToJSON a) => a -> m ()
writeEncodedJSON a = do
  modifyResponse $ setHeader "Content-Type" "application/json; charset=utf-8"
  writeLBS . AE.encode $ a

where AE.encode is a qualified import of Data.Aeson.Encode.

With the proposed patch, everything works as expected:

di essere il più precisi possibile nell'inserimento

I also suggest we refactor out the modifyResponse, maybe creating a combinator which adds the charset utf8 ad the content-type, so that we can reuse what we already have : jsResponse, jsonResponse etc.

A.

@ozataman
Owner

Ah, weird. Couple of questions:

  1. Aren't we already using the encode from Data.Aeson? A look at http://hackage.haskell.org/packages/archive/aeson/0.6.2.0/doc/html/src/Data-Aeson-Generic.html#encode shows that we are using Data.Aeson.Encode.encode. Am I missing something here?

  2. As explained here (http://stackoverflow.com/questions/9254891/what-does-content-type-application-json-charset-utf-8-really-mean), I thought all JSON is automatically interpreted as UTF8 and therefore the additional denotation is unnecessary?

  3. What front-end/client/browser are you using to interpret the results? It almost sounds like you're using an invalid parser that is NOT assuming any JSON is utf8 but instead assuming it is latin-1 or ascii or something. As far as I know, that is invalid behavior. For example, try passing a non-utf8 valid string to aeson for parsing and it will crap out with an error. It forces you to ensure your input is utf8 encoded.

@adinapoli

Hi Oz, again let me elaborate on this and I will get back to you. I can reply to 3) straight away:

  1. I'm using Google Chrome, so I don't think I'm in any way doing something an end user wouldn't do.

I'll get back later to you with points 1 and 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment