Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Response with Text type does not contain unicode characters #263

Closed
przembot opened this Issue Nov 4, 2015 · 21 comments

Comments

Projects
None yet
8 participants
@przembot
Copy link

przembot commented Nov 4, 2015

As in title, when connecting to the /test endpoint using web browser, the unicode characters are not displayed as they should.

{-# LANGUAGE DataKinds, TypeOperators, OverloadedStrings #-}
module Main where

import Control.Monad.Trans.Either
import Servant
import Network.Wai
import Network.Wai.Handler.Warp
import Data.Text

type API = "test" :> Get '[JSON] Text

test :: EitherT ServantErr IO Text
test = return "ąółżźć sdfasdf"

proxyAPI :: Proxy API
proxyAPI = Proxy

app :: Application
app = serve proxyAPI test

main :: IO ()
main = run 8080 app

Tested with Firefox 42.0, Chromium 46.0:

"ąółżźć sdfasdf"
@jkarni

This comment has been minimized.

Copy link
Member

jkarni commented Nov 4, 2015

Can you paste the result of locale?

@przembot

This comment has been minimized.

Copy link
Author

przembot commented Nov 4, 2015

LANG=en_US.utf8
LC_CTYPE="en_US.utf8"
LC_NUMERIC="en_US.utf8"
LC_TIME="en_US.utf8"
LC_COLLATE="en_US.utf8"
LC_MONETARY="en_US.utf8"
LC_MESSAGES=C
LC_PAPER="en_US.utf8"
LC_NAME="en_US.utf8"
LC_ADDRESS="en_US.utf8"
LC_TELEPHONE="en_US.utf8"
LC_MEASUREMENT="en_US.utf8"
LC_IDENTIFICATION="en_US.utf8"
LC_ALL=
@christian-marie

This comment has been minimized.

Copy link
Contributor

christian-marie commented Nov 7, 2015

@przembot Could you please try the master/HEAD version of servant (You'll need to change EitherT to ExceptT, or coerce it) with this patch applied?

diff --git a/servant/src/Servant/API/ContentTypes.hs b/servant/src/Servant/API/ContentTypes.hs
index ab857ce20a3b..b38a4434c7af 100644
--- a/servant/src/Servant/API/ContentTypes.hs
+++ b/servant/src/Servant/API/ContentTypes.hs
@@ -122,7 +122,7 @@ class Accept ctype where

 -- | @application/json@
 instance Accept JSON where
-    contentType _ = "application" M.// "json"
+    contentType _ = "application" M.// "json" M./: ("charset", "utf-8")

 -- | @application/x-www-form-urlencoded@
 instance Accept FormUrlEncoded where
@przembot

This comment has been minimized.

Copy link
Author

przembot commented Nov 12, 2015

@christian-marie The header is fine with this patch, but it breaks many tests in servant-server.

@christian-marie

This comment has been minimized.

Copy link
Contributor

christian-marie commented Nov 12, 2015

Right. I just want to confirm that it is the browser trying to interpret Unicode as latin-1

@christian-marie

This comment has been minimized.

Copy link
Contributor

christian-marie commented Nov 12, 2015

@przembot with this patch does it render correctly for you? It wasn't quite clear from your response.

@przembot

This comment has been minimized.

Copy link
Author

przembot commented Nov 12, 2015

@christian-marie Yes, it renders correctly with it :)

@jkarni jkarni added the bug label Dec 10, 2015

@jkarni

This comment has been minimized.

Copy link
Member

jkarni commented Dec 10, 2015

I thought charset was meant only for text top-level types? And that application/json was by default utf-8? E.g. RFC 4627, section 3:

JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.

@jkarni

This comment has been minimized.

Copy link
Member

jkarni commented Dec 10, 2015

Perhaps we should charset utf-8 JSON by default, even though that's kind of the wrong thing to do?

@maerten

This comment has been minimized.

Copy link

maerten commented Feb 8, 2016

I use this code to override the Content-Type by replacing the headers of servant through wai-middleware. This will probably break all kinds of things if you need other content-types than UTF8 application/json, but it works for me :-)

import           Network.Wai
import Network.Wai.Internal (Response(..))
import qualified Network.HTTP.Types as HTTPT   (Header)
import qualified Data.CaseInsensitive as CI

overrideHeader :: (ByteString, ByteString) -> Middleware
overrideHeader h = modifyResponse $ overrideHeader' ((first CI.mk) h)

overrideHeader' :: HTTPT.Header -> Response -> Response
overrideHeader' h = mapResponseHeaders (\hs -> h : (removeHeader h hs))

removeHeader :: HTTPT.Header -> [HTTPT.Header] -> [HTTPT.Header]
removeHeader h hs = filter (not . isHeader h) hs
        where isHeader (x,_) (y,_) = x == y

Example usage (key point is just to use it as a Wai middleware):

app :: PoolConfig -> Application
app env = logStdout $ simpleCors $ overrideHeader ("Content-Type", "application/json; charset=utf-8") $ serve shopAPI EmptyConfig (readerAPIServer env)

Again, it's by no means meant as a good fix, and might cause problems in your app!

@jonathanjouty

This comment has been minimized.

Copy link

jonathanjouty commented Oct 12, 2016

We ran into this recently.

Perhaps we should charset utf-8 JSON by default, even though that's kind of the wrong thing to do?

There's nothing inherently wrong with doing that apart from it being a redundant instruction given RFC4627 § 3. All we're saying is "yes, we are indeed following the default encoding".

(Edited: mistaken Ctrl+Enter seems to submit)

@jonathanjouty

This comment has been minimized.

Copy link

jonathanjouty commented Oct 12, 2016

Just another two cents:

The Chromium bug dates from Dec 2014 and not yet fixed.
For Firefox on the other hand it, it says it is fixed as of v44 (this is the fixed ticket, the one linked above is marked as a duplicate).

So... just tested using BrowserStack.
Same issue exists on IE11 running on Windows 7, Firefox 49 on Win 7, and Safari (Win7 + Mac).

@christian-marie

This comment has been minimized.

Copy link
Contributor

christian-marie commented Oct 13, 2016

@haskell-servant/maintainers Do we want to do something like:

 instance Accept JSON where
-    contentType _ = "application" M.// "json"
+    contentType _ = "application" M.// "json" M./: ("charset", "utf-8")

To prevent more innocents being confused, or is this going to set the world on fire?

@phadej

This comment has been minimized.

Copy link
Member

phadej commented Oct 13, 2016

How that patch affect the input, i.e. when the content body is sent with application/json mime type (i.e. no encoding). Does it still match? If it does, I don't see a problem. Yet have to check which test fails.

@jkarni

This comment has been minimized.

Copy link
Member

jkarni commented Oct 13, 2016

@christian-marie yes, that's what we want. But tests fail with that change - is anyone up for looking into why?

@christian-marie

This comment has been minimized.

Copy link
Contributor

christian-marie commented Oct 13, 2016

It'd just be because extra headers are appearing, I'd think. I never actually saw them. I'll take a look within the next week, unless someone beats me to it.

@alpmestan

This comment has been minimized.

Copy link
Contributor

alpmestan commented Oct 13, 2016

By the way, many content types could be generalized a bit. We could have a, say, data JSON (charset :: Symbol) combinator that lets us specify the charset, and have a default one with the charset set to UTF8. This way we let the user pick the charset, going with UTF8 by default but allowing people to easily change the charset if needed.

@soenkehahn

This comment has been minimized.

Copy link
Contributor

soenkehahn commented Oct 13, 2016

@alpmestan: I wouldn't recommend complicating things for a feature that nobody is asking for.

@alpmestan

This comment has been minimized.

Copy link
Contributor

alpmestan commented Oct 13, 2016

It seems like the most natural solution, because right now we have some "implicit" UTF8 charsets sprinkled over our content types and I've always found this to be unfortunate. We could still just export UTF8 versions by default and have the fancier ones sit in another module.

Or maybe something like having all (textual content types) be UTF8 by default and then have something like:

data Charset (charset :: Symbol) ct

instance (KnownSymbol charset, Accept ct) => Accept (Charset charset ct) where ...

This would provide us with a somewhat principled approach for setting charsets: UTF8 by default for all textual CTs, use a little utility type to change it.

@christian-marie

This comment has been minimized.

Copy link
Contributor

christian-marie commented Jan 8, 2017

Sorry, I dropped this on the floor. I'm gonna leave it there for now, really busy lately.

@jkarni

This comment has been minimized.

Copy link
Member

jkarni commented Sep 13, 2017

Looks like both the Chromium and Firefox bugs have been fixed, which I think means even browsers agree with us not going against the spec here (and leaving things as they are)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.