Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set charset for JsonResponse #120

Closed
patrickkusebauch opened this issue Feb 19, 2016 · 17 comments
Closed

Set charset for JsonResponse #120

patrickkusebauch opened this issue Feb 19, 2016 · 17 comments

Comments

@patrickkusebauch
Copy link
Contributor

@patrickkusebauch patrickkusebauch commented Feb 19, 2016

When sending a JsonResponse, you cannot set the JSON charset right now. There is a work around right now with
$this->sendResponse(new \Nette\Application\Responses\JsonResponse($data, 'application/json; charset=utf-8')); , but it ?should not? work.

There is a discussion of this behavior on Nette Framework Forum (in Czech).

I am willing to write a PR, of somebody decides the solution (you should not be able to set charset under any circumstances vs add ability to set charset transparently).

@dg

This comment has been minimized.

Copy link
Member

@dg dg commented Feb 19, 2016

Nette supports only UTF-8.

You can create your own response, send a different encoding and encode json without flag JSON_UNESCAPED_UNICODE, but it should not be part of Nette.

@dg

This comment has been minimized.

Copy link
Member

@dg dg commented Feb 19, 2016

Are you saying that there is missing charset=UTF-8? So please send PR and set 'utf-8' here, setContentType($this->contentType, 'utf-8')

@patrickkusebauch

This comment has been minimized.

Copy link
Contributor Author

@patrickkusebauch patrickkusebauch commented Feb 19, 2016

Yes, that is exactly what I am saying. I will make a PR over the weekend.

@JanTvrdik

This comment has been minimized.

Copy link
Contributor

@JanTvrdik JanTvrdik commented Feb 19, 2016

This is more of a Chrome bug than Nette bug because

  1. encoding of JSON can be reliable determined from first few bytes
  2. only UTF-8 is used in real-world
  3. UTF-8 is the default encoding

Also the official registration of mime type application/json pretty much says that the charset parameter is pointless for JSON.

@milanobrtlik

This comment has been minimized.

Copy link

@milanobrtlik milanobrtlik commented Feb 19, 2016

Try this:
$this->payload = ["test"=>"ÝŽáýžřáýž"];
$this->sendPayload();

and you will see Chrome bug.
In real world are used ALL possible encodings. Even the impossible ones.

@patrickkusebauch

This comment has been minimized.

Copy link
Contributor Author

@patrickkusebauch patrickkusebauch commented Feb 19, 2016

@dg Furthermore, there is a possible bug in Nette\Http\Repsonse, where you can set the charset by callingNette\Http\Response::setContentType('application/json; charset=utf-8', null)instead of Nette\Http\Response::setContentType('application/json', 'charset=utf-8')

@dg

This comment has been minimized.

@dg dg closed this in 2f939c8 Feb 22, 2016
dg added a commit that referenced this issue Feb 22, 2016
User agents should ignore charset for application/json, this is just workaround for bug in Firefox and Chrome.
dg added a commit that referenced this issue Feb 22, 2016
User agents should ignore charset for application/json, this is just workaround for bug in Firefox and Chrome.
@jahudka

This comment has been minimized.

Copy link

@jahudka jahudka commented Mar 6, 2016

Well this is just wrong :-( There is no "charset" parameter for the application/json content-type and JSON must be unicode by definition. Setting a charset to an application/json response may break things, like Nittro for example, because some people are trying to follow standards and specs and those specifically say there is no charset for application/json...

@jahudka

This comment has been minimized.

Copy link

@jahudka jahudka commented Mar 6, 2016

Besides, if you read through the chromium bug report, this is an edge case at best..

  1. It only happens to people who have a specific encoding set in their browser, instead of automatic encoding detection; yes, there still might be some people like that, but
  2. It only happens if you're looking directly at the JSON data out of context, i.e. by directly opening an API endpoint by typing its address in your browser address bar. If you load the JSON data, say, by AJAX, from a page that already is in UTF-8, then in that page's context the JSON data is interpreted correctly, as demonstrated here (yes, it's Opera, but the behaviour when opening the JSON URLs directly is the same). And who is the only person that will ever see the raw JSON data out of context? Yes, the developer.

So basically what you're arguing is that Nette should output invalid, standards-incompatible responses for the sake of a couple of developers who aren't even able to switch encoding to automatic detection. Neat.

@fprochazka

This comment has been minimized.

Copy link
Contributor

@fprochazka fprochazka commented Mar 6, 2016

@jahudka can you please post a link for the mentioned standards? Anyway, this should be probably reverted as a bug.

@jahudka

This comment has been minimized.

Copy link

@jahudka jahudka commented Mar 6, 2016

@fprochazka the IANA application/json Media Type Registration explicitly says:

Required parameters: n/a
Optional parameters: n/a
...
Note: No "charset" parameter is defined for this registration.
Adding one really has no effect on compliant recipients.

And the last bit about a charset parameter not having an effect on compliant recipients is because JSON must always be in Unicode and the specific Unicode charset can be determined by looking at the first two characters, according to IETF RFC4627:

JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.

Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets.

00 00 00 xx  UTF-32BE
00 xx 00 xx  UTF-16BE
xx 00 00 00  UTF-32LE
xx 00 xx 00  UTF-16LE
xx xx xx xx  UTF-8

There is a little more to it since RFC7159 obsoleted RFC4627, but the end result is the same. See the comments below this SO answer.

@Majkl578

This comment has been minimized.

Copy link
Contributor

@Majkl578 Majkl578 commented Mar 6, 2016

Agreed, this change is controversial at least. If someone needs to set charset explicitly, they are free to do so.

Anyway, I just checked some of the top sites and their API Content-Type:
GitHub API: application/json; charset=utf-8
Bitbucket API: application/json; charset=utf-8
Facebook Graph API: application/json; charset=UTF-8
Twitter API: application/json; charset=utf-8
Youtube Data API: application/json; charset=UTF-8

@jahudka

This comment has been minimized.

Copy link

@jahudka jahudka commented Mar 6, 2016

@Majkl578 As my father was always prone to say in my childhood: "I don't care what your classmates' grades are, it's your grades I'm concerned about" :-) The fact that everybody else is doing it wrong doesn't mean we should be copying them.

@hrach

This comment has been minimized.

Copy link
Contributor

@hrach hrach commented Mar 6, 2016

@jahudka please, just admit you have written something what's crystal clear with the specs, but doesn't work in the real world.
edit: I do not argue about reverting, I just see your arguments quite weird.

@jahudka

This comment has been minimized.

Copy link

@jahudka jahudka commented Mar 6, 2016

@hrach What's so weird about trying to follow the specs? Admittedly my code could be more fault-tolerant about it's input (and I've already fixed that), but that's hardly relevant to the topic of Nette following the specs.. I've always found that the best philosophy for developing software is to be as tolerant about your input as you can get, while being as strict about your output as possible. I'm not saying I'm following that rule flawlessly, as this issue clearly proves; but that doesn't mean I don't (or shouldn't) try..

@hrach

This comment has been minimized.

Copy link
Contributor

@hrach hrach commented Mar 6, 2016

From my point of view - following specs is not how the internet actually works. The real world apps (definitely) doesn't follow the spec for a reason. We use features from css, html, js, ... which are hadly defined in specs. And someday, specs will have to reflect the current state. I do not claim following the specs is wrong, I'm saing it's not (should not be) the (major) reason why something should be done.

The question is: does it more break things or fix things?

I do not know the answer so I don't have an opinion.

@jahudka

This comment has been minimized.

Copy link

@jahudka jahudka commented Mar 6, 2016

Yes, it's true that in a lot of cases we don't, and can't, follow the specs, because there either aren't any, or they aren't adopted widely enough, or they're just stupid. I'm not saying that the specs are the Holy Writ and that they are the single source of ultimate truth. What I am saying is that there is a reason why programmers have come up with the idea of having specs in the first place, and that reason ultimately does make sense; and similarly that a lot of the specs that we do have were issued for a reason and as long as that reason makes sense, we should try to follow the specs, for the same reasons that we have them. Yes, it's a valid point that we should be looking at whether our code fixes more things than it breaks, but I think that in cases like this where there is not much evidence either way we should stick to the specs, because that should make our code less likely to break something in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.