Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

add "charset=utf-8" to content-type "application/json" #383

Closed
thesmart opened this Issue Nov 30, 2012 · 14 comments

Comments

Projects
None yet
7 participants

The Request.prototype.json function sets Content-Type headers to application/json. utf-8 really should be the default imo, and this lead me to several hours of debugging hell to realize that the API I was working with rejects all but UTF-8 json.

I made these changes:

main.js, Line #803:

this.setHeader('content-type', 'application/json; charset=utf-8')

main.js, Line #807:

this.setHeader('content-type', 'application/json; charset=utf-8')

Is there anything wrong with this? Any thoughts?

efbeka commented Jan 13, 2013

👍 +1 on this.

Contributor

kevinoid commented May 22, 2013

RFC 4627 does not define a charset parameter because it requires that JSON is always encoded as Unicode (per Section 3). I would urge against this change unless there is a compelling reason to violate the standard.

Not sure I understand in what way this violates any standard.

JSON could be any unicode variety (e.g. UTF-16, UTF-32) or endianness (e.g.
little or big). This bug is just requesting that the library explicitly
declare UTF-8 rather than to be ambiguous and assuming that the server will
follow the specified default.

On Tue, May 21, 2013 at 10:49 PM, Kevin Locke notifications@github.comwrote:

RFC 4627 http://tools.ietf.org/html/rfc4627 does not define a charset
parameter because it requires that JSON is always encoded as Unicode (per
Section 3). I would urge against this change unless there is a compelling
reason to violate the standard.


Reply to this email directly or view it on GitHubhttps://github.com/mikeal/request/issues/383#issuecomment-18258693
.

Contributor

kevinoid commented May 25, 2013

Section 3 of RFC 4627 defines how the charset is detected using the first 4 bytes of the file, which is why it does not define a charset parameter. Adding a charset parameter is contrary to the standard because the standard defines the set of parameters for the media type and their interpretation and it does not include a charset parameter (Section 6).

And yet Java servers everywhere require the charset for acceptance...

On Sat, May 25, 2013 at 4:31 PM, Kevin Locke notifications@github.comwrote:

Section 3 of RFC 4627 http://tools.ietf.org/html/rfc4627 defines how
the charset is detected using the first 4 bytes of the file, which is why
it does not define a charset parameter. Adding a charset parameter is
contrary to the standard because the standard defines the set of parameters
for the media type and their interpretation and it does not include a
charset parameter (Section 6).


Reply to this email directly or view it on GitHubhttps://github.com/mikeal/request/issues/383#issuecomment-18455635
.

Contributor

kevinoid commented May 26, 2013

Supporting those services by sending a non-conformant media type is certainly an option. What are some services/programs/libraries which demonstrate the problem? Perhaps that will help to make an informed decision.

Message Bus API, for one.

I think you are confusing the scope of the JSON spec. I'm suggesting setting a HTTP 1.1 header, which is 100% valid. This only concerns the transport layer, but you are implying that a specification for body-content supersedes it. Charset spec is here:
http://www.w3.org/International/O-HTTP-charset

All I'm proposing is setting the header, which is 100% standards compliant and only affects the transport. Considering that 99.9% of Node.js is written for UTF-8 applications, it makes sense to set it as default for servers that fail to apply UTF-8 as the defacto default. I'm not trying to start a crusade here.

Contributor

kevinoid commented May 26, 2013

That document discusses the charset parameter for media types "that are of type text, such as text/html, text/plain, etc.". JSON is of type application, for which the charset parameter is not defined. HTTP 1.1, as defined in RFC 2616 defines the Content-Type as being a media type, as defined in the IANA registry. application/json is so defined and does not include a charset parameter. No HTTP RFC that I am aware of adds a charset parameter to all media types.

Which of the message bus API implementations incorrectly decode JSON as non-Unicode types?

It's ok, Tim Berners-Lee couldn't affect this thread.

Vote however you want.

Smart

On May 25, 2013, at 6:52 PM, Kevin Locke notifications@github.com wrote:

That document discusses the charset parameter for media types "that are of type text, such as text/html, text/plain, etc.". JSON is of type application, for which the charset parameter is not defined. HTTP 1.1, as defined in RFC 2616 defines the Content-Type as being a media type, as defined in the IANA registry. application/json is so defined and does not include a charset parameter. No HTTP RFC that I am aware of adds a charset parameter to all media types.

Which of the message bus API implementations incorrectly decode JSON as non-Unicode types?


Reply to this email directly or view it on GitHub.

@evantahler evantahler referenced this issue in actionhero/actionhero Jan 29, 2014

Merged

add charset 'utf-8' to 'Content-Type' header #310

Seems like those Java servers are lazy as "it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets." RFC 4627 sec.3

Further, the application/json entry in the IANA registry has a note:

Note:  No "charset" parameter is defined for this registration.
Adding one really has no effect on compliant recipients.

I'm understanding this issue to have been opened to accommodate standards non-compliant servers? And it seems the proposed solution was a further non-compliance? Thus a standard vs. implementation discussion...

Contributor

kevinoid commented Aug 25, 2014

@PixnBits Yes, I think your understanding is correct.

I'm not strongly opposed to adding the out-of-spec charset parameter if it's necessary for interoperation, but I'd like to make sure that such a deviation from the spec is justified. In my testing, both Firefox and Chrome send application/json without a charset parameter when uploading files, but require it when receiving JSON to interpret it as one of the unicode charsets. So the current state of affairs is already a bit of a mess. Whether it's preferable to stick to the spec or deviate in a way that compliant implementations should ignore is unclear.

Owner

mikeal commented Aug 25, 2014

My concern with this is that it would break more out-of-spec implementations than it would fix. I'm willing to bet that there are more servers that just == "applications/json" than there are servers that need the utf charset definition.

@mikeal mikeal closed this Aug 27, 2014

ilatypov commented Jul 29, 2017

If the script HTML tag's charset attribute overrides the auto-detection and an extra URL parameter can manipulate a part of the response, then attackers can read the entire UTF-8 response by sending a link to their malicious page containing the script tag fetching the targeted JSON resource <script charset=utf16-le src="https://mail.test/api/inbox?foo=bar=1337;for(i in window) if(window[i] === 1337) alert(i)"> to victim users whose browsers will interpret authenticated JSON data with UTF-16. http://blog.portswigger.net/2016/11/json-hijacking-for-modern-web.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment