Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

add "charset=utf-8" to content-type "application/json" #383

Closed
thesmart opened this Issue · 12 comments

5 participants

@thesmart

The Request.prototype.json function sets Content-Type headers to application/json. utf-8 really should be the default imo, and this lead me to several hours of debugging hell to realize that the API I was working with rejects all but UTF-8 json.

I made these changes:

main.js, Line #803:

this.setHeader('content-type', 'application/json; charset=utf-8')

main.js, Line #807:

this.setHeader('content-type', 'application/json; charset=utf-8')

Is there anything wrong with this? Any thoughts?

@efbeka

:+1: +1 on this.

@kevinoid
Collaborator

RFC 4627 does not define a charset parameter because it requires that JSON is always encoded as Unicode (per Section 3). I would urge against this change unless there is a compelling reason to violate the standard.

@thesmart
@kevinoid
Collaborator

Section 3 of RFC 4627 defines how the charset is detected using the first 4 bytes of the file, which is why it does not define a charset parameter. Adding a charset parameter is contrary to the standard because the standard defines the set of parameters for the media type and their interpretation and it does not include a charset parameter (Section 6).

@thesmart
@kevinoid
Collaborator

Supporting those services by sending a non-conformant media type is certainly an option. What are some services/programs/libraries which demonstrate the problem? Perhaps that will help to make an informed decision.

@thesmart

Message Bus API, for one.

I think you are confusing the scope of the JSON spec. I'm suggesting setting a HTTP 1.1 header, which is 100% valid. This only concerns the transport layer, but you are implying that a specification for body-content supersedes it. Charset spec is here:
http://www.w3.org/International/O-HTTP-charset

All I'm proposing is setting the header, which is 100% standards compliant and only affects the transport. Considering that 99.9% of Node.js is written for UTF-8 applications, it makes sense to set it as default for servers that fail to apply UTF-8 as the defacto default. I'm not trying to start a crusade here.

@kevinoid
Collaborator

That document discusses the charset parameter for media types "that are of type text, such as text/html, text/plain, etc.". JSON is of type application, for which the charset parameter is not defined. HTTP 1.1, as defined in RFC 2616 defines the Content-Type as being a media type, as defined in the IANA registry. application/json is so defined and does not include a charset parameter. No HTTP RFC that I am aware of adds a charset parameter to all media types.

Which of the message bus API implementations incorrectly decode JSON as non-Unicode types?

@thesmart
@evantahler evantahler referenced this issue in evantahler/actionhero
Merged

add charset 'utf-8' to 'Content-Type' header #310

@PixnBits

Seems like those Java servers are lazy as "it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets." RFC 4627 sec.3

Further, the application/json entry in the IANA registry has a note:

Note:  No "charset" parameter is defined for this registration.
Adding one really has no effect on compliant recipients.

I'm understanding this issue to have been opened to accommodate standards non-compliant servers? And it seems the proposed solution was a further non-compliance? Thus a standard vs. implementation discussion...

@kevinoid
Collaborator

@PixnBits Yes, I think your understanding is correct.

I'm not strongly opposed to adding the out-of-spec charset parameter if it's necessary for interoperation, but I'd like to make sure that such a deviation from the spec is justified. In my testing, both Firefox and Chrome send application/json without a charset parameter when uploading files, but require it when receiving JSON to interpret it as one of the unicode charsets. So the current state of affairs is already a bit of a mess. Whether it's preferable to stick to the spec or deviate in a way that compliant implementations should ignore is unclear.

@mikeal
Owner

My concern with this is that it would break more out-of-spec implementations than it would fix. I'm willing to bet that there are more servers that just == "applications/json" than there are servers that need the utf charset definition.

@mikeal mikeal closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.