The Request.prototype.json function sets Content-Type headers to application/json. utf-8 really should be the default imo, and this lead me to several hours of debugging hell to realize that the API I was working with rejects all but UTF-8 json.
I made these changes:
main.js, Line #803:
this.setHeader('content-type', 'application/json; charset=utf-8')
main.js, Line #807:
Is there anything wrong with this? Any thoughts?
+1 on this.
RFC 4627 does not define a charset parameter because it requires that JSON is always encoded as Unicode (per Section 3). I would urge against this change unless there is a compelling reason to violate the standard.
Section 3 of RFC 4627 defines how the charset is detected using the first 4 bytes of the file, which is why it does not define a charset parameter. Adding a charset parameter is contrary to the standard because the standard defines the set of parameters for the media type and their interpretation and it does not include a charset parameter (Section 6).
Supporting those services by sending a non-conformant media type is certainly an option. What are some services/programs/libraries which demonstrate the problem? Perhaps that will help to make an informed decision.
Message Bus API, for one.
I think you are confusing the scope of the JSON spec. I'm suggesting setting a HTTP 1.1 header, which is 100% valid. This only concerns the transport layer, but you are implying that a specification for body-content supersedes it. Charset spec is here:
All I'm proposing is setting the header, which is 100% standards compliant and only affects the transport. Considering that 99.9% of Node.js is written for UTF-8 applications, it makes sense to set it as default for servers that fail to apply UTF-8 as the defacto default. I'm not trying to start a crusade here.
That document discusses the charset parameter for media types "that are of type text, such as text/html, text/plain, etc.". JSON is of type application, for which the charset parameter is not defined. HTTP 1.1, as defined in RFC 2616 defines the Content-Type as being a media type, as defined in the IANA registry. application/json is so defined and does not include a charset parameter. No HTTP RFC that I am aware of adds a charset parameter to all media types.
Which of the message bus API implementations incorrectly decode JSON as non-Unicode types?
Seems like those Java servers are lazy as "it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets." RFC 4627 sec.3
Further, the application/json entry in the IANA registry has a note:
Note: No "charset" parameter is defined for this registration.
Adding one really has no effect on compliant recipients.
I'm understanding this issue to have been opened to accommodate standards non-compliant servers? And it seems the proposed solution was a further non-compliance? Thus a standard vs. implementation discussion...
@PixnBits Yes, I think your understanding is correct.
I'm not strongly opposed to adding the out-of-spec charset parameter if it's necessary for interoperation, but I'd like to make sure that such a deviation from the spec is justified. In my testing, both Firefox and Chrome send application/json without a charset parameter when uploading files, but require it when receiving JSON to interpret it as one of the unicode charsets. So the current state of affairs is already a bit of a mess. Whether it's preferable to stick to the spec or deviate in a way that compliant implementations should ignore is unclear.
My concern with this is that it would break more out-of-spec implementations than it would fix. I'm willing to bet that there are more servers that just == "applications/json" than there are servers that need the utf charset definition.