Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add "charset=utf-8" to content-type "application/json" #383

Closed
thesmart opened this Issue Nov 30, 2012 · 15 comments

Comments

Projects
None yet
8 participants
@thesmart
Copy link

thesmart commented Nov 30, 2012

The Request.prototype.json function sets Content-Type headers to application/json. utf-8 really should be the default imo, and this lead me to several hours of debugging hell to realize that the API I was working with rejects all but UTF-8 json.

I made these changes:

main.js, Line #803:

this.setHeader('content-type', 'application/json; charset=utf-8')

main.js, Line #807:

this.setHeader('content-type', 'application/json; charset=utf-8')

Is there anything wrong with this? Any thoughts?

@efbeka

This comment has been minimized.

Copy link

efbeka commented Jan 13, 2013

👍 +1 on this.

@kevinoid

This comment has been minimized.

Copy link
Contributor

kevinoid commented May 22, 2013

RFC 4627 does not define a charset parameter because it requires that JSON is always encoded as Unicode (per Section 3). I would urge against this change unless there is a compelling reason to violate the standard.

@thesmart

This comment has been minimized.

Copy link
Author

thesmart commented May 25, 2013

Not sure I understand in what way this violates any standard.

JSON could be any unicode variety (e.g. UTF-16, UTF-32) or endianness (e.g.
little or big). This bug is just requesting that the library explicitly
declare UTF-8 rather than to be ambiguous and assuming that the server will
follow the specified default.

On Tue, May 21, 2013 at 10:49 PM, Kevin Locke notifications@github.comwrote:

RFC 4627 http://tools.ietf.org/html/rfc4627 does not define a charset
parameter because it requires that JSON is always encoded as Unicode (per
Section 3). I would urge against this change unless there is a compelling
reason to violate the standard.


Reply to this email directly or view it on GitHubhttps://github.com//issues/383#issuecomment-18258693
.

@kevinoid

This comment has been minimized.

Copy link
Contributor

kevinoid commented May 25, 2013

Section 3 of RFC 4627 defines how the charset is detected using the first 4 bytes of the file, which is why it does not define a charset parameter. Adding a charset parameter is contrary to the standard because the standard defines the set of parameters for the media type and their interpretation and it does not include a charset parameter (Section 6).

@thesmart

This comment has been minimized.

Copy link
Author

thesmart commented May 26, 2013

And yet Java servers everywhere require the charset for acceptance...

On Sat, May 25, 2013 at 4:31 PM, Kevin Locke notifications@github.comwrote:

Section 3 of RFC 4627 http://tools.ietf.org/html/rfc4627 defines how
the charset is detected using the first 4 bytes of the file, which is why
it does not define a charset parameter. Adding a charset parameter is
contrary to the standard because the standard defines the set of parameters
for the media type and their interpretation and it does not include a
charset parameter (Section 6).


Reply to this email directly or view it on GitHubhttps://github.com//issues/383#issuecomment-18455635
.

@kevinoid

This comment has been minimized.

Copy link
Contributor

kevinoid commented May 26, 2013

Supporting those services by sending a non-conformant media type is certainly an option. What are some services/programs/libraries which demonstrate the problem? Perhaps that will help to make an informed decision.

@thesmart

This comment has been minimized.

Copy link
Author

thesmart commented May 26, 2013

Message Bus API, for one.

I think you are confusing the scope of the JSON spec. I'm suggesting setting a HTTP 1.1 header, which is 100% valid. This only concerns the transport layer, but you are implying that a specification for body-content supersedes it. Charset spec is here:
http://www.w3.org/International/O-HTTP-charset

All I'm proposing is setting the header, which is 100% standards compliant and only affects the transport. Considering that 99.9% of Node.js is written for UTF-8 applications, it makes sense to set it as default for servers that fail to apply UTF-8 as the defacto default. I'm not trying to start a crusade here.

@kevinoid

This comment has been minimized.

Copy link
Contributor

kevinoid commented May 26, 2013

That document discusses the charset parameter for media types "that are of type text, such as text/html, text/plain, etc.". JSON is of type application, for which the charset parameter is not defined. HTTP 1.1, as defined in RFC 2616 defines the Content-Type as being a media type, as defined in the IANA registry. application/json is so defined and does not include a charset parameter. No HTTP RFC that I am aware of adds a charset parameter to all media types.

Which of the message bus API implementations incorrectly decode JSON as non-Unicode types?

@thesmart

This comment has been minimized.

Copy link
Author

thesmart commented May 26, 2013

It's ok, Tim Berners-Lee couldn't affect this thread.

Vote however you want.

Smart

On May 25, 2013, at 6:52 PM, Kevin Locke notifications@github.com wrote:

That document discusses the charset parameter for media types "that are of type text, such as text/html, text/plain, etc.". JSON is of type application, for which the charset parameter is not defined. HTTP 1.1, as defined in RFC 2616 defines the Content-Type as being a media type, as defined in the IANA registry. application/json is so defined and does not include a charset parameter. No HTTP RFC that I am aware of adds a charset parameter to all media types.

Which of the message bus API implementations incorrectly decode JSON as non-Unicode types?


Reply to this email directly or view it on GitHub.

@PixnBits

This comment has been minimized.

Copy link

PixnBits commented Aug 25, 2014

Seems like those Java servers are lazy as "it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets." RFC 4627 sec.3

Further, the application/json entry in the IANA registry has a note:

Note:  No "charset" parameter is defined for this registration.
Adding one really has no effect on compliant recipients.

I'm understanding this issue to have been opened to accommodate standards non-compliant servers? And it seems the proposed solution was a further non-compliance? Thus a standard vs. implementation discussion...

@kevinoid

This comment has been minimized.

Copy link
Contributor

kevinoid commented Aug 25, 2014

@PixnBits Yes, I think your understanding is correct.

I'm not strongly opposed to adding the out-of-spec charset parameter if it's necessary for interoperation, but I'd like to make sure that such a deviation from the spec is justified. In my testing, both Firefox and Chrome send application/json without a charset parameter when uploading files, but require it when receiving JSON to interpret it as one of the unicode charsets. So the current state of affairs is already a bit of a mess. Whether it's preferable to stick to the spec or deviate in a way that compliant implementations should ignore is unclear.

@mikeal

This comment has been minimized.

Copy link
Member

mikeal commented Aug 25, 2014

My concern with this is that it would break more out-of-spec implementations than it would fix. I'm willing to bet that there are more servers that just == "applications/json" than there are servers that need the utf charset definition.

@xgqfrms-GitHub

This comment has been minimized.

Copy link

xgqfrms-GitHub commented Jun 7, 2017

@ilatypov

This comment has been minimized.

Copy link

ilatypov commented Jul 29, 2017

If the script HTML tag's charset attribute overrides the auto-detection and an extra URL parameter can manipulate a part of the response, then attackers can read the entire UTF-8 response by sending a link to their malicious page containing the script tag fetching the targeted JSON resource <script charset=utf16-le src="https://mail.test/api/inbox?foo=bar=1337;for(i in window) if(window[i] === 1337) alert(i)"> to victim users whose browsers will interpret authenticated JSON data with UTF-16. http://blog.portswigger.net/2016/11/json-hijacking-for-modern-web.html

@sumit1317

This comment has been minimized.

Copy link

sumit1317 commented Dec 5, 2017

Hi,
I am using the following nodejs code generated via postman for one of my APIs, which is giving error as shown below. The api call is working fine from postman, but not from nodejs.
There is an attachment of xml file in the request call.
Any help will be highly appreciated.

Nodejs Code
var fs = require("fs");
var request = require("request");
var options = { method: 'POST',
url: 'https://uat.nlis.mla.com.au/soap/upload.aspx',
headers:
{ 'postman-token': 'cda8f580-50cc-0d76-08f3-61670fcf0b4b',
'cache-control': 'no-cache',
'content-type': 'multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW' },
formData:
{ '':
{ value: 'fs.createReadStream("nlisMovement.xml")',
options: { filename: 'nlisMovement.xml', contentType: null } } } };
request(options, function (error, response, body) {
if (error) throw new Error(error);
console.log(body);
});

Error
"<SOAP-ENV:Envelope xmlns:SOAP-ENV='http://schemas.xmlsoap.org/soap/envelope/'>SOAP-ENV:BodySOAP-ENV:FaultSOAP-ENV:ClientContent-Type should be set to text/xml</SOAP-ENV:Fault></SOAP-ENV:Body></SOAP-ENV:Envelope>"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.