Why is Unicode NULL codepoint invalid? #51

adamjernst · 2011-10-12T16:33:14Z

Why is the Unicode NULL codepoint, when properly escaped, invalid? That is, a string like "Testing\u0000Hello world"

isValidCodePoint returns sourceIllegal for the NULL character (ch == 0U). The headers say:

The code in isValidCodePoint() is derived from the ICU code in
utf.h for the macros U_IS_UNICODE_NONCHAR and U_IS_UNICODE_CHAR.

However, U_IS_UNICODE_CHAR(0) returns true.

In addition, RFC 4627 makes a passing reference to U+0000 as being allowed:

Any character may be escaped. If the character is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence

I can of course use the LooseUnicode option to replace it, but that's not ideal since it is valid JSON as far as I can tell.

The text was updated successfully, but these errors were encountered:

johnezang · 2011-10-21T18:54:45Z

Well, I have to admit, you certainly did your homework, more so than most people. :)

Just one question... did you read the README.md? :)

adamjernst · 2011-10-21T19:02:46Z

Well, look at that, a whole paragraph for this exact issue. :-)

I do disagree, since I think JSONKit should just do the right thing and let the user deal with the security implications. Most users will be using NSString anyway, which does handle null characters correctly. However, yours is a perfectly valid decision.

I would request a more specific error message for this particular case, at least; e.g. "\u0000 is not allowed for security reasons, use JKParseOptionLooseUnicode"). Whether this is practical is up to you. It would have saved me an hour of research, but this is an obscure case.

BTW, the trigger was that I'm dealing with JSON from ID3 (MP3) tags from a large database of media files. Lots of null characters in there for inexplicable reasons.

Thanks for the library!

derekjensen · 2012-01-24T02:10:04Z

Bump. I would like to see this fixed too. I agree that the security issue is mitigated by using the NSString class, and valid Unicode and JSON should be respected.

filmaj · 2012-01-24T17:12:43Z

+1

bmharper · 2016-08-23T19:35:38Z

I'm busy writing my own UTF-8 library, and stumbled into the same issue. Right now I'm leaning towards not supporting U+0000 at all, for the same reasons as JSONKit. I'm curious to know if anyone has any real-world stories of a case where it was essential to support decoding U+0000? Is it possible that the ID3 tags mentioned above by @adamjernst were crafted with malicious intent, or that they were simply the result of buggy software that produced them?

ghost assigned johnezang Oct 21, 2011

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is Unicode NULL codepoint invalid? #51

Why is Unicode NULL codepoint invalid? #51

adamjernst commented Oct 12, 2011

johnezang commented Oct 21, 2011

adamjernst commented Oct 21, 2011

derekjensen commented Jan 24, 2012

filmaj commented Jan 24, 2012

bmharper commented Aug 23, 2016

Why is Unicode NULL codepoint invalid? #51

Why is Unicode NULL codepoint invalid? #51

Comments

adamjernst commented Oct 12, 2011

johnezang commented Oct 21, 2011

adamjernst commented Oct 21, 2011

derekjensen commented Jan 24, 2012

filmaj commented Jan 24, 2012

bmharper commented Aug 23, 2016