Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is Unicode NULL codepoint invalid? #51

Open
adamjernst opened this issue Oct 12, 2011 · 5 comments
Open

Why is Unicode NULL codepoint invalid? #51

adamjernst opened this issue Oct 12, 2011 · 5 comments

Comments

@adamjernst
Copy link

Why is the Unicode NULL codepoint, when properly escaped, invalid? That is, a string like "Testing\u0000Hello world"

isValidCodePoint returns sourceIllegal for the NULL character (ch == 0U). The headers say:

The code in isValidCodePoint() is derived from the ICU code in
utf.h for the macros U_IS_UNICODE_NONCHAR and U_IS_UNICODE_CHAR.

However, U_IS_UNICODE_CHAR(0) returns true.

In addition, RFC 4627 makes a passing reference to U+0000 as being allowed:

Any character may be escaped. If the character is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence

I can of course use the LooseUnicode option to replace it, but that's not ideal since it is valid JSON as far as I can tell.

@johnezang
Copy link
Owner

Well, I have to admit, you certainly did your homework, more so than most people. :)

Just one question... did you read the README.md? :)

@ghost ghost assigned johnezang Oct 21, 2011
@adamjernst
Copy link
Author

Well, look at that, a whole paragraph for this exact issue. :-)

I do disagree, since I think JSONKit should just do the right thing and let the user deal with the security implications. Most users will be using NSString anyway, which does handle null characters correctly. However, yours is a perfectly valid decision.

I would request a more specific error message for this particular case, at least; e.g. "\u0000 is not allowed for security reasons, use JKParseOptionLooseUnicode"). Whether this is practical is up to you. It would have saved me an hour of research, but this is an obscure case.

BTW, the trigger was that I'm dealing with JSON from ID3 (MP3) tags from a large database of media files. Lots of null characters in there for inexplicable reasons.

Thanks for the library!

@derekjensen
Copy link

Bump. I would like to see this fixed too. I agree that the security issue is mitigated by using the NSString class, and valid Unicode and JSON should be respected.

@filmaj
Copy link

filmaj commented Jan 24, 2012

+1

@bmharper
Copy link

I'm busy writing my own UTF-8 library, and stumbled into the same issue. Right now I'm leaning towards not supporting U+0000 at all, for the same reasons as JSONKit. I'm curious to know if anyone has any real-world stories of a case where it was essential to support decoding U+0000? Is it possible that the ID3 tags mentioned above by @adamjernst were crafted with malicious intent, or that they were simply the result of buggy software that produced them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants