You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Why is the Unicode NULL codepoint, when properly escaped, invalid? That is, a string like "Testing\u0000Hello world"
isValidCodePoint returns sourceIllegal for the NULL character (ch == 0U). The headers say:
The code in isValidCodePoint() is derived from the ICU code in
utf.h for the macros U_IS_UNICODE_NONCHAR and U_IS_UNICODE_CHAR.
However, U_IS_UNICODE_CHAR(0) returns true.
In addition, RFC 4627 makes a passing reference to U+0000 as being allowed:
Any character may be escaped. If the character is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence
I can of course use the LooseUnicode option to replace it, but that's not ideal since it is valid JSON as far as I can tell.
The text was updated successfully, but these errors were encountered:
Well, look at that, a whole paragraph for this exact issue. :-)
I do disagree, since I think JSONKit should just do the right thing and let the user deal with the security implications. Most users will be using NSString anyway, which does handle null characters correctly. However, yours is a perfectly valid decision.
I would request a more specific error message for this particular case, at least; e.g. "\u0000 is not allowed for security reasons, use JKParseOptionLooseUnicode"). Whether this is practical is up to you. It would have saved me an hour of research, but this is an obscure case.
BTW, the trigger was that I'm dealing with JSON from ID3 (MP3) tags from a large database of media files. Lots of null characters in there for inexplicable reasons.
Bump. I would like to see this fixed too. I agree that the security issue is mitigated by using the NSString class, and valid Unicode and JSON should be respected.
I'm busy writing my own UTF-8 library, and stumbled into the same issue. Right now I'm leaning towards not supporting U+0000 at all, for the same reasons as JSONKit. I'm curious to know if anyone has any real-world stories of a case where it was essential to support decoding U+0000? Is it possible that the ID3 tags mentioned above by @adamjernst were crafted with malicious intent, or that they were simply the result of buggy software that produced them?
Why is the Unicode NULL codepoint, when properly escaped, invalid? That is, a string like
"Testing\u0000Hello world"
isValidCodePoint
returnssourceIllegal
for the NULL character (ch == 0U
). The headers say:However,
U_IS_UNICODE_CHAR(0)
returns true.In addition, RFC 4627 makes a passing reference to U+0000 as being allowed:
I can of course use the LooseUnicode option to replace it, but that's not ideal since it is valid JSON as far as I can tell.
The text was updated successfully, but these errors were encountered: