Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Does not correctly parse surrogate pairs #42

Closed
johnezang opened this Issue · 2 comments

2 participants

@johnezang

The following is not parsed correctly:

{ "MATHEMATICAL ITALIC CAPITAL ALPHA": "\uD835\uDEE2" }

Expected result:

{ "MATHEMATICAL ITALIC CAPITAL ALPHA": "𝛢" }

(note: github seems to have problems dealing with unicode characters > U+10000. This is why it looks funky, I did my best with what I could.)

Using the following code:

NSString *json = [NSString stringWithUTF8String:"{ \"MATHEMATICAL ITALIC CAPITAL ALPHA\": \"\\uD835\\uDEE2\" }"];
id obj = [json JSONValue];
NSLog(@"stringWithObject: %@", [writer stringWithObject:obj]);

... produces the following:

stringWithObject: {"MATHEMATICAL ITALIC CAPITAL ALPHA":"훢"}

Also, the code in parseUnicodeEscape and decodeHexQuad "may" (on a zero order approximation) have corner cases that "read past the end of the array", in particular when dealing with surrogate pairs. The code that calls parseUnicodeEscape seems to have an explicit length variable, while the unicode parsing code does not, instead relying on \0 termination. It's not clear to me if this assumption is guaranteed to be valid, looks very suspicious to me.

@stig
Owner

There is a hack in -appendBytes: that appends a \0 to make sure the hecodeHexQuad worsk. Let me stress again that it's a hack. One of these days I want to make the code completely length-based.

@stig
Owner

Thanks. Having looked into this the decoding of the code point seems to work, but my conversion from the code point to the string was not. I'll try to fix this.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.