New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when parsing non-UTF-8 strings #219

Closed
nst opened this Issue Sep 4, 2016 · 9 comments

Comments

Projects
None yet
2 participants
@nst

nst commented Sep 4, 2016

SBJson4Parser will crash when parsing invalid UTF-8 strings such as ["\xFF"].

*** Assertion failure in -[SBJson4Parser parserFound:isValue:], SBJson4Parser.m:150
*** Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: 'Invalid parameter not satisfying: obj'
*** First throw call stack:
(
    0   CoreFoundation                      0x00007fff95f4b4f2 __exceptionPreprocess + 178
    1   libobjc.A.dylib                     0x00007fff9783bf7e objc_exception_throw + 48
    2   CoreFoundation                      0x00007fff95f501ca +[NSException raise:format:arguments:] + 106
    3   Foundation                          0x00007fff9ce86856 -[NSAssertionHandler handleFailureInMethod:object:file:lineNumber:description:] + 198
    4   test_SBJSON                         0x00000001000067e5 -[SBJson4Parser parserFound:isValue:] + 309
    5   test_SBJSON                         0x00000001000073f3 -[SBJson4Parser parserFoundString:] + 67
    6   test_SBJSON                         0x0000000100004289 -[SBJson4StreamParser parse:] + 2377
    7   test_SBJSON                         0x0000000100007989 -[SBJson4Parser parse:] + 73
    8   test_SBJSON                         0x0000000100005d0d main + 221
    9   libdyld.dylib                       0x00007fff929ea5ad start + 1
)
libc++abi.dylib: terminating with uncaught exception of type NSException
@stig

This comment has been minimized.

Show comment
Hide comment
@stig

stig Sep 4, 2016

Owner

Patch including failing test case (with or without fix!) very welcome! :-)

It's been years since I did any Objective-C in anger now, so I'm a bit rusty.

Nicolas Seriot notifications@github.com writes:

SBJson4Parser will crash when parsing invalid UTF-8 strings such as ["\xFF"].


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

Owner

stig commented Sep 4, 2016

Patch including failing test case (with or without fix!) very welcome! :-)

It's been years since I did any Objective-C in anger now, so I'm a bit rusty.

Nicolas Seriot notifications@github.com writes:

SBJson4Parser will crash when parsing invalid UTF-8 strings such as ["\xFF"].


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

@nst nst changed the title from Crash on malformed inputs to Crash when parsing non-UTF-8 strings Sep 6, 2016

@stig

This comment has been minimized.

Show comment
Hide comment
@stig

stig Oct 26, 2016

Owner

Hi @nst, thanks for the report. How did you get this crash? Adding your example to my test suite detects the broken JSON and reports the byte index in the input where it fails, c.f. #220

Owner

stig commented Oct 26, 2016

Hi @nst, thanks for the report. How did you get this crash? Adding your example to my test suite detects the broken JSON and reports the byte index in the input where it fails, c.f. #220

@stig

This comment has been minimized.

Show comment
Hide comment
@stig

stig Oct 26, 2016

Owner

BTW, I admit I was prompted into looking into this more closely by your Parsing JSON is a Minefield post. Thanks for the motivation :-)

Owner

stig commented Oct 26, 2016

BTW, I admit I was prompted into looking into this more closely by your Parsing JSON is a Minefield post. Thanks for the motivation :-)

@nst

This comment has been minimized.

Show comment
Hide comment
@nst

nst Oct 26, 2016

In fact, the file has to include the 0xFF raw byte, and not -x-F-F characters as my report may suggest. I just reproduced the crash by recreating the offending file like this:

$ echo -en "[\"\xFF\"]" > x.json

Thank you for reading the article and maintaining SBJSON :-)

nst commented Oct 26, 2016

In fact, the file has to include the 0xFF raw byte, and not -x-F-F characters as my report may suggest. I just reproduced the crash by recreating the offending file like this:

$ echo -en "[\"\xFF\"]" > x.json

Thank you for reading the article and maintaining SBJSON :-)

@stig

This comment has been minimized.

Show comment
Hide comment
@stig

stig Oct 26, 2016

Owner

"Maintaining". Heh. I've done little more than merging legit-seeming PRs in the last few years. Last time I opened Xcode was probably Xcode 5 or 6... I'll let you in on a secret: I've never even used this library myself, outside its tests. (Although I worked at a company where they used it. I wrote Scala there.) I wish people just used NSJSONSerialisation tbh. Although as you have pointed out, that has issues too.

Owner

stig commented Oct 26, 2016

"Maintaining". Heh. I've done little more than merging legit-seeming PRs in the last few years. Last time I opened Xcode was probably Xcode 5 or 6... I'll let you in on a secret: I've never even used this library myself, outside its tests. (Although I worked at a company where they used it. I wrote Scala there.) I wish people just used NSJSONSerialisation tbh. Although as you have pointed out, that has issues too.

@stig stig added Bug and removed Unable to Reproduce labels Oct 26, 2016

@stig

This comment has been minimized.

Show comment
Hide comment
@stig

stig Oct 26, 2016

Owner

Alright, that looks like a bug then. Thanks for the clarification! I did indeed misunderstand your initial report.

Owner

stig commented Oct 26, 2016

Alright, that looks like a bug then. Thanks for the clarification! I did indeed misunderstand your initial report.

@stig

This comment has been minimized.

Show comment
Hide comment
@stig

stig Oct 30, 2016

Owner

Just a quick note that I've found the reason for this crash, and am planning to fix it.

Owner

stig commented Oct 30, 2016

Just a quick note that I've found the reason for this crash, and am planning to fix it.

@nst

This comment has been minimized.

Show comment
Hide comment
@nst

nst Oct 30, 2016

Great, so now SBJSON is actively maintained :)

nst commented Oct 30, 2016

Great, so now SBJSON is actively maintained :)

stig added a commit that referenced this issue Oct 30, 2016

Fix crash when encountering illegal UTF-8 byte - fixes #219
All UTF-8 bytes must contain at least 1 cleared bit, so 0xFF is just
flat out illegal. That follows from the description here:
https://en.wikipedia.org/wiki/UTF-8#Description

This fix is a special case, however; we don't currently validate leading
byte / continuation byte sequences either. That will be the subject of
another commit.

stig added a commit that referenced this issue Oct 30, 2016

Fix crash when encountering illegal UTF-8 byte - fixes #219
I initially (and perhaps foolishly) thought I could rely on NSString's
initialiser to decode the UTF-8 string, so I focused on just expanding
the special escape sequences that JSON allows and marking the end of the
JSON string. That does not suffice, apparently.

This change throws an error for any of the flat-out illegal UTF-8 bytes
noted in https://en.wikipedia.org/wiki/UTF-8#Codepage_layout -- it does
however not validate leading byte / continuation byte sequences or
overlong encodings.
@stig

This comment has been minimized.

Show comment
Hide comment
@stig

stig Nov 2, 2016

Owner

PR to address this bug is here: #220

I think I may have just slain this beast of a bug!

Owner

stig commented Nov 2, 2016

PR to address this bug is here: #220

I think I may have just slain this beast of a bug!

stig added a commit that referenced this issue Nov 3, 2016

Reject invalid UTF-8 byte sequences & unicode code points
Gosh! That was a merry chase. There's a bit of duplication here, but in the spirit of "first make it work..." I'm happy with it. It addresses:

- flat-out illegal byte values
- missing continuation bytes
- unexpected continuation bytes
- overlong encodings
- invalid code points

c.f. https://en.wikipedia.org/wiki/UTF-8

I wish to thank Nicolas Seriot for reporting #219 which led to this learning opportunity. After 9 years of calling itself a "strict" JSON parser SBJson finally does UTF-8 validation 😂

@stig stig closed this Nov 3, 2016

stig added a commit to stig/JSONTestSuite that referenced this issue Nov 3, 2016

Update to SBJson 4.0.4 which fixes the crashes on invalid UTF-8
Thank you so much for your bug report[1]. I learnt so much about UTF-8
while figuring out how to fix these bugs!

[1]: stig/json-framework#219
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment