Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hang/Crash with large strings #477

Closed
terryopie opened this issue Feb 21, 2019 · 12 comments
Closed

Hang/Crash with large strings #477

terryopie opened this issue Feb 21, 2019 · 12 comments

Comments

@terryopie
Copy link

I am using v0.12.1. I am seeing that very large strings are causing the tokener code to hang. I'm not exactly sure what the size cutoff is that is causing the issue because I don't control the code sending the messages. I have seen packets 300-500 bytes/characters long. The string that I"m working on now though is 1298 bytes/characters.

Is there a max length that is supported by json_c?

Here is a snippet of my code and where things hang up:

    json_tokener* tok;
    tok = json_tokener_new();
    jsonObj = json_tokener_parse_ex(tok, i_strToParse, i_length);

Initially I had this, which worked for the smaller strings.
jsonObj = json_tokener_parse( i_strToParse );

I even tried calling json_tokener_new_ex() and passing a specific depth thinking that possibly it was allocating too much space. That made no difference.

I have to be doing something wrong, but I'm not sure what. The above code works for all of the other strings that I am trying to parse. Any help/guidance is appreciated.

@ploxiln
Copy link
Contributor

ploxiln commented Feb 21, 2019

1300 bytes is not nearly large enough for this to be a problem with json-c handling of "large" strings. Things might get noticeably slow after a few MB ... might have some kind of bug after maybe 2 GB ... but at 1300 bytes your problem is something else.

@ploxiln
Copy link
Contributor

ploxiln commented Feb 21, 2019

If you can provide a minimal compile-able example that demonstrates the problem, that would help a lot.

@terryopie
Copy link
Author

Sorry it took so long to reply, I didn't get back to this as soon as I should have.

After I asked the question, I also noticed that the signature string had newline characters embedded within it. The string came from a JAVA base 64 encoded string. We have since removed those characters to get something similar to what is in the snippet below. Failure is the same. Signature re-written for security.

Code is below:

      json_object* jsonObj = NULL;
      const char * str = "{\"signature\":\"abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789xxxxxxxxx\",\"version\":\"test version\",\"messageType\":\"VERSIONCHECK_AND_SIGNATUREVALIDATION\"}";
      jsonObj = json_tokener_parse( str );
      printf( json_object_to_json_string( jsonObj ) );

During the parse, it is consistently failing/hanging on the ending qutation mark of the very long signature string.

Attached are screenshots from my debugger.
image
image
image

@ploxiln
Copy link
Contributor

ploxiln commented Feb 22, 2019

hmm ... it's a realloc() from 500 to 1000 bytes ... I wonder if there's something weird about realloc() on windows - json-c does have some optional wrapper for it:

/* Define to rpl_realloc if the replacement function should be used. */

void* rpl_realloc(void* p, size_t n)

@terryopie
Copy link
Author

Secondary question, and what ended up being the root of the original crash/hang I was seeing was what I mentioned in my last comment. The newline character. I see that there is special code within the tokener that exits out (shown below) where it exits once it hits an escape character. Why is that? There are escape characters which are valid for JSON strings, correct?

            case json_tokener_state_string:
                {
                    /* Advance until we change state */
                    const char *case_start = str;
                    while(1) {
                        if(c == tok->quote_char) {
                            printbuf_memappend_fast(tok->pb, case_start, str-case_start);
                            current = json_object_new_string_len(tok->pb->buf, tok->pb->bpos);
                            saved_state = json_tokener_state_finish;
                            state = json_tokener_state_eatws;
                            break;
                        } else if(c == '\\') {  // <<<-----HERE
                            printbuf_memappend_fast(tok->pb, case_start, str-case_start);
                            saved_state = json_tokener_state_string;
                            state = json_tokener_state_string_escape;
                            break;
                        }
                        if (!ADVANCE_CHAR(str, tok) || !PEEK_CHAR(c, tok)) {
                            printbuf_memappend_fast(tok->pb, case_start, str-case_start);
                            goto out;
                        }
                    }
                }
                break;

@terryopie
Copy link
Author

terryopie commented Feb 22, 2019

hmm ... it's a realloc() from 500 to 1000 bytes ... I wonder if there's something weird about realloc() on windows - json-c does have some optional wrapper for it:

json-c/config.h.win32

Line 201 in 3df1f98
/* Define to rpl_realloc if the replacement function should be used. */

json-c/json_util.c

Line 211 in 3df1f98
void* rpl_realloc(void* p, size_t n)

Sorry, I guess I forgot to mention that this is running on a NXP K70 running MQX RTOS. I'll look at whether the optional wrapper has any better luck.

Maybe a secondary question to this... It would be grossly inefficient, but is there a way to set a larger default size for the buffer? ie, from 500 to 1000?

@ploxiln
Copy link
Contributor

ploxiln commented Feb 22, 2019

I see that there is special code within the tokener that exits out (shown below) where it exits once it hits an escape character. Why is that?

It switches from state json_tokener_state_string to state json_tokener_state_string_escape and later it switches back.

this is running on a NXP K70 running MQX RTOS

That realloc wrapper should not make any difference for this case, it's for handling null and zero in case libc realloc() is not fully posix. It's probably something in that RTOS's libc realloc() not liking these "big" re-allocations.

@terryopie
Copy link
Author

That realloc wrapper should not make any difference for this case, it's for handling null and zero in case libc realloc() is not fully posix. It's probably something in that RTOS's libc realloc() not liking these "big" re-allocations.

If that is the case, is there a way for me to just set the allocations larger by default so there isn't a need to have it reallocate?

@ploxiln
Copy link
Contributor

ploxiln commented Feb 22, 2019

Seems like the K70 has about 128 KiB SRAM, so it makes sense that heap management is very constrained. If you set allocations larger by default, they'll apply to a bunch of smaller things, and you may run out of memory, or into a memory fragmentation issue (lack of big-enough free chunks in the heap), sooner. You may need to switch to a stream-oriented json library that does no allocations, just a sequence of callbacks with offsets.

@terryopie
Copy link
Author

Seems like the K70 has about 128 KiB SRAM, so it makes sense that heap management is very constrained. If you set allocations larger by default, they'll apply to a bunch of smaller things, and you may run out of memory, or into a memory fragmentation issue (lack of big-enough free chunks in the heap), sooner. You may need to switch to a stream-oriented json library that does no allocations, just a sequence of callbacks with offsets.

We have and are mapped to an external SRAM with a size of 1MB, I think, possibly 2MB. I'll have to do some reading to see why MQX is having an issue with realloc like that.

I did bump up the size for the printbuf allocation from initial 500 to 1000. That solved the hang issue and I was able to proceed.

Are there downsides to do that? Besides "wasting" space that won't be used for a majority of the transfers?

@hawicz
Copy link
Member

hawicz commented Feb 25, 2019

The big downside here is that you're just masking the problem, without truly fixing anything.
You say that it "hangs", do you mean that the realloc call never returns? That smells like either a bug in the malloc library (unlikely) or something prior to the realloc call had stomped on the malloc library's data structures.
Are those 4 lines of code you posted earlier actually a test case that reproduces the problem? If yes, you should be seeing a realloc from 32 bytes (the default printbuf allocation) to 630 (length of your sample signature plus 8 bytes), not 500 to 1000.

@hawicz
Copy link
Member

hawicz commented Mar 11, 2019

Closing due to lack of response.

@hawicz hawicz closed this as completed Mar 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants