Non-linear parsing performance #64

tsachiherman · 2016-12-07T16:58:58Z

When using the following code:
WJReader r = WJROpenMemDocument(buffer, NULL, 0);
handle = WJEOpenDocument(r,NULL,NULL,NULL);

I've noted that the performance degrade as a function of n.

Unfortunately it's degrading in a non-linear way:
json parsing of 100000 items took : 0.204 per-element time : 0.000002
json parsing of 200000 items took : 0.697 per-element time : 0.000003
json parsing of 300000 items took : 1.465 per-element time : 0.000005
json parsing of 400000 items took : 2.639 per-element time : 0.000007
json parsing of 500000 items took : 4.190 per-element time : 0.000008
json parsing of 600000 items took : 6.559 per-element time : 0.000011
json parsing of 700000 items took : 9.934 per-element time : 0.000014
json parsing of 800000 items took : 14.148 per-element time : 0.000018
json parsing of 900000 items took : 18.535 per-element time : 0.000021

Any suggestions would be appreciated.

minego · 2016-12-07T17:12:58Z

Can you give some examples of what the structure of the documents you're testing with looks like? You don't need to attach a full document, but something with 5 or 10 items so we get the idea would be helpful.

There are a few possible culprits. Normally when adding an object to a WJElement the "last" pointer is used so we don't have to walk through the children, but perhaps there is a case left over that doesn't do that.

tsachiherman · 2016-12-07T18:07:35Z

Of course. When conducting the performance testing above, I was using the following function to generate the consumed json string:

char *
generate_json_string(int count) {
    // allocate buffer.
    char *out_buffer = malloc(count * 50);
    *out_buffer = '\0';
    strcpy(out_buffer, "{");
    char item[] = "{\"a\" : 1, \"b\": \"abc\"}";
    int item_length = strlen(item);
    char *walk = out_buffer + 1;
    for (int j = 0; j < count; j++) {
        if (j > 0) {
            strcat(walk, ",");
            walk++;
        }
        strcat(walk, item);
        walk+= item_length;
    }
    strcat(walk, "}");
    return out_buffer;
}

tsachiherman · 2016-12-07T20:11:00Z

I believe that I've found the culprit; It's the WJRMemCallback implementation that makes it into O(n^2) instead of just O(n).
In particular, it's the

len = strlen(json);

that could, should - but wasn't per-calculated, and cached.

To resolve the performance issue externally, one can define his own callback and pass a custom data structure as the user data pointer. In there, the two fields would be the json string as well as the length of the json string.

minego · 2016-12-08T15:38:56Z

Wow, I wouldn't have expected that to be the problem. Unfortunately it would be a bit tricky to change that callback without affecting existing code since it doesn't really have a place to store the length.

But, I think I can improve it by using a memchr instead of a strlen there. That callback doesn't actually need the whole length. If the amount left is less than the requested amount then it needs to know the amount left. That's all.

I'll play with it and see if I can make a simple change that will help in this case.

minego · 2016-12-08T15:57:09Z

Okay, I threw in a quick test so that I can try this with "make test". It runs "BigDoc" with 100000 and "RealBigDoc" with 10 times that.

With the strlen:
16/17 Test #16: WJElement:BigDoc ................. Passed 0.77 sec
17/17 Test #17: WJElement:RealBigDoc ............. Passed 15.23 sec

With the memchr:
16/17 Test #16: WJElement:BigDoc ................. Passed 0.73 sec
17/17 Test #17: WJElement:RealBigDoc ............. Passed 3.72 sec

Seems like a pretty good improvement to me.

minego · 2016-12-08T16:04:04Z

I've pushed this change. I'm pretty happy with the results.

Please run your tests and let me know how they behave. I ran again (on a faster machine) and I am consistently getting 0.09 for BigDoc and 0.86 for RealBigDoc.

tsachiherman · 2016-12-08T17:12:05Z

Thanks for the quick response; I wasn't able to see the change in the github web interface.
Is it delayed, or needed to be merged first ?

minego · 2016-12-08T17:15:25Z

Sorry, I am an idiot and pushed to the wrong branch. It has been pushed to the master branch now.

…

On Thu, Dec 08, 2016 at 09:12:06AM -0800, Tsachi Herman wrote: Thanks for the quick response; I wasn't able to see the change in the github web interface. Is it delayed, or needed to be merged first ? — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or mute the thread.*

minego closed this as completed Dec 8, 2016

tsachiherman mentioned this issue Mar 13, 2017

Large json item seems to confuse the parser #65

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-linear parsing performance #64

Non-linear parsing performance #64

tsachiherman commented Dec 7, 2016

minego commented Dec 7, 2016

tsachiherman commented Dec 7, 2016

tsachiherman commented Dec 7, 2016

minego commented Dec 8, 2016

minego commented Dec 8, 2016

minego commented Dec 8, 2016

tsachiherman commented Dec 8, 2016

minego commented Dec 8, 2016 via email

Non-linear parsing performance #64

Non-linear parsing performance #64

Comments

tsachiherman commented Dec 7, 2016

minego commented Dec 7, 2016

tsachiherman commented Dec 7, 2016

tsachiherman commented Dec 7, 2016

minego commented Dec 8, 2016

minego commented Dec 8, 2016

minego commented Dec 8, 2016

tsachiherman commented Dec 8, 2016

minego commented Dec 8, 2016 via email