Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-linear parsing performance #64

Closed
tsachiherman opened this issue Dec 7, 2016 · 8 comments
Closed

Non-linear parsing performance #64

tsachiherman opened this issue Dec 7, 2016 · 8 comments

Comments

@tsachiherman
Copy link

When using the following code:
WJReader r = WJROpenMemDocument(buffer, NULL, 0);
handle = WJEOpenDocument(r,NULL,NULL,NULL);

I've noted that the performance degrade as a function of n.

Unfortunately it's degrading in a non-linear way:
json parsing of 100000 items took : 0.204 per-element time : 0.000002
json parsing of 200000 items took : 0.697 per-element time : 0.000003
json parsing of 300000 items took : 1.465 per-element time : 0.000005
json parsing of 400000 items took : 2.639 per-element time : 0.000007
json parsing of 500000 items took : 4.190 per-element time : 0.000008
json parsing of 600000 items took : 6.559 per-element time : 0.000011
json parsing of 700000 items took : 9.934 per-element time : 0.000014
json parsing of 800000 items took : 14.148 per-element time : 0.000018
json parsing of 900000 items took : 18.535 per-element time : 0.000021

Any suggestions would be appreciated.

@minego
Copy link
Member

minego commented Dec 7, 2016

Can you give some examples of what the structure of the documents you're testing with looks like? You don't need to attach a full document, but something with 5 or 10 items so we get the idea would be helpful.

There are a few possible culprits. Normally when adding an object to a WJElement the "last" pointer is used so we don't have to walk through the children, but perhaps there is a case left over that doesn't do that.

@tsachiherman
Copy link
Author

Of course. When conducting the performance testing above, I was using the following function to generate the consumed json string:

char *
generate_json_string(int count) {
    // allocate buffer.
    char *out_buffer = malloc(count * 50);
    *out_buffer = '\0';
    strcpy(out_buffer, "{");
    char item[] = "{\"a\" : 1, \"b\": \"abc\"}";
    int item_length = strlen(item);
    char *walk = out_buffer + 1;
    for (int j = 0; j < count; j++) {
        if (j > 0) {
            strcat(walk, ",");
            walk++;
        }
        strcat(walk, item);
        walk+= item_length;
    }
    strcat(walk, "}");
    return out_buffer;
}

@tsachiherman
Copy link
Author

I believe that I've found the culprit; It's the WJRMemCallback implementation that makes it into O(n^2) instead of just O(n).
In particular, it's the

len = strlen(json);

that could, should - but wasn't per-calculated, and cached.

To resolve the performance issue externally, one can define his own callback and pass a custom data structure as the user data pointer. In there, the two fields would be the json string as well as the length of the json string.

@minego
Copy link
Member

minego commented Dec 8, 2016

Wow, I wouldn't have expected that to be the problem. Unfortunately it would be a bit tricky to change that callback without affecting existing code since it doesn't really have a place to store the length.

But, I think I can improve it by using a memchr instead of a strlen there. That callback doesn't actually need the whole length. If the amount left is less than the requested amount then it needs to know the amount left. That's all.

I'll play with it and see if I can make a simple change that will help in this case.

@minego
Copy link
Member

minego commented Dec 8, 2016

Okay, I threw in a quick test so that I can try this with "make test". It runs "BigDoc" with 100000 and "RealBigDoc" with 10 times that.

With the strlen:
16/17 Test #16: WJElement:BigDoc ................. Passed 0.77 sec
17/17 Test #17: WJElement:RealBigDoc ............. Passed 15.23 sec

With the memchr:
16/17 Test #16: WJElement:BigDoc ................. Passed 0.73 sec
17/17 Test #17: WJElement:RealBigDoc ............. Passed 3.72 sec

Seems like a pretty good improvement to me.

@minego
Copy link
Member

minego commented Dec 8, 2016

I've pushed this change. I'm pretty happy with the results.

Please run your tests and let me know how they behave. I ran again (on a faster machine) and I am consistently getting 0.09 for BigDoc and 0.86 for RealBigDoc.

@minego minego closed this as completed Dec 8, 2016
@tsachiherman
Copy link
Author

Thanks for the quick response; I wasn't able to see the change in the github web interface.
Is it delayed, or needed to be merged first ?

@minego
Copy link
Member

minego commented Dec 8, 2016 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants