-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-linear parsing performance #64
Comments
Can you give some examples of what the structure of the documents you're testing with looks like? You don't need to attach a full document, but something with 5 or 10 items so we get the idea would be helpful. There are a few possible culprits. Normally when adding an object to a WJElement the "last" pointer is used so we don't have to walk through the children, but perhaps there is a case left over that doesn't do that. |
Of course. When conducting the performance testing above, I was using the following function to generate the consumed json string:
|
I believe that I've found the culprit; It's the WJRMemCallback implementation that makes it into O(n^2) instead of just O(n).
that could, should - but wasn't per-calculated, and cached. To resolve the performance issue externally, one can define his own callback and pass a custom data structure as the user data pointer. In there, the two fields would be the json string as well as the length of the json string. |
Wow, I wouldn't have expected that to be the problem. Unfortunately it would be a bit tricky to change that callback without affecting existing code since it doesn't really have a place to store the length. But, I think I can improve it by using a memchr instead of a strlen there. That callback doesn't actually need the whole length. If the amount left is less than the requested amount then it needs to know the amount left. That's all. I'll play with it and see if I can make a simple change that will help in this case. |
Okay, I threw in a quick test so that I can try this with "make test". It runs "BigDoc" with 100000 and "RealBigDoc" with 10 times that. With the strlen: With the memchr: Seems like a pretty good improvement to me. |
I've pushed this change. I'm pretty happy with the results. Please run your tests and let me know how they behave. I ran again (on a faster machine) and I am consistently getting 0.09 for BigDoc and 0.86 for RealBigDoc. |
Thanks for the quick response; I wasn't able to see the change in the github web interface. |
Sorry, I am an idiot and pushed to the wrong branch. It has been pushed to the
master branch now.
…On Thu, Dec 08, 2016 at 09:12:06AM -0800, Tsachi Herman wrote:
Thanks for the quick response; I wasn't able to see the change in the github
web interface.
Is it delayed, or needed to be merged first ?
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub, or mute the thread.*
|
When using the following code:
WJReader r = WJROpenMemDocument(buffer, NULL, 0);
handle = WJEOpenDocument(r,NULL,NULL,NULL);
I've noted that the performance degrade as a function of n.
Unfortunately it's degrading in a non-linear way:
json parsing of 100000 items took : 0.204 per-element time : 0.000002
json parsing of 200000 items took : 0.697 per-element time : 0.000003
json parsing of 300000 items took : 1.465 per-element time : 0.000005
json parsing of 400000 items took : 2.639 per-element time : 0.000007
json parsing of 500000 items took : 4.190 per-element time : 0.000008
json parsing of 600000 items took : 6.559 per-element time : 0.000011
json parsing of 700000 items took : 9.934 per-element time : 0.000014
json parsing of 800000 items took : 14.148 per-element time : 0.000018
json parsing of 900000 items took : 18.535 per-element time : 0.000021
Any suggestions would be appreciated.
The text was updated successfully, but these errors were encountered: