-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added fast.json D library beating RapidJSON #46
Conversation
wow, how this is so fast? |
When I started I just scanned for object/array start and ends with SSE and measured the time. It was somewhere between 0.1s and 0.2s. Then when adding actual number parsing, string parsing and UTF-8 validation I kept a close eye on my time budget. Some of the performance is gained by doing heuristics first. While parsing numbers you can calculate exactly if an integer will overflow, or you can go faster by just comparing against a fixed value and still cover 99.99% of cases while handling the rest outside of the hot loop. Or you can add heuristics to the white-space checking function like RapidJSON did. SSE is fast when you have more than two bytes to skip, but white-space in JSON is often at most one character, so you better check that before doing SIMD processing. Things like force-inlining the white-space skipping function also made a 10% difference in overall speed at this level.
RapidJSON supports 1 and 4 as far as I can tell. 4 is way to cumbersome to use routinely and 1 adds unnecessary overhead in many cases. I decided to go with 2 and add a layer of convenience on top. For example I used a D's dispatch operator to make a JSON key lookup look like a property: On the downside I did not validate the unused side-structures. I think it is not necessary to validate data you are not using. So basically I only scan them so much as to find where they end. Granted it is a bit of optimization for a benchmark, but is actually handy in real-life as well. After all you could still raise the validation level to maximum if you really cared or call one of the validation functions. Thanks for adding fast.json. Being the #1 feels good ;) |
great, you should create article from this and post somewhere. |
I have fixed the file reading part and provided a SAX version for RapidJSON in #53 . Since I do not have GNU D compiler on this machine yet and cannot do a comparison directly. This comparison may be not fair in somehow, for example, RapidJSON test does not turn on UTF-8 validation, and the fast.json does not validate the parts that are not used. Should the tests be adjusted? |
Good to see you @miloyip. I hoped it would take a while before you notice that with SAX parsing you could do a lot better in this benchmark. Something is strange with the results though. I did try your push parser in a quick hack, just to see what to expect when you come around to improve your benchmark entry and it was more than twice as fast as the DOM parser ... on my i5. As for the validation, it seems that everyone has a slightly different idea of what needs to be validated and as the authors of the parsers we know what kind of validation costs us precious milliseconds. My stance was that external input needs to be validated down to the character encoding level. But since my parser is effectively just a one-way deserializer, errors in unused parts don't matter, as they can't propagate. We could certainly arrange for our two parsers to do the same level of validation, but what about the other entries? Some "read file as string" functions perform UTF-8 validation already, others don't. I'm actually in favor of keeping things as is, since not validating unused parts gives fast.json such a huge advantage, while still catching any errors that could invalidate the program output. |
Since RapidJSON claims to be the fastest I thought I'll accept the challenge. For better comparability I chose the Gcc based backend.