Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance in miloyip/nativejson-benchmark #202

Closed
twelsby opened this issue Jan 30, 2016 · 10 comments
Closed

Performance in miloyip/nativejson-benchmark #202

twelsby opened this issue Jan 30, 2016 · 10 comments

Comments

@twelsby
Copy link
Contributor

twelsby commented Jan 30, 2016

Here is the results of testing 1.1.0, 2.0.0 and my proposed changes to 2.0.0 under miloyip/nativejson-benchmark Results.zip.

Here is my (possibly naive) interpretation:

Parse
Our lexer looks to be pretty good and parsing of strings is about as good as it could be. Some further improvements could be made in number parsing by incorporating custom floating point parsing into the proposed get_integer() in lieu of strtof()/strtod()/strtold() but this would greatly increase code complexity and the improvements would likely be marginal at best.

Looking at the other libraries results (especially the ones with fast parsing) it looks like allocations are what is killing us. The other thing that might be killing us is the use of map. This is not ideal for parsing a large unordered file because the tree will be continually re-balanced. The benefit of using a map is lookup, but that isn't really tested in this benchmark as the timed access is all sequential.

Note that I'm not saying we should be moving away from map, because our library is really designed around using the json object to store data, not merely to parse/dump json files. We probably just have to accept that our parse is always going to be slower.

Stringify
The proposed change to snprintf() for floating point stringifying highlights just how bad stringstreams are for this. There may be further improvements by reducing our reliance on stringstreams elsewhere but this would come at the expense of c++ style. I did try using snprintf() for integers, and even a very efficient custom function I've used before on microcontrollers, but didn't achieve any significant improvement so perhaps the issue is only manifest for floating point numbers.

The other thing here is that use of map is probably not as fast for sequential dumping as other container types. I think this is just a price we have to pay.

Time
This test simply iterates through the document while collecting statistics about number of strings, numbers, objects, arrays etc. Any difference between library versions is an anomaly as there are no significant differences (only the bitfield overhead for the proposed changes). Because this test is sequential we are at a disadvantage using maps.

The bottom line seems to be that a choice must be made - do we optimize for rapid parsing/dumping/sequential access at the expense of lookup, or do we optimize for lookup and accept that parsing/dumping/sequential access will always be a little slower (but not that much).

@nlohmann, if you are able to run similar tests under Linux/OSX that would help to make sure this isn't just a VS2015 issue.

@nlohmann
Copy link
Owner

Thanks for the results! I'll check them with OSX.

Here are some thoughts:

  • I also think that the lexer is near optimal. This is thanks to re2c, of course.
  • The parser is quite naive as it is a manual recursive decent parser. I'd really like to have an LALR parser, but I did not find a tool yet that generates C++ code that can be put in a single header file. I may have another look at the json.org reference parser though. In the end, this part may not be a performance issue, but at least I would like to give it a try... Nevertheless, I think the parser is now stable enough to check if we can get rid of some of the allocations.
  • Custom floating-point parsing - as difficult as it may be - could be worth the effort, because the resulting code would be locale-independent. So far, the only other way would be to use the functions of xlocale.h.
  • So far, I think that a lot of people actually use the library to create/manipulate JSON objects rather than "just" writing an import. To this end, we spent so much effort in having a nice API, implicit conversions, STL-like access, etc. That said, I think it's OK to optimize for lookup.

@gregmarr
Copy link
Contributor

It might be interesting to compare performance of using the default std::map with using std::unordered_map, and the boost flat equivalents.

@twelsby
Copy link
Contributor Author

twelsby commented Jan 30, 2016

I think unordered map would be a good alternative for users doing mainly sequential access and just parse/dump type access. I definitely think there is value in doing that.

Boost however... definitely not. I think we are better off avoiding external dependencies - especially a behemoth like Boost. Or do you just mean implementing something like Boost, but without actually using Boost.

@nlohmann, with regard to floating point - I also think it is worthwhile, both for parsing and dumping - but it is going to bloat the code considerably. The two things that I consider most important about the library are that it remains single header and that it retains its ease of use. Performance to me seems a distant third - anyone who wants raw performance over ease of use is probably going to go for rapidJSON or one of the c libraries anyway.

@gregmarr
Copy link
Contributor

I just mean testing it in the benchmark to see what the perfromance differences are, since the class has the ability to specify the class used for objects. It could give insight into the performance differences and those could be documented for users that need the performance.

@nlohmann
Copy link
Owner

@gregmarr Yes, checking std::unordered_map is a good idea. But I think the change has to be made inside the code. Unfortunately, just changing a template argument does not work (see #164).

@gregmarr
Copy link
Contributor

Ah, I thought that issue had been resolved.

@nlohmann
Copy link
Owner

Not really. The first fix would break the allocator stuff. (And there is another issue that this part is also flawed...)

@nlohmann
Copy link
Owner

I adjusted the code to make a test with unordered_map. Interestingly, the performance is worse:

$ ./json_benchmarks_unordered --benchtime 10
parse jeopardy.json                           5  3920308867 ns/op
parse canada.json                           100   173185741 ns/op
parse citm_catalog.json                     200    91518287 ns/op
parse twitter.json                          500    44022349 ns/op
dump jeopardy.json                           10  2060110065 ns/op
dump jeopardy.json with indent               10  2287772953 ns/op
./json_benchmarks_unordered 213.228s

$ ./json_benchmarks --benchtime 10
parse jeopardy.json                           5  3594846905 ns/op
parse canada.json                           100   158635930 ns/op
parse citm_catalog.json                     200    83257414 ns/op
parse twitter.json                          500    38796771 ns/op
dump jeopardy.json                           10  1291502946 ns/op
dump jeopardy.json with indent               10  2183497379 ns/op
./json_benchmarks 187.691s

@gregmarr
Copy link
Contributor

Could be lots of hash collisions. I would expect better performance on lookups, but not much change elsewhere. The flat containers from boost might only help in the memory usage and output, insertion may be worse.

@miloyip
Copy link

miloyip commented Jan 31, 2016

I have done some profiling and optimizations on itoa and dtoa while working on RapidJSON:

It seems that some C++ standard library are improving their routines in these few years. So it may just get improved in the future. We have faced a locale issue in the past, where the JSON is written as incorrect JSON number format. Doing setlocale() to backup/set/restore locale was killing the time. So finally I wrote and optimized one.

For the object stoarge/query issue, we have made some discussion Tencent/rapidjson#102. However, we did not come to an conclusion for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants