-
-
Notifications
You must be signed in to change notification settings - Fork 6.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
locale-independent num-to-str #378
locale-independent num-to-str #378
Conversation
…ocales (localeconv) rather than std::locale
…r to manifest in AppVeyor
I shall have a look at the PR this week. |
Hey @TurpentineDistillery, thanks for the PR! I can confirm some of your numbers - here is the output of the benchmark (mean of 100 runs):
There is a 2x speedup for integers, a slight speedup for float, but for some reason, the jeopardy-example is slower, which makes no real sense. Edit: I reran the benchmarks and now have these times:
I think switching to Nonius is not a good idea... |
Can you modify benchmarking to use harmonic mean (as it should) instead of arithmetic mean? That should make benchmarking robust to outlier runs. |
I ran another 100 repetitions. Here are the harmonic means:
|
Based on the previous runs, this appears to be within the experimental error (maybe add CIs to the timing outputs?) . By the way, jeopardy.json contains no numbers, only strings. |
I know - that's why I am so puzzled that the numbers are so fragile... |
Try running On #337 I remember jeopardy-indented was consistently worse too but it was very slight. |
Another strange observation: Somewhat related to benchmarking in general: https://stackoverflow.com/questions/9006596/is-the-unix-time-command-accurate-enough-for-benchmarks |
@TurpentineDistillery You are right - this all makes no sense. In the develop branch, I am using benchpress with some additional tweaks to avoid cache-related outliers. Then I had a look at nonius which has a nicer API and output options, but comes to these strange results. I shall try Google Benchmark... Edit: https://github.com/DigitalInBlue/Celero also sounds promising. |
|
||
snprintf(m_buf.data(), m_buf.size(), fmt, x); | ||
|
||
#if 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this #if 0
branch still relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - this is a here-be-dragons note to a future clever contributor (or code peruser) who'll think "let's use c++-flavor locales here!" Feel free to get rid of it.
Huh? How is that possible?? The only thing that's supposed to be faster is the writing of integer types that is done "by hand" - I'd expect others to be unaffected, or minimally affected. |
I think the fact that the locale is not reset with every dump call may make a difference. This code is not executed for every dump: // fix locale problems
ss.imbue(std::locale::classic());
// 6, 15 or 16 digits of precision allows round-trip IEEE 754
// string->float->string, string->double->string or string->long
// double->string; to be safe, we read this value from
// std::numeric_limits<number_float_t>::digits10
ss.precision(std::numeric_limits<double>::digits10); |
I have not observed similar speedup with running json_benchmarks, however. |
I currently have no real way to produce reproducible benchmark results :( |
Google Benchmark feels plausible, but I have no real way to verify the results. |
I ran my own 'ghetto' benchmarking, which is, as you might have expected by now, is in disagreement with all of the results obtained with various approaches above. It is about as simple as "hello world", measuring the time to dump json record in a loop until it totals 100000000 bytes (I didn't bother factoring out the bootstrap time to initially deserialize the record). It shows that PR is about the same as develop in all cases; maybe just marginally faster. I ran it a few times and got consistent numbers. >>g++ --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.10.sdk/usr/include/c++/4.2.1
Apple LLVM version 6.0 (clang-600.0.57) (based on LLVM 3.5svn)
Target: x86_64-apple-darwin13.4.0
Thread model: posix
>>cat main.cpp
#include "src/json.hpp"
#include <iostream>
int main()
{
nlohmann::json j;
std::cin >> j;
while(true) std::cout << j;
return 0;
}
>>git checkout develop
>>g++ -std=c++11 -O2 main.cpp -DNDEBUG -o a.out.develop
>>git checkout feature/locale_independent_num_to_str
>>g++ -std=c++11 -O2 main.cpp -DNDEBUG -o a.out.pr
>>for file in `find benchmarks/ -name *.json`; do echo $file; cat $file | time ./a.out.pr | head -c 100000000 >/dev/null; done
benchmarks//files/jeopardy/jeopardy.json
Command terminated abnormally.
13.02 real 12.83 user 0.18 sys
benchmarks//files/nativejson-benchmark/canada.json
Command terminated abnormally.
11.97 real 11.94 user 0.02 sys
benchmarks//files/nativejson-benchmark/citm_catalog.json
Command terminated abnormally.
8.78 real 8.76 user 0.02 sys
benchmarks//files/nativejson-benchmark/twitter.json
Command terminated abnormally.
8.10 real 8.08 user 0.01 sys
benchmarks//files/numbers/floats.json
Command terminated abnormally.
13.63 real 13.59 user 0.03 sys
benchmarks//files/numbers/signed_ints.json
Command terminated abnormally.
9.44 real 9.41 user 0.02 sys
benchmarks//files/numbers/unsigned_ints.json
Command terminated abnormally.
9.41 real 9.38 user 0.02 sys
>>for file in `find benchmarks/ -name *.json`; do echo $file; cat $file | time ./a.out.develop | head -c 100000000 >/dev/null; done
benchmarks//files/jeopardy/jeopardy.json
Command terminated abnormally.
13.05 real 12.85 user 0.18 sys
benchmarks//files/nativejson-benchmark/canada.json
Command terminated abnormally.
14.25 real 14.22 user 0.02 sys
benchmarks//files/nativejson-benchmark/citm_catalog.json
Command terminated abnormally.
9.48 real 9.46 user 0.02 sys
benchmarks//files/nativejson-benchmark/twitter.json
Command terminated abnormally.
8.20 real 8.18 user 0.02 sys
benchmarks//files/numbers/floats.json
Command terminated abnormally.
14.75 real 14.72 user 0.03 sys
benchmarks//files/numbers/signed_ints.json
Command terminated abnormally.
10.59 real 10.56 user 0.03 sys
benchmarks//files/numbers/unsigned_ints.json
Command terminated abnormally.
10.52 real 10.49 user 0.03 sys
|
The profiler did not identify these as hot-spots. The locale instantiation (prior to using classic) was super expensive, if you remember, but imbueing and setting precision once per dump was not expensive. |
Sigh... I had another look at the benchmarks. Apart of changing the header file (develop/PR), the numbers stay the same
|
Well, at least we can conclusively say that PR is not worse performance-wise : ) |
This implements #362
In order to determine whether a floating-point string representation needs ".0" appended, we need to be able to examine the string content before dumping it into stream. One way of doing that is to reuse thread-local classic-locale-imbued stringstream, but that has additional overhead, and also clang on my mac does not seem to know about
thread_local
. So I went with a different approach - usesnprintf
and then fix the result to be POSIX-locale compliant (erase decimal grouping characters and replace decimal separator with '.').Also, this gives 2x speedup to
dump numbers/signed_ints.json
benchmarks. Others are unchanged.