locale-independent num-to-str #378

TurpentineDistillery · 2016-12-06T00:39:51Z

This implements #362
In order to determine whether a floating-point string representation needs ".0" appended, we need to be able to examine the string content before dumping it into stream. One way of doing that is to reuse thread-local classic-locale-imbued stringstream, but that has additional overhead, and also clang on my mac does not seem to know about thread_local. So I went with a different approach - use snprintf and then fix the result to be POSIX-locale compliant (erase decimal grouping characters and replace decimal separator with '.').

Also, this gives 2x speedup to dump numbers/signed_ints.json benchmarks. Others are unchanged.

coveralls · 2016-12-06T01:50:47Z

Coverage decreased (-0.1%) to 99.864% when pulling 509447b on TurpentineDistillery:feature/locale_independent_num_to_str into bc28942 on nlohmann:develop.

…ocales (localeconv) rather than std::locale

coveralls · 2016-12-07T04:25:29Z

Coverage decreased (-0.3%) to 99.728% when pulling 738d462 on TurpentineDistillery:feature/locale_independent_num_to_str into bc28942 on nlohmann:develop.

coveralls · 2016-12-09T04:02:20Z

Coverage remained the same at 100.0% when pulling 0193035 on TurpentineDistillery:feature/locale_independent_num_to_str into bc28942 on nlohmann:develop.

…r to manifest in AppVeyor

coveralls · 2016-12-13T02:06:45Z

Coverage remained the same at 100.0% when pulling 65b9b0c on TurpentineDistillery:feature/locale_independent_num_to_str into bc28942 on nlohmann:develop.

nlohmann · 2017-01-02T17:15:28Z

I shall have a look at the PR this week.

nlohmann · 2017-01-04T21:36:13Z

Hey @TurpentineDistillery, thanks for the PR!

I can confirm some of your numbers - here is the output of the benchmark (mean of 100 runs):

Test	develop	this PR
dump jeopardy.json	508.225 ms	561.056 ms
dump jeopardy.json with indent	563.668 ms	599.749 ms
dump numbers/floats.json	729.318 ms	693.462 ms
dump numbers/signed_ints.json	255.533 ms	123.513 ms

There is a 2x speedup for integers, a slight speedup for float, but for some reason, the jeopardy-example is slower, which makes no real sense.

Edit: I reran the benchmarks and now have these times:

Test	develop	this PR
dump jeopardy.json	574.94 ms	569.415 ms

I think switching to Nonius is not a good idea...

TurpentineDistillery · 2017-01-04T23:52:47Z

Can you modify benchmarking to use harmonic mean (as it should) instead of arithmetic mean? That should make benchmarking robust to outlier runs.

coveralls · 2017-01-05T03:40:14Z

Coverage remained the same at 100.0% when pulling 9490610 on TurpentineDistillery:feature/locale_independent_num_to_str into 9f6c86f on nlohmann:develop.

nlohmann · 2017-01-05T16:44:52Z

I ran another 100 repetitions. Here are the harmonic means:

Test	develop	this PR
dump jeopardy.json	569.3 ms	573.2 ms

TurpentineDistillery · 2017-01-05T17:07:59Z

Based on the previous runs, this appears to be within the experimental error (maybe add CIs to the timing outputs?) . By the way, jeopardy.json contains no numbers, only strings.

nlohmann · 2017-01-05T17:08:38Z

I know - that's why I am so puzzled that the numbers are so fragile...

whackashoe · 2017-01-05T18:44:12Z

Try running perf stat -Bddd ./whatever and compare off a few runs of each?

On #337 I remember jeopardy-indented was consistently worse too but it was very slight.

TurpentineDistillery · 2017-01-06T01:27:35Z

Another strange observation:
Harmonic mean is biased toward the minimum of sample of values (arithmetic mean of a sample with non-zero variance is always larger). However, in the data above we see the opposite: the arithmetic mean for develop was 508.225 ms, and the harmonic mean was 569.3 ms, which is indicative of the latter execution being consistently slower - the computer was was doing something else during the latter benchmarking, causing context-switches or frequency-scaling, or IO-related slowdowns. Either that or a bug ; )

Somewhat related to benchmarking in general: https://stackoverflow.com/questions/9006596/is-the-unix-time-command-accurate-enough-for-benchmarks

nlohmann · 2017-01-08T08:25:31Z

@TurpentineDistillery You are right - this all makes no sense. In the develop branch, I am using benchpress with some additional tweaks to avoid cache-related outliers. Then I had a look at nonius which has a nicer API and output options, but comes to these strange results.

I shall try Google Benchmark...

Edit: https://github.com/DigitalInBlue/Celero also sounds promising.

nlohmann · 2017-01-10T21:19:00Z

I added Google Benchmark to a feature branch and let the serialization benchmarks run (to is the code from the develop branch, bottom the code from this PR):

In this benchmark, the PR is consistently more than twice as fast as the code from develop! 👍

nlohmann · 2017-01-10T21:20:18Z

src/json.hpp.re2c

+
+            snprintf(m_buf.data(), m_buf.size(), fmt, x);
+
+#if 0


Is this #if 0 branch still relevant?

Yes - this is a here-be-dragons note to a future clever contributor (or code peruser) who'll think "let's use c++-flavor locales here!" Feel free to get rid of it.

TurpentineDistillery · 2017-01-10T21:25:58Z

In this benchmark, the PR is consistently more than twice as fast as the code from develop!

Huh? How is that possible?? The only thing that's supposed to be faster is the writing of integer types that is done "by hand" - I'd expect others to be unaffected, or minimally affected.

nlohmann · 2017-01-10T21:29:14Z

I think the fact that the locale is not reset with every dump call may make a difference.

This code is not executed for every dump:

        // fix locale problems
        ss.imbue(std::locale::classic());

        // 6, 15 or 16 digits of precision allows round-trip IEEE 754
        // string->float->string, string->double->string or string->long
        // double->string; to be safe, we read this value from
        // std::numeric_limits<number_float_t>::digits10
        ss.precision(std::numeric_limits<double>::digits10);

TurpentineDistillery · 2017-01-10T21:36:24Z

I have not observed similar speedup with running json_benchmarks, however.

nlohmann · 2017-01-10T21:41:50Z

I currently have no real way to produce reproducible benchmark results :(

nlohmann · 2017-01-10T21:42:24Z

Google Benchmark feels plausible, but I have no real way to verify the results.

TurpentineDistillery · 2017-01-11T02:40:29Z

I ran my own 'ghetto' benchmarking, which is, as you might have expected by now, is in disagreement with all of the results obtained with various approaches above. It is about as simple as "hello world", measuring the time to dump json record in a loop until it totals 100000000 bytes (I didn't bother factoring out the bootstrap time to initially deserialize the record). It shows that PR is about the same as develop in all cases; maybe just marginally faster. I ran it a few times and got consistent numbers.

>>g++ --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.10.sdk/usr/include/c++/4.2.1
Apple LLVM version 6.0 (clang-600.0.57) (based on LLVM 3.5svn)
Target: x86_64-apple-darwin13.4.0
Thread model: posix

>>cat main.cpp
#include "src/json.hpp"
#include <iostream>

int main()
{
    nlohmann::json j;
    std::cin >> j;
    while(true) std::cout << j;
    return 0;
}

>>git checkout develop
>>g++ -std=c++11 -O2 main.cpp -DNDEBUG -o a.out.develop

>>git checkout feature/locale_independent_num_to_str
>>g++ -std=c++11 -O2 main.cpp -DNDEBUG -o a.out.pr


>>for file in `find benchmarks/ -name *.json`; do echo $file; cat $file | time ./a.out.pr | head -c 100000000 >/dev/null;  done
benchmarks//files/jeopardy/jeopardy.json
Command terminated abnormally.
       13.02 real        12.83 user         0.18 sys
benchmarks//files/nativejson-benchmark/canada.json
Command terminated abnormally.
       11.97 real        11.94 user         0.02 sys
benchmarks//files/nativejson-benchmark/citm_catalog.json
Command terminated abnormally.
        8.78 real         8.76 user         0.02 sys
benchmarks//files/nativejson-benchmark/twitter.json
Command terminated abnormally.
        8.10 real         8.08 user         0.01 sys
benchmarks//files/numbers/floats.json
Command terminated abnormally.
       13.63 real        13.59 user         0.03 sys
benchmarks//files/numbers/signed_ints.json
Command terminated abnormally.
        9.44 real         9.41 user         0.02 sys
benchmarks//files/numbers/unsigned_ints.json
Command terminated abnormally.
        9.41 real         9.38 user         0.02 sys


>>for file in `find benchmarks/ -name *.json`; do echo $file; cat $file | time ./a.out.develop | head -c 100000000 >/dev/null;  done
benchmarks//files/jeopardy/jeopardy.json
Command terminated abnormally.
       13.05 real        12.85 user         0.18 sys
benchmarks//files/nativejson-benchmark/canada.json
Command terminated abnormally.
       14.25 real        14.22 user         0.02 sys
benchmarks//files/nativejson-benchmark/citm_catalog.json
Command terminated abnormally.
        9.48 real         9.46 user         0.02 sys
benchmarks//files/nativejson-benchmark/twitter.json
Command terminated abnormally.
        8.20 real         8.18 user         0.02 sys
benchmarks//files/numbers/floats.json
Command terminated abnormally.
       14.75 real        14.72 user         0.03 sys
benchmarks//files/numbers/signed_ints.json
Command terminated abnormally.
       10.59 real        10.56 user         0.03 sys
benchmarks//files/numbers/unsigned_ints.json
Command terminated abnormally.
       10.52 real        10.49 user         0.03 sys

TurpentineDistillery · 2017-01-11T02:54:02Z

I think the fact that the locale is not reset with every dump call may make a difference.

The profiler did not identify these as hot-spots. The locale instantiation (prior to using classic) was super expensive, if you remember, but imbueing and setting precision once per dump was not expensive.

nlohmann · 2017-01-11T18:52:40Z

Sigh... I had another look at the benchmarks. Apart of changing the header file (develop/PR), the numbers stay the same

$ ./benchmark-develop --benchmark_filter="dump.*" --benchmark_repetitions=30 --benchmark_report_aggregates_only=true
Run on (8 X 2900 MHz CPU s)
2017-01-11 19:34:34
Benchmark                                          Time           CPU Iterations
--------------------------------------------------------------------------------
dump data/jeopardy/jeopardy.json_mean            271 ns        270 ns    2625706   14.1525MB/s normal
dump data/jeopardy/jeopardy.json_stddev           11 ns         10 ns          0   566.813kB/s normal
dump data/jeopardy/jeopardy.json_mean            267 ns        267 ns    2537979   14.3072MB/s pretty
dump data/jeopardy/jeopardy.json_stddev           10 ns         10 ns          0   566.589kB/s pretty
dump data/numbers/floats.json_mean               270 ns        270 ns    2643914   14.1578MB/s normal
dump data/numbers/floats.json_stddev              10 ns         10 ns          0   563.347kB/s normal
dump data/numbers/floats.json_mean               267 ns        266 ns    2761069   14.3434MB/s pretty
dump data/numbers/floats.json_stddev              12 ns         12 ns          0    667.08kB/s pretty
dump data/numbers/signed_ints.json_mean          264 ns        264 ns    2781078   14.4766MB/s normal
dump data/numbers/signed_ints.json_stddev         13 ns         12 ns          0   708.379kB/s normal
dump data/numbers/signed_ints.json_mean          268 ns        267 ns    2760285   14.3026MB/s pretty
dump data/numbers/signed_ints.json_stddev         12 ns         12 ns          0   665.562kB/s pretty

$ ./benchmark-pr --benchmark_filter="dump.*" --benchmark_repetitions=30 --benchmark_report_aggregates_only=true
Run on (8 X 2900 MHz CPU s)
2017-01-11 19:39:48
Benchmark                                          Time           CPU Iterations
--------------------------------------------------------------------------------
dump data/jeopardy/jeopardy.json_mean            116 ns        116 ns    5679375   33.0805MB/s normal
dump data/jeopardy/jeopardy.json_stddev            6 ns          6 ns          0   1.64533MB/s normal
dump data/jeopardy/jeopardy.json_mean            118 ns        118 ns    5818545   32.3685MB/s pretty
dump data/jeopardy/jeopardy.json_stddev            3 ns          3 ns          0   951.512kB/s pretty
dump data/numbers/floats.json_mean               120 ns        119 ns    5542447   31.9539MB/s normal
dump data/numbers/floats.json_stddev               2 ns          2 ns          0   496.115kB/s normal
dump data/numbers/floats.json_mean               119 ns        119 ns    6097083   32.1939MB/s pretty
dump data/numbers/floats.json_stddev               1 ns          1 ns          0   401.886kB/s pretty
dump data/numbers/signed_ints.json_mean          119 ns        118 ns    6104687   32.2889MB/s normal
dump data/numbers/signed_ints.json_stddev          2 ns          2 ns          0   610.253kB/s normal
dump data/numbers/signed_ints.json_mean          118 ns        118 ns    5893595   32.3966MB/s pretty
dump data/numbers/signed_ints.json_stddev          3 ns          3 ns          0   816.329kB/s pretty

TurpentineDistillery · 2017-01-11T21:25:02Z

Well, at least we can conclusively say that PR is not worse performance-wise : )

Alex Astashyn added 4 commits December 4, 2016 01:27

Added locale-independent numtostr

21cae35

Fixed suffixing .0 and modified the unit tests accordingly

2197856

Small bufix related to creation of fmt string for snprintf

509447b

Addressing msvc-specific compilation issues.

82b82fd

Bugfix: when working with C formatting functions we need to query C l…

738d462

…ocales (localeconv) rather than std::locale

Alex Astashyn added 3 commits December 7, 2016 20:23

Added unit test for issue nlohmann#378

50f0484

Addressing compiler warnings

343c9f9

Tweaking unit test, as digits grouping is failing to be invoked in CI

0193035

Disabling snprintf pre-check, since can't get locale-specific behavio…

65b9b0c

…r to manifest in AppVeyor

nlohmann mentioned this pull request Dec 23, 2016

replace strtold with non locale dependent version #337

Closed

nlohmann self-assigned this Jan 2, 2017

nlohmann added this to the Release 2.1.0 milestone Jan 2, 2017

nlohmann added the state: please discuss please discuss the issue or vote for your favorite option label Jan 4, 2017

nlohmann mentioned this pull request Jan 4, 2017

Append ".0" to serialized floating_point values that are digits-only. #362

Closed

Merge upstream/develop into feature/locale_independent_num_to_str

9490610

nlohmann reviewed Jan 10, 2017

View reviewed changes

nlohmann modified the milestones: Release 2.1.0, Release 3.0.1 Jan 24, 2017

nlohmann modified the milestones: Release 2.1.1, Release 3.0.1 Feb 16, 2017

nlohmann added a commit that referenced this pull request Feb 16, 2017

🔀 merge #378 (for #362 and #454)

6408402

nlohmann merged commit 9490610 into nlohmann:develop Feb 19, 2017

This was referenced Feb 19, 2017

doubles are printed as integers #454

Closed

Roundtrip error while parsing "1000000000000000010E5" #465

Closed

kishorenc mentioned this pull request Nov 5, 2017

Floating point value loses decimal point during dump #818

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

locale-independent num-to-str #378

locale-independent num-to-str #378

TurpentineDistillery commented Dec 6, 2016

coveralls commented Dec 6, 2016

coveralls commented Dec 7, 2016

coveralls commented Dec 9, 2016

coveralls commented Dec 13, 2016

nlohmann commented Jan 2, 2017

nlohmann commented Jan 4, 2017 •

edited

Loading

TurpentineDistillery commented Jan 4, 2017

coveralls commented Jan 5, 2017

nlohmann commented Jan 5, 2017

TurpentineDistillery commented Jan 5, 2017

nlohmann commented Jan 5, 2017

whackashoe commented Jan 5, 2017

TurpentineDistillery commented Jan 6, 2017 •

edited

Loading

nlohmann commented Jan 8, 2017 •

edited

Loading

nlohmann commented Jan 10, 2017

nlohmann Jan 10, 2017

TurpentineDistillery Jan 10, 2017

TurpentineDistillery commented Jan 10, 2017

nlohmann commented Jan 10, 2017

TurpentineDistillery commented Jan 10, 2017

nlohmann commented Jan 10, 2017

nlohmann commented Jan 10, 2017

TurpentineDistillery commented Jan 11, 2017

TurpentineDistillery commented Jan 11, 2017

nlohmann commented Jan 11, 2017

TurpentineDistillery commented Jan 11, 2017

locale-independent num-to-str #378

locale-independent num-to-str #378

Conversation

TurpentineDistillery commented Dec 6, 2016

coveralls commented Dec 6, 2016

coveralls commented Dec 7, 2016

coveralls commented Dec 9, 2016

coveralls commented Dec 13, 2016

nlohmann commented Jan 2, 2017

nlohmann commented Jan 4, 2017 • edited Loading

TurpentineDistillery commented Jan 4, 2017

coveralls commented Jan 5, 2017

nlohmann commented Jan 5, 2017

TurpentineDistillery commented Jan 5, 2017

nlohmann commented Jan 5, 2017

whackashoe commented Jan 5, 2017

TurpentineDistillery commented Jan 6, 2017 • edited Loading

nlohmann commented Jan 8, 2017 • edited Loading

nlohmann commented Jan 10, 2017

nlohmann Jan 10, 2017

Choose a reason for hiding this comment

TurpentineDistillery Jan 10, 2017

Choose a reason for hiding this comment

TurpentineDistillery commented Jan 10, 2017

nlohmann commented Jan 10, 2017

TurpentineDistillery commented Jan 10, 2017

nlohmann commented Jan 10, 2017

nlohmann commented Jan 10, 2017

TurpentineDistillery commented Jan 11, 2017

TurpentineDistillery commented Jan 11, 2017

nlohmann commented Jan 11, 2017

TurpentineDistillery commented Jan 11, 2017

nlohmann commented Jan 4, 2017 •

edited

Loading

TurpentineDistillery commented Jan 6, 2017 •

edited

Loading

nlohmann commented Jan 8, 2017 •

edited

Loading