replace strtold with non locale dependent version #337

whackashoe · 2016-10-16T13:34:59Z

Hi!

I've taken and reworked a strtod implementation from TI in order to provide this. It's rewritten in an attempt to be solid C++11 rather than C.

Using this fixes #302. It does not support octal or hexadecimal which JSON doesn't recognize in its number type anyway. This seems to be a bit more efficient (possibly due to non support of octal/hexadecimal) though I wouldn't be surprised if these gains disappear once the last few bugs are fixed heh.

Not all tests are passing yet but it's pretty close, perhaps someone can figure out the ones I haven't.

BEFORE:

./json_benchmarks
parse jeopardy.json                           1  1681452236 ns/op
parse canada.json                            50    43518542 ns/op
parse citm_catalog.json                      50    24345478 ns/op
parse twitter.json                          100    12713148 ns/op
parse numbers/floats.json                     5   447398631 ns/op
parse numbers/signed_ints.json                5   280521961 ns/op
parse numbers/unsigned_ints.json              5   278368046 ns/op
dump jeopardy.json                            1  1390675617 ns/op
dump jeopardy.json with indent                1  1578243282 ns/op
./json_benchmarks 21.516s

AFTER:

./json_benchmarks
parse jeopardy.json                           1  1619098873 ns/op
parse canada.json                            50    36922996 ns/op
parse citm_catalog.json                      50    25375433 ns/op
parse twitter.json                          100    13150869 ns/op
parse numbers/floats.json                     5   376901576 ns/op
parse numbers/signed_ints.json                5   300091862 ns/op
parse numbers/unsigned_ints.json              5   297152348 ns/op
dump jeopardy.json                            1  1390073209 ns/op
dump jeopardy.json with indent                1  1616301685 ns/op
./json_benchmarks 20.828s

UNIT TESTS:

(these are run with CXX=clang++ make check)

src/unit-class_parser.cpp:37

src/unit-class_parser.cpp:258: FAILED:
  CHECK( json::parser("0.999").parse() == json(0.999) )
with expansion:
  0.999 == 0.999

regression tests
  issue #89 - nonstandard integer type
src/unit-regression.cpp:36

src/unit-regression.cpp:150: FAILED:
  CHECK( j.get<float>() == 4294967296.0f )
with expansion:
  4294967040.0f == 4294967296.0f

src/unit-regression.cpp:160: FAILED:
  CHECK( j.get<float>() == -2147483650.0f )
with expansion:
  -2147483520.0f == -2147483648.0f

regression tests
  issue #186 miloyip/nativejson-benchmark: floating-point parsing
src/unit-regression.cpp:36

src/unit-regression.cpp:316: FAILED:
  CHECK( j.get<double>() == 2.2250738585072009e-308 )
with expansion:
  0.0 == 0.0

src/unit-regression.cpp:319: FAILED:
  CHECK( j.get<double>() == 0.99999999999999989 )
with expansion:
  1.0 == 1.0

src/unit-regression.cpp:322: FAILED:
  CHECK( j.get<double>() == 1.00000000000000022 )
with expansion:
  1.0 == 1.0

src/unit-regression.cpp:325: FAILED:
  CHECK( j.get<double>() == 72057594037927928.0 )
with expansion:
  72057594037927920.0 == 72057594037927928.0

src/unit-regression.cpp:328: FAILED:
  CHECK( j.get<double>() == 9223372036854774784.0 )
with expansion:
  9223372036854775808.0
  ==
  9223372036854774784.0

src/unit-regression.cpp:331: FAILED:
  CHECK( j.get<double>() == 10141204801825834086073718800384.0 )
with expansion:
  10141204801825832960173811957760.0
  ==
  10141204801825834086073718800384.0

src/unit-regression.cpp:334: FAILED:
  CHECK( j.get<double>() == 5708990770823838890407843763683279797179383808.0 )
with expansion:
  5708990770823843327184944562486185035640602624.0
  ==
  5708990770823838890407843763683279797179383808.0

compliance tests from nativejson-benchmark
  roundtrip
src/unit-testsuites.cpp:103

src/unit-testsuites.cpp:309: FAILED:
  CHECK( j.dump() == json_string )
with expansion:
  "[0.0]" == "[2.2250738585072e-308]"
with message:
  filename := test/data/json_roundtrip/roundtrip29.json

===============================================================================
test cases:      35 |      32 passed |  3 failed
assertions: 8905168 | 8905157 passed | 11 failed

nlohmann · 2016-10-16T17:05:22Z

Thanks for the PR! I'll check it as soon as I find the time.

nlohmann · 2016-10-16T17:06:14Z

Just one question: Can the code be licensed under the MIT license?

whackashoe · 2016-10-16T17:49:10Z

Yeah I think it's fine, it's basically all rewritten so I don't think there is much of an issue that way... plus if they attempted to sue for using a similar variable name or if statement we'd probably become famous.

If you want to be extra safe we can change the comments to be unique as they are just copied verbatim, though I think that'd be horrid as their comments are quite nice.

nlohmann · 2016-10-16T21:36:12Z

src/json.hpp

-        standard floating point number parsing function based on the type
-        supplied via the first parameter.  Set this to @a
-        static_cast<number_float_t*>(nullptr).
+            while (nl_isspace(*fst))


Whitespace is already skipped by the lexer, so I think it is safe to skip this loop.

nlohmann · 2016-10-16T21:38:01Z

src/json.hpp

+                if (cp == '-' or cp == '+')
+                {
+                    ++fst;
+                    successful_parse = true;


The lexer discards any string that violates JSON's specification for numbers. Therefore, we do not need additional verification here.

whackashoe · 2016-10-16T22:37:16Z

Since it is relying on other conditions now with these 2 changes, I think we can strip out endptr and errno as well.

this changes a few things. the name of the function to better represent what it does (string to json number) - we aren't strtoul compliant so best not to lie about that. we remove template code and just use one function. this makes it a bit simpler and we can always add it back if it buys us anything. the penalty for casting down from long double seems to be inconsequential.

whackashoe · 2016-10-21T04:03:57Z

The unit-regression failures seem to be just an off by 1 error, if you look at expected and result values as hex:

0x4370000000000000 == 0x436fffffffffffff

0x43e0000000000000 == 0x43dfffffffffffff

nlohmann · 2016-10-23T16:05:19Z

src/json.hpp.re2c

+            if (*str == 'e' or *str == 'E')
+            {
+                cp = *++str;
+                bool negative_exp = cp == '-'; // read in exponent sign (+/-)


negative_exp could be const bool

nlohmann · 2016-10-23T16:05:58Z

src/json.hpp.re2c

+                while (nl_isdigit(cp = *str))
+                {
+                     result = result * 10 + (cp - '0');
+                     ++str;


Couldn't str be incremented in the while condition just like in the loop below?

No, we may not want to increment if nl_isdigit fails so it must be in body.

nlohmann · 2016-10-23T16:06:54Z

src/json.hpp.re2c

-        supplied via the first parameter.  Set this to @a
-        static_cast<number_float_t*>(nullptr).
+                int count = 0; // exponent calculation
+                if (! nl_isdigit(cp))


please use not rather than !

nlohmann · 2016-10-23T16:11:13Z

src/json.hpp.re2c

+                {1.e1L, 1.e2L, 1.e4L, 1.e8L, 1.e16L, 1.e32L, 1.e64L, 1.e128L, 1.e256L}
+            };
+
+            if (exp > std::numeric_limits<long double>::max_exponent10)


Starting here, there a now comments in the code. Please briefly describe each conditional and loop.

whackashoe · 2016-11-02T16:53:41Z

@nlohmann could you add help-wanted tag to this, I am too busy right now to do anything with this right now but it is very close.

nlohmann · 2016-11-11T18:41:28Z

Hey @whackashoe do you need help with this?

whackashoe · 2016-11-11T19:00:09Z

@nlohmann Yes. I haven't had time to figure out the last issues with this, though iirc its basically at a level where it works except for an edge case (maybe more than one).

nlohmann · 2016-11-12T18:28:20Z

I'll have a look. I may open a feature branch myself for the moment to fix the conflicts and fiddle around a bit.

- took current state of #337 - adjusted parser to reject incomplete numbers - simplified number parsing code as it could rely on correct numbers

nlohmann · 2016-11-12T21:13:36Z

I opened a feature branch. I changed the parser to reject incomplete numbers. This helped to simplify the number parser. There are still three tests failing - I'll check tomorrow.

jaredgrubb · 2016-11-15T04:45:55Z

src/json.hpp

+                {
+                     if (plus_or_minus)
+                     {
+                         *--str;


Why is there a '*' here?

Thanks for taking a look! @nlohmann want to incorp these into your feature branch?

This makes no sense. Please have a look at https://github.com/nlohmann/json/blob/feature/strtold/src/json.hpp.re2c#L8139 where I started an overworked version of this PR.

jaredgrubb · 2016-11-15T04:46:03Z

src/json.hpp


-        @param[in] type  the @ref number_float_t in use
+                     *--str;


nlohmann · 2016-11-15T06:30:32Z

Please have a look at https://github.com/nlohmann/json/blob/feature/strtold/src/json.hpp.re2c#L8139 where I started an overworked version of this PR, and also the discussion in #302.

TurpentineDistillery · 2016-11-19T18:02:20Z

I see the code parses to long double, and then the return value is cast to float_number_t.
This will yield different results than parsing into float_number_t directly (unless it is long double, of course). See issue #186

TurpentineDistillery · 2016-12-02T02:16:54Z

Apparently implementing strtod correctly is harder than it seems.
Here's mozilla implementation

nlohmann · 2016-12-12T22:21:06Z

@whackashoe Any idea how to proceed here? I agree with @TurpentineDistillery that correct floating-point handling is really hard...

nlohmann · 2016-12-23T09:22:01Z

What about this PR? Is it likely that the remaining issues can be fixed? There is also PR #378 dealing with the same issue.

whackashoe · 2016-12-28T13:32:40Z

It seems like it will need significant changes to work. You can close this if it's dirtying up the queue, I don't have time to work on it for a while.

nlohmann · 2016-12-28T13:34:50Z

Thanks for checking back. I close this for now.

Move definition of common macros (e.g., JSON_HAS_CPP_*) into macros.hpp header file and update unit test sources. Incidentally enables the regression tests for nlohmann#2546 and nlohmann#3070. A CHECK_THROWS_WITH_AS in nlohmann#3070 was disabled, which is tracked here: nlohmann#337

nlohmann and others added 2 commits October 15, 2016 16:47

Merge branch 'release/2.0.6'

60bba02

new strtod/ld/f function

22f3f88

nlohmann added the improvement label Oct 16, 2016

nlohmann reviewed Oct 16, 2016

View reviewed changes

whackashoe added 5 commits October 16, 2016 17:14

remove whitespace loop in strtox

8877ad7

remove parse check for strtox

201e1f3

remove endptr parameter from strtox

b7a978f

remove unused errno setting from strtox

7f5b228

remove unnecessary str pointer and rename arg to be clearer

973ee97

whackashoe added 3 commits October 20, 2016 23:18

nl_isdigit can be made constexpr

7cc60cc

remove unneeded branch

dec60cb

different neg style for consistency

babb65d

nlohmann reviewed Oct 23, 2016

View reviewed changes

whackashoe added 3 commits October 23, 2016 14:55

make negative_exp and plus_or_minus const

308082a

use not over !

66a4c06

add additional comments

bc53b79

nlohmann added the state: help needed the issue needs help to proceed label Nov 2, 2016

nlohmann added a commit that referenced this pull request Nov 12, 2016

work on PR #337 / issue #302

7c44851

- took current state of #337 - adjusted parser to reject incomplete numbers - simplified number parsing code as it could rely on correct numbers

nlohmann mentioned this pull request Nov 13, 2016

Parsing of floats locale dependent #302

Closed

nlohmann removed the kind: improvement label Nov 13, 2016

jaredgrubb reviewed Nov 15, 2016

View reviewed changes

nlohmann added the state: please discuss please discuss the issue or vote for your favorite option label Dec 14, 2016

nlohmann closed this Dec 28, 2016

whackashoe mentioned this pull request Jan 5, 2017

locale-independent num-to-str #378

Merged

falbrechtskirchinger mentioned this pull request Mar 6, 2022

Improve unit testing (Part 1) #3380

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replace strtold with non locale dependent version #337

replace strtold with non locale dependent version #337

whackashoe commented Oct 16, 2016 •

edited

nlohmann commented Oct 16, 2016

nlohmann commented Oct 16, 2016

whackashoe commented Oct 16, 2016

nlohmann Oct 16, 2016

nlohmann Oct 16, 2016

whackashoe commented Oct 16, 2016

whackashoe commented Oct 21, 2016

nlohmann Oct 23, 2016

nlohmann Oct 23, 2016

whackashoe Oct 23, 2016

nlohmann Oct 23, 2016

nlohmann Oct 23, 2016

whackashoe commented Nov 2, 2016

nlohmann commented Nov 11, 2016

whackashoe commented Nov 11, 2016

nlohmann commented Nov 12, 2016

nlohmann commented Nov 12, 2016

jaredgrubb Nov 15, 2016

whackashoe Nov 15, 2016

nlohmann Nov 15, 2016

jaredgrubb Nov 15, 2016

nlohmann Nov 15, 2016

nlohmann commented Nov 15, 2016

TurpentineDistillery commented Nov 19, 2016

TurpentineDistillery commented Dec 2, 2016

nlohmann commented Dec 12, 2016

nlohmann commented Dec 23, 2016

whackashoe commented Dec 28, 2016

nlohmann commented Dec 28, 2016

replace strtold with non locale dependent version #337

replace strtold with non locale dependent version #337

Conversation

whackashoe commented Oct 16, 2016 • edited

BEFORE:

AFTER:

UNIT TESTS:

nlohmann commented Oct 16, 2016

nlohmann commented Oct 16, 2016

whackashoe commented Oct 16, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

whackashoe commented Oct 16, 2016

whackashoe commented Oct 21, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

whackashoe commented Nov 2, 2016

nlohmann commented Nov 11, 2016

whackashoe commented Nov 11, 2016

nlohmann commented Nov 12, 2016

nlohmann commented Nov 12, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nlohmann commented Nov 15, 2016

TurpentineDistillery commented Nov 19, 2016

TurpentineDistillery commented Dec 2, 2016

nlohmann commented Dec 12, 2016

nlohmann commented Dec 23, 2016

whackashoe commented Dec 28, 2016

nlohmann commented Dec 28, 2016

whackashoe commented Oct 16, 2016 •

edited