Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for high-precision numbers in UBJSON encoding #2297

Merged
merged 17 commits into from
Jul 25, 2020
Merged

Conversation

nlohmann
Copy link
Owner

This PR adds high-precision numbers for UBJSON.

  • Entries with a 'H' prefix use the JSON number parser and store the parsed number.
  • When serializing, integer values greater than int64max use the 'H' prefix, followed by a JSON text of the number.

Closes #2286.

@nlohmann nlohmann self-assigned this Jul 21, 2020
@nlohmann nlohmann marked this pull request as draft July 21, 2020 20:04
@nlohmann
Copy link
Owner Author

There is a SIGSEGV when executing the tests with MSVC 2019. I cannot reproduce this issue on my machine (macOS), and since there is no issue when running with Valgrind or the sanitizers, I have no idea what to do.

@nlohmann nlohmann added aspect: binary formats BSON, CBOR, MessagePack, UBJSON state: help needed the issue needs help to proceed labels Jul 21, 2020
@coveralls
Copy link

coveralls commented Jul 21, 2020

Coverage Status

Coverage remained the same at 100.0% when pulling 3366241 on issue2286 into a048b72 on develop.

@t-b
Copy link
Contributor

t-b commented Jul 22, 2020

It is not 100% reproducible. Worked 5 times, once crashed with

3: Test command: E:\projekte\json\build-msvc2019\test\Debug\test-ubjson.exe "--no-skip"
43: Test timeout computed to be: 1500
43: [doctest] doctest version is "2.3.7"
43: [doctest] run with "--help" for options
43: ===============================================================================
43: E:\projekte\json\test\src\unit-ubjson.cpp(2446):
43: TEST CASE:  UBJSON roundtrips
43:   input from self-generated UBJSON files
43:
43: E:\projekte\json\test\src\unit-ubjson.cpp(2446): FATAL ERROR: test case CRASHED: SIGSEGV - Stack overflow
43:
43: ===============================================================================
43: E:\projekte\json\test\src\unit-ubjson.cpp(2446):
43: TEST CASE:  UBJSON roundtrips
43:
43: ===============================================================================
43: [doctest] test cases:      4 |      3 passed |      1 failed |      0 skipped
43: [doctest] assertions: 691826 | 691826 passed |      0 failed |
43: [doctest] Status: FAILURE!
2/2 Test #43: test-ubjson ......................***Exception: SegFault139.84 sec

If I only run that sub case with ./test/Debug/test-ubjson.exe --subcase="UBJSON roundtrips" I can't break it either.

@nlohmann
Copy link
Owner Author

It is not 100% reproducible. Worked 5 times, once crashed with

Can you get a stack trace? I have no idea what even goes wrong...

@fmotty
Copy link

fmotty commented Jul 23, 2020

I was able to reproduce this issue under the debugger and obtain the stack trace. This is a quite a classical stack exhaustion case due to large functions and recursion. The 'get_ubjson_value' function stack demands grew, and some of the tests started failing in debug mode. The test consumed 1 MiB of stack space and crashed. In non-debug mode, with all the optimizations, the test passes.

Basically, in debug mode, the compiler (not only MSVC compiler, others do the same) does not efficiently merge the stack of various unrelated blocks. Consider this example:
void Foo() {
{ char a[1024]; }
{ char b[1024]; }
}
One would expect that the compiler allocates only ~1 KiB of stack space for Foo, but, in fact, it uses ~2 KiB, placing 'a' and 'b' in separate locations. Multiply it by the number of times the function reenters, and you can see how stack is quickly exhausted.
The stack trace is attached.
stacktrace.txt

In order to verify the theory that the newly added code is overfilling the stack size limit, I did a quick hack and wrapped the new code inside the added "case 'H'" switch branch into the lambda object, and the test passed.

The easiest mitigation would be to extract the new "case 'H'" code into a separate method. This way, the 'get_ubjson_value' method stack demands will shrink, and the test will pass. Alternatively, the test may be altered to provide the larger stack.

@nlohmann nlohmann marked this pull request as ready for review July 23, 2020 10:47
@nlohmann
Copy link
Owner Author

@fmotty Thanks so much for the detailed analysis!

@nlohmann nlohmann removed the state: help needed the issue needs help to proceed label Jul 23, 2020
@nlohmann nlohmann added this to the Release 3.8.1 milestone Jul 23, 2020
@t-b
Copy link
Contributor

t-b commented Jul 23, 2020

@fmotty Thanks for chiming in!

@nlohmann nlohmann merged commit 63e7c40 into develop Jul 25, 2020
@nlohmann nlohmann deleted the issue2286 branch July 25, 2020 19:50
@nlohmann
Copy link
Owner Author


🔖 Release item

This issue/PR will be part of the next release of the library. This template helps preparing the release notes.

Type

  • ✨ New Feature
  • 🐛 Bug Fix
  • ⚡️ Improvement
  • 🔨 Further Change
  • 🔥 Deprecated function

Description


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for high-precision numbers in UBJSON encoding
4 participants