Prevent stackoverflow caused by recursive deconstruction #1436

nickaein · 2019-01-15T17:33:27Z

This pull request fixes #832, #1419, #1835.

For a deeply-nested JSON object, the recursive implementation of json_value::destroy function causes stack overflow.

This is problematic for users that wish to expose a public API since an attacker can crash their systems by sending a deeply nested JSON request (e.g. a request containing 512K bytes of [).

In this pull request:

First, a regression test has been added for this particular issue. We should probably add more regression tests for more complex scenarios.
json_value::destroy method has been modified to deconstruct children of the JSON value iteratively to prevent triggering a recursive deconstruction. The computational complexity and memory requirement of this approach is O(n) which is equivalent to previous recursive deconstruction. Also, std::move has been used to minimize the overhead of flattening original JSON object.

One question: The iterative deconstruction can be also implemented inside ~basic_json::basic_json. I am not sure which place is more appropriate. I have currently implemented it inside json_value::destory merely because I find it easier.

Pull request checklist

Read the Contribution Guidelines for detailed information.

Changes are described in the pull request, or an existing issue is referenced.
The test suite compiles and runs without error.
Code coverage is 100%. Test cases can be added by editing the test suite.
The source code is amalgamated; that is, after making changes to the sources in the include/nlohmann directory, run make amalgamate to create the single-header file single_include/nlohmann/json.hpp. The whole process is described here.

Please don't

The C++11 support varies between different compilers and versions. Please note the list of supported compilers. Some compilers like GCC 4.7 (and earlier), Clang 3.3 (and earlier), or Microsoft Visual Studio 13.0 and earlier are known not to work due to missing or incomplete C++11 support. Please refrain from proposing changes that work around these compiler's limitations with #ifdefs or other means.
Specifically, I am aware of compilation problems with Microsoft Visual Studio (there even is an issue label for these kind of bugs). I understand that even in 2016, complete C++11 support isn't there yet. But please also understand that I do not want to drop features or uglify the code just to make Microsoft's sub-standard compiler happy. The past has shown that there are ways to express the functionality such that the code compiles with the most recent MSVC - unfortunately, this is not the main objective of the project.
Please refrain from proposing changes that would break JSON conformance. If you propose a conformant extension of JSON to be supported by the library, please motivate this extension.
Please do not open pull requests that address multiple issues.

nlohmann · 2019-01-15T20:18:31Z

How is this PR related to #1431?

nickaein · 2019-01-15T21:28:20Z

Wow! I didn't notice there is already an open PR on this. Such an active community!

Skimming through the code, the general idea is the same as this PR. Both use heap-allocated stack to unfold the JSON object. I haven't tested it yet though.

How should we proceed now?

coveralls · 2019-01-16T00:50:07Z

Coverage remained the same at 100.0% when pulling 7e2445a on nickaein:iterate-on-destruction into 67259d6 on nlohmann:develop.

nlohmann · 2019-01-16T19:28:57Z

@nickaein It would be great if you could review PR #1431 so we could have one PR with a "best of" of your ideas.

nickaein · 2019-01-27T23:08:22Z

@nlohmann Sorry for the long delay. While this PR and #1431 are in the same spirit, I believe this PR is more concise and easier to read. Also, it is accompanied by a basic unit test to prevent regression.

Regarding the other PR, I don't quite understand the implementation thus I cannot verify its validity. Specifically, I am not sure why there is a break at https://github.com/nlohmann/json/pull/1431/files#diff-641eb804df1899b4e8277af5a52e6b7dR1020. As the value object might have many children of type object/array, we should not stop processing them after the first hit.

stale · 2019-02-27T03:33:25Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

nlohmann · 2019-11-09T10:09:59Z

@nickaein I just resurrected this from the dead and forgotten, as I found #1835. I'm curious whether your solution fixes this.

nickaein · 2019-11-09T10:47:05Z

@nlohmann Yes, this will probably fix #1835. I will check it with bad_json_parsers to make sure. Thanks for reminding me!

…uction

nickaein · 2019-11-09T12:05:17Z

I can confirm it passes bad_json_parsers without crashing on any depths (1 to 5,000,000) provided in their test.

nlohmann

Looks good to me.

nickaein · 2019-11-09T15:05:04Z

Travis CI stalled on one of the jobs (https://travis-ci.org/nlohmann/json/jobs/609598763?utm_medium=notification&utm_source=github_status). Restarting it might help.

nickaein · 2019-11-09T18:16:14Z

Once again, a timeout issue with CI :-). Travis CI terminates valgrind test-regression since it doesn't output anything for 10 minutes.

This is caused by creating a very deep JSON object in unit-test, that requires a lot of memory allocations hence imposing a heavy burden on valgrind to check for memory leaks.

To mitigate this problem, I moved this test to a separate unit-test (unit-large_json.cpp). If this doesn't help, the next step would be running tests with travis_wait as indicated in Travis docs to periodically output a message.

Meanwhile, breaking down test-regression into multiple unit-tests could be helpful too since it already takes a considerable amount of time to run with valgrind (~5min on my workstation).

nlohmann · 2019-11-10T10:56:57Z

Thanks for taking care about this. I would have taken a more complex approach and would have used the parser's callback mechanism to produce some output to tame Travis. But your approach is much cleaner. Thanks a lot!

Sorry for not have merged this issue earlier in January. I have no idea why it slipped through. But I did remember it the moment I saw the stacktrace in #1835.

nlohmann · 2019-11-10T11:00:57Z

🔖 Release item

This issue/PR will be part of the next release of the library. This template helps preparing the release notes.

Type

Description

Fixed a bug that lead to stack overflow for deeply nested JSON values (objects, array) by changing the implementation of the destructor from a recursive to an iterative approach. Stack-overflow (OSS-Fuzz 4234) #832, Segmentation fault (stack overflow) due to unbounded recursion #1419, Segmentation fault in destructor in case of large inputs #1835

nickaein · 2019-11-10T15:17:41Z

No problem at all! I had forgotten about it too, thanks for reminding me :-)

I would have taken a more complex approach and would have used the parser's callback mechanism

That's interesting, I didn't know that is possible.

BTW, there were more considerations on Travis build time which posted in #1836 but I doubt if that qualify for an issue.

ilyapopov · 2019-11-11T00:35:27Z

There seems to be a problem with the implementation:

Calling vector::reserve in a loop can be big pessimization.It effectively disables exponential resizing, and makes everything O(n^2).

See a benchmark here: http://quick-bench.com/KT304ZLCCZpN8sd1tVzrGaPkl4A

Another question is: an array is cleared after it is inserted to the stack, but not an object. I suppose it will cause only one level of recursion (because values are now moved-from), so maybe it is fine.

nickaein · 2019-11-11T08:16:46Z

Thanks for the feedback.

It effectively disables exponential resizing, and makes everything O(n^2).

Very good point! I will look into it. PRs are welcome too :-)

Another question is: an array is cleared after it is inserted to the stack, but not an object. I suppose it will cause only one level of recursion (because values are now moved-from), so maybe it is fine.

That's correct. I can't remember why the object wasn't explicitly cleared. While it might not make any difference, it's nice to clear it for the sake of consistency at least.

nickaein · 2019-11-11T08:45:44Z

Update: Refer to #1436 (comment) for actual benchmark on object destruction.

While the use of vector::reserve is sub-optimal in this case, the current benchmarks doesn't show a performance regression. They don't aim to precisely measure creation/destruction though. Also the previous state (destruction through recursion) might also had some overhead that worked in our favor.

Nevertheless, we still might have to get rid of vector::reserve here. In addition, having dedicated benchmarks for measure the cost of creation/destruction can be more telling and let us avoid regressions in future.

Benchmark results on Release 3.7.1 (recursive destruction): aacdc6b

2019-11-11 11:55:52
Running ./json_benchmarks
Run on (12 X 4700 MHz CPU s)
CPU Caches:
  L1 Data 32K (x6)
  L1 Instruction 32K (x6)
  L2 Unified 256K (x6)
  L3 Unified 12288K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
---------------------------------------------------------------------
Benchmark                              Time           CPU Iterations
---------------------------------------------------------------------
ParseFile/jeopardy             653391190 ns  653388685 ns          1   81.0865MB/s
ParseFile/canada                35413237 ns   35412950 ns         20    60.621MB/s
ParseFile/citm_catalog          11998248 ns   11997873 ns         57    137.29MB/s
ParseFile/twitter                5727503 ns    5727253 ns        118   105.157MB/s
ParseFile/floats               284392213 ns  284392997 ns          2   76.0222MB/s
ParseFile/signed_ints          211342921 ns  211340574 ns          3    110.01MB/s
ParseFile/unsigned_ints        202955892 ns  202956661 ns          3   114.644MB/s
ParseFile/small_signed_ints    102841627 ns  102841195 ns          6   114.884MB/s
ParseString/jeopardy           517309175 ns  517291784 ns          1    102.42MB/s
ParseString/canada              32083381 ns   32083357 ns         22   66.9122MB/s
ParseString/citm_catalog         9325681 ns    9325523 ns         74   176.632MB/s
ParseString/twitter              4918281 ns    4918184 ns        132   122.455MB/s
ParseString/floats             255695011 ns  255683982 ns          3   84.5582MB/s
ParseString/signed_ints        201151854 ns  201149679 ns          4   115.584MB/s
ParseString/unsigned_ints      190463687 ns  190451861 ns          4   122.172MB/s
ParseString/small_signed_ints   96172104 ns   96172035 ns          7    122.85MB/s
Dump/jeopardy / -              206485526 ns  206478523 ns          4   242.525MB/s
Dump/jeopardy / 4              243926376 ns  243925607 ns          3   273.143MB/s
Dump/canada / -                  8516601 ns    8516379 ns         83   234.075MB/s
Dump/canada / 4                 10655437 ns   10655448 ns         66   725.983MB/s
Dump/citm_catalog / -            1276515 ns    1276520 ns        549   373.768MB/s
Dump/citm_catalog / 4            1551297 ns    1551254 ns        452   1061.84MB/s
Dump/twitter / -                 1359369 ns    1359355 ns        516   327.564MB/s
Dump/twitter / 4                 1501429 ns    1501402 ns        484   487.378MB/s
Dump/floats / -                 69716166 ns   69713492 ns          9   255.417MB/s
Dump/floats / 4                 75994594 ns   75987473 ns          8    297.08MB/s
Dump/signed_ints / -            28986198 ns   28986086 ns         24   670.494MB/s
Dump/signed_ints / 4            33516721 ns   33515637 ns         21   722.151MB/s
Dump/unsigned_ints / -          27785412 ns   27785499 ns         25   700.119MB/s
Dump/unsigned_ints / 4          32583436 ns   32583744 ns         21   743.362MB/s
Dump/small_signed_ints / -      16823679 ns   16822873 ns         40   418.858MB/s
Dump/small_signed_ints / 4      22616060 ns   22615501 ns         31   522.419MB/s

**Benchmark results on Release 3.7.2 (non-recursive destruction): 56109ea **

2019-11-11 11:58:56
Running ./json_benchmarks
Run on (12 X 4700 MHz CPU s)
CPU Caches:
  L1 Data 32K (x6)
  L1 Instruction 32K (x6)
  L2 Unified 256K (x6)
  L3 Unified 12288K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
---------------------------------------------------------------------
Benchmark                              Time           CPU Iterations
---------------------------------------------------------------------
ParseFile/jeopardy             619635360 ns  619633724 ns          1   85.5038MB/s
ParseFile/canada                33962428 ns   33961601 ns         21   63.2117MB/s
ParseFile/citm_catalog          11491394 ns   11490995 ns         61   143.346MB/s
ParseFile/twitter                5442961 ns    5442828 ns        119   110.652MB/s
ParseFile/floats               271742571 ns  271735064 ns          3   79.5634MB/s
ParseFile/signed_ints          208148545 ns  208146316 ns          3   111.699MB/s
ParseFile/unsigned_ints        203907483 ns  203906161 ns          3   114.111MB/s
ParseFile/small_signed_ints    102751059 ns  102750017 ns          7   114.986MB/s
ParseString/jeopardy           515605364 ns  515591583 ns          1   102.758MB/s
ParseString/canada              32018308 ns   32017772 ns         22   67.0493MB/s
ParseString/citm_catalog         8819107 ns    8818931 ns         79   186.779MB/s
ParseString/twitter              4868723 ns    4868714 ns        145     123.7MB/s
ParseString/floats             254895582 ns  254888122 ns          3   84.8222MB/s
ParseString/signed_ints        187500549 ns  187496572 ns          4   124.001MB/s
ParseString/unsigned_ints      182429842 ns  182427692 ns          4   127.546MB/s
ParseString/small_signed_ints   91003202 ns   91001957 ns          8    129.83MB/s
Dump/jeopardy / -              210882048 ns  210866880 ns          3   237.478MB/s
Dump/jeopardy / 4              250152286 ns  250141778 ns          3   266.356MB/s
Dump/canada / -                  8493631 ns    8493616 ns         81   234.702MB/s
Dump/canada / 4                 10607660 ns   10607275 ns         66    729.28MB/s
Dump/citm_catalog / -            1276988 ns    1276959 ns        551   373.639MB/s
Dump/citm_catalog / 4            1568202 ns    1568196 ns        452   1050.37MB/s
Dump/twitter / -                 1339110 ns    1339101 ns        519   332.519MB/s
Dump/twitter / 4                 1441126 ns    1441108 ns        479   507.769MB/s
Dump/floats / -                 69699528 ns   69698395 ns          9   255.472MB/s
Dump/floats / 4                 75024261 ns   75023187 ns          9   300.899MB/s
Dump/signed_ints / -            29055360 ns   29054091 ns         24   668.924MB/s
Dump/signed_ints / 4            33627926 ns   33626963 ns         21    719.76MB/s
Dump/unsigned_ints / -          27661213 ns   27660553 ns         25   703.281MB/s
Dump/unsigned_ints / 4          32133900 ns   32132047 ns         22   753.812MB/s
Dump/small_signed_ints / -      16330746 ns   16330614 ns         42   431.484MB/s
Dump/small_signed_ints / 4      22447317 ns   22447042 ns         31    526.34MB/s

ilyapopov · 2019-11-11T09:08:32Z

Current parsing benchmarks specifically exclude destruction time from measurements.

json/benchmarks/src/benchmarks.cpp

Lines 22 to 25 in 411158d

    
           state.PauseTiming(); 
        
           delete f; 
        
           delete j; 
        
           state.ResumeTiming();

nickaein · 2019-11-11T10:33:11Z

I modified the benchmarks to only measure delete j time. The results show that there isn't that much of difference with or without reserve. Actually, having reserve gives us a slightly better performance, probably because there are not too many re-allocations during the deconstruction and reversing can be occasionally helpful.

The performance of recursive deconstruction is significantly higher, but I believe avoiding the stack-overflow worth this performance penalty in destruction.

non-recursive with vector::reverse:


2019-11-11 13:39:20
Running ./json_benchmarks
Run on (12 X 4700 MHz CPU s)
CPU Caches:
  L1 Data 32K (x6)
  L1 Instruction 32K (x6)
  L2 Unified 256K (x6)
  L3 Unified 12288K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
--------------------------------------------------------------
Benchmark                       Time           CPU Iterations
--------------------------------------------------------------
ParseFile/jeopardy         216309 us     216310 us          6   244.931MB/s
ParseFile/canada             3581 us       3581 us        195   599.517MB/s
ParseFile/citm_catalog       1642 us       1642 us        424   1003.14MB/s
ParseFile/twitter             560 us        560 us       1245   1075.24MB/s

non-recursive without vector::reverse:

2019-11-11 13:45:43
Running ./json_benchmarks
Run on (12 X 4700 MHz CPU s)
CPU Caches:
  L1 Data 32K (x6)
  L1 Instruction 32K (x6)
  L2 Unified 256K (x6)
  L3 Unified 12288K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
--------------------------------------------------------------
Benchmark                       Time           CPU Iterations
--------------------------------------------------------------
ParseFile/jeopardy         231977 us     231974 us          6   228.392MB/s
ParseFile/canada             3139 us       3139 us        223   683.868MB/s
ParseFile/citm_catalog       1696 us       1696 us        416   971.453MB/s
ParseFile/twitter             617 us        617 us       1114   976.121MB/s

recursive:

2019-11-11 13:36:07
Running ./json_benchmarks
Run on (12 X 4700 MHz CPU s)
CPU Caches:
  L1 Data 32K (x6)
  L1 Instruction 32K (x6)
  L2 Unified 256K (x6)
  L3 Unified 12288K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
--------------------------------------------------------------
Benchmark                       Time           CPU Iterations
--------------------------------------------------------------
ParseFile/jeopardy         161115 us     161113 us         10   328.844MB/s
ParseFile/canada             1424 us       1424 us        489   1.47193GB/s
ParseFile/citm_catalog        426 us        426 us       1629   3.77215GB/s
ParseFile/twitter             234 us        234 us       2992   2.51837GB/s

ilyapopov · 2019-11-12T17:05:56Z

I made a benchmark that demonstrates the issue. To avoid this being buried in the comments to a closed PR, I created a separate issue #1837

This includes the following fixes: nlohmann/json#1436 > For a deeply-nested JSON object, the recursive implementation of json_value::destroy function causes stack overflow. nlohmann/json#1708 nlohmann/json#1722 Stack size nlohmann/json#1693 (comment) Integer Overflow nlohmann/json#1447 UTF8, json dump out of bounds nlohmann/json#1445 Possibly influences #7532

nickaein added 2 commits January 15, 2019 20:22

Add unit test for parsing deeply-nested array

47fe4b9

During destruction, flatten children of objects to avoid recursion

f0883dd

Merge branch 'develop' into iterate-on-destruction

372c4d2

stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Feb 27, 2019

stale bot closed this Mar 6, 2019

nlohmann mentioned this pull request Nov 9, 2019

Segmentation fault in destructor in case of large inputs #1835

Closed

nlohmann reopened this Nov 9, 2019

stale bot removed the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Nov 9, 2019

Merge remote-tracking branch 'nlohmann/develop' into iterate-on-destr…

eec1974

…uction

nlohmann self-assigned this Nov 9, 2019

nlohmann added this to the Release 3.7.2 milestone Nov 9, 2019

nlohmann added the release item: 🐛 bug fix label Nov 9, 2019

nlohmann approved these changes Nov 9, 2019

View reviewed changes

nickaein added 2 commits November 9, 2019 21:19

Reduce depth in unit-test to avoid choking valgrind

68d0a7b

Move deep JSON test to a separate unit-test

7e2445a

nlohmann approved these changes Nov 10, 2019

View reviewed changes

nlohmann merged commit 0a513a3 into nlohmann:develop Nov 10, 2019

nlohmann mentioned this pull request Nov 10, 2019

Segmentation fault (stack overflow) due to unbounded recursion #1419

Closed

nickaein mentioned this pull request Nov 10, 2019

Add support for parse stack limiting #1788

Closed

ilyapopov mentioned this pull request Nov 12, 2019

Quadratic destruction complexity introduced in #1436 #1837

Closed

This was referenced Dec 13, 2019

JSON: Update to nlohmann/json 3.7.3 Icinga/icinga2#7712

Merged

JSON-RPC Crashes with 2.11 Icinga/icinga2#7532

Closed

nickaein mentioned this pull request Feb 7, 2020

Heap corruption issue #1929

Closed

nickaein deleted the iterate-on-destruction branch April 23, 2021 23:33

t-b mentioned this pull request Aug 5, 2021

An Ubjson Parsing Problem Can Easily Cause DDoS Attack. #2793

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent stackoverflow caused by recursive deconstruction #1436

Prevent stackoverflow caused by recursive deconstruction #1436

nickaein commented Jan 15, 2019 •

edited by nlohmann

Loading

nlohmann commented Jan 15, 2019

nickaein commented Jan 15, 2019

coveralls commented Jan 16, 2019 •

edited

Loading

nlohmann commented Jan 16, 2019

nickaein commented Jan 27, 2019

stale bot commented Feb 27, 2019

nlohmann commented Nov 9, 2019

nickaein commented Nov 9, 2019

nickaein commented Nov 9, 2019 •

edited

Loading

nlohmann left a comment

nickaein commented Nov 9, 2019

nickaein commented Nov 9, 2019

nlohmann commented Nov 10, 2019

nlohmann commented Nov 10, 2019

nickaein commented Nov 10, 2019

ilyapopov commented Nov 11, 2019

nickaein commented Nov 11, 2019

nickaein commented Nov 11, 2019 •

edited

Loading

ilyapopov commented Nov 11, 2019

nickaein commented Nov 11, 2019 •

edited

Loading

ilyapopov commented Nov 12, 2019

Prevent stackoverflow caused by recursive deconstruction #1436

Prevent stackoverflow caused by recursive deconstruction #1436

Conversation

nickaein commented Jan 15, 2019 • edited by nlohmann Loading

Pull request checklist

Please don't

nlohmann commented Jan 15, 2019

nickaein commented Jan 15, 2019

coveralls commented Jan 16, 2019 • edited Loading

nlohmann commented Jan 16, 2019

nickaein commented Jan 27, 2019

stale bot commented Feb 27, 2019

nlohmann commented Nov 9, 2019

nickaein commented Nov 9, 2019

nickaein commented Nov 9, 2019 • edited Loading

nlohmann left a comment

Choose a reason for hiding this comment

nickaein commented Nov 9, 2019

nickaein commented Nov 9, 2019

nlohmann commented Nov 10, 2019

nlohmann commented Nov 10, 2019

🔖 Release item

Type

Description

nickaein commented Nov 10, 2019

ilyapopov commented Nov 11, 2019

nickaein commented Nov 11, 2019

nickaein commented Nov 11, 2019 • edited Loading

ilyapopov commented Nov 11, 2019

nickaein commented Nov 11, 2019 • edited Loading

ilyapopov commented Nov 12, 2019

nickaein commented Jan 15, 2019 •

edited by nlohmann

Loading

coveralls commented Jan 16, 2019 •

edited

Loading

nickaein commented Nov 9, 2019 •

edited

Loading

nickaein commented Nov 11, 2019 •

edited

Loading

nickaein commented Nov 11, 2019 •

edited

Loading