Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON.parse optimizations, refactoring, DoS-attack prevention #1485

Merged
merged 5 commits into from Mar 16, 2023

Conversation

tomatosalat0
Copy link
Contributor

@tomatosalat0 tomatosalat0 commented Mar 8, 2023

Hi there,

first PR for this project, hopefully got everything right 😃.

I tried to not modify too much to not destroy previous knowledge of the parser code.

Summary

  • JSON.parse now takes about half the time to complete
  • JSON.parse now only allocates about 20% of the previously required memory
  • JSON.parse more aligned to the JSON specification
  • Added recursion limit to JsonParser to prevent StackOverflow-Exception

Adjustments to the parser

The old parser did parse the following strings, but they are not valid according to the JSON specification:

  • "[1,]": return value was JsArray([1, null])
  • ".1": return value was JsNumber(0.1)

These will now throw a JavaScriptException. These cases are baked by tests.

Recursion limit

Previously the parser did not have a recursion limit. A specially crafted JSON string could lead to a non-handlable StackOverflowException. I've added a max. recursion limit which is valid for arrays and objects. The default value is 64 (same as the default value of System.Text.Json). The limit can get adjusted per Parser-Instance or globally using JsonParser.DefaultMaxDepth.

Performance Improvements

I've refactored several areas of the JsonParser to improve the performance. I validated the changes running BenchmarkDotNet using different JSON files. I've used the following sources to get sample JSON files for the benchmark:

For that I had to remove some old code which either wasn't used at all (like the _extra-field) or did not return any useful result with the current implementation (like ParseJsonObject() which is now private or MarkEnd...).

While the old code was generally about 20% slower compared to the Json.NET parser (Newtonsoft.Json), it now requires about half Json.NET needs when used by calling JsonConvert.DeserializeObject(). It is still slower compared to more optimized parsers like System.Text.Json or SpanJson.

Test execution

The test was basically just

[Benchmark]
public void Parse() 
{
    var parser = new JsonParser(_engine);
    parser.Parse( [json string already in memory] );
}

To make both parsers runnable within the same test run, I simply copied the changes temporarily into a new class JsonParser2 and kept the other class unchanged.

Results

(quite a lot, but the relevant columns are "Ratio" and "Alloc Ratio")

BenchmarkDotNet=v0.13.4, OS=Windows 10 (10.0.19042.2251/20H2/October2020Update)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=7.0.102
  [Host]     : .NET 6.0.13 (6.0.1322.58009), X64 RyuJIT AVX2
  DefaultJob : .NET 6.0.13 (6.0.1322.58009), X64 RyuJIT AVX2

Method FileName Mean Error StdDev Ratio RatioSD Gen0 Gen1 Gen2 Allocated Alloc Ratio
OldParser algol(...)ation [31] 1,534,348.91 μs 21,656.919 μs 20,257.895 μs 1.00 0.00 94000.0000 33000.0000 4000.0000 1553748.27 KB 1.00
NewParser algol(...)ation [31] 775,250.79 μs 8,302.750 μs 7,766.397 μs 0.51 0.01 21000.0000 11000.0000 2000.0000 326915.87 KB 0.21
OldParser australia-abc 312.18 μs 2.246 μs 2.101 μs 1.00 0.00 62.5000 19.0430 - 1022.43 KB 1.00
NewParser australia-abc 143.85 μs 0.700 μs 0.655 μs 0.46 0.00 12.2070 3.6621 - 201.73 KB 0.20
OldParser bestbuy_dataset 528,457.14 μs 10,458.706 μs 10,271.847 μs 1.00 0.00 41000.0000 13000.0000 3000.0000 650111.23 KB 1.00
NewParser bestbuy_dataset 304,451.35 μs 5,150.079 μs 4,565.411 μs 0.58 0.02 8000.0000 4500.0000 1500.0000 117774.67 KB 0.18
OldParser bitcoin 212.40 μs 1.383 μs 1.294 μs 1.00 0.00 38.3301 8.7891 - 630.01 KB 1.00
NewParser bitcoin 101.33 μs 0.612 μs 0.543 μs 0.48 0.00 8.0566 2.1973 - 133.33 KB 0.21
OldParser canada 116,232.41 μs 2,244.857 μs 3,146.976 μs 1.00 0.00 12400.0000 3800.0000 1400.0000 186975.62 KB 1.00
NewParser canada 58,226.87 μs 844.937 μs 790.354 μs 0.50 0.02 1444.4444 777.7778 222.2222 20835 KB 0.11
OldParser citm_catalog 22,420.39 μs 446.148 μs 458.161 μs 1.00 0.00 2000.0000 937.5000 125.0000 32245.68 KB 1.00
NewParser citm_catalog 8,347.13 μs 59.575 μs 55.726 μs 0.37 0.01 468.7500 218.7500 - 7862.44 KB 0.24
OldParser doj-blog 1,074.00 μs 7.967 μs 7.452 μs 1.00 0.00 263.6719 99.6094 - 4334.5 KB 1.00
NewParser doj-blog 390.77 μs 1.634 μs 1.365 μs 0.36 0.00 33.6914 16.6016 - 555.34 KB 0.13
OldParser eu-lobby-country 134.93 μs 1.002 μs 0.937 μs 1.00 0.00 28.0762 5.6152 - 459.2 KB 1.00
NewParser eu-lobby-country 60.97 μs 0.512 μs 0.454 μs 0.45 0.01 5.2490 0.9766 - 85.88 KB 0.19
OldParser eu-lobby-financial 745.04 μs 5.226 μs 4.888 μs 1.00 0.00 137.6953 50.7813 - 2258.68 KB 1.00
NewParser eu-lobby-financial 343.75 μs 2.047 μs 1.815 μs 0.46 0.00 27.3438 9.2773 - 451.91 KB 0.20
OldParser eu-lobby-repr 1,801.72 μs 13.256 μs 11.070 μs 1.00 0.00 337.8906 146.4844 - 5529.42 KB 1.00
NewParser eu-lobby-repr 818.08 μs 3.248 μs 2.879 μs 0.45 0.00 59.5703 21.4844 - 982.02 KB 0.18
OldParser github-events 1,126.93 μs 7.823 μs 6.935 μs 1.00 0.00 207.0313 85.9375 - 3383.16 KB 1.00
NewParser github-events 500.68 μs 2.872 μs 2.686 μs 0.44 0.00 41.0156 18.5547 - 677.24 KB 0.20
OldParser github-gists 665.14 μs 7.119 μs 6.659 μs 1.00 0.00 130.8594 50.7813 - 2147.94 KB 1.00
NewParser github-gists 289.54 μs 1.009 μs 0.944 μs 0.44 0.00 26.3672 9.2773 - 432.67 KB 0.20
OldParser inspe(...)yload [22] 326,536.84 μs 6,355.739 μs 5,634.197 μs 1.00 0.00 22000.0000 8000.0000 2000.0000 338250.76 KB 1.00
NewParser inspe(...)yload [22] 170,099.88 μs 2,120.607 μs 1,879.863 μs 0.52 0.01 5000.0000 2750.0000 750.0000 76775.04 KB 0.23
OldParser json-generator 130.27 μs 0.933 μs 0.873 μs 1.00 0.00 25.3906 4.8828 - 416.81 KB 1.00
NewParser json-generator 63.97 μs 0.309 μs 0.289 μs 0.49 0.00 5.6152 1.0986 - 93.25 KB 0.22
OldParser meteorites 6,377.50 μs 93.836 μs 87.774 μs 1.00 0.00 757.8125 289.0625 - 12844.52 KB 1.00
NewParser meteorites 2,539.29 μs 13.003 μs 12.163 μs 0.40 0.01 218.7500 109.3750 - 3593.54 KB 0.28
OldParser movies 210,554.93 μs 4,070.923 μs 5,572.325 μs 1.00 0.00 11333.3333 5333.3333 1666.6667 175284.86 KB 1.00
NewParser movies 78,367.06 μs 552.833 μs 517.120 μs 0.38 0.01 3000.0000 1714.2857 571.4286 43000.17 KB 0.25
OldParser reddit-scala 1,147.43 μs 11.760 μs 10.425 μs 1.00 0.00 224.6094 99.6094 - 3674.52 KB 1.00
NewParser reddit-scala 554.86 μs 3.855 μs 3.606 μs 0.48 0.00 47.8516 23.4375 - 786.67 KB 0.21
OldParser rick-morty 231.36 μs 1.671 μs 1.482 μs 1.00 0.00 47.8516 12.9395 - 783.19 KB 1.00
NewParser rick-morty 108.73 μs 0.682 μs 0.569 μs 0.47 0.00 10.0098 2.4414 - 165.44 KB 0.21
OldParser temp-anomaly 47.08 μs 0.348 μs 0.326 μs 1.00 0.00 8.2397 0.7324 - 135.41 KB 1.00
NewParser temp-anomaly 26.79 μs 0.156 μs 0.146 μs 0.57 0.00 2.5330 0.1831 - 41.5 KB 0.31
OldParser thai-cinemas 148.31 μs 1.063 μs 0.888 μs 1.00 0.00 30.2734 6.5918 - 496.79 KB 1.00
NewParser thai-cinemas 73.76 μs 0.339 μs 0.301 μs 0.50 0.00 7.8125 1.9531 - 129.46 KB 0.26
OldParser turkish 15,746.64 μs 271.154 μs 240.371 μs 1.00 0.00 1468.7500 687.5000 109.3750 22860.31 KB 1.00
NewParser turkish 6,270.93 μs 71.383 μs 66.771 μs 0.40 0.01 406.2500 203.1250 - 6742 KB 0.29
OldParser twitter 8,107.72 μs 110.434 μs 103.300 μs 1.00 0.00 968.7500 375.0000 - 16247.7 KB 1.00
NewParser twitter 3,179.93 μs 20.212 μs 18.907 μs 0.39 0.00 226.5625 113.2813 - 3727.24 KB 0.23
OldParser twitt(...)ponse [28] 120.06 μs 0.595 μs 0.528 μs 1.00 0.00 23.8037 4.0283 - 390.14 KB 1.00
NewParser twitt(...)ponse [28] 57.25 μs 0.344 μs 0.322 μs 0.48 0.00 5.1270 1.2207 - 84.57 KB 0.22
OldParser twitter_api_response 140.73 μs 1.026 μs 0.960 μs 1.00 0.00 27.0996 5.1270 - 443.7 KB 1.00
NewParser twitter_api_response 71.70 μs 0.648 μs 0.606 μs 0.51 0.00 5.8594 1.4648 - 97.39 KB 0.22

Again, I hope I got everything right.

* JSON.parse now only allocates about 20% of the previously needed memory
* JSON.parse more strict according to the JSON spec
* Added JSON.parse recursion limit to prevent StackOverflow-Exception
return ConstructFast(contents, 0, contents.Length);
}

internal JsArray ConstructFast(JsValue[] contents, int offset, int length)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change seems unnecessary, also probably causes more array bounds checks due to calculation, maybe revert this and check my suggestions at call sites

namespace Jint.Native.Json
{
public sealed class JsonParser
{
private readonly Engine _engine;
private readonly int _maxDepth;

/// <summary>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment seems to be out-of-place

}

private Extra _extra = null!;
public JsonParser(Engine engine, uint maxDepth)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uint seems odd for public API in this case, could take int and throw argument exception if <= 0?

private static bool IsDecimalDigit(char ch)
{
return (ch >= '0' && ch <= '9');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be more readable without the unchecked and uint is more common type when doing these overlow-based checks

ch >= 'a' && ch <= 'f' ||
ch >= 'A' && ch <= 'F'
;
unchecked
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for unchecked, uint is the usual idiom

Expect("{");
if ((++state.CurrentDepth) > _maxDepth)
ThrowDepthLimitReached(_lookahead);
Expect(ref state, '{');

var obj = _engine.Realm.Intrinsics.Object.Construct(Arguments.Empty);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var obj = new JsObject(_engine);

return ParseJsonObject();
return Lex(ref state).Value;
case Tokens.Punctuator:
if (_lookahead.FirstCharacter == '[')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

braces

@@ -826,14 +728,25 @@ private enum Tokens
EOF,
};

class Token
private class Token
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should mark sealed

Jint/Options.cs Outdated
/// The maximum depth allowed when parsing JSON files using "JSON.parse",
/// defaults to 64.
/// </summary>
public uint MaxJsonParseDepth { get; set; } = 64u;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to have separate public JsonOptions Json { get; set} = new JsonOptions(); under Options so if we later want more behavioral configuration it would all be there?

var engine = new Engine(options => options.Json.MaxParseDepth = 42); // note shorter property name

}

private Token ScanNumericLiteral()
private string ScanPuncatatorValue(int start, char code)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScanPunctuatorValue

@lahma
Copy link
Collaborator

lahma commented Mar 12, 2023

Quite impressive PR backed with great improvement numbers, nice work! I've added some comments, please take a look.

@tomatosalat0
Copy link
Contributor Author

Hi,

thanks for taking your time to review it. Will go through the comments and adjust the PR accordingly. Might take a day or two, depending on when I find some time to do it.

@tomatosalat0
Copy link
Contributor Author

I applied all suggestions. Hopefully I didn't miss any braces 😄.

I also did a quick Benchmark-Re-Run with the suggested modifications and from a quick glance, it even got a tiny little bit faster now. Not much (and not always) - but at least a little bit 😀.

The movies.json file (source) which has a file size of 3MB, took 800us less time and allocated 5KB less memory. The turkish.json file (same source) with has a file size of 700KB took 70us longer to parse while allocated 100 byte less.

Here are the results if you are interested: (NewParser means "After I applied your suggestions" and OldParser means "Before applying your suggestions").

BenchmarkDotNet=v0.13.4, OS=Windows 10 (10.0.19042.2251/20H2/October2020Update)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=7.0.102
  [Host]     : .NET 6.0.13 (6.0.1322.58009), X64 RyuJIT AVX2
  DefaultJob : .NET 6.0.13 (6.0.1322.58009), X64 RyuJIT AVX2

Method FileName Mean Error StdDev Ratio RatioSD Gen0 Gen1 Gen2 Allocated Alloc Ratio
OldParser algol(...)ation [31] 779,612.07 μs 11,332.450 μs 10,045.922 μs 1.00 0.00 21000.0000 11000.0000 2000.0000 326914.01 KB 1.00
NewParser algol(...)ation [31] 778,467.57 μs 9,529.966 μs 8,448.066 μs 1.00 0.02 21000.0000 11000.0000 2000.0000 326915.76 KB 1.00
OldParser australia-abc 140.83 μs 0.661 μs 0.586 μs 1.00 0.00 12.2070 3.6621 - 201.73 KB 1.00
NewParser australia-abc 140.25 μs 0.489 μs 0.433 μs 1.00 0.01 12.2070 3.6621 - 201.63 KB 1.00
OldParser bestbuy_dataset 304,608.19 μs 4,156.968 μs 3,685.044 μs 1.00 0.00 8000.0000 4500.0000 1500.0000 117773.69 KB 1.00
NewParser bestbuy_dataset 304,314.60 μs 4,795.339 μs 4,485.563 μs 1.00 0.02 8000.0000 4500.0000 1500.0000 117772.89 KB 1.00
OldParser bitcoin 110.83 μs 0.909 μs 0.806 μs 1.00 0.00 8.0566 2.1973 - 133.33 KB 1.00
NewParser bitcoin 108.22 μs 0.619 μs 0.517 μs 0.98 0.01 8.0566 2.3193 - 133.33 KB 1.00
OldParser canada 59,923.24 μs 891.611 μs 834.014 μs 1.00 0.00 1444.4444 777.7778 222.2222 20835.77 KB 1.00
NewParser canada 59,564.98 μs 1,133.170 μs 1,212.479 μs 0.99 0.02 1444.4444 777.7778 222.2222 20787.97 KB 1.00
OldParser citm_catalog 8,604.72 μs 43.155 μs 40.367 μs 1.00 0.00 468.7500 218.7500 - 7862.44 KB 1.00
NewParser citm_catalog 8,574.10 μs 83.290 μs 77.910 μs 1.00 0.01 468.7500 218.7500 - 7862.34 KB 1.00
OldParser doj-blog 390.99 μs 1.322 μs 1.236 μs 1.00 0.00 33.6914 16.6016 - 555.34 KB 1.00
NewParser doj-blog 391.80 μs 0.758 μs 0.633 μs 1.00 0.00 33.6914 16.6016 - 555.23 KB 1.00
OldParser eu-lobby-country 60.45 μs 0.287 μs 0.268 μs 1.00 0.00 5.2490 1.0376 - 85.88 KB 1.00
NewParser eu-lobby-country 60.64 μs 0.248 μs 0.232 μs 1.00 0.01 5.2490 1.0376 - 85.8 KB 1.00
OldParser eu-lobby-financial 349.72 μs 3.094 μs 2.584 μs 1.00 0.00 27.3438 9.2773 - 451.91 KB 1.00
NewParser eu-lobby-financial 343.46 μs 1.499 μs 1.402 μs 0.98 0.01 27.3438 9.2773 - 451.82 KB 1.00
OldParser eu-lobby-repr 787.15 μs 3.439 μs 3.217 μs 1.00 0.00 59.5703 21.4844 - 982.02 KB 1.00
NewParser eu-lobby-repr 787.67 μs 3.074 μs 2.875 μs 1.00 0.01 59.5703 21.4844 - 981.94 KB 1.00
OldParser github-events 517.18 μs 1.520 μs 1.347 μs 1.00 0.00 41.0156 18.5547 - 677.24 KB 1.00
NewParser github-events 516.05 μs 2.314 μs 2.164 μs 1.00 0.00 41.0156 19.5313 - 677.06 KB 1.00
OldParser github-gists 295.21 μs 4.600 μs 4.303 μs 1.00 0.00 26.3672 9.2773 - 432.67 KB 1.00
NewParser github-gists 289.58 μs 1.741 μs 1.628 μs 0.98 0.01 26.3672 9.2773 - 432.49 KB 1.00
OldParser inspe(...)yload [22] 170,204.77 μs 950.263 μs 793.513 μs 1.00 0.00 5000.0000 2750.0000 750.0000 76776.79 KB 1.00
NewParser inspe(...)yload [22] 168,279.87 μs 2,107.163 μs 1,971.042 μs 0.99 0.01 5000.0000 2750.0000 750.0000 76767.61 KB 1.00
OldParser json-generator 66.94 μs 0.262 μs 0.245 μs 1.00 0.00 5.6152 1.0986 - 93.25 KB 1.00
NewParser json-generator 67.92 μs 0.167 μs 0.156 μs 1.01 0.00 5.6152 1.0986 - 93.25 KB 1.00
OldParser meteorites 2,685.29 μs 13.881 μs 12.305 μs 1.00 0.00 218.7500 109.3750 - 3593.54 KB 1.00
NewParser meteorites 2,626.44 μs 42.720 μs 41.956 μs 0.98 0.02 218.7500 109.3750 - 3593.41 KB 1.00
OldParser movies 79,749.56 μs 1,151.686 μs 1,077.288 μs 1.00 0.00 3000.0000 1714.2857 571.4286 43000.16 KB 1.00
NewParser movies 79,180.02 μs 870.187 μs 771.398 μs 0.99 0.02 3000.0000 1714.2857 571.4286 42995.93 KB 1.00
OldParser reddit-scala 572.57 μs 6.352 μs 5.941 μs 1.00 0.00 47.8516 23.4375 - 786.67 KB 1.00
NewParser reddit-scala 538.76 μs 2.580 μs 2.413 μs 0.94 0.01 47.8516 23.4375 - 786.51 KB 1.00
OldParser rick-morty 110.08 μs 0.531 μs 0.471 μs 1.00 0.00 10.0098 2.4414 - 165.44 KB 1.00
NewParser rick-morty 108.59 μs 0.384 μs 0.340 μs 0.99 0.00 10.0098 2.8076 - 165.25 KB 1.00
OldParser temp-anomaly 25.59 μs 0.086 μs 0.072 μs 1.00 0.00 2.5330 0.1831 - 41.5 KB 1.00
NewParser temp-anomaly 26.19 μs 0.155 μs 0.130 μs 1.02 0.00 2.5330 0.1831 - 41.5 KB 1.00
OldParser thai-cinemas 76.61 μs 0.322 μs 0.302 μs 1.00 0.00 7.8125 1.9531 - 129.46 KB 1.00
NewParser thai-cinemas 75.64 μs 0.324 μs 0.303 μs 0.99 0.00 7.8125 1.9531 - 129.36 KB 1.00
OldParser turkish 6,159.81 μs 80.099 μs 74.925 μs 1.00 0.00 406.2500 203.1250 - 6742 KB 1.00
NewParser turkish 6,221.11 μs 59.805 μs 55.942 μs 1.01 0.01 406.2500 203.1250 - 6741.9 KB 1.00
OldParser twitter 3,259.77 μs 36.733 μs 30.673 μs 1.00 0.00 226.5625 113.2813 - 3727.24 KB 1.00
NewParser twitter 3,217.77 μs 22.260 μs 20.822 μs 0.99 0.01 226.5625 113.2813 - 3727.14 KB 1.00
OldParser twitt(...)ponse [28] 60.30 μs 0.313 μs 0.292 μs 1.00 0.00 5.1270 1.2207 - 84.57 KB 1.00
NewParser twitt(...)ponse [28] 58.78 μs 0.245 μs 0.229 μs 0.97 0.01 5.1270 1.2207 - 84.57 KB 1.00
OldParser twitter_api_response 73.64 μs 0.579 μs 0.541 μs 1.00 0.00 5.8594 1.4648 - 97.39 KB 1.00
NewParser twitter_api_response 75.64 μs 0.329 μs 0.307 μs 1.03 0.01 5.8594 1.4648 - 97.39 KB 1.00

Copy link
Collaborator

@lahma lahma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this outcome and improvement. Thank you for taking the time to refactor the original code and iterate on feedback!

@lahma lahma merged commit d372b34 into sebastienros:main Mar 16, 2023
2 checks passed
@tomatosalat0 tomatosalat0 deleted the json-parser-performance branch March 16, 2023 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants