Skip to content

Conversation

casperisfine
Copy link

Fix: #655

For very small documents, the biggest performance gap with alternatives is
that the API impose that we allocate the State object. In a real world app
this doesn't make much of a difference, but when running in a micro-benchmark
this doubles the allocations, causing twice the amount of GC runs, making us
look bad.

However, unless we have to call a to_json method, the State object isn't
visible, so with some refactoring, we can elude that allocation entirely.

Instead we allocate the State internal struct on the stack, and if we need
to call a to_json method, we allocate the State and spill the struct on
the heap.

As a result, JSON.generate is now as fast as re-using a State instance,
as long as only primitives are generated.

Before:

== Encoding small mixed (34 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
        json (reuse)   598.654k i/100ms
                json   400.542k i/100ms
                  oj   533.353k i/100ms
Calculating -------------------------------------
        json (reuse)      6.371M (± 8.6%) i/s  (156.96 ns/i) -     31.729M in   5.059195s
                json      4.120M (± 6.6%) i/s  (242.72 ns/i) -     20.828M in   5.090549s
                  oj      5.622M (± 6.4%) i/s  (177.86 ns/i) -     28.268M in   5.061473s

Comparison:
        json (reuse):  6371126.6 i/s
                  oj:  5622452.0 i/s - same-ish: difference falls within error
                json:  4119991.1 i/s - 1.55x  slower

== Encoding small nested array (121 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
        json (reuse)   248.125k i/100ms
                json   215.255k i/100ms
                  oj   217.531k i/100ms
Calculating -------------------------------------
        json (reuse)      2.628M (± 6.1%) i/s  (380.55 ns/i) -     13.151M in   5.030281s
                json      2.185M (± 6.7%) i/s  (457.74 ns/i) -     10.978M in   5.057655s
                  oj      2.217M (± 6.7%) i/s  (451.10 ns/i) -     11.094M in   5.044844s

Comparison:
        json (reuse):  2627799.4 i/s
                  oj:  2216824.8 i/s - 1.19x  slower
                json:  2184669.5 i/s - 1.20x  slower

== Encoding small hash (65 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
        json (reuse)   641.334k i/100ms
                json   322.745k i/100ms
                  oj   642.450k i/100ms
Calculating -------------------------------------
        json (reuse)      7.133M (± 6.5%) i/s  (140.19 ns/i) -     35.915M in   5.068201s
                json      4.615M (± 7.0%) i/s  (216.70 ns/i) -     22.915M in   5.003718s
                  oj      6.912M (± 6.4%) i/s  (144.68 ns/i) -     34.692M in   5.047690s

Comparison:
        json (reuse):  7133123.3 i/s
                  oj:  6911977.1 i/s - same-ish: difference falls within error
                json:  4614696.6 i/s - 1.55x  slower

After:

== Encoding small mixed (34 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
        json (reuse)   572.751k i/100ms
                json   457.741k i/100ms
                  oj   512.247k i/100ms
Calculating -------------------------------------
        json (reuse)      6.324M (± 6.9%) i/s  (158.12 ns/i) -     31.501M in   5.023093s
                json      6.263M (± 6.9%) i/s  (159.66 ns/i) -     31.126M in   5.017086s
                  oj      5.569M (± 6.6%) i/s  (179.56 ns/i) -     27.661M in   5.003739s

Comparison:
        json (reuse):  6324183.5 i/s
                json:  6263204.9 i/s - same-ish: difference falls within error
                  oj:  5569049.2 i/s - same-ish: difference falls within error

== Encoding small nested array (121 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
        json (reuse)   258.505k i/100ms
                json   242.335k i/100ms
                  oj   220.678k i/100ms
Calculating -------------------------------------
        json (reuse)      2.589M (± 9.6%) i/s  (386.17 ns/i) -     12.925M in   5.071853s
                json      2.594M (± 6.6%) i/s  (385.46 ns/i) -     13.086M in   5.083035s
                  oj      2.250M (± 2.3%) i/s  (444.43 ns/i) -     11.255M in   5.004707s

Comparison:
        json (reuse):  2589499.6 i/s
                json:  2594321.0 i/s - same-ish: difference falls within error
                  oj:  2250064.0 i/s - 1.15x  slower

== Encoding small hash (65 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
        json (reuse)   656.373k i/100ms
                json   644.135k i/100ms
                  oj   650.283k i/100ms
Calculating -------------------------------------
        json (reuse)      7.202M (± 7.1%) i/s  (138.84 ns/i) -     36.101M in   5.051438s
                json      7.278M (± 1.7%) i/s  (137.40 ns/i) -     36.716M in   5.046300s
                  oj      7.036M (± 1.7%) i/s  (142.12 ns/i) -     35.766M in   5.084729s

Comparison:
        json (reuse):  7202447.9 i/s
                json:  7277883.0 i/s - same-ish: difference falls within error
                  oj:  7036115.2 i/s - same-ish: difference falls within error

While less nice, this open the door to eluding the State object
allocation when possible.
…called

Fix: ruby#655

For very small documents, the biggest performance gap with alternatives is
that the API impose that we allocate the `State` object. In a real world app
this doesn't make much of a difference, but when running in a micro-benchmark
this doubles the allocations, causing twice the amount of GC runs, making us
look bad.

However, unless we have to call a `to_json` method, the `State` object isn't
visible, so with some refactoring, we can elude that allocation entirely.

Instead we allocate the State internal struct on the stack, and if we need
to call a `to_json` method, we allocate the `State` and spill the struct on
the heap.

As a result, `JSON.generate` is now as fast as re-using a `State` instance,
as long as only primitives are generated.

Before:
```
== Encoding small mixed (34 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
        json (reuse)   598.654k i/100ms
                json   400.542k i/100ms
                  oj   533.353k i/100ms
Calculating -------------------------------------
        json (reuse)      6.371M (± 8.6%) i/s  (156.96 ns/i) -     31.729M in   5.059195s
                json      4.120M (± 6.6%) i/s  (242.72 ns/i) -     20.828M in   5.090549s
                  oj      5.622M (± 6.4%) i/s  (177.86 ns/i) -     28.268M in   5.061473s

Comparison:
        json (reuse):  6371126.6 i/s
                  oj:  5622452.0 i/s - same-ish: difference falls within error
                json:  4119991.1 i/s - 1.55x  slower

== Encoding small nested array (121 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
        json (reuse)   248.125k i/100ms
                json   215.255k i/100ms
                  oj   217.531k i/100ms
Calculating -------------------------------------
        json (reuse)      2.628M (± 6.1%) i/s  (380.55 ns/i) -     13.151M in   5.030281s
                json      2.185M (± 6.7%) i/s  (457.74 ns/i) -     10.978M in   5.057655s
                  oj      2.217M (± 6.7%) i/s  (451.10 ns/i) -     11.094M in   5.044844s

Comparison:
        json (reuse):  2627799.4 i/s
                  oj:  2216824.8 i/s - 1.19x  slower
                json:  2184669.5 i/s - 1.20x  slower

== Encoding small hash (65 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
        json (reuse)   641.334k i/100ms
                json   322.745k i/100ms
                  oj   642.450k i/100ms
Calculating -------------------------------------
        json (reuse)      7.133M (± 6.5%) i/s  (140.19 ns/i) -     35.915M in   5.068201s
                json      4.615M (± 7.0%) i/s  (216.70 ns/i) -     22.915M in   5.003718s
                  oj      6.912M (± 6.4%) i/s  (144.68 ns/i) -     34.692M in   5.047690s

Comparison:
        json (reuse):  7133123.3 i/s
                  oj:  6911977.1 i/s - same-ish: difference falls within error
                json:  4614696.6 i/s - 1.55x  slower
```

After:

```
== Encoding small mixed (34 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
        json (reuse)   572.751k i/100ms
                json   457.741k i/100ms
                  oj   512.247k i/100ms
Calculating -------------------------------------
        json (reuse)      6.324M (± 6.9%) i/s  (158.12 ns/i) -     31.501M in   5.023093s
                json      6.263M (± 6.9%) i/s  (159.66 ns/i) -     31.126M in   5.017086s
                  oj      5.569M (± 6.6%) i/s  (179.56 ns/i) -     27.661M in   5.003739s

Comparison:
        json (reuse):  6324183.5 i/s
                json:  6263204.9 i/s - same-ish: difference falls within error
                  oj:  5569049.2 i/s - same-ish: difference falls within error

== Encoding small nested array (121 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
        json (reuse)   258.505k i/100ms
                json   242.335k i/100ms
                  oj   220.678k i/100ms
Calculating -------------------------------------
        json (reuse)      2.589M (± 9.6%) i/s  (386.17 ns/i) -     12.925M in   5.071853s
                json      2.594M (± 6.6%) i/s  (385.46 ns/i) -     13.086M in   5.083035s
                  oj      2.250M (± 2.3%) i/s  (444.43 ns/i) -     11.255M in   5.004707s

Comparison:
        json (reuse):  2589499.6 i/s
                json:  2594321.0 i/s - same-ish: difference falls within error
                  oj:  2250064.0 i/s - 1.15x  slower

== Encoding small hash (65 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
        json (reuse)   656.373k i/100ms
                json   644.135k i/100ms
                  oj   650.283k i/100ms
Calculating -------------------------------------
        json (reuse)      7.202M (± 7.1%) i/s  (138.84 ns/i) -     36.101M in   5.051438s
                json      7.278M (± 1.7%) i/s  (137.40 ns/i) -     36.716M in   5.046300s
                  oj      7.036M (± 1.7%) i/s  (142.12 ns/i) -     35.766M in   5.084729s

Comparison:
        json (reuse):  7202447.9 i/s
                json:  7277883.0 i/s - same-ish: difference falls within error
                  oj:  7036115.2 i/s - same-ish: difference falls within error

```
@byroot byroot merged commit 942cd3f into ruby:master Oct 30, 2024
35 checks passed
byroot added a commit to byroot/json that referenced this pull request Nov 1, 2024
Similar to ruby#662, but here
we don't even need to spill on the heap, because the parser is never
exposed.

Before:

```
== Parsing small hash (65 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json   188.233k i/100ms
                  oj   213.985k i/100ms
           oj strict   242.564k i/100ms
          Oj::Parser   448.682k i/100ms
           rapidjson   291.925k i/100ms
Calculating -------------------------------------
                json      1.983M (± 0.5%) i/s  (504.32 ns/i) -      9.976M in   5.031352s
                  oj      2.334M (± 0.2%) i/s  (428.48 ns/i) -     11.769M in   5.042839s
           oj strict      2.689M (± 0.2%) i/s  (371.85 ns/i) -     13.584M in   5.051044s
          Oj::Parser      4.662M (± 1.2%) i/s  (214.50 ns/i) -     23.331M in   5.005414s
           rapidjson      3.110M (± 0.7%) i/s  (321.57 ns/i) -     15.764M in   5.069531s

Comparison:
                json:  1982878.1 i/s
          Oj::Parser:  4661924.8 i/s - 2.35x  faster
           rapidjson:  3109722.2 i/s - 1.57x  faster
           oj strict:  2689277.0 i/s - 1.36x  faster
                  oj:  2333852.9 i/s - 1.18x  faster
```

After:

```
== Parsing small hash (65 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json   223.083k i/100ms
                  oj   214.400k i/100ms
           oj strict   243.519k i/100ms
          Oj::Parser   445.445k i/100ms
           rapidjson   293.936k i/100ms
Calculating -------------------------------------
                json      2.279M (± 4.5%) i/s  (438.71 ns/i) -     11.377M in   5.002132s
                  oj      2.315M (± 0.3%) i/s  (431.96 ns/i) -     11.578M in   5.001141s
           oj strict      2.665M (± 0.9%) i/s  (375.19 ns/i) -     13.394M in   5.025562s
          Oj::Parser      4.703M (± 0.3%) i/s  (212.63 ns/i) -     23.609M in   5.019913s
           rapidjson      3.129M (± 0.4%) i/s  (319.55 ns/i) -     15.873M in   5.072213s

Comparison:
                json:  2279385.2 i/s
          Oj::Parser:  4703032.3 i/s - 2.06x  faster
           rapidjson:  3129356.1 i/s - 1.37x  faster
           oj strict:  2665318.3 i/s - 1.17x  faster
                  oj:  2315009.3 i/s - same-ish: difference falls within error
```
byroot added a commit to byroot/json that referenced this pull request Nov 1, 2024
Similar to ruby#662, but here
we don't even need to spill on the heap, because the parser is never
exposed.

Before:

```
== Parsing small hash (65 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json   188.233k i/100ms
                  oj   213.985k i/100ms
           oj strict   242.564k i/100ms
          Oj::Parser   448.682k i/100ms
           rapidjson   291.925k i/100ms
Calculating -------------------------------------
                json      1.983M (± 0.5%) i/s  (504.32 ns/i) -      9.976M in   5.031352s
                  oj      2.334M (± 0.2%) i/s  (428.48 ns/i) -     11.769M in   5.042839s
           oj strict      2.689M (± 0.2%) i/s  (371.85 ns/i) -     13.584M in   5.051044s
          Oj::Parser      4.662M (± 1.2%) i/s  (214.50 ns/i) -     23.331M in   5.005414s
           rapidjson      3.110M (± 0.7%) i/s  (321.57 ns/i) -     15.764M in   5.069531s

Comparison:
                json:  1982878.1 i/s
          Oj::Parser:  4661924.8 i/s - 2.35x  faster
           rapidjson:  3109722.2 i/s - 1.57x  faster
           oj strict:  2689277.0 i/s - 1.36x  faster
                  oj:  2333852.9 i/s - 1.18x  faster
```

After:

```
== Parsing small hash (65 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json   223.083k i/100ms
                  oj   214.400k i/100ms
           oj strict   243.519k i/100ms
          Oj::Parser   445.445k i/100ms
           rapidjson   293.936k i/100ms
Calculating -------------------------------------
                json      2.279M (± 4.5%) i/s  (438.71 ns/i) -     11.377M in   5.002132s
                  oj      2.315M (± 0.3%) i/s  (431.96 ns/i) -     11.578M in   5.001141s
           oj strict      2.665M (± 0.9%) i/s  (375.19 ns/i) -     13.394M in   5.025562s
          Oj::Parser      4.703M (± 0.3%) i/s  (212.63 ns/i) -     23.609M in   5.019913s
           rapidjson      3.129M (± 0.4%) i/s  (319.55 ns/i) -     15.873M in   5.072213s

Comparison:
                json:  2279385.2 i/s
          Oj::Parser:  4703032.3 i/s - 2.06x  faster
           rapidjson:  3129356.1 i/s - 1.37x  faster
           oj strict:  2665318.3 i/s - 1.17x  faster
                  oj:  2315009.3 i/s - same-ish: difference falls within error
```
byroot added a commit to byroot/json that referenced this pull request Nov 1, 2024
Similar to ruby#662, but here
we don't even need to spill on the heap, because the parser is never
exposed.

Before:

```
== Parsing small hash (65 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json   188.233k i/100ms
                  oj   213.985k i/100ms
           oj strict   242.564k i/100ms
          Oj::Parser   448.682k i/100ms
           rapidjson   291.925k i/100ms
Calculating -------------------------------------
                json      1.983M (± 0.5%) i/s  (504.32 ns/i) -      9.976M in   5.031352s
                  oj      2.334M (± 0.2%) i/s  (428.48 ns/i) -     11.769M in   5.042839s
           oj strict      2.689M (± 0.2%) i/s  (371.85 ns/i) -     13.584M in   5.051044s
          Oj::Parser      4.662M (± 1.2%) i/s  (214.50 ns/i) -     23.331M in   5.005414s
           rapidjson      3.110M (± 0.7%) i/s  (321.57 ns/i) -     15.764M in   5.069531s

Comparison:
                json:  1982878.1 i/s
          Oj::Parser:  4661924.8 i/s - 2.35x  faster
           rapidjson:  3109722.2 i/s - 1.57x  faster
           oj strict:  2689277.0 i/s - 1.36x  faster
                  oj:  2333852.9 i/s - 1.18x  faster
```

After:

```
== Parsing small hash (65 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json   223.083k i/100ms
                  oj   214.400k i/100ms
           oj strict   243.519k i/100ms
          Oj::Parser   445.445k i/100ms
           rapidjson   293.936k i/100ms
Calculating -------------------------------------
                json      2.279M (± 4.5%) i/s  (438.71 ns/i) -     11.377M in   5.002132s
                  oj      2.315M (± 0.3%) i/s  (431.96 ns/i) -     11.578M in   5.001141s
           oj strict      2.665M (± 0.9%) i/s  (375.19 ns/i) -     13.394M in   5.025562s
          Oj::Parser      4.703M (± 0.3%) i/s  (212.63 ns/i) -     23.609M in   5.019913s
           rapidjson      3.129M (± 0.4%) i/s  (319.55 ns/i) -     15.873M in   5.072213s

Comparison:
                json:  2279385.2 i/s
          Oj::Parser:  4703032.3 i/s - 2.06x  faster
           rapidjson:  3129356.1 i/s - 1.37x  faster
           oj strict:  2665318.3 i/s - 1.17x  faster
                  oj:  2315009.3 i/s - same-ish: difference falls within error
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate micro-benchmark JSON.dump performance
2 participants