-
Notifications
You must be signed in to change notification settings - Fork 352
Elide Generator::State allocation until a to_json method has to be called #662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
While less nice, this open the door to eluding the State object allocation when possible.
…called Fix: ruby#655 For very small documents, the biggest performance gap with alternatives is that the API impose that we allocate the `State` object. In a real world app this doesn't make much of a difference, but when running in a micro-benchmark this doubles the allocations, causing twice the amount of GC runs, making us look bad. However, unless we have to call a `to_json` method, the `State` object isn't visible, so with some refactoring, we can elude that allocation entirely. Instead we allocate the State internal struct on the stack, and if we need to call a `to_json` method, we allocate the `State` and spill the struct on the heap. As a result, `JSON.generate` is now as fast as re-using a `State` instance, as long as only primitives are generated. Before: ``` == Encoding small mixed (34 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json (reuse) 598.654k i/100ms json 400.542k i/100ms oj 533.353k i/100ms Calculating ------------------------------------- json (reuse) 6.371M (± 8.6%) i/s (156.96 ns/i) - 31.729M in 5.059195s json 4.120M (± 6.6%) i/s (242.72 ns/i) - 20.828M in 5.090549s oj 5.622M (± 6.4%) i/s (177.86 ns/i) - 28.268M in 5.061473s Comparison: json (reuse): 6371126.6 i/s oj: 5622452.0 i/s - same-ish: difference falls within error json: 4119991.1 i/s - 1.55x slower == Encoding small nested array (121 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json (reuse) 248.125k i/100ms json 215.255k i/100ms oj 217.531k i/100ms Calculating ------------------------------------- json (reuse) 2.628M (± 6.1%) i/s (380.55 ns/i) - 13.151M in 5.030281s json 2.185M (± 6.7%) i/s (457.74 ns/i) - 10.978M in 5.057655s oj 2.217M (± 6.7%) i/s (451.10 ns/i) - 11.094M in 5.044844s Comparison: json (reuse): 2627799.4 i/s oj: 2216824.8 i/s - 1.19x slower json: 2184669.5 i/s - 1.20x slower == Encoding small hash (65 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json (reuse) 641.334k i/100ms json 322.745k i/100ms oj 642.450k i/100ms Calculating ------------------------------------- json (reuse) 7.133M (± 6.5%) i/s (140.19 ns/i) - 35.915M in 5.068201s json 4.615M (± 7.0%) i/s (216.70 ns/i) - 22.915M in 5.003718s oj 6.912M (± 6.4%) i/s (144.68 ns/i) - 34.692M in 5.047690s Comparison: json (reuse): 7133123.3 i/s oj: 6911977.1 i/s - same-ish: difference falls within error json: 4614696.6 i/s - 1.55x slower ``` After: ``` == Encoding small mixed (34 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json (reuse) 572.751k i/100ms json 457.741k i/100ms oj 512.247k i/100ms Calculating ------------------------------------- json (reuse) 6.324M (± 6.9%) i/s (158.12 ns/i) - 31.501M in 5.023093s json 6.263M (± 6.9%) i/s (159.66 ns/i) - 31.126M in 5.017086s oj 5.569M (± 6.6%) i/s (179.56 ns/i) - 27.661M in 5.003739s Comparison: json (reuse): 6324183.5 i/s json: 6263204.9 i/s - same-ish: difference falls within error oj: 5569049.2 i/s - same-ish: difference falls within error == Encoding small nested array (121 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json (reuse) 258.505k i/100ms json 242.335k i/100ms oj 220.678k i/100ms Calculating ------------------------------------- json (reuse) 2.589M (± 9.6%) i/s (386.17 ns/i) - 12.925M in 5.071853s json 2.594M (± 6.6%) i/s (385.46 ns/i) - 13.086M in 5.083035s oj 2.250M (± 2.3%) i/s (444.43 ns/i) - 11.255M in 5.004707s Comparison: json (reuse): 2589499.6 i/s json: 2594321.0 i/s - same-ish: difference falls within error oj: 2250064.0 i/s - 1.15x slower == Encoding small hash (65 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json (reuse) 656.373k i/100ms json 644.135k i/100ms oj 650.283k i/100ms Calculating ------------------------------------- json (reuse) 7.202M (± 7.1%) i/s (138.84 ns/i) - 36.101M in 5.051438s json 7.278M (± 1.7%) i/s (137.40 ns/i) - 36.716M in 5.046300s oj 7.036M (± 1.7%) i/s (142.12 ns/i) - 35.766M in 5.084729s Comparison: json (reuse): 7202447.9 i/s json: 7277883.0 i/s - same-ish: difference falls within error oj: 7036115.2 i/s - same-ish: difference falls within error ```
4af8b03
to
5009e78
Compare
This was referenced Oct 30, 2024
byroot
added a commit
to byroot/json
that referenced
this pull request
Nov 1, 2024
Similar to ruby#662, but here we don't even need to spill on the heap, because the parser is never exposed. Before: ``` == Parsing small hash (65 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 188.233k i/100ms oj 213.985k i/100ms oj strict 242.564k i/100ms Oj::Parser 448.682k i/100ms rapidjson 291.925k i/100ms Calculating ------------------------------------- json 1.983M (± 0.5%) i/s (504.32 ns/i) - 9.976M in 5.031352s oj 2.334M (± 0.2%) i/s (428.48 ns/i) - 11.769M in 5.042839s oj strict 2.689M (± 0.2%) i/s (371.85 ns/i) - 13.584M in 5.051044s Oj::Parser 4.662M (± 1.2%) i/s (214.50 ns/i) - 23.331M in 5.005414s rapidjson 3.110M (± 0.7%) i/s (321.57 ns/i) - 15.764M in 5.069531s Comparison: json: 1982878.1 i/s Oj::Parser: 4661924.8 i/s - 2.35x faster rapidjson: 3109722.2 i/s - 1.57x faster oj strict: 2689277.0 i/s - 1.36x faster oj: 2333852.9 i/s - 1.18x faster ``` After: ``` == Parsing small hash (65 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 223.083k i/100ms oj 214.400k i/100ms oj strict 243.519k i/100ms Oj::Parser 445.445k i/100ms rapidjson 293.936k i/100ms Calculating ------------------------------------- json 2.279M (± 4.5%) i/s (438.71 ns/i) - 11.377M in 5.002132s oj 2.315M (± 0.3%) i/s (431.96 ns/i) - 11.578M in 5.001141s oj strict 2.665M (± 0.9%) i/s (375.19 ns/i) - 13.394M in 5.025562s Oj::Parser 4.703M (± 0.3%) i/s (212.63 ns/i) - 23.609M in 5.019913s rapidjson 3.129M (± 0.4%) i/s (319.55 ns/i) - 15.873M in 5.072213s Comparison: json: 2279385.2 i/s Oj::Parser: 4703032.3 i/s - 2.06x faster rapidjson: 3129356.1 i/s - 1.37x faster oj strict: 2665318.3 i/s - 1.17x faster oj: 2315009.3 i/s - same-ish: difference falls within error ```
byroot
added a commit
to byroot/json
that referenced
this pull request
Nov 1, 2024
Similar to ruby#662, but here we don't even need to spill on the heap, because the parser is never exposed. Before: ``` == Parsing small hash (65 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 188.233k i/100ms oj 213.985k i/100ms oj strict 242.564k i/100ms Oj::Parser 448.682k i/100ms rapidjson 291.925k i/100ms Calculating ------------------------------------- json 1.983M (± 0.5%) i/s (504.32 ns/i) - 9.976M in 5.031352s oj 2.334M (± 0.2%) i/s (428.48 ns/i) - 11.769M in 5.042839s oj strict 2.689M (± 0.2%) i/s (371.85 ns/i) - 13.584M in 5.051044s Oj::Parser 4.662M (± 1.2%) i/s (214.50 ns/i) - 23.331M in 5.005414s rapidjson 3.110M (± 0.7%) i/s (321.57 ns/i) - 15.764M in 5.069531s Comparison: json: 1982878.1 i/s Oj::Parser: 4661924.8 i/s - 2.35x faster rapidjson: 3109722.2 i/s - 1.57x faster oj strict: 2689277.0 i/s - 1.36x faster oj: 2333852.9 i/s - 1.18x faster ``` After: ``` == Parsing small hash (65 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 223.083k i/100ms oj 214.400k i/100ms oj strict 243.519k i/100ms Oj::Parser 445.445k i/100ms rapidjson 293.936k i/100ms Calculating ------------------------------------- json 2.279M (± 4.5%) i/s (438.71 ns/i) - 11.377M in 5.002132s oj 2.315M (± 0.3%) i/s (431.96 ns/i) - 11.578M in 5.001141s oj strict 2.665M (± 0.9%) i/s (375.19 ns/i) - 13.394M in 5.025562s Oj::Parser 4.703M (± 0.3%) i/s (212.63 ns/i) - 23.609M in 5.019913s rapidjson 3.129M (± 0.4%) i/s (319.55 ns/i) - 15.873M in 5.072213s Comparison: json: 2279385.2 i/s Oj::Parser: 4703032.3 i/s - 2.06x faster rapidjson: 3129356.1 i/s - 1.37x faster oj strict: 2665318.3 i/s - 1.17x faster oj: 2315009.3 i/s - same-ish: difference falls within error ```
byroot
added a commit
to byroot/json
that referenced
this pull request
Nov 1, 2024
Similar to ruby#662, but here we don't even need to spill on the heap, because the parser is never exposed. Before: ``` == Parsing small hash (65 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 188.233k i/100ms oj 213.985k i/100ms oj strict 242.564k i/100ms Oj::Parser 448.682k i/100ms rapidjson 291.925k i/100ms Calculating ------------------------------------- json 1.983M (± 0.5%) i/s (504.32 ns/i) - 9.976M in 5.031352s oj 2.334M (± 0.2%) i/s (428.48 ns/i) - 11.769M in 5.042839s oj strict 2.689M (± 0.2%) i/s (371.85 ns/i) - 13.584M in 5.051044s Oj::Parser 4.662M (± 1.2%) i/s (214.50 ns/i) - 23.331M in 5.005414s rapidjson 3.110M (± 0.7%) i/s (321.57 ns/i) - 15.764M in 5.069531s Comparison: json: 1982878.1 i/s Oj::Parser: 4661924.8 i/s - 2.35x faster rapidjson: 3109722.2 i/s - 1.57x faster oj strict: 2689277.0 i/s - 1.36x faster oj: 2333852.9 i/s - 1.18x faster ``` After: ``` == Parsing small hash (65 bytes) ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23] Warming up -------------------------------------- json 223.083k i/100ms oj 214.400k i/100ms oj strict 243.519k i/100ms Oj::Parser 445.445k i/100ms rapidjson 293.936k i/100ms Calculating ------------------------------------- json 2.279M (± 4.5%) i/s (438.71 ns/i) - 11.377M in 5.002132s oj 2.315M (± 0.3%) i/s (431.96 ns/i) - 11.578M in 5.001141s oj strict 2.665M (± 0.9%) i/s (375.19 ns/i) - 13.394M in 5.025562s Oj::Parser 4.703M (± 0.3%) i/s (212.63 ns/i) - 23.609M in 5.019913s rapidjson 3.129M (± 0.4%) i/s (319.55 ns/i) - 15.873M in 5.072213s Comparison: json: 2279385.2 i/s Oj::Parser: 4703032.3 i/s - 2.06x faster rapidjson: 3129356.1 i/s - 1.37x faster oj strict: 2665318.3 i/s - 1.17x faster oj: 2315009.3 i/s - same-ish: difference falls within error ```
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix: #655
For very small documents, the biggest performance gap with alternatives is
that the API impose that we allocate the
State
object. In a real world appthis doesn't make much of a difference, but when running in a micro-benchmark
this doubles the allocations, causing twice the amount of GC runs, making us
look bad.
However, unless we have to call a
to_json
method, theState
object isn'tvisible, so with some refactoring, we can elude that allocation entirely.
Instead we allocate the State internal struct on the stack, and if we need
to call a
to_json
method, we allocate theState
and spill the struct onthe heap.
As a result,
JSON.generate
is now as fast as re-using aState
instance,as long as only primitives are generated.
Before:
After: