Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of ActiveSupport::JSON.encode #48614

Merged
merged 3 commits into from Jun 30, 2023

Conversation

jhawthorn
Copy link
Member

@jhawthorn jhawthorn commented Jun 29, 2023

This about doubles the performance of .to_json/ActiveSupport::JSON.encode (with most impact when data is already JSON-ready, and no options are passed, but should be faster in all cases). The change should be mostly transparent to users. I've split this into three commits for easier digesting.

Avoid extra pass on AS::JSON.dump with no options

JSONGemEncoder.encode previously would always perform two passes. First it would call .as_json(options), but then would perform a second pass "jsonify" to recursively call .as_json (this time without options) until the data converges into a "JSON-ready" representation.

When options are not given, the second pass should be equivalent to the first, so we can detect that, and only perform the "jsonify" step.

The only user-visible effect of this should be that we will pass no options to as_json instead of an empty Hash, but implementations of as_json should already be expected to accept that (EDIT: though there was one test case each in ActiveModel and ActiveRecord which this broke).

This could be considered a "light" version of what was done in #34633 from @tenderlove and @eileencodes from a few years back (I think that's still a great idea and may still be a future implementation, but a wider change than this PR).

Escape JSON output instead of string inputs

Rails performs additional escaping of strings compared to the JSON gem, and will escape U+2028, U+2029, <, >, and &. In JSON, the only places those characters are valid is inside of a JSON string, so it should be equivalent (and faster) to perform the escaping on the output.

This has the most performance impact, which is nice as it should improve all .to_json calls regardless of input. This escaping is still quite expensive and something I intend to continue looking into (either adjusting our rules for escaping or having a faster implementation of the escaping further upstream).

Consider Symbol "JSON-ready", improve jsonify

Previously jsonify would call .as_json for Integer, nil, true, and false, even though those types are considered "JSON-ready" (looks like this was introduced in #26933). Technically a user could have overridden .as_json for these types but I can't imagine a use case and I don't think we should support that.

I left the same behaviour of calling .as_json for generic "Numeric" as that can have user subclasses where one may have implemented as_json. This behaviour is also used for Float (which coerces NaN/Infinity/-Infinity into nil).

This also adds Symbol to the list of "JSON-ready" types, to avoid unnecessarily casting them to strings (possible as we no longer perform escaping on input). The output of jsonify should never be user visible before it is passed through JSON.generate, so I don't think this can be seen by users.

This also corrects our handling of Hash to call to_s on all keys, matching the behaviour of .as_json and JSON's requirement that keys are Strings (Symbols are also permitted as the JSON gem knows to convert them to a string).

Benchmark

Tested using the twitter.json from nativejson-benchmark, which is somewhat unicode-heavy but otherwise IMO pretty typical.

(JSON and RapidJSON included just for comparison/scale)

source_data = ::JSON.parse(File.read("twitter.json"))
source_data_sym = source_data.deep_symbolize_keys.deep_dup

Rails main

 source_data.to_json     26.103  (± 0.0%) i/s -    132.000  in   5.056999s
source_data_sym.to_json
                         24.311  (± 0.0%) i/s -    122.000  in   5.018373s
JSON.generate(source_data)
                        474.026  (± 0.8%) i/s -      2.392k in   5.046540s
RapidJSON.generate(source_data)
                        716.922  (± 4.6%) i/s -      3.618k in   5.057603s

this branch

 source_data.to_json     61.344  (± 0.0%) i/s -    310.000  in   5.053787s
source_data_sym.to_json
                         63.068  (± 1.6%) i/s -    318.000  in   5.042746s
JSON.generate(source_data)
                        470.431  (± 2.3%) i/s -      2.392k in   5.087858s
RapidJSON.generate(source_data)
                        704.303  (± 3.4%) i/s -      3.520k in   5.004222s

cc @matthewd @byroot

JSONGemEncoder.encode previously would always perform two passes. First
it would call `.as_json(options)`, but then would perform a second pass
"jsonify" to recursively call `.as_json` (this time without options)
until the data converges into a "JSON-ready" representation.

When options are not given, the second pass should be equivalent to the
first, so we can detect that, and only perform the "jsonify" step.

The only user-visible effect of this should be that we will pass no
options to `as_json` instead of an empty Hash, but implementations of
`as_json` should already be expected to accept that.
Rails performs additional escaping of strings compared to the JSON gem,
and will escape U+2028, U+2029, <, >, and &.

In JSON, the only places those characters are valid is inside of a
JSON string, so it should be equivalent (and faster) to perform the
escaping on the output.
Copy link
Member

@byroot byroot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

value.as_json
when Hash
result = {}
value.each do |k, v|
result[jsonify(k)] = jsonify(v)
result[keyify(k)] = jsonify(v)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
result[keyify(k)] = jsonify(v)
k = k.to_s unless Symbol === k || String === k
result[k] = jsonify(v)

Do you think it may be worth inlining keyify?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good idea. I'm now also wondering if we should could just leave the keys unchanged (regardless of their class) and expect that JSON will to_s them (unlike values, our behaviour matches JSON's, we just want .to_s). I'll test that out in a follow up PR though.

Previously jsonify would call `.as_json` for Integer, nil, true, and
false, even though those types are considered "JSON-ready". Technically
a user could have overridden `.as_json` for these types but I can't
imagine a use case and I don't think we should support that.

I left the same behaviour of calling `.as_json` for generic "Numeric" as
that can have user subclasses where one may have implemented as_json.
This behaviour is also used for Float (which coerces
NaN/Infinity/-Infinity into nil).

This also adds Symbol to the list of "JSON-ready" types, to avoid
unnecessarily casting them to strings (possible as we no longer perform
escaping on input). The output of jsonify should never be user visible
before it is passed through JSON.generate, so I don't think this should
be a user facing
change.

This also corrects our handling of Hash to call to_s on all keys,
matching the behaviour of `.as_json` and JSON's requirement that keys
are Strings (Symbols are also permitted as JSON knows to convert them to
a String).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants