-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf: Improve performance of where when using an array of values #39009
Conversation
A coworker at GitHub found a few months back that if we used `santitize_sql` over `where` when we knew the values going into `where` it was a lot faster than `where`. This PR adds a new Arel node type called `HomogenousIn` that will be used when Rails knows the values are all homogenous and can therefore pick a faster codepath. This new codepath skips some of the required processing by `where` to make `wheres` with homogenous arrays faster without requiring the application author to know when to use which query type. Using our benchmark code: ```ruby ids = (1..1000).each.map do |n| Post.create!.id end Benchmark.ips do |x| x.report("where with ids") do Post.where(id: ids).to_a end x.report("where with sanitize") do Post.where(ActiveRecord::Base.sanitize_sql(["id IN (?)", ids])).to_a end x.compare! end ``` Before this PR comparing where with a list of IDs to santitize sql: ``` Warming up -------------------------------------- where with ids 11.000 i/100ms where with sanitize 17.000 i/100ms Calculating ------------------------------------- where with ids 115.733 (± 4.3%) i/s - 583.000 in 5.045828s where with sanitize 174.231 (± 4.0%) i/s - 884.000 in 5.081495s Comparison: where with sanitize: 174.2 i/s where with ids: 115.7 i/s - 1.51x slower ``` After this PR comparing where with a list of IDs to santitize sql: ``` Warming up -------------------------------------- where with ids 16.000 i/100ms where with sanitize 19.000 i/100ms Calculating ------------------------------------- where with ids 158.293 (± 6.3%) i/s - 800.000 in 5.072208s where with sanitize 169.141 (± 3.5%) i/s - 855.000 in 5.060878s Comparison: where with sanitize: 169.1 i/s where with ids: 158.3 i/s - same-ish: difference falls within error ``` Notes: 1) `santitize2` is a horrific name, obviously. We aren't sure what to name it though. Thoughts? 2) `HomogenousIn` may not be the best name for our new node. Co-authored-by: Aaron Patterson <aaron.patterson@gmail.com>
804261f
to
7e5b3fe
Compare
I'm not sure that |
def cast(value) | ||
value = \ | ||
value = if value <=> 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably add a comment here about why we're doing this. This is a huge "hack" to check to see whether or not value
is numeric. Non-numbers will return nil
for <=>
, and it avoid us need to do an is_a?
check. We found using the spaceship operator was faster than checking types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why not use value.is_a?(Numeric)
. <=>
is faster alternative of value.is_a?(Numeric)
?
The issue with the current Here is an example of what I mean. The active model integer type's def serialize(value)
return if value.is_a?(::String) && non_numeric_string?(value)
ensure_in_range(super)
end
Another behavior that Active Record has is that we will omit values that are outside of range supported by the database. Here is an example: class Post < ActiveRecord::Base
end
params = {}
params[:id] = 9999999999999999999999999999999
$-d = true
Post.where(id: [params[:id], 1234]).to_a The SQL executed by the above code is:
Active Record strips out the very large number because the database can't handle it. An unfortunate consequence of "raising + serializing" combined as one unit, and stripping out-of-range values is that we have to raise and catch exceptions in order to support this. If you run the above example, you'll actually see that an
It would be nice if we could check that a value is valid without having to serialize it, only to get an exception. Unfortunately the Second, this "raise if not valid" behavior is not consistent among all types. For example, the I think |
values.map! do |v| | ||
predicate_builder.build_bind_attribute(attribute.name, v) | ||
if nils.empty? && ranges.empty? | ||
predicate_builder.connection.in_clause_length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
predicate_builder.connection.in_clause_length |
I think we can remove this.
I was thinking about this a little more and I want to take back my statement. We know the list is homogenous (or must be homogenous) in the "traditional sense". All the values must have the same type. This line proves it. They all must be the same type so we can use the same type caster. We should probably push this logic down in to the |
@@ -8,14 +8,21 @@ def serialize(value) | |||
cast(value) | |||
end | |||
|
|||
def serialize2(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why serialize2
exists? All definitions of it in this PR seems to have the same code as serialize
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used here https://github.com/rails/rails/pull/39009/files#diff-3bcf02e80bc52706c3df424d2a8c4db2R31 and it's a terrible name but we're not sure what to call it. The other serialize methods do both serialize and validation, this one just does serialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if all the implementations of serialize2
on this PR is always the same as the implementation as serialize
I don't see the different and serialize2
is also doing serialization and validation, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not the same. Check the serialize method on Integer here. It's doing both serialization and validation. This one is only doing serialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the role of serialize2
to be a fast, unchecked serialize
(i.e. "garbage in, garbage out")? If so, would unchecked_serialize
be an appropriate name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, that was the context I was missing. In this diff all serialize2
methods have the same implementation of the serialize
in the same files, but I was missing that the subclasses were overriding the method.
If we want to serialize2
to be part of the public API I'd go with @jonathanhefner's suggestion, but with the long term goal of renaming it to serialize
and renaming the current serialize
to something as verify_and_serialize
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rafaelfranca I like @jonathanhefner's suggestion too. @eileencodes wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me!
@@ -11,6 +11,10 @@ def initialize(precision: nil, limit: nil, scale: nil) | |||
@limit = limit | |||
end | |||
|
|||
def serializable?(_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be part of the public API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should. I would like it if active model types can tell us whether or not a value can be serialized for that particular type. We should add some docs, but I didn't want to go that far unless everyone was OK with adding this to the public API
This is a smaller alternative of performance improvement, without refactoring type casting mechanism rails#39009. This is relatively a smaller change (but about 40% faster than before), so I think this could be easier reviewed without discuss about refactoring type casting mechanism. This just makes `attribute.in(values)` less allocation from an array of casted nodes to one casted array node. ```ruby ids = (1..1000).each.map do |n| Post.create!.id end Benchmark.ips do |x| x.report("where with ids") do Post.where(id: ids).to_a end x.report("where with sanitize") do Post.where(ActiveRecord::Base.sanitize_sql(["id IN (?)", ids])).to_a end x.compare! end ``` Before: ``` Warming up -------------------------------------- where with ids 7.000 i/100ms where with sanitize 13.000 i/100ms Calculating ------------------------------------- where with ids 70.661 (± 5.7%) i/s - 357.000 in 5.072771s where with sanitize 130.993 (± 7.6%) i/s - 663.000 in 5.096085s Comparison: where with sanitize: 131.0 i/s where with ids: 70.7 i/s - 1.85x slower ``` After: ``` Warming up -------------------------------------- where with ids 10.000 i/100ms where with sanitize 13.000 i/100ms Calculating ------------------------------------- where with ids 98.174 (± 7.1%) i/s - 490.000 in 5.012851s where with sanitize 132.289 (± 8.3%) i/s - 663.000 in 5.052728s Comparison: where with sanitize: 132.3 i/s where with ids: 98.2 i/s - 1.35x slower ```
We fixed the method names, made some improvements, and manually merged this in 70ddb8a. |
A coworker at GitHub found a few months back that if we used
santitize_sql
overwhere
when we knew the values going intowhere
it was a lot faster than
where
.This PR adds a new Arel node type called
HomogenousIn
that will beused when Rails knows the values are all homogenous and can therefore
pick a faster codepath. This new codepath skips some of the required
processing by
where
to makewheres
with homogenous arrays fasterwithout requiring the application author to know when to use which query
type.
Using our benchmark code:
Before this PR comparing where with a list of IDs to sanitize sql:
After this PR comparing where with a list of IDs to sanitize sql:
Notes:
sanitize2
is a horrific name, obviously. We aren't sure what toname it though. Thoughts?
HomogenousIn
may not be the best name for our new node.Co-authored-by: Aaron Patterson aaron.patterson@gmail.com
cc/ @tenderlove