Speed up homogeneous AR lists / reduce allocations #33223
Conversation
This commit speeds up allocating homogeneous lists of AR objects. We can know if the result set contains an STI column before initializing every AR object, so this change pulls the "does this result set contain an STI column?" test up, then uses a specialized instantiation function. This way we only have to check for an STI column once rather than N times. This change also introduces a new initialization function that is meant for use when allocating AR objects that come from the database. Doing this allows us to eliminate one hash allocation per AR instance. Here is a benchmark: ```ruby require 'active_record' require 'benchmark/ips' ActiveRecord::Base.establish_connection adapter: "sqlite3", database: ":memory:" ActiveRecord::Migration.verbose = false ActiveRecord::Schema.define do create_table :users, force: true do |t| t.string :name t.timestamps null: false end end class User < ActiveRecord::Base; end 2000.times do User.create!(name: "Gorby") end Benchmark.ips do |x| x.report("find") do User.limit(2000).to_a end end ``` Results: Before: ``` [aaron@TC activerecord (master)]$ be ruby -I lib:~/git/allocation_tracer/lib speed.rb Warming up -------------------------------------- find 5.000 i/100ms Calculating ------------------------------------- find 56.192 (± 3.6%) i/s - 285.000 in 5.080940s ``` After: ``` [aaron@TC activerecord (homogeneous-allocation)]$ be ruby -I lib:~/git/allocation_tracer/lib speed.rb Warming up -------------------------------------- find 7.000 i/100ms Calculating ------------------------------------- find 72.204 (± 2.8%) i/s - 364.000 in 5.044592s ```
@@ -349,6 +349,26 @@ def init_with(coder) | |||
self | |||
end | |||
|
|||
## | |||
# Initializer used for instantiating objects that have been read from the | |||
# database. +attributes+ should be an attributes object, and unlike the |
utilum
Jun 26, 2018
Contributor
double space after # database.
redundant?
double space after # database.
redundant?
tenderlove
Jun 26, 2018
Author
Member
I'm not sure. We learned double space after punctuation in school because everything was fixed width font at that time. These days we don't need it because most stuff (except code) isn't fixed width. Since code is fixed width, I go with double space but I'm not sure what our style guide is in this case. @fxn?
I'm not sure. We learned double space after punctuation in school because everything was fixed width font at that time. These days we don't need it because most stuff (except code) isn't fixed width. Since code is fixed width, I go with double space but I'm not sure what our style guide is in this case. @fxn?
* master: Call initialize after allocate Remove `ActiveSupport::Concern` from `ActiveRecord::Aggregations` Add example for no_touching? in active_record/no_touching for api docs [ci skip] Generate a new key for each service test
Now that `allocate` is removed, we need to define attribute methods in all "init" methods.
Some questions about the documentation, but the implementation looks good. |
instantiate_instance_of(klass, attributes, column_types, &block) | ||
end | ||
|
||
# Given a class, an attributes hash, +instantiate+ returns a new instance |
rafaelfranca
Jun 26, 2018
Member
I think it is +instantiate_instance_of+
instead of instantiate
here. But I guess we can remove that part of the documentation if we rephrase it.
I think it is +instantiate_instance_of+
instead of instantiate
here. But I guess we can remove that part of the documentation if we rephrase it.
@@ -43,6 +43,11 @@ def initialize(columns, rows, column_types = {}) | |||
@column_types = column_types | |||
end | |||
|
|||
# Does this result set include the column named +name+ ? |
rafaelfranca
Jun 26, 2018
Member
To follow the same convention on this same class we should do:
# Returns true if this result set include the column named +name+
To follow the same convention on this same class we should do:
# Returns true if this result set include the column named +name+
tenderlove
Jun 26, 2018
Author
Member
@rafaelfranca how about Returns truthy if this result set include the column named +name+
. I don't want to guarantee the return value is true
or false
if we decide to change implementation (I remember there was some controversy because we changed a method to return a truthy value). @fxn thoughts?
@rafaelfranca how about Returns truthy if this result set include the column named +name+
. I don't want to guarantee the return value is true
or false
if we decide to change implementation (I remember there was some controversy because we changed a method to return a truthy value). @fxn thoughts?
rafaelfranca
Jun 26, 2018
Member
true
in our guideline means truthy. +true+
means the instance of TrueClass
.
http://guides.rubyonrails.org/api_documentation_guidelines.html#booleans
true
in our guideline means truthy. +true+
means the instance of TrueClass
.
http://guides.rubyonrails.org/api_documentation_guidelines.html#booleans
fxn
Jun 26, 2018
•
Member
I am not really happy with our convention (that I defined myself many years ago), which is to use fixed-width font for the singletons, and regular font to mean "true" in the Ruby sense of the word (i.e., "truthy"). My thoughts by then were that Ruby defines what is true and false, not what is truthy and falsey, so I stuck with that jargon for respect.
In retrospect, that choice was too subtle and time has shown it doesn't communicate well in practice. It has also the problem that in sample code like
user.blank? # => true
you cannot distinguish if that is regular or fixed-width, therefore people assume the singleton true
sometimes and that has made us change some return contracts, to be stricter.
I suggested to change this convention in Basecamp a while back, but it was agreed to leave it like it is.
I am not really happy with our convention (that I defined myself many years ago), which is to use fixed-width font for the singletons, and regular font to mean "true" in the Ruby sense of the word (i.e., "truthy"). My thoughts by then were that Ruby defines what is true and false, not what is truthy and falsey, so I stuck with that jargon for respect.
In retrospect, that choice was too subtle and time has shown it doesn't communicate well in practice. It has also the problem that in sample code like
user.blank? # => true
you cannot distinguish if that is regular or fixed-width, therefore people assume the singleton true
sometimes and that has made us change some return contracts, to be stricter.
I suggested to change this convention in Basecamp a while back, but it was agreed to leave it like it is.
|
||
# Given a class, an attributes hash, +instantiate+ returns a new instance | ||
# of the class. Accepts only keys as strings. | ||
def instantiate_instance_of(klass, attributes, column_types = {}, &block) |
rafaelfranca
Jun 26, 2018
Member
Do we want/expect people to call this method on their applications? If not this should be # :nodoc:
Do we want/expect people to call this method on their applications? If not this should be # :nodoc:
tenderlove
Jun 26, 2018
Author
Member
I don't think we want people to call this. I'll :nodoc:
it.
I don't think we want people to call this. I'll :nodoc:
it.
This commit speeds up allocating homogeneous lists of AR objects. We
can know if the result set contains an STI column before initializing
every AR object, so this change pulls the "does this result set contain
an STI column?" test up, then uses a specialized instantiation function.
This way we only have to check for an STI column once rather than N
times.
This change also introduces a new initialization function that is meant
for use when allocating AR objects that come from the database. Doing
this allows us to eliminate one hash allocation per AR instance.
Here is a benchmark:
Results:
Before:
After: