Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.5x faster `ActiveRecord#respond_to?` - No longer allocates strings #34197

Merged
merged 1 commit into from Oct 17, 2018

Conversation

Projects
None yet
4 participants
@schneems
Copy link
Member

commented Oct 11, 2018

This is an alternative to #34195

The active record respond_to? method needs to do two things if super does not say that the method exists. It has to see if the "name" being passed in represents a column in the table. If it does then it needs to pass it to has_attribute? to see if the key exists in the current object. The reason why this is slow is that has_attribute? needs a string and most (almost all) objects passed in are symbols.

The only time we need to allocate a string in this method is if the column does exist in the database, and since these are a limited number of strings (since column names are a finite set) then we can pre-generate all of them and use the same string.

We generate a list hash of column names and convert them to symbols, and store the value as the string name. This allows us to both check if the "name" exists as a column, but also provides us with a string object we can use for the has_attribute? call.

I then ran the test suite and found there was only one case where we're intentionally passing in a string and changed it to a symbol. (However there are tests where we are using a string key, but they don't ship with rails).

As re-written this method should never allocate unless the user passes in a string key, which is fairly uncommon with respond_to?.

This also eliminates the need to special case every common item that might come through the method via the case that was originally added in f80aa59 (by me) and then with an attempt to extend in #34195.

As a bonus this reduces 6,300 comparisons (in the CodeTriage app homepage) to 450 as we also no longer need to loop through the column array to check for an include?.

Results

Memory:

Before: Total allocated: 752009 bytes (6527 objects)
After: Total allocated: 743921 bytes (6325 objects)
Diff: (752009 - 743921) / 752009.0 # => 1.05%

Speed:

On CodeTriage i'm seeing around a ~1% performance gain with this patch versus master. Here are the raw numbers https://gist.github.com/schneems/dc139542331baa9cca693ec7ecdfc40a.

@rails-bot rails-bot bot added the activerecord label Oct 11, 2018

@schneems schneems force-pushed the schneems:schneems/symbol-hash-respond_to branch 2 times, most recently from a61a024 to 85e6498 Oct 11, 2018

@lsylvester

This comment has been minimized.

Copy link
Contributor

commented Oct 12, 2018

I think that the idea here is good, but I wonder if its possible to move this conversion into a more generic method like has_attribute? so that more instances of checking an attribute by symbol don't allocate strings.

It also looks like the respond_to? method in ActiveModel::AttributeMethods is calling to_s, so the symbol will still be being to_sed when super is called - so that could be another string allocation we could skip.

@schneems

This comment has been minimized.

Copy link
Member Author

commented Oct 12, 2018

Thanks for the review, you've got a sharp eye.

I checked and that code path does not get called by my benchmark with CodeTriage. When I limit to only the attribute_method.rb files on this branch this is all I get:

Total allocated: 1040 bytes (26 objects)
Total retained:  0 bytes (0 objects)

allocated memory by gem
-----------------------------------
       600  activemodel/lib
       440  activerecord/lib

allocated memory by file
-----------------------------------
       600  /Users/rschneeman/Documents/projects/rails/activemodel/lib/active_model/attribute_methods.rb
       440  /Users/rschneeman/Documents/projects/rails/activerecord/lib/active_record/attribute_methods.rb

allocated memory by location
-----------------------------------
       600  /Users/rschneeman/Documents/projects/rails/activemodel/lib/active_model/attribute_methods.rb:219
       320  /Users/rschneeman/Documents/projects/rails/activerecord/lib/active_record/attribute_methods.rb:230
        80  /Users/rschneeman/Documents/projects/rails/activerecord/lib/active_record/attribute_methods.rb:198
        40  /Users/rschneeman/Documents/projects/rails/activerecord/lib/active_record/attribute_methods.rb:196

1040 bytes is 0.13 % of a request (i.e. a little more than a tenth of one percent which is really tiny). Only 600 bytes are coming from active_model/attribute_methods.rb:219 which is not the to_s you're referring to.

Changing attribute models to be symbol keyed would be an extremely difficult and large change, and for my case here it would only save an additional 600 bytes. It might be a worthwile change for another app or another page, but it's not a bottleneck for my benchmark. It was worth investigating but I think this is the approach I would like to move forward with for now.

@@ -477,6 +484,7 @@ def load_schema!
def reload_schema_from_cache
@arel_table = nil
@column_names = nil
@symbol_column_to_string_name_hash = nil

This comment has been minimized.

Copy link
@schneems

schneems Oct 12, 2018

Author Member

@sgrif is there a place I should be testing that this works? I poked around a bit an didn't see a centralized location for testing this behavior. Also, thoughts on the general concept of the PR and the implementation are appreciated if you have time.

@@ -388,6 +388,13 @@ def column_names
@column_names ||= columns.map(&:name)
end

def symbol_column_to_string(name_symbol) # :nodoc:
@symbol_column_to_string_name_hash ||= column_names.each_with_object({}) do |name, hash|

This comment has been minimized.

Copy link
@eugeneius

eugeneius Oct 13, 2018

Member
@symbol_column_to_string_name_hash ||= column_names.index_by(&:to_sym)

This comment has been minimized.

Copy link
@schneems

schneems Oct 14, 2018

Author Member

It's cleaner. I benchmarked it and it's just as fast. I'll update to this method

column_names = ["id", "name", "user_name", "issues_count", "language", "description", "full_name", "notes", "created_at", "updated_at", "github_error_msg", "commit_sha", "stars_count", "subscribers_count"].freeze

require 'benchmark/ips'

Benchmark.ips do |x|
  x.report("each_with_object") {
    column_names.each_with_object({}) do |name, hash|
      hash[name.to_sym] = name
    end
  }
  x.report("index_by") {
    column_names.index_by(&:to_sym)
  }
  x.compare!
end
# Warming up --------------------------------------
#     each_with_object    23.947k i/100ms
#             index_by    23.890k i/100ms
# Calculating -------------------------------------
#     each_with_object    256.539k (± 3.4%) i/s -      1.293M in   5.046333s
#             index_by    257.835k (± 5.3%) i/s -      1.290M in   5.018684s

# Comparison:
#             index_by:   257835.3 i/s
#     each_with_object:   256539.1 i/s - same-ish: difference falls within error

@schneems schneems force-pushed the schneems:schneems/symbol-hash-respond_to branch from 85e6498 to af6d708 Oct 14, 2018

@lsylvester

This comment has been minimized.

Copy link
Contributor

commented Oct 14, 2018

@schneems I have worked out why you are not seeing the allocations in the ActiveModel::AttributeMethods. respond_to? only checks if the attribute exists for the main attribute method, but all of the methods derived from it ignore this.

developer = Developer.select("id").first
developer.respond_to?(:updated_at) #=> false
developer.updated_at #=> ActiveModel::MissingAttributeError: missing attribute: updated_at
developer.respond_to?(:updated_at_before_type_cast) #=> true
developer.updated_at_before_type_cast #=> nil
developer.respond_to?(:updated_at?) #=> true
developer.updated_at? #=> ActiveModel::MissingAttributeError: missing attribute: updated_at

This behaviour is inconsistent and I think we should try to make consistent behaviour between them.

I think that I would prefer to make respond_to? return true even if the method is missing - as the object should know how to handle the method - and it is more likely that the column as removed in error. Developers can always use has_attribute? to check this condition instead.

I would be interested what the use cases are where fields are omitted from the select clause - I would think that in would normally be as a performance optimisation - in which case you wouldn't expect the behaviour if the application to change.

In the particular case of cache_version I think it would make more sense to raise an error if the updated_at column exists but is missing - otherwise we could lead to caches that never expire.

@schneems

This comment has been minimized.

Copy link
Member Author

commented Oct 15, 2018

This behaviour is inconsistent and I think we should try to make consistent behaviour between them.

That example is a major WAT. Good catch.

I think we should open a new issue and work on that behavior independently of this perf refactor. My goal in this PR is to change as little as possible while making things faster. This sounds like a bug that we need to make a decision on no matter what. I don't have the context here to know if the existing behavior is intentional or not.

The behavior of try and respond_to? are linked. I don't think that you should ever get an exception from calling try, or at least it should be the extreme minority of the cases.

I would be interested what the use cases are where fields are omitted from the select clause - I would think that in would normally be as a performance optimisation - in which case you wouldn't expect the behaviour if the application to change.

Yes, that is my belief as well. I've done this as a perf optimization across CodeTriage. When I did this I basically limited to one or two fields and then loaded the page and then waited for errors to show which fields I was using but weren't being loaded.

In the particular case of cache_version I think it would make more sense to raise an error if the updated_at column exists but is missing - otherwise we could lead to caches that never expire.

I agree, that behavior is pretty confusing otherwise and would be hard to diagnose.

ActiveRecord#respond_to? No longer allocates strings
This is an alternative to #34195

The active record `respond_to?` method needs to do two things if `super` does not say that the method exists. It has to see if the "name" being passed in represents a column in the table. If it does then it needs to pass it to `has_attribute?` to see if the key exists in the current object. The reason why this is slow is that `has_attribute?` needs a string and most (almost all) objects passed in are symbols.

The only time we need to allocate a string in this method is if the column does exist in the database, and since these are a limited number of strings (since column names are a finite set) then we can pre-generate all of them and use the same string. 

We generate a list hash of column names and convert them to symbols, and store the value as the string name. This allows us to both check if the "name" exists as a column, but also provides us with a string object we can use for the `has_attribute?` call. 

I then ran the test suite and found there was only one case where we're intentionally passing in a string and changed it to a symbol. (However there are tests where we are using a string key, but they don't ship with rails).

As re-written this method should never allocate unless the user passes in a string key, which is fairly uncommon with `respond_to?`.

This also eliminates the need to special case every common item that might come through the method via the `case` that was originally added in f80aa59 (by me) and then with an attempt to extend in #34195.

As a bonus this reduces 6,300 comparisons (in the CodeTriage app homepage) to 450 as we also no longer need to loop through the column array to check for an `include?`.

@schneems schneems force-pushed the schneems:schneems/symbol-hash-respond_to branch from af6d708 to f45267b Oct 15, 2018

schneems added a commit to schneems/rails that referenced this pull request Oct 15, 2018

Do not silently fail to generate a cache_version
When an `updated_at` column exists on the model, but is not available on the instance (likely due to a select), we should raise an error rather than silently not generating a cache_version. Without this behavior it's likely that cache entries will not be able to be invalidated and this will happen without notice. 

This behavior was reported and described by @isylvester in rails#34197 (comment).

schneems added a commit to schneems/rails that referenced this pull request Oct 15, 2018

Do not silently fail to generate a cache_version
When an `updated_at` column exists on the model, but is not available on the instance (likely due to a select), we should raise an error rather than silently not generating a cache_version. Without this behavior it's likely that cache entries will not be able to be invalidated and this will happen without notice. 

This behavior was reported and described by @isylvester in rails#34197 (comment).

schneems added a commit to schneems/rails that referenced this pull request Oct 15, 2018

Do not silently fail to generate a cache_version
When an `updated_at` column exists on the model, but is not available on the instance (likely due to a select), we should raise an error rather than silently not generating a cache_version. Without this behavior it's likely that cache entries will not be able to be invalidated and this will happen without notice. 

This behavior was reported and described by @lsylvester in rails#34197 (comment).

schneems added a commit to schneems/rails that referenced this pull request Oct 15, 2018

Do not silently fail to generate a cache_version
When an `updated_at` column exists on the model, but is not available on the instance (likely due to a select), we should raise an error rather than silently not generating a cache_version. Without this behavior it's likely that cache entries will not be able to be invalidated and this will happen without notice. 

This behavior was reported and described by @lsylvester in rails#34197 (comment).

schneems added a commit to schneems/rails that referenced this pull request Oct 16, 2018

Do not silently fail to generate a cache_version
When an `updated_at` column exists on the model, but is not available on the instance (likely due to a select), we should raise an error rather than silently not generating a cache_version. Without this behavior it's likely that cache entries will not be able to be invalidated and this will happen without notice. 

This behavior was reported and described by @lsylvester in rails#34197 (comment).

schneems added a commit to schneems/rails that referenced this pull request Oct 16, 2018

Do not silently fail to generate a cache_version
When an `updated_at` column exists on the model, but is not available on the instance (likely due to a select), we should raise an error rather than silently not generating a cache_version. Without this behavior it's likely that cache entries will not be able to be invalidated and this will happen without notice. 

This behavior was reported and described by @lsylvester in rails#34197 (comment).

schneems added a commit to schneems/rails that referenced this pull request Oct 16, 2018

Do not silently fail to generate a cache_version
When an `updated_at` column exists on the model, but is not available on the instance (likely due to a select), we should raise an error rather than silently not generating a cache_version. Without this behavior it's likely that cache entries will not be able to be invalidated and this will happen without notice. 

This behavior was reported and described by @lsylvester in rails#34197 (comment).

schneems added a commit to schneems/rails that referenced this pull request Oct 16, 2018

Do not silently fail to generate a cache_version
When an `updated_at` column exists on the model, but is not available on the instance (likely due to a select), we should raise an error rather than silently not generating a cache_version. Without this behavior it's likely that cache entries will not be able to be invalidated and this will happen without notice. 

This behavior was reported and described by @lsylvester in rails#34197 (comment).

@schneems schneems merged commit ead8683 into rails:master Oct 17, 2018

2 checks passed

codeclimate All good!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

schneems added a commit to schneems/rails that referenced this pull request Oct 17, 2018

Do not silently fail to generate a cache_version
When an `updated_at` column exists on the model, but is not available on the instance (likely due to a select), we should raise an error rather than silently not generating a cache_version. Without this behavior it's likely that cache entries will not be able to be invalidated and this will happen without notice. 

This behavior was reported and described by @lsylvester in rails#34197 (comment).

benoittgt added a commit to benoittgt/ar_and_draper_allocation that referenced this pull request Nov 15, 2018

benoittgt added a commit to benoittgt/ar_and_draper_allocation that referenced this pull request Nov 15, 2018

@schneems

This comment has been minimized.

Copy link
Member Author

commented Jan 4, 2019

Here's a blog post about the how the code before this patch caused a hotspot in an app https://medium.com/appaloosa-store-engineering/ruby-memory-activerecord-and-draper-64f06abeeb34

benoittgt added a commit to benoittgt/ar_and_draper_allocation that referenced this pull request Jan 8, 2019

@schneems schneems changed the title ActiveRecord#respond_to? No longer allocates strings 1.5x faster `ActiveRecord#respond_to?` - No longer allocates strings May 13, 2019

@schneems

This comment has been minimized.

Copy link
Member Author

commented May 13, 2019

On a microbenchmark, this makes the method call 1.5x faster https://gist.github.com/schneems/de03da00e1dd9f51b64e943878364b1b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.