Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Freeze columns before using them as hash keys #7631

Merged
merged 1 commit into from
@jeremyevans

This reduces the number of allocated strings from columns * (rows + 1) to just columns. This should fix #7629.

@jeremyevans jeremyevans Freeze columns before using them as hash keys
This reduces the number of allocated strings from columns * (rows + 1) to just columns.
3aef5ce
@rafaelfranca

Could you add benchmark to the commit message?

cc @tenderlove @jonleighton

@jeremyevans

Why should I spend more time helping the competition? ;)

@jeremyevans jeremyevans reopened this
@rafaelfranca

lol. Right. I just ask, we don't need to do. I'll not merge without @tenderlove and @jonleighton approval.

Also thank you for the pull request.

@dmathieu dmathieu commented on the diff
activerecord/lib/active_record/result.rb
@@ -11,7 +11,7 @@ class Result
attr_reader :columns, :rows, :column_types
def initialize(columns, rows, column_types = {})
- @columns = columns
+ @columns = columns.map{|c| c.freeze}
@dmathieu Collaborator

master is only 1.9 compatible. So you can use columns.map(&:freeze)

@lunks, I saw that before you opened the pull request. Like I said, it's up to rails core to change the style after they merge it, if they even decide to merge it.

@steveklabnik Collaborator

to change the style after they merge it,

Naw, style changes are always asked for before merging, not after. We actually don't do style-only commits.

@evanphx
evanphx added a note

Are the values in @columns actually used as Hash keys? If not always, then this change is actually worse than without it because this code always allocates another copy of each string and then freezes it. Basically, it's paying constant upfront cost instead of a variable cost. You'll need to just show why that trade off is valid.

@steveklabnik I don't plan on doing additional work on this patch. If that means you won't merge the patch due to style issues, that's fine with me. I'm not an ActiveRecord user, for obvious reasons.

@evanphx most callers are going to call hash_rows, which is going to use the columns as hash keys (https://github.com/jeremyevans/rails/blob/3aef5ce9b35a4659379201eb6bb1dba355a83ba4/activerecord/lib/active_record/result.rb#L57). It's possible to move this map call into hash_rows if you are worried about affecting the other cases. Also, this should not allocate any additional strings, since String#freeze just modifies the object's flags in-place, it doesn't allocate another object. However, Array#map does allocate an additional array. The only situation where this change allocates more objects is in the case where 0 rows are returned or hash_rows is not called, in which it allocates 1 additional object (the array). If a single row with a single column is returned, this will allocate the same number of objects (an array object instead of a string object). In all other cases, it should allocate fewer objects (columns * rows - 1 fewer).

@evanphx
evanphx added a note

@jeremyevans Oh yes yes. Just an extra Array. Ooops! I hadn't had dinner yet!

We can do @columns = columns.each(&:freeze) or @columns = columns.each {|c| c.freeze} to avoid allocating that extra array.

@tenderlove Owner
@lunks
lunks added a note

Perhaps it's better to dup, then freeze them?

@wlipa
wlipa added a note

That should work almost as well, since the Result isn't per returned instance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@wlipa

I verified this fix worked for the case I was studying in #7629, bringing the number of extra strings per AR model instance down from 15 to 2. Thanks!

@rafaelfranca

@wlipa do you have to check this case that can be shared?

@wlipa

Here's a two line check that relies on an AR model with more than one instance in the db.

  arr = MyModel.all # or any other AR query that returns > 1 of the same object type
  arr.first.instance_variable_get("@attributes").keys.first.object_id == arr.last.instance_variable_get("@attributes").keys.first.object_id

That will be true if the keys aren't getting duplicated.

(Note, this assumes that columns are always stored in the same order in the attributes hash. Not sure if that's a safe assumption. If not one could select the key named "id".)

@fxn
Owner

While @tenderlove or @jonleighton say something... I'd like to suggest adding a comment. Without a comment the next guy reading that file will wonder why that is done.

@tenderlove
Owner

Can we actually prove that this is a performance bottleneck before merging? "Many objects allocated" does not equal "slow system". Also, note that while we may be reducing memory here, we're also trading it for speed (extra method calls to freeze). Please, please, please someone make hard numbers and graphs before we embark on this micro-optimization hunt.

I am a hard :-1: on this pull request until someone can actually demonstrate a real performance increase.

@wlipa

For this optimization, it shouldn't be necessary to call freeze more than N times per process where N is the number of db fields in AR classes (ie, a small constant cost at startup). That would probably change the patch though.

@wlipa

@jeremyevans stated that Ruby internally dups and calls freeze on strings used as hash keys. Given that, this patch should substantially reduce the number of calls to freeze, simply because there are so many fewer strings in play.

That said I'll take a look at gathering some metrics. Never a bad idea!

@argent-smith argent-smith referenced this pull request in rubynoname/ShowNotes
Closed

об оптимизации #33

@wlipa

I took a more detailed look at an AR model with 9 integer fields, 2 datetimes, and 2 strings. Querying the full table out from my db with MyModel.all returned 9200 rows.

The growth in process resident set size after the query was 21m (from 84m to 105m) with the patch applied. Without the patch, the growth was 29m (from 87m to 116m). So the patch yielded at 27.5% reduction in process rss growth for this query.

The average CPU time to do the query was 1.16s patched and 1.22s unpatched, so 5% reduction there.

This was with Ruby 1.9.3p194, Ubuntu 12.04, and Rails 3.2.8.

@steveklabnik
Collaborator

@wlipa could you provide the benchmark script, please?

@wlipa

You mean "MyModel.all" with the database as described? It's really as simple as that.

@tenderlove
Owner
@thedarkone

Pre-freezing String keys before using them as keys in multiple Hash instances is a very cool optimisation that I myself frequently use. This should be merged.

@wlipa

Here are instructions for replicating from scratch in a new app.

rails new querypig -d mysql
rails generate model MyModel user_id:integer kind:integer product_id:integer comment_id:integer duration:integer category_id:integer ip:string leaderboard_id:integer status_id:integer description:string
rake db:create db:migrate

Populate the database:

10000.times do
  MyModel.create({
    :user_id => rand(1..100),
    :kind => rand(1..100),
    :product_id => rand(1..100),
    :comment_id => rand(1..100),
    :duration => rand(1..100),
    :category_id => rand(1..100),
    :ip => rand(1..100).to_s,
    :leaderboard_id => rand(1..100),
    :status_id => rand(1..100),
    :description => rand(1..100).to_s,
  })
end

Measure the memory usage:

def resident_mb
  `ps -o rss= -p #{Process.pid}`.to_i / 1.kilobyte
end

def stats
  GC.start
  rmb = resident_mb
  puts "pid #{Process.pid} rss: #{rmb}m gc: #{GC.stat}"
  rmb
end


# preflight
MyModel.first

initial_rss = cur_rss = stats

8.times do 
  # potentially piggish
  mm = MyModel.all
  cur_rss = stats
end

puts "rss delta: #{cur_rss - initial_rss}"

I did that by sticking the above in script/measure.rb and using the command "rails runner script/measure.rb". On OSX I consistently got a delta of right near 18m with the patch, and a varying but always larger amount between 25-41m without it.

@spastorino
Owner

I wonder why do adapters return strings instead of symbols.

@NZKoz
Owner

As @jeremyevans points out, generating columns.size strings is considerably less GC pressure than columns.size * result_set.size. These strings are the keys to the attributes hash, so I don't think we need to worry about users expecting to be able to do something crazy like

  @attributes.keys.map(&:upcase!)

This seems like a complete no brainer to me, the only change I'd suggest is just interning the strings to symbols but that's something that could surprise users.

@jeremyevans

Note that you can't upcase! hash string keys in ruby anyway, even if the string used as a key wasn't frozen:

$ ruby -e '{"a"=>1}.keys.map{|s| s.upcase!}' 
-e:1:in `upcase!': can't modify frozen String (RuntimeError)
        from -e:1:in `block in <main>'
        from -e:1:in `map'
        from -e:1:in `<main>'

This is because ruby uses a frozen dup of the string as the hash key if the string isn't already frozen, so pretty much all string hash keys in ruby are already frozen.

@tenderlove tenderlove merged commit 2004ef2 into rails:master
@spastorino
Owner

I've moved the code to hash_rows methods and added a comment 2068d30
Thanks @jeremyevans

@wlipa

Doesn't that change it by calling @columns.map { |c| c.freeze } once per row rather than once per result?

@spastorino
Owner

Ouch yeah sorry. Fixed da400fb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Sep 13, 2012
  1. @jeremyevans

    Freeze columns before using them as hash keys

    jeremyevans authored
    This reduces the number of allocated strings from columns * (rows + 1) to just columns.
This page is out of date. Refresh to see the latest.
Showing with 1 addition and 1 deletion.
  1. +1 −1  activerecord/lib/active_record/result.rb
View
2  activerecord/lib/active_record/result.rb
@@ -11,7 +11,7 @@ class Result
attr_reader :columns, :rows, :column_types
def initialize(columns, rows, column_types = {})
- @columns = columns
+ @columns = columns.map{|c| c.freeze}
@dmathieu Collaborator

master is only 1.9 compatible. So you can use columns.map(&:freeze)

@lunks, I saw that before you opened the pull request. Like I said, it's up to rails core to change the style after they merge it, if they even decide to merge it.

@steveklabnik Collaborator

to change the style after they merge it,

Naw, style changes are always asked for before merging, not after. We actually don't do style-only commits.

@evanphx
evanphx added a note

Are the values in @columns actually used as Hash keys? If not always, then this change is actually worse than without it because this code always allocates another copy of each string and then freezes it. Basically, it's paying constant upfront cost instead of a variable cost. You'll need to just show why that trade off is valid.

@steveklabnik I don't plan on doing additional work on this patch. If that means you won't merge the patch due to style issues, that's fine with me. I'm not an ActiveRecord user, for obvious reasons.

@evanphx most callers are going to call hash_rows, which is going to use the columns as hash keys (https://github.com/jeremyevans/rails/blob/3aef5ce9b35a4659379201eb6bb1dba355a83ba4/activerecord/lib/active_record/result.rb#L57). It's possible to move this map call into hash_rows if you are worried about affecting the other cases. Also, this should not allocate any additional strings, since String#freeze just modifies the object's flags in-place, it doesn't allocate another object. However, Array#map does allocate an additional array. The only situation where this change allocates more objects is in the case where 0 rows are returned or hash_rows is not called, in which it allocates 1 additional object (the array). If a single row with a single column is returned, this will allocate the same number of objects (an array object instead of a string object). In all other cases, it should allocate fewer objects (columns * rows - 1 fewer).

@evanphx
evanphx added a note

@jeremyevans Oh yes yes. Just an extra Array. Ooops! I hadn't had dinner yet!

We can do @columns = columns.each(&:freeze) or @columns = columns.each {|c| c.freeze} to avoid allocating that extra array.

@tenderlove Owner
@lunks
lunks added a note

Perhaps it's better to dup, then freeze them?

@wlipa
wlipa added a note

That should work almost as well, since the Result isn't per returned instance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@rows = rows
@hash_rows = nil
@column_types = column_types
Something went wrong with that request. Please try again.