-
Notifications
You must be signed in to change notification settings - Fork 21.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Freeze columns before using them as hash keys #7631
Conversation
This reduces the number of allocated strings from columns * (rows + 1) to just columns.
Could you add benchmark to the commit message? |
Why should I spend more time helping the competition? ;) |
lol. Right. I just ask, we don't need to do. I'll not merge without @tenderlove and @jonleighton approval. Also thank you for the pull request. |
@@ -11,7 +11,7 @@ class Result | |||
attr_reader :columns, :rows, :column_types | |||
|
|||
def initialize(columns, rows, column_types = {}) | |||
@columns = columns | |||
@columns = columns.map{|c| c.freeze} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
master is only 1.9 compatible. So you can use columns.map(&:freeze)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @jeremyevans
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lunks, I saw that before you opened the pull request. Like I said, it's up to rails core to change the style after they merge it, if they even decide to merge it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to change the style after they merge it,
Naw, style changes are always asked for before merging, not after. We actually don't do style-only commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the values in @columns
actually used as Hash keys? If not always, then this change is actually worse than without it because this code always allocates another copy of each string and then freezes it. Basically, it's paying constant upfront cost instead of a variable cost. You'll need to just show why that trade off is valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeremyevans Oh yes yes. Just an extra Array. Ooops! I hadn't had dinner yet!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do @columns = columns.each(&:freeze)
or @columns = columns.each {|c| c.freeze}
to avoid allocating that extra array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tbh, I'd rather the thing that calls initialize do the freezing. Freezing strings does mutate them. People may not expect that we froze their strings.
Aaron Patterson
http://tenderlovemaking.com/
I'm on an iPhone so I apologize for top posting.
On Sep 14, 2012, at 8:14 PM, thedarkone notifications@github.com wrote:
In activerecord/lib/active_record/result.rb:
@@ -11,7 +11,7 @@ class Result
attr_reader :columns, :rows, :column_typesdef initialize(columns, rows, column_types = {})
�
Reply to this email directly or view it on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it's better to dup, then freeze them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should work almost as well, since the Result isn't per returned instance.
I verified this fix worked for the case I was studying in #7629, bringing the number of extra strings per AR model instance down from 15 to 2. Thanks! |
@wlipa do you have to check this case that can be shared? |
Here's a two line check that relies on an AR model with more than one instance in the db.
That will be true if the keys aren't getting duplicated. (Note, this assumes that columns are always stored in the same order in the attributes hash. Not sure if that's a safe assumption. If not one could select the key named "id".) |
While @tenderlove or @jonleighton say something... I'd like to suggest adding a comment. Without a comment the next guy reading that file will wonder why that is done. |
Can we actually prove that this is a performance bottleneck before merging? "Many objects allocated" does not equal "slow system". Also, note that while we may be reducing memory here, we're also trading it for speed (extra method calls to I am a hard 👎 on this pull request until someone can actually demonstrate a real performance increase. |
For this optimization, it shouldn't be necessary to call freeze more than N times per process where N is the number of db fields in AR classes (ie, a small constant cost at startup). That would probably change the patch though. |
@jeremyevans stated that Ruby internally dups and calls freeze on strings used as hash keys. Given that, this patch should substantially reduce the number of calls to freeze, simply because there are so many fewer strings in play. That said I'll take a look at gathering some metrics. Never a bad idea! |
I took a more detailed look at an AR model with 9 integer fields, 2 datetimes, and 2 strings. Querying the full table out from my db with MyModel.all returned 9200 rows. The growth in process resident set size after the query was 21m (from 84m to 105m) with the patch applied. Without the patch, the growth was 29m (from 87m to 116m). So the patch yielded at 27.5% reduction in process rss growth for this query. The average CPU time to do the query was 1.16s patched and 1.22s unpatched, so 5% reduction there. This was with Ruby 1.9.3p194, Ubuntu 12.04, and Rails 3.2.8. |
@wlipa could you provide the benchmark script, please? |
You mean "MyModel.all" with the database as described? It's really as simple as that. |
What were the column names you used? How many times did you run the program? What was the std dev? Can you provide the code so that we can reproduce your results and add it to the benchmarks? Thanks! Aaron Patterson On Sep 14, 2012, at 1:58 PM, Bill Lipa notifications@github.com wrote:
|
Pre-freezing |
Here are instructions for replicating from scratch in a new app.
Populate the database:
Measure the memory usage:
I did that by sticking the above in script/measure.rb and using the command "rails runner script/measure.rb". On OSX I consistently got a delta of right near 18m with the patch, and a varying but always larger amount between 25-41m without it. |
I wonder why do adapters return strings instead of symbols. |
As @jeremyevans points out, generating columns.size strings is considerably less GC pressure than columns.size * result_set.size. These strings are the keys to the attributes hash, so I don't think we need to worry about users expecting to be able to do something crazy like @attributes.keys.map(&:upcase!) This seems like a complete no brainer to me, the only change I'd suggest is just interning the strings to symbols but that's something that could surprise users. |
Note that you can't upcase! hash string keys in ruby anyway, even if the string used as a key wasn't frozen:
This is because ruby uses a frozen dup of the string as the hash key if the string isn't already frozen, so pretty much all string hash keys in ruby are already frozen. |
Freeze columns before using them as hash keys
I've moved the code to hash_rows methods and added a comment 2068d30 |
Doesn't that change it by calling @columns.map { |c| c.freeze } once per row rather than once per result? |
Ouch yeah sorry. Fixed da400fb |
This reduces the number of allocated strings from columns * (rows + 1) to just columns. This should fix #7629.