Optimize AR::Relation#pluck #13051

Closed
wants to merge 1 commit into
from

Projects

None yet

8 participants

@bogdan
Contributor
bogdan commented Nov 26, 2013

Less intermediate arrays and hashes, less lines of code.

@rafaelfranca
Member

Where is the benchmark?

@lukaszx0
Member

Do you have any benchmarks?

@josevalim
Member

Yup, there is a chance this commit actually makes things slower. Basically it removes the zip (and the allocation that comes as consequence of the zip) but it does the column lookup for every record where previously it was done just once.

@bogdan
Contributor
bogdan commented Nov 27, 2013

https://gist.github.com/7675628

                    user     system      total        real
-----------------------------pluck 1 columns and 10 records
After patch:    0.120000   0.000000   0.120000 (  0.125344)
Before patch:   0.130000   0.010000   0.140000 (  0.136894)
Improvement: 8%

-----------------------------pluck 2 columns and 10 records
After patch:    0.150000   0.010000   0.160000 (  0.157452)
Before patch:   0.160000   0.000000   0.160000 (  0.171012)
Improvement: 8%

-----------------------------pluck 3 columns and 10 records
After patch:    0.170000   0.000000   0.170000 (  0.178856)
Before patch:   0.190000   0.010000   0.200000 (  0.193022)
Improvement: 7%

----------------------------pluck 1 columns and 100 records
After patch:    0.030000   0.000000   0.030000 (  0.028619)
Before patch:   0.030000   0.000000   0.030000 (  0.036457)
Improvement: 21%

----------------------------pluck 2 columns and 100 records
After patch:    0.040000   0.000000   0.040000 (  0.037713)
Before patch:   0.050000   0.000000   0.050000 (  0.047892)
Improvement: 21%

----------------------------pluck 3 columns and 100 records
After patch:    0.040000   0.010000   0.050000 (  0.042278)
Before patch:   0.050000   0.010000   0.060000 (  0.053374)
Improvement: 21%

---------------------------pluck 1 columns and 1000 records
After patch:    0.020000   0.000000   0.020000 (  0.017872)
Before patch:   0.020000   0.000000   0.020000 (  0.024972)
Improvement: 28%

---------------------------pluck 2 columns and 1000 records
After patch:    0.020000   0.000000   0.020000 (  0.024940)
Before patch:   0.040000   0.000000   0.040000 (  0.033943)
Improvement: 27%

---------------------------pluck 3 columns and 1000 records
After patch:    0.030000   0.000000   0.030000 (  0.027810)
Before patch:   0.030000   0.000000   0.030000 (  0.039015)
Improvement: 29%

--------------------------pluck 1 columns and 10000 records
After patch:    0.010000   0.000000   0.010000 (  0.016477)
Before patch:   0.030000   0.000000   0.030000 (  0.024243)
Improvement: 32%

--------------------------pluck 2 columns and 10000 records
After patch:    0.030000   0.000000   0.030000 (  0.023728)
Before patch:   0.030000   0.010000   0.040000 (  0.033059)
Improvement: 28%

--------------------------pluck 3 columns and 10000 records
After patch:    0.020000   0.000000   0.020000 (  0.026993)
Before patch:   0.030000   0.000000   0.030000 (  0.038090)
Improvement: 29%
@lukaszx0
Member

It would be nice to mention those results at least in one sentence (maybe with link to gist) in commit message

@jeremy
Member
jeremy commented Nov 27, 2013

Looks good @bogdan! +1 to include some concise benchmark results in the commit message

@bogdan bogdan Optimize AR::Relation#pluck
https://gist.github.com/bogdan/7675628
Around 7% boost on pluck 10 records
Around 20% boost on pluck 100 records
Around 25% boost on pluck 1000 records
Around 30% boost on pluck 10000 records
d9b34e9
@bogdan
Contributor
bogdan commented Nov 27, 2013
@lukaszx0
Member

Good job! 👍

@tenderlove
Member

it does the column lookup for every record where previously it was done just once.

Agree. We could probably remove the multiple fetch and just do this || statement, but this change makes the code do more work.

@tenderlove tenderlove closed this Nov 27, 2013
@bogdan
Contributor
bogdan commented Nov 28, 2013

@tenderlove sorry, but I don't understand why it was closed.

I have fixes concern about columns loading and provided a benchmark that shows 30% performance boost - even more than I would expect.

@tenderlove
Member

Can you do it in iterations per second? Your benchmark shows "30%", but the total difference in time is extremely small. This 30% could merely be noise.

Aaron Patterson
http://tenderlovemaking.com/
I'm on an iPhone so I apologize for top posting.

On Nov 28, 2013, at 3:26 AM, Bogdan Gusiev notifications@github.com wrote:

@tenderlove sorry, but I don't understand why it was closed.

I have fixes concern about columns loading and provided a benchmark that shows 30% performance boost - even more than I would expect.


Reply to this email directly or view it on GitHub.

@bogdan
Contributor
bogdan commented Nov 28, 2013

Maybe you distrust it because I didn't explain where the boost is coming from:

In current master the main typecasting loop is done using AR::Result#map method that as Enumerable proxies the work to #each: https://github.com/rails/rails/blob/master/activerecord/lib/active_record/result.rb#L55
That calls #hash_rows https://github.com/rails/rails/blob/master/activerecord/lib/active_record/result.rb#L94 That is yet another data transformation.

After this patch #pluck iterates over AR::Result#rows that doesn't need that #hash_rows transformation.
https://github.com/bogdan/rails/blob/d9b34e9b38f140e19aff80c3bcc2e10aae627f3e/activerecord/lib/active_record/relation/calculations.rb#L163

Anyway I'll provide iterations per second benchmark tomorrow if it still seems a noise for you.

@bogdan
Contributor
bogdan commented Dec 2, 2013

Iterational benchmark:

https://gist.github.com/7750221

After Patch

Groups:10000
Calculating -------------------------------------
pluck 1 columns and 10 records
                           686 i/100ms
pluck 1 columns and 100 records
                           328 i/100ms
pluck 1 columns and 1000 records
                            52 i/100ms
pluck 1 columns and 10000 records
                             4 i/100ms
-------------------------------------------------
pluck 1 columns and 10 records
                         6814.3 (±3.6%) i/s -      34300 in   5.040195s
pluck 1 columns and 100 records
                         2596.3 (±14.0%) i/s -      12792 in   5.098327s
pluck 1 columns and 1000 records
                          305.3 (±20.3%) i/s -       1456 in   5.037526s
pluck 1 columns and 10000 records
                           24.9 (±36.1%) i/s -        108 in   5.143262s

Before patch

Calculating -------------------------------------
pluck 1 columns and 10 records
                           654 i/100ms
pluck 1 columns and 100 records
                           258 i/100ms
pluck 1 columns and 1000 records
                            36 i/100ms
pluck 1 columns and 10000 records
                             3 i/100ms
-------------------------------------------------
pluck 1 columns and 10 records
                         6297.5 (±2.2%) i/s -      32046 in   5.091279s
pluck 1 columns and 100 records
                         1964.6 (±23.6%) i/s -       9030 in   5.090748s
pluck 1 columns and 1000 records
                          161.1 (±44.7%) i/s -        684 in   5.917276s
pluck 1 columns and 10000 records
                            7.9 (±38.1%) i/s -         36 in   5.439820s
@josevalim josevalim reopened this Dec 4, 2013
@josevalim
Member
@tenderlove tenderlove commented on the diff Dec 4, 2013
activerecord/lib/active_record/relation/calculations.rb
- columns.zip(values).map { |column, value| column.type_cast value }
+ columns = data.columns.map do |key|
+ klass.column_types[key] || data.column_types[key] || data.identity_type
+ end
+ coders = data.columns.map do |key|
+ klass.serialized_attributes[key]
+ end
+
+ result = data.rows.map do |attributes|
+ attributes.map.with_index do |value, index|
+ if coder = coders[index]
+ value = AttributeMethods::Serialization::Attribute.new(coder, value, :serialized)
@tenderlove
tenderlove Dec 4, 2013 Member

What's this for?

@bogdan
bogdan Dec 5, 2013 Contributor

By some wild reason pluck support automatic conversion of serialized attributes: https://github.com/rails/rails/blob/master/activerecord/test/cases/calculations_test.rb#L502

Previously it was held inside initialize_attributes but I can not call it directly. I tried to extract required method from https://github.com/rails/rails/blob/master/activerecord/lib/active_record/attribute_methods/serialization.rb#L110 but no shared method variant was looking better.

@szimek
Contributor
szimek commented Feb 15, 2014

Is there still something to be done with this issue or could it be merged?

@sgrif sgrif was assigned by rails-bot Oct 20, 2015
@sgrif
Member
sgrif commented Oct 30, 2015

Feel free to re-open if there's still an optimization to be made here, but most of the code this touches has changed significantly and this PR is no longer valid.

@sgrif sgrif closed this Oct 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment