New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use hash lookup for deleting existing associations from `target` #29939

Merged
merged 1 commit into from Jun 11, 2018

Conversation

Projects
None yet
5 participants
@arthurchui
Contributor

arthurchui commented Jul 25, 2017

Summary

Array#delete searches for all occurrences in the target array. When performing dependent: :destroy in active_record/associations/collection_association, the loop becomes O(N^2). It is particularly slow when destroying large amount of associations (e.g. 10K records).

Either Hash or Set can optimize the loop to O(N). Hash is slightly faster in this usage.

Benchmark

class Dummy; end

num = 10_000
test1a = num.times.map { Dummy.new }; nil
test1b = test1a.dup
test2a = num.times.map { Dummy.new }; nil
test2b = test2a.dup

Benchmark.ips do |x|
  x.config(stats: :bootstrap, confidence: 95)

  x.report("hash") do
    hash = test1a.group_by { |r| r }
    test1b.select! { |r| !hash[r] }
  end
  x.report("array")  do
    test2a.each { |r| test2b.delete(r) }
  end
  x.compare!
end
Calculating -------------------------------------
                hash    11.000  i/100ms
               array     1.000  i/100ms
-------------------------------------------------
                hash    114.586  (±16.6%) i/s -    561.000
               array      1.710k (±10.3%) i/s -      8.377k

Comparison:
               array:     1710.4 i/s
                hash:      114.6 i/s - 14.93x slower
Use hash lookup for deleting existing associations from `target`
`Array#delete` searches for all occurrences in the `target` array. When performing `dependent: :destroy` in active_record/associations/collection_association, the loop becomes O(N^2). It is particularly slow when destroying large amount of associations (e.g. 10K records).

Either `Hash` or `Set` can optimize the loop to O(N). `Hash` is slightly faster in this simple usage.

```ruby
class Dummy; end

num = 10_000
test1a = num.times.map { Dummy.new }; nil
test1b = test1a.dup
test2a = num.times.map { Dummy.new }; nil
test2b = test2a.dup

Benchmark.ips do |x|
  x.config(stats: :bootstrap, confidence: 95)

  x.report("hash") do
    hash = test1a.group_by { |r| r }
    test1b.select! { |r| !hash[r] }
  end
  x.report("array")  do
    test2a.each { |r| test2b.delete(r) }
  end
  x.compare!
end
```

```
Calculating -------------------------------------
                hash    11.000  i/100ms
               array     1.000  i/100ms
-------------------------------------------------
                hash    114.586  (±16.6%) i/s -    561.000
               array      1.710k (±10.3%) i/s -      8.377k

Comparison:
               array:     1710.4 i/s
                hash:      114.6 i/s - 14.93x slower
```
@@ -392,7 +392,8 @@ def remove_records(existing_records, records, method)
records.each { |record| callback(:before_remove, record) }
delete_records(existing_records, method) if existing_records.any?
records.each { |record| target.delete(record) }
hashed_records = records.group_by { |record| record }

This comment has been minimized.

@sgrif

sgrif Jul 25, 2017

Member

How about just self.target -= records?

@sgrif

This comment has been minimized.

Member

sgrif commented Jul 25, 2017

We don't generally accept performance patches that show up only in a microbenchmark. Can you demonstrate the change in performance that occurs when this is called on an actual association? (e.g. user.posts.delete(some, records, here)?

@gmcgibbon

This comment has been minimized.

Member

gmcgibbon commented May 6, 2018

I've never benchmarked anything before in ruby but here's what I saw:

On 6.0.0.alpha without patch:

Warming up --------------------------------------
 deleting 50 records     9.000  i/100ms
Calculating -------------------------------------
 deleting 50 records     98.827  (± 1.5%) i/s -    504.000  in   5.117731s
                   with 95.0% confidence

On 6.0.0.alpha with PR'd patch:

Warming up --------------------------------------
 deleting 50 records    51.000  i/100ms
Calculating -------------------------------------
 deleting 50 records    548.534  (± 1.2%) i/s -      2.754k in   5.029883s
                   with 95.0% confidence

On 6.0.0.alpha with self.target -= records patch:

Warming up --------------------------------------
 deleting 50 records    55.000  i/100ms
Calculating -------------------------------------
 deleting 50 records    585.075  (± 2.2%) i/s -      2.915k in   5.024079s
                   with 95.0% confidence

I used this template. Am I doing it right?

@zetter

This comment has been minimized.

Contributor

zetter commented Jun 11, 2018

To give a real-world example, I recently experienced this performance problem on a project I work on (Rails 5.1.6, Ruby 2.5.1). I found that with this patch it takes about a quarter of the time to delete 18k records.

More details:

In our application a user can like comments. If a user requests their information to be removed we destroy their user account and any likes they have authored.

class User < ActiveRecord::Base
  has_many :authored_likes, class_name: 'Like', dependent: :destroy
end

class Like < ActiveRecord::Base
end

Some of our users have thousands of likes. In the case of a user with 18,600 likes, calling user.destroy takes 12mins.

After the patch is applied, it takes about 3min. The majority of that time is spent deleting the records in the database, the lines that this patch deals with run in less than a second. This is the case for both the code in the commit, and the suggested alternative of self.target -= records.

The new EU General Data Protection Regulation (GDPR) mean that it's likely that users of Rails find themselves needing to delete user data more often or wanting it easier for users to delete their own data. Improving the performance of dependent: :destroy will certainly help with that.

@rafaelfranca rafaelfranca merged commit 3466795 into rails:master Jun 11, 2018

1 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build could not complete due to an error
Details
codeclimate All good!
Details

rafaelfranca added a commit that referenced this pull request Jun 11, 2018

Merge pull request #29939 from arthurchui/activerecord-delete-associa…
…tions-loop

Use hash lookup for deleting existing associations from `target`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment