Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance of CollectionAssociation replace_on_target and include #46652

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jeffcarbs
Copy link

@jeffcarbs jeffcarbs commented Dec 6, 2022

Motivation / Background

In some production timing traces we noticed large gaps where code that looked like order.items = items (i.e. setting an association) seemed to be hanging. In this instance the association had ~7k items and it hung for around 30 seconds. Even more surprising is that no new records were being added/removed - it was essentially order.items = order.items.

In profiling the code, we uncovered an n-squared lookup that this PR fixes. The high-level results in iterations/second:

  • ~roughly same for collections of size 10
  • 5x improvement for collections of size 100
  • 50x (🚀) improvement for collections of size 1000

Detail

The n-squared lookup was due to how the code determined the index of an existing record. We looped through every record we were trying to set that was in the existing list, and then tried to find its index using target.index(record) which would potentially have to scan the entire array.

This PR introduces a method for finding the index of a record that will build a hash of record-to-index on the first call so subsequent calls are constant time hash lookups.

I also noticed the same array scan behavior in order.items.include?(item) that could trivially be replaced with the same index_for_record method so this PR updates that as well. For include the speed improvements were:

  • 7x for collections of size 10
  • 68x for collections of size 100
  • 690x for collections of size 1000

See the associated benchmarks

Benchmark script for `replace_on_target`
require "bundler/setup"
require "active_record"
begin
  require "benchmark/ips"
rescue LoadError
  raise LoadError, "Run install `gem install benchmark-ips`"
end

ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")
ActiveRecord::Base.connection.create_table(:orders) { |t| }
ActiveRecord::Base.connection.create_table(:items) { |t| t.references :order, null: false }

class Order < ActiveRecord::Base
  has_many :items
  accepts_nested_attributes_for :items
end

class Item < ActiveRecord::Base
end

Benchmark.ips do |x|
  order_10 = Order.create
  Item.insert_all(Array.new(10, { order_id: order_10.id }))
  order_10.reload

  order_100 = Order.create
  Item.insert_all(Array.new(100, { order_id: order_100.id }))
  order_100.reload

  order_1000 = Order.create
  Item.insert_all(Array.new(1000, { order_id: order_1000.id }))
  order_1000.reload

  x.report("Assoc size   10") do
    order_10.items.reload
    order_10.items = order_10.items
  end

  x.report("Assoc size  100") do
    order_100.items.reload
    order_100.items = order_100.items
  end

  x.report("Assoc size 1000") do
    order_1000.items.reload
    order_1000.items = order_1000.items
  end
end

Before

~[main] $ ruby benchmark_replace_on_target.rb
Warming up --------------------------------------
     Assoc size   10   461.000  i/100ms
     Assoc size  100    29.000  i/100ms
     Assoc size 1000     1.000  i/100ms
Calculating -------------------------------------
     Assoc size   10      4.636k (± 2.6%) i/s -     23.511k in   5.074951s
     Assoc size  100    285.265  (± 3.2%) i/s -      1.450k in   5.089038s
     Assoc size 1000      3.361  (± 0.0%) i/s -     17.000  in   5.058245s

After

~[collection-association-performance] $ ruby benchmark_replace_on_target.rb
Warming up --------------------------------------
     Assoc size   10   493.000  i/100ms
     Assoc size  100   147.000  i/100ms
     Assoc size 1000    16.000  i/100ms
Calculating -------------------------------------
     Assoc size   10      4.982k (± 1.6%) i/s -     25.143k in   5.048535s
     Assoc size  100      1.513k (± 4.9%) i/s -      7.644k in   5.065432s
     Assoc size 1000    168.608  (± 6.5%) i/s -    848.000  in   5.051481s

Speed improvements:

  • Size 10: roughly same
  • Size 100: ~5x improvement
  • Size 1000: ~50x improvement
Benchmark script for `include?`
require "bundler/setup"
require "active_record"
begin
  require "benchmark/ips"
rescue LoadError
  raise LoadError, "Run install `gem install benchmark-ips`"
end

ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")
ActiveRecord::Base.connection.create_table(:orders) { |t| }
ActiveRecord::Base.connection.create_table(:items) { |t| t.references :order, null: false }

class Order < ActiveRecord::Base
  has_many :items
  accepts_nested_attributes_for :items
end

class Item < ActiveRecord::Base
end

Benchmark.ips do |x|
  order_10 = Order.create
  Item.insert_all(Array.new(10, { order_id: order_10.id }))
  order_10.reload

  order_100 = Order.create
  Item.insert_all(Array.new(100, { order_id: order_100.id }))
  order_100.reload

  order_1000 = Order.create
  Item.insert_all(Array.new(1000, { order_id: order_1000.id }))
  order_1000.reload

  # Use item that's definitely not in the order as worst case
  non_existent_item = Item.create(order_id: Order.create.id)

  # Load association so we use an in-memory lookup
  order_10.items.load_target
  x.report("Assoc size   10") do
    order_10.items.include?(non_existent_item)
  end

  # Load association so we use an in-memory lookup
  order_100.items.load_target
  x.report("Assoc size  100") do
    order_100.items.include?(non_existent_item)
  end

  # Load association so we use an in-memory lookup
  order_1000.items.load_target
  x.report("Assoc size 1000") do
    order_1000.items.include?(non_existent_item)
  end
end

Before

~[main] $ ruby benchmark_include.rb
Warming up --------------------------------------
     Assoc size   10    14.311k i/100ms
     Assoc size  100     1.543k i/100ms
     Assoc size 1000   155.000  i/100ms
Calculating -------------------------------------
     Assoc size   10    144.877k (± 1.0%) i/s -    729.861k in   5.038337s
     Assoc size  100     15.434k (± 1.1%) i/s -     78.693k in   5.099425s
     Assoc size 1000      1.538k (± 1.4%) i/s -      7.750k in   5.039121s

After

~[collection-association-performance] $ ruby benchmark_include.rb
Warming up --------------------------------------
     Assoc size   10   107.222k i/100ms
     Assoc size  100   105.796k i/100ms
     Assoc size 1000   104.449k i/100ms
Calculating -------------------------------------
     Assoc size   10      1.068M (± 1.1%) i/s -      5.361M in   5.020763s
     Assoc size  100      1.065M (± 0.8%) i/s -      5.396M in   5.067783s
     Assoc size 1000      1.058M (± 1.5%) i/s -      5.327M in   5.035916s

Speed improvements:

  • Size 10: ~7x
  • Size 100: ~68x improvement
  • Size 1000: ~690x improvement

Checklist

Before submitting the PR make sure the following are checked:

  • This Pull Request is related to one change. Changes that are unrelated should be opened in separate PRs.
  • Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex: [Fix #issue-number]
  • Tests are added or updated if you fix a bug or add a feature.
  • CHANGELOG files are updated for the changed libraries if there is a behavior change or additional feature. Minor bug fixes and documentation changes should not be included.

@@ -516,6 +519,11 @@ def find_by_scan(*args)
load_target.select { |r| ids.include?(r.id.to_s) }
end
end

def index_in_target(record)
@target_index_map ||= @target.each_with_object({}).with_index { |(r, memo), i| memo[r] ||= i }
Copy link
Author

@jeffcarbs jeffcarbs Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code this was replacing was @target.index(record) which would return the first index for duplicate records so this implementation keeps that behavior.

@@ -465,6 +467,7 @@ def replace_on_target(record, skip_callbacks, replace:, inversing: false)
target[index] = record

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we set @target_index_map[record] here too?

There's also a few places in this file where @target is set directly, but it's not clear to me if @target_index_map actually needs to be reset in those cases.

Copy link
Author

@jeffcarbs jeffcarbs Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two places in this method where index could have been set both look in @target_index_map[record] so there's no need to update it.

There's two other memoized objects, @association_ids and @replaced_or_added_targets so I added updating/resetting of @target_index_map to wherever we update/reset those, which is the reset, remove_records, and replace_on_target methods.

Looking through the control flow here I think that's all that's needed. The two places where @target is set directly is:

  • load_target where the association would've either never been loaded or would have been reset
  • replace_records, where we first call delete which calls reset. Actually, looking at it, I think it's possible for association_ids to also get into a bad state in that method. Going to try to repro with a test.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think association_ids is OK because we null it out. For @target_index_map we need to be more careful since we update it in place in replace_on_target. We could also just null it out there, but that would produce an n-squared when writing new records which wouldn't be ideal for large collections.

I think the safest thing to do is just reset the target_index_map whenever we explicitly set target, which is only those two place so it seems fine to do. This way there's no footgun lying around and you don't have to make any assumptions on when the collection would've last been reset.

@ghiculescu
Copy link
Member

This reminds me of a similar regression we saw when upgrading to 6.1. Just sharing in case that fix is helpful: #42524

@byroot
Copy link
Member

byroot commented Dec 8, 2022

Please squash your commits.

@jeffcarbs jeffcarbs force-pushed the collection-association-performance branch from 590887e to 04b404e Compare December 8, 2022 14:15
@jeffcarbs jeffcarbs force-pushed the collection-association-performance branch from 04b404e to 7cb55dc Compare December 8, 2022 14:17
@jeffcarbs
Copy link
Author

I squashed the commits into one (including the suggested change) and just pushed it up. I did make one other change, which was for the usages of index_in_target inside the include? methods I used !! to ensure it returns a boolean.

@jonathanhefner
Copy link
Member

jonathanhefner commented Dec 8, 2022

  • 100x for collections of size 10
  • 4300x for collections of size 100
  • 460000x for collections of size 1000

These numbers are inaccurate due to the times block parameter in your benchmark scripts, e.g.:

  x.report("Assoc size 1000") do |times|
    order_1000.items.reload
    order_1000.items = order_1000.items
  end

When the benchmark block accepts an argument, benchmark-ips expects the block to execute the code the given number of times.

Would you mind re-running the benchmarks without the block parameters?

Also, does your first benchmark represent a real world use case? I had been trying to refactor some of this code in #46652. I haven't merged that PR yet (because it caused a failure in Shopify's test suite that has yet to be investigated), but it has a benchmark script based on the performance concerns from #42524. Would you mind trying that benchmark with your code? (You may need to add a few require statements; I ran it using bin/rails r with a fresh app pointing at the appropriate Rails branch.)

@jeffcarbs
Copy link
Author

jeffcarbs commented Dec 8, 2022

@jonathanhefner - Thanks for the reply and pointing out the issues with the benchmarks. I'll update my scripts and post the new benchmarks.

Also, in your comment I think you may have referenced the wrong PR? You said:

I had been trying to refactor some of this code in #46652

which is this PR. Did you mean to link #45399 by chance or maybe a different one? Or maybe I misunderstood the comment.

Also, does your first benchmark represent a real world use case

Do you mean does order.items = items on a huge collection represent a real-world use-case? If so, the answer is yes for us. We have a job that runs to ensure the correct "bars" are associated with the right "foo", so it gathers up the correct set of "bars" and does foo.bars = bars and in production this collection can be upwards of 10k things. In most cases, the result will just be a no-op or adding/removing a few records since the process runs frequently as the "bars" are modified.

Given this performance issue we uncovered, we've refactored that code to manually create/remove the join records since we can use bulk operations like insert_all and delete_all (would love if rails had a built-in way to bulk update join records 🙂)

@jeffcarbs
Copy link
Author

@jonathanhefner - I re-ran the benchmarks and updated the PR description with the correct scripts and results. Thanks again for pointing out the mistake with the block variable, billion-x improvements were definitely suspicious 😅.

Speed improvements for replace_on_target:

  • Size 10: roughly same
  • Size 100: ~5x improvement
  • Size 1000: ~50x improvement

Speed improvements for replace_on_target:

  • Size 10: ~7x
  • Size 100: ~68x improvement
  • Size 1000: ~690x improvement

I tried running the benchmarks from #45399, here are my results:

Results with main branch:

~ [main] $ ruby benchmark_from_45399.rb
-- create_table(:authors, {:force=>true})
   -> 0.0103s
-- create_table(:posts, {:force=>true})
   -> 0.0002s
has_many_inversing == false, posts.count == 1     => `posts.build` @ 0.043ms
has_many_inversing == false, posts.count == 10    => `posts.build` @ 0.043ms
has_many_inversing == false, posts.count == 100   => `posts.build` @ 0.046ms
has_many_inversing == false, posts.count == 1000  => `posts.build` @ 0.041ms
has_many_inversing == true, posts.count == 1      => `posts.build` @ 0.041ms
has_many_inversing == true, posts.count == 10     => `posts.build` @ 0.040ms
has_many_inversing == true, posts.count == 100    => `posts.build` @ 0.039ms
has_many_inversing == true, posts.count == 1000   => `posts.build` @ 0.041ms
has_many_inversing == false, posts.count == 1     => `posts << post` @ 0.063ms
has_many_inversing == false, posts.count == 10    => `posts << post` @ 0.066ms
has_many_inversing == false, posts.count == 100   => `posts << post` @ 0.066ms
has_many_inversing == false, posts.count == 1000  => `posts << post` @ 0.068ms
has_many_inversing == true, posts.count == 1      => `posts << post` @ 0.066ms
has_many_inversing == true, posts.count == 10     => `posts << post` @ 0.067ms
has_many_inversing == true, posts.count == 100    => `posts << post` @ 0.067ms
has_many_inversing == true, posts.count == 1000   => `posts << post` @ 0.068ms

Results with this branch:

~ [collection-association-performance %] $ ruby benchmark_from_45399.rb
-- create_table(:authors, {:force=>true})
   -> 0.0112s
-- create_table(:posts, {:force=>true})
   -> 0.0003s
has_many_inversing == false, posts.count == 1     => `posts.build` @ 0.043ms
has_many_inversing == false, posts.count == 10    => `posts.build` @ 0.044ms
has_many_inversing == false, posts.count == 100   => `posts.build` @ 0.045ms
has_many_inversing == false, posts.count == 1000  => `posts.build` @ 0.041ms
has_many_inversing == true, posts.count == 1      => `posts.build` @ 0.038ms
has_many_inversing == true, posts.count == 10     => `posts.build` @ 0.040ms
has_many_inversing == true, posts.count == 100    => `posts.build` @ 0.040ms
has_many_inversing == true, posts.count == 1000   => `posts.build` @ 0.042ms
has_many_inversing == false, posts.count == 1     => `posts << post` @ 0.065ms
has_many_inversing == false, posts.count == 10    => `posts << post` @ 0.067ms
has_many_inversing == false, posts.count == 100   => `posts << post` @ 0.064ms
has_many_inversing == false, posts.count == 1000  => `posts << post` @ 0.066ms
has_many_inversing == true, posts.count == 1      => `posts << post` @ 0.063ms
has_many_inversing == true, posts.count == 10     => `posts << post` @ 0.067ms
has_many_inversing == true, posts.count == 100    => `posts << post` @ 0.064ms
has_many_inversing == true, posts.count == 1000   => `posts << post` @ 0.070ms

Also, just to have more data and be able to more easily compare, I pulled down the jonathanhefner:collection_association-refactor-inversing-logic branch and ran all of the benchmarks (performance similar to main):

~[(77889e87c0...) *%] $ ruby benchmark_replace_on_target.rb
Warming up --------------------------------------
     Assoc size   10   465.000  i/100ms
     Assoc size  100    28.000  i/100ms
     Assoc size 1000     1.000  i/100ms
Calculating -------------------------------------
     Assoc size   10      4.702k (± 1.5%) i/s -     23.715k in   5.045185s
     Assoc size  100    287.962  (± 0.7%) i/s -      1.456k in   5.056606s
     Assoc size 1000      3.324  (± 0.0%) i/s -     17.000  in   5.116352s

~[(77889e87c0...) *%] $ uby benchmark_include.rb
Warming up --------------------------------------
     Assoc size   10    14.127k i/100ms
     Assoc size  100     1.531k i/100ms
     Assoc size 1000   152.000  i/100ms
Calculating -------------------------------------
     Assoc size   10    142.562k (± 1.0%) i/s -    720.477k in   5.054247s
     Assoc size  100     15.351k (± 0.8%) i/s -     78.081k in   5.086709s
     Assoc size 1000      1.529k (± 1.2%) i/s -      7.752k in   5.070870s

~[(77889e87c0...) *%] $ ruby benchmark_from_45399.rb
-- create_table(:authors, {:force=>true})
   -> 0.0106s
-- create_table(:posts, {:force=>true})
   -> 0.0002s
has_many_inversing == false, posts.count == 1     => `posts.build` @ 0.039ms
has_many_inversing == false, posts.count == 10    => `posts.build` @ 0.038ms
has_many_inversing == false, posts.count == 100   => `posts.build` @ 0.039ms
has_many_inversing == false, posts.count == 1000  => `posts.build` @ 0.042ms
has_many_inversing == true, posts.count == 1      => `posts.build` @ 0.038ms
has_many_inversing == true, posts.count == 10     => `posts.build` @ 0.035ms
has_many_inversing == true, posts.count == 100    => `posts.build` @ 0.036ms
has_many_inversing == true, posts.count == 1000   => `posts.build` @ 0.043ms
has_many_inversing == false, posts.count == 1     => `posts << post` @ 0.067ms
has_many_inversing == false, posts.count == 10    => `posts << post` @ 0.065ms
has_many_inversing == false, posts.count == 100   => `posts << post` @ 0.065ms
has_many_inversing == false, posts.count == 1000  => `posts << post` @ 0.070ms
has_many_inversing == true, posts.count == 1      => `posts << post` @ 0.065ms
has_many_inversing == true, posts.count == 10     => `posts << post` @ 0.068ms
has_many_inversing == true, posts.count == 100    => `posts << post` @ 0.066ms
has_many_inversing == true, posts.count == 1000   => `posts << post` @ 0.070ms

@jonathanhefner
Copy link
Member

which is this PR. Did you mean to link #45399 by chance or maybe a different one? Or maybe I misunderstood the comment.

Yes, that's the one! Sorry, I was juggling tabs and copy-pasted the wrong link.

We have a job that runs to ensure the correct "bars" are associated with the right "foo", so it gathers up the correct set of "bars" and does foo.bars = bars and in production this collection can be upwards of 10k things. In most cases, the result will just be a no-op or adding/removing a few records since the process runs frequently as the "bars" are modified.

In that case, maybe an optimization local to replace or replace_common_records_in_memory could help? This is just a sketch:

index e1125ff8dc..f84d735afc 100644
--- a/activerecord/lib/active_record/associations/collection_association.rb
+++ b/activerecord/lib/active_record/associations/collection_association.rb
@@ -414,10 +414,13 @@ def replace_records(new_target, original_target)
         end
 
         def replace_common_records_in_memory(new_target, original_target)
-          common_records = intersection(new_target, original_target)
-          common_records.each do |record|
-            skip_callbacks = true
-            replace_on_target(record, skip_callbacks, replace: true)
+          original_target_indexes = original_target.each_with_index.to_h
+
+          new_target.each do |record|
+            next unless (index = original_target_indexes[record])
+
+            set_inverse_instance(record)
+            target[index] = record
           end
         end

That ignores @replaced_or_added_targets. It might need a @replaced_or_added_targets.merge(new_target), but I'm not 100% sure. (With #45399, I think an additional @implicit_target_records = nil in replace would suffice.)

I'm just wary of introducing more complexity to this file. In particular, maintaining the invariants of @target_index_map. If, in the future, new code fails to set @target_index_map = nil as necessary, that might lead to subtle bugs. I realize we kind of already have this problem with @replaced_or_added_targets (or @implicit_target_records in #45399), but less is better. 😆

Also, you might find that a local optimization improves performance even more.

@jeffcarbs
Copy link
Author

I'm just wary of introducing more complexity to this file. In particular, maintaining the invariants of @target_index_map

Totally hear you on that. Another thought was to make target a more sophisticated array-like object that had more efficient index and include? methods, but otherwise looked like a typical array. That would be an even wider reaching change.

The more localized change you suggested is interesting. I'll play around with that approach and see what is looks like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants