Skip to content


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP


Major performance regression when preloading has_many_through association #12537

njakobsen opened this Issue · 13 comments

4 participants

Nicholas Jakobsen Aaron Patterson Ruby on Rails Bot Rafael Mendonça França
Nicholas Jakobsen

Between Rails 3.2 and 4.1.0 beta, the performance of preloading a has_many_through association has declined. A commit, #12090, brought performance back to the range of one third the speed of Rails 3.2. However, between a3dd738 and the current codebase, the performance has again dropped approximately 5.5 times.

Here are some benchmarks.

                                         3.2.14 - postgresql
                                              user     system      total        real
limit 100 has_many                         0.070000   0.000000   0.070000 (  0.086018)
limit 100 has_many_through                 0.030000   0.000000   0.030000 (  0.034133)
limit 100 double has_many_through          0.040000   0.010000   0.050000 (  0.057401)
limit 1000 has_many                        0.110000   0.000000   0.110000 (  0.122771)
limit 1000 has_many_through                0.230000   0.010000   0.240000 (  0.249680)
limit 1000 double has_many_through         0.460000   0.000000   0.460000 (  0.488896)

                                         4.1.0.beta (a3dd738c38df634a46a2261a0de27fd31de7ae51) - postgresql
                                              user     system      total        real
limit 100 has_many                         0.060000   0.000000   0.060000 (  0.070222)
limit 100 has_many_through                 0.070000   0.010000   0.080000 (  0.075542)
limit 100 double has_many_through          0.090000   0.010000   0.100000 (  0.107504)
limit 1000 has_many                        0.130000   0.000000   0.130000 (  0.136422)
limit 1000 has_many_through                0.680000   0.010000   0.690000 (  0.719953)
limit 1000 double has_many_through         1.320000   0.010000   1.330000 (  1.343767)

                                         4.1.0.beta - postgresql
                                              user     system      total        real
limit 100 has_many                         0.020000   0.000000   0.020000 (  0.025983)
limit 100 has_many_through                 0.130000   0.000000   0.130000 (  0.140196)
limit 100 double has_many_through          0.130000   0.010000   0.140000 (  0.149074)
limit 1000 has_many                        0.140000   0.000000   0.140000 (  0.152584)
limit 1000 has_many_through                3.490000   0.030000   3.520000 (  3.542585)
limit 1000 double has_many_through         7.250000   0.030000   7.280000 (  7.390997)

To be clear, the issue does not occur using eager_load, only when using preload.

A quick look at the call graph shows invocations of hash[]= have jumped from 8865 to almost 9.4 million.

# Rails 3.2.14     0.19%     0.19%   0.01    0.01    0.00    0.00    8865        Hash#[]=
# Rails 4.1.0 beta 76.78%    15.66%  114.09  23.27   0.00    90.82   9384486     Hash#[]=

I've created a gist for other to repeat my tests in case I've missed something major.

Through a process of elimination, I identified a likely source of the regression to this commit e5299c1. All subsequent commits that I tested exhibited the slow performance, all preceding commits I tested did not. I know @tenderlove has been working on preloader recently, reducing the number of objects that are created, perhaps he can shed some light onto the major structural he's been applying, and how we might help optimize this section of code.

Aaron Patterson

Thanks for reporting this. I believe the regression is from these lines, which as basically for ensuring that the preloaded records maintain the same order that they had from the database. Removing this will fix the regression, but break some tests. I'm trying to figure out a better algorithm now. :-(

Aaron Patterson

I've pushed a commit that seems to mostly fix the performance (for me anyway). Here's the test I used. I still think there's a better way to write this code such that I don't have to manually calculate offsets like this, but I haven't figured it out yet. :-(

Nicholas Jakobsen

:heart: That fixed it for me; a much more reasonable 72k invocations of hash[]= now. I have an app that does a lot of eager/preloading as it indexes for solr, so I've been trying to follow along with your commits to see how close to Rails 3.2 speeds they get. Unfortunately I'm not familiar enough with the code to find anything but the major hotspots. Closing the remaining gap looks like it'll come from micro-optimizations. I'll keep the issue open while you're still working on a beautiful solution. Feel free to close it if you're done.

Aaron Patterson

@njakobsen I think I've found a better algorithm for doing this. Right now, we walk down the tree, left, center, then right. We're trying to build a hash with the left records as keys, and the right records as values, but that loses the ordering. If we could go from the right back to the left, we could keep the ordering from the db and eliminate this "indexing" rigamarole. I'm working on a spike now.

Really, thank you very much for reporting this. I was worried that my changes had perf impact, but didn't have a good way to test.

Edit: just to clarify, the reason the right ordering is lost is because the center records could be ordered in any order. We can't simply loop over the center records asking them for the RHS records as the center records will not share order with the RHS records.

Nicholas Jakobsen

I've had a chance to follow it through and I see your point. I tried a couple methods, but unfortunately ary & ary is slower, hash.values_at uses the order from the arguments, not the receiver, and using Set is worse than both those methods. Like you said, the solution probably lies in reversing the direction and going from the rhs => middle => lhs. I'll wait for you to put up your spike and see if any lightbulbs turn on in my brain.

Aaron Patterson

@njakobsen I spent about 4 hours on the spike and couldn't get it to pass the tests. I'm letting my brain rest for a bit before tackling again (though I think we should leave this ticket open until it's fixed).

Nicholas Jakobsen

I played around with organizing the incoming rhs records into a hash whose values are lists of the associated owners. Then we go through @preloaded_records and populate the records_by_owner hash in the correct order. Unfortunately this seems to have the same performance as your method. How extensive was your spike? I'm wondering if there's really any more performance to be gained in the few lines I tackled. About 25% of the total time was spent the section pasted below, but about 95% of the total time is spent in the entire function, so there's another 70% somewhere before it.

def associated_records_by_owner(preloader)
  middle_to_pl = preloaders.each_with_object({}) do |pl,h|
    pl.owners.each { |middle|
      h[middle] = pl
  # Changes begin here
  rhs_to_owners = {|h,rhs| h[rhs] = [] }

  through_records.each do |lhs,center|
    pl_to_middle = center.group_by { |record| middle_to_pl[record] }

    pl_to_middle.each do |pl, middles|
      middles.each do |mid|
        association = mid.association
        Array(association.reader).each {|rhs| rhs_to_owners[rhs] << lhs }

  records_by_owner = {|h,record| h[record] = [] }

  @preloaded_records.each do |rhs|
    rhs_to_owners[rhs].each do |owner|
      records_by_owner[owner] << rhs

  return records_by_owner
Aaron Patterson tenderlove referenced this issue from a commit
Aaron Patterson tenderlove Merge branch 'master' into joindep
* master: (44 commits)
  grammar fix (reverted in e9a1ecd)
  Revert "fixed a doc bug in the s/does no longer depend on/no longer depends on/"
  Add missed require making `enable_warnings` available
  Prepare generated Gemfile for Capistrano 3
  Added --model-name option scaffold_controller_generator.
  read the association instead of sending
  we should have unique sponsorable ids in the fixtures at least
  simplify populating the ordering hash
  the preloader for the RHS has all the preloaded records, so ask it
  only calculate offset index once. #12537
  Remove size alias for length validation
  Fix `singleton_class?`
  Minor Refactoring to `NumberHelper#number_to_human`
  `$SAFE = 4;` has been removed with Ruby 2.1
  scope_chain should not be mutated for other reflections
  Remove `default_primary_key_type`  and extract contains of `native_database_types` to a constant since they aren't conditional now in SQLite3Adapter. Makes it more like other adapters.
  cleanup changelog entry format. [ci skip]
  Extract a function to determine if the default value is a function
  Push default_function to superclass to avoid method check
  Dump the default function when the primary key is uuid

Nicholas Jakobsen njakobsen added the stale label
Ruby on Rails Bot

This issue has been automatically marked as stale because it has not been commented on for at least
three months.

The resources of the Rails team are limited, and so we are asking for your help.

If you can still reproduce this error on the 4-1-stable, 4-0-stable branches or on master,
please reply with all of the information you have about it in order to keep the issue open.

Thank you for all your contributions.

Rafael Mendonça França rafaelfranca modified the milestone: 4.2.0, 5.0.0
Rafael Mendonça França rafaelfranca removed the stale label
Rafael Mendonça França

Could you check the performance with current master?

Nicholas Jakobsen
                                         4.2.0.alpha - postgresql
                                               user     system      total        real
limit 100 has_many                         0.020000   0.000000   0.020000 (  0.019844)
limit 100 has_many_through                 0.060000   0.000000   0.060000 (  0.069341)
limit 100 double has_many_through          0.120000   0.000000   0.120000 (  0.125350)
limit 1000 has_many                        0.120000   0.010000   0.130000 (  0.136233)
limit 1000 has_many_through                0.670000   0.000000   0.670000 (  0.800994)
limit 1000 double has_many_through         1.260000   0.010000   1.270000 (  1.515393)

                                         3.2.18 - postgresql
                                               user     system      total        real
limit 100 has_many                         0.010000   0.000000   0.010000 (  0.011362)
limit 100 has_many_through                 0.020000   0.000000   0.020000 (  0.028196)
limit 100 double has_many_through          0.040000   0.000000   0.040000 (  0.045982)
limit 1000 has_many                        0.080000   0.000000   0.080000 (  0.132761)
limit 1000 has_many_through                0.190000   0.000000   0.190000 (  0.210479)
limit 1000 double has_many_through         0.370000   0.000000   0.370000 (  0.404050)
Rafael Mendonça França

@njakobsen could you use bechmark-ips so we can see the standard deviation?

Nicholas Jakobsen
ActiveRecord 3.2.18
Calculating -------------------------------------
limit 100 has_many                  7 i/100ms
limit 100 has_many_through          3 i/100ms
limit 100 double has_many_through   1 i/100ms
limit 1000 has_many                 1 i/100ms
limit 1000 has_many_through         1 i/100ms
limit 1000 double has_many_through  1 i/100ms
limit 100 has_many                  87.5 (±26.3%) i/s -        385 in   5.046093s
limit 100 has_many_through          37.7 (±23.9%) i/s -        171 in   5.057724s
limit 100 double has_many_through   22.9 (±21.8%) i/s -        105 in   5.021097s
limit 1000 has_many                 10.0 (±30.0%) i/s -         45 in   5.013717s
limit 1000 has_many_through          3.8 (±26.6%) i/s -         18 in   5.174370s
limit 1000 double has_many_through   2.1  (±0.0%) i/s -         10 in   5.061884s

ActiveRecord 4.2.0.alpha
Calculating -------------------------------------
limit 100 has_many                  4 i/100ms                                                                                                                                                                                       
limit 100 has_many_through          1 i/100ms                                                                                                                              
limit 100 double has_many_through   1 i/100ms                                                                                                         
limit 1000 has_many                 1 i/100ms                                                                                                                                                                                       
limit 1000 has_many_through         1 i/100ms                                                                                                                           
limit 1000 double has_many_through  1 i/100ms                                                                                                      
limit 100 has_many                  61.5 (±29.3%) i/s -       2008 in  40.109908s                                                                  
limit 100 has_many_through          12.2 (±24.6%) i/s -        444 in  40.047831s                     
limit 100 double has_many_through    6.7 (±29.8%) i/s -        251 in  40.171552s
limit 1000 has_many                  6.5 (±30.8%) i/s -        238 in  40.083358s                                                                           
limit 1000 has_many_through          1.1  (±0.0%) i/s -         45 in  40.288512s                     
limit 1000 double has_many_through   0.6  (±0.0%) i/s -         25 in  40.958853s
Rafael Mendonça França

OMG! Still slower.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.