Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added AR associations lazy preloading #33527

Closed

Conversation

@realglebivanov
Copy link

@realglebivanov realglebivanov commented Aug 3, 2018

Feature is requested by https://cultofmartians.com/

Suppose that you have the following two Active Record models:

  class Author < ActiveRecord::Base
    # columns: name, age
    has_many :books
  end

  class Book < ActiveRecord::Base
    # columns: title, sales, author_id
  end

When you load an author with all associated books Active Record will make
only one query:

  authors = Author.lazy_preload(:books).where(name: ['bell hooks', 'Homer']).to_a
  => SELECT `authors`.* FROM `authors` WHERE `name` IN ('bell hooks', 'Homer')

Then you probably will want to retrieve books while iterating authors

  authors.each { |author| read author.books }

And only after books method invocation second query will be executed
to load all books for authors in the collection:

=> SELECT `books`.* FROM `books` WHERE `author_id` IN (2, 5)

This also works for nested association preloading, what can be useful for writing e.g. graphql apis.

@rails-bot
Copy link

@rails-bot rails-bot commented Aug 3, 2018

Thanks for the pull request, and welcome! The Rails team is excited to review your changes, and you should hear from @kamipo (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

This repository is being automatically checked for code quality issues using Code Climate. You can see results for this analysis in the PR status below. Newly introduced issues should be fixed before a Pull Request is considered ready to review.

Please see the contribution instructions for more information.

Loading

@@ -31,6 +31,16 @@ def initialize(owner, reflection)
reset_scope
end

# Checks whether lazy eager load of the association is possible
# and tries to preload if it is so.
def reader
Copy link
Contributor

@palkan palkan Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reader is called every time we access the association; what about injecting this behaviour into load_target method, which is called only if association hasn't been loaded yet? This way you don't have to modify Association subclasses.

Loading

Copy link
Author

@realglebivanov realglebivanov Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a nicer solution, but I will have to modify subclasses anyway.

Loading

# Checks whether lazy eager load of the association is possible
# and tries to preload if it is so.
def reader
LazyPreloader::WeakRegistry.get(@owner).tap do |preloader|
Copy link
Contributor

@palkan palkan Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use accessors (we do already have them) instead of the instance variables (@owner, @reflection).

Loading

Copy link
Author

@realglebivanov realglebivanov Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Loading

private

def loaded_records(association)
@records.flat_map { |record| record.send association }
Copy link
Contributor

@palkan palkan Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why flat_map? Isn't@records always an array?

Loading

Copy link
Author

@realglebivanov realglebivanov Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given association of the record can be an array too, so we will have an array of arrays

Loading

Copy link
Contributor

@palkan palkan Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using record.association(association).target? That would be more performant since we don't have to pass through all the reader -> load_target process (even if we do not actually need to load anything).

Loading

@@ -621,9 +621,9 @@ def alias_tracker(joins = [], aliases = nil) # :nodoc:
def preload_associations(records) # :nodoc:
preload = preload_values
preload += includes_values unless eager_loading?
preloader = nil
preloader = build_preloader
Copy link
Contributor

@palkan palkan Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we build Preloader for every relation, even if preload_values and lazy_preload_values are empty. We can fix this by checking for the presence of the values to preload (lazily or not); smth like: return if preload.empty? && lazy_preload_values.empty?

Loading

Copy link
Author

@realglebivanov realglebivanov Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Loading

require "models/tagging"

class EagerLazyPreloadNestedTest < ActiveRecord::TestCase
NUM_SIMPLE_OBJS = 50
Copy link
Contributor

@palkan palkan Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for creating so many records for every test?

Loading

Copy link
Author

@realglebivanov realglebivanov Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to be sure there were few queries executed instead of hundreds. Having hundreds records is more visual in that case and more unlikely will give false positive test results.

Loading

NUM_SIMPLE_OBJS = 50
NUM_SHAPE_EXPRESSIONS = 100

module Remembered
Copy link
Contributor

@palkan palkan Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adding lazy_preload tests to eager_load_nested_include_test.rb to avoid this duplication?

Loading

Copy link
Author

@realglebivanov realglebivanov Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Loading

# => SELECT `books`.* FROM `books` WHERE `author_id` IN (2, 5)
#
class WeakRegistry
@@weakmap = ::ObjectSpace::WeakMap.new
Copy link
Contributor

@palkan palkan Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about WeakMap, it's considered to be private; Ruby docs suppose using WeakRef instead. But that would bring the complexity of cleaning up dead refs.

Maybe, a simpler alternative would be to store the preloader within the owner as an instance variable.

Loading

Copy link
Author

@realglebivanov realglebivanov Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first, I did it the way you asked, but I got many warnings complaining about not initialized instance variable.

Loading

Copy link
Author

@realglebivanov realglebivanov Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched back to an instance variable solution.

Loading

@realglebivanov
Copy link
Author

@realglebivanov realglebivanov commented Aug 4, 2018

Well, I think I have fixed all issues you noticed and this PR needs another review.

Loading

@matthewd
Copy link
Member

@matthewd matthewd commented Aug 4, 2018

This is a great idea, thanks! It was actually on my todo list for after we land #32136.

I haven't dug into the implementation yet, but first a design question: is there any reason this shouldn't be the default/only behaviour for preloading?

Loading

@realglebivanov
Copy link
Author

@realglebivanov realglebivanov commented Aug 4, 2018

Thanks! I think we should leave a choice to a client of our code - he probably wants to preload "everything at once" and doesn't want to load any extra associations unless he explicitly declares that.

It may seem like my reason is not strong, but I dont't like libraries that are trying to decide for me something should be totally optional.

Loading

@realglebivanov realglebivanov changed the title [Lazy-Preloading] Added AR associations lazy preloading Added AR associations lazy preloading Aug 4, 2018
@matthewd
Copy link
Member

@matthewd matthewd commented Aug 4, 2018

"We decided for you" is kinda Rails' thing 😉

We already decide how and when to do the preloading, and we silently defer relation queries until they're absolutely needed... so to me, it seems like a logical extension of that behaviour for us to leave these queries, too, until the association is accessed.

For a slightly more concrete version of my question: if we keep them separate, but make this behaviour preload and rename the current one to preload_immediately, and then describe the reasons you'd choose preload_immediately in the documentation... what would that documentation say?

Loading

@realglebivanov
Copy link
Author

@realglebivanov realglebivanov commented Aug 4, 2018

According to what I have seen in preloading logic, preloading runs when "main" ActiveRecord::Relation's query is executed, so preloading runs always if "main" query is executed. What I tried to achieve is to defer preloading to the moment when it is really needed.

I'm not very good at writing documentation, but I think it's possible to explain this idea.

But I agree with you that it's possible to make this behavior default without people getting mad about it.

Loading

@@ -31,6 +31,18 @@ def initialize(owner, reflection)
reset_scope
end

# Checks whether lazy eager load of the association is possible
# and tries to preload if it is so.
def reader; end
Copy link
Contributor

@palkan palkan Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this method stub?

Looks like the comment above is for the lazy_preload

Loading

private

def loaded_records(association)
@records.flat_map { |record| record.send association }
Copy link
Contributor

@palkan palkan Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using record.association(association).target? That would be more performant since we don't have to pass through all the reader -> load_target process (even if we do not actually need to load anything).

Loading

#
# => SELECT `books`.* FROM `books` WHERE `author_id` IN (2, 5)
#
class WeakRegistry
Copy link
Contributor

@palkan palkan Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no longer a weak registry, isn't it? This naming is a little bit confusing.

What about calling the methods store and fetch, and probably change the behaviour of the latter one to:

def fetch
  if record.instance_variable_defined?(:@_lazy_preloader)
    # here we can assume that if the variable exists then it's not nil
    # so we don't have to nil-check later
    yield record.instance_variable_get :@_lazy_preloader
  end
end

# and the use it like this
def lazy_preload
  LazyPreloader::Registry.fetch(owner) do |preloader|
    preloader.preload reflection.name if preloader.should_load?(reflection.name)
  end
end

Loading

Copy link
Author

@realglebivanov realglebivanov Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, thank you for your comments!

Loading

Copy link
Author

@realglebivanov realglebivanov Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually mistake was in naming Registry WeakRegistry, because weak is kind of low-level implementation detail, which is unnecessary to know for users of registry.

Loading

assert_queries(1) { res.records }
assert_equal NUM_SHAPE_EXPRESSIONS, res.size

assert_queries(4) do
Copy link
Contributor

@palkan palkan Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be great to have some comments here: why four?

Also, shouldn't that be 2? One for paint records and one for non_poly records?

Loading

Copy link
Author

@realglebivanov realglebivanov Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data for each class in this test is stored in separate tables, so there are a bit more queries than you have expected to see.

Loading

Copy link
Contributor

@palkan palkan Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok.

Loading

@@ -99,6 +102,25 @@ def test_include_query
end
end
end

def test_lazy_preload_query
res = ShapeExpression.all.merge!(lazy_preload: [ :shape, { paint: :non_poly } ])
Copy link
Contributor

@palkan palkan Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test covers only singular association. I think we need one for collection associations too, like: Squre.all.lazy_preload(shape_expressions: { paint: :non_poly })

Loading

@palkan
Copy link
Contributor

@palkan palkan commented Aug 4, 2018

+1 for making this behaviour a default in Rails 6. Couldn't find a reason for using preload_immediately so far 🤔

Loading

@realglebivanov
Copy link
Author

@realglebivanov realglebivanov commented Aug 4, 2018

Ok. I'm working on this behavior as a default currently.

Loading

@lsylvester
Copy link
Contributor

@lsylvester lsylvester commented Aug 6, 2018

Hi, this feature looks great.

There are a couple of instances where I think that the immediate preloading would still be useful.

When calling explain on a relation with preload, the explain plans are printed for not just the main query but also each of the preloading queries. With lazy_preload I think it would become a bit tricky to generate these explain plans.

Also, if you are iterating over a large set of records with find_each or find_in_batches, and keeping references to some of them then this could result in a fair amount of memory bloat as each record would contain references to each other record in each batch. Sample code below to explain what I am talking about:

result = []
Book.lazy_preload(:author).find_each do |book|
  result << book if some_condition?(book)
end

I think that last preloading is good default behaviour, but there should be a way to get immediate preloading to occur. I can't see any reason that you would need to do both with the same relation, and I think that for the case of explain it would be good to be able to take a relation that is using lazy preloading and change it to be immediate preloading just for the purposes of running the explain plan. Possibly some thing like:

my_association.preload_strategy(:immeditate).explain

where :lazy is the default strategy.

Loading

@palkan
Copy link
Contributor

@palkan palkan commented Aug 6, 2018

Good points @lsylvester 👍

I think, the current implementation of lazy_preload should work with find_in_batches without any change (since each batch is using its own Relation instance => its own preloader). @realglebivanov WDYT?

The explain case is more tricky. Probably, we can check for ExplainRegistry.collect = true in preload_associations and fallback to immediate preload. Though it doesn't look like a good idea to check for ExplainRegistry in Preloader.

Loading

@realglebivanov
Copy link
Author

@realglebivanov realglebivanov commented Aug 6, 2018

@lsylvester, Thank you for your observations!

@palkan

Well, I think the problem is that whole batch wont be GC'ed if one record remains referenced.
This can be fixed with switching back to weak references(in Registry) and checking in preloader whether a record is still should be affected by lazy preloading(not GC'ed in @records array).

I think there's no need in letting user determine which stategy should be used. We still can swap to immediate strategy right in Relation#preload_associations by checking whether ExplainRegistry is collecting queries - we can choose between calling preload and lazy_preload on Preloader instance.

Loading

def self.store(record, instance)
record.instance_variable_set :@_lazy_preloader, instance
reference = ::WeakRef.new record
Copy link
Contributor

@lsylvester lsylvester Aug 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the WeakRefs need to be for the collection of records stored in the LazyPreloader at https://github.com/rails/rails/pull/33527/files#diff-a96765768ab0d3346409a9af7447ccf3R56. I don't think that there was anything wrong with storing the preloaded against each of the records, unless I have missed something.

Loading

@lsylvester
Copy link
Contributor

@lsylvester lsylvester commented Aug 7, 2018

@realglebivanov I have made a test to ensure that lazy preloader isn't preventing records from being garbage collected. It is at https://github.com/lsylvester/rails/blob/884a803fca3b0857bcadc4042591ece85a17ba56/activerecord/test/cases/relations_test.rb#L2030.

I am not sure how dependent this tests passing is on the particulars of the garbage collector, but it seems to work for me reliably.

Feel free to use this test, or any of the other code in my branch at realglebivanov/rails@feature-Lazy-Preloading...lsylvester:feature-Lazy-Preloading for this work.

Loading

@realglebivanov
Copy link
Author

@realglebivanov realglebivanov commented Aug 7, 2018

@lsylvester I realized that there is no need in WeakRefs too - we can store object_ids array in LazyPreloader and retrieve objects when needed with _id2ref. No tricks with finalizers are needed in that case as I have understood. Is usage of _id2ref considered a good practice?

Loading

@matthewd
Copy link
Member

@matthewd matthewd commented Aug 7, 2018

Is usage of _id2ref considered a good practice?

No; use a WeakRef -- it's a higher level abstraction over the same concept.

I haven't thought it through properly, but I think we can do it with a single weakref from the record back to its cohort -- where each member of the cohort can reference the same instance of WeakRef.

Loading

@lsylvester
Copy link
Contributor

@lsylvester lsylvester commented Aug 8, 2018

@realglebivanov I think that the object_ids get reused once the the object has been GCed, so _id2ref would end up returning the incorrect object.

@matthewd I am not sure how a single WeakRef would work. If each record only has a week ref to the preloader, then the preloader could get GCed at any time and the app would fall back to the N+1 query strategy.

Loading

@matthewd
Copy link
Member

@matthewd matthewd commented Aug 8, 2018

@lsylvester my theory was that in the simple case of Post.preload(:comments), each Post could have a weakref back to the relation (or a separate preloader-type object, that is strongly referenced by the relation); if the relation is GCed, then it seems unlikely anyone is looping over it. Maybe that's overly presumptive though... Post.preload(:comments).select(&:good?).flat_map(&:comments).map(&:title) might be inefficient, but it should still only run two queries, no matter when GC occurs.

In that case, I'd closely consider using WeakMap directly, if it can save us making an extra allocation per record.

Loading

@realglebivanov
Copy link
Author

@realglebivanov realglebivanov commented Aug 8, 2018

@matthewd, your case is a bit tricky - we will have problems with missing preloaders(preloaders will be GC'ed because they are in a WeakMap and nothing references them but WeakMap itself what doesn't prevent them from being GC'ed) as @lsylvester have said.

Or we should decide then when our preloaders should be GC'ed, what is hardly possible, I think.

Post.preload(:comments).select(&:good?).flat_map(&:comments).map(&:title)

If garbage collection was executed between constructing query with select and executing it due to flat_map invocation preloaders would be GC'ed and n+1 problem would be back if we had a fallback behavior. But this would happen from time to time causing inconsistent behavior.

So, I think it's better to allocate N WeakRef's and holding weak references in preloader is kind of a decent decision in this situation.

Loading

@matthewd
Copy link
Member

@matthewd matthewd commented Aug 8, 2018

@realglebivanov right; the above was me explaining 1) what my earlier idea was, and 2) why it doesn't work.

Loading

@realglebivanov
Copy link
Author

@realglebivanov realglebivanov commented Aug 8, 2018

@matthewd, I understand now, thank you.

Loading

@@ -1911,7 +1918,7 @@ def test_presence
authors_count = Post.select("author_id, COUNT(author_id) AS num_posts").
group("author_id").order("author_id").includes(:author).to_a

assert_no_queries do
assert_queries(1) do
Copy link
Contributor

@robotdana robotdana Aug 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this and all other examples original query should also be wrapped in an assert_queries(1) block to ensure there are no cases where unnecessary queries are happening

Loading

@rails-bot
Copy link

@rails-bot rails-bot bot commented Dec 18, 2019

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Thank you for your contributions.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

8 participants