Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with Associations::Preloader used as a Public API #32140

Open
xuorig opened this issue Feb 28, 2018 · 11 comments
Open

Issues with Associations::Preloader used as a Public API #32140

xuorig opened this issue Feb 28, 2018 · 11 comments
Labels

Comments

@xuorig
Copy link

xuorig commented Feb 28, 2018

Recently @eileencodes made me aware of this PullRequest #32136 and plans to possibly make Preloader a Public API.

We've been using Associations:Preloader at GitHub as part of our https://github.com/Shopify/graphql-batch setup to preload associations during GraphQL queries. Over the past month we've hit some pretty nasty issues with it. I believe these may have to be fixed before making the Preloader a public concern.

Loading already loaded records

The main issue is this line: https://github.com/rails/rails/blob/master/activerecord/lib/active_record/associations/preloader.rb#L180

When preloading an association on a list of records, if this association is already loaded on the first record of the list, it is assumed that the whole list is loaded. In these cases, a AlreadyLoaded preloader is returned, and uses association(name).target to read the preloaded records.

In a public setting, I don't think we can make this assumption, in fact, using preload with a list of records might result in this scenario:

a = Record.create!
b = Record.create!
c = Record.create!

a.some_association.create!

preloaders = ActiveRecord::Associations::Preloader.new.preload([a,b,c], [:some_association])

preloaders.first.preloaded_records
=> [Only the association for a is preloaded]

At GitHub we worked around this problem by always checking if the association was already loaded, before passing a record to Preloader, filtering them out if needed. That solved most issues until we hit issues that were out of our control.

Take through associations for example:
https://github.com/rails/rails/blob/master/activerecord/lib/active_record/associations/preloader/through_association.rb#L8-L14

If we take

has_many :patients, through: :appointments , source: :client

First, appointments are preloaded on patients. Then, client is preloaded on all patients. If the first appointment's client was already loaded, we'll get back only the client.

When returning the preloaded records, we'll try to get the target association on all records, but the whole tail of the list of records will be nil because they weren't actually loaded.

In certain versions of Rails, association.reader is used instead of target if the association is not loaded. This is almost worst because it will silently generate N+1's.

Another gotcha this behaviour introduces is that you cannot use Preloader with different types of records / associations.

For example, one issue we hit was that we were preloading an association on two different types of records at the same time:

Preloader.new.preload([doctor, patient, patient_b], [:appointment])

In our case, appoitments on one of the records was a has_one through:, and the other one was a belongs to. During the has_one through preloading logic, an association was loaded on the other kind of record. When it was time to load the second kind of record, since the association was already loaded for the first record, Preloader was skipping the whole list again.

If needed, I can provide failing tests for some of these tricky scenarios.

Preloading different object_ids

Another smaller issue is that the preloader currently uniq! the records before preloading them. This means that passing multiple objects for the same record will result in only the first record being preloaded.

We work around that by deduping the records before passing them to Preloader, and we copy over the associations after.

Takeaways

Currently, Rails assumes the preloader is ran in a certain context (its a private API right now after all). It breaks quickly when its used in other ways. It would be really awesome for us to make it a Public API, and it would be great to find a way around these gotchas!

cc @eileencodes @tenderlove

@rafaelfranca
Copy link
Member

When preloading an association on a list of records, if this association is already loaded on the first record of the list, it is assumed that the whole list is loaded.

We need to fix this case.

preloading an association on two different types of records at the same time:

This is not going to be allowed by the public API, it will raise in case you pass incompatible objects.
https://github.com/rails/rails/pull/32136/files#diff-bf6dd6226db3aab589916f09236881c7R1198

Preloading different object_ids

Also should be fixed.

@matthewd
Copy link
Member

matthewd commented Mar 1, 2018

it will raise in case you pass incompatible objects

Depends where you call it; it'd be possible to have two members of an inheritance hierarchy with different definitions of an association. And it's currently [deliberately] callable on AR::Base, which makes anything possible.

The check is there to disallow obviously-wrong things like User.load_associations(some_companies), but doesn't ensure strict homogeneity.

But I think this is just a specialization of the "first record is loaded but others aren't" problem, so if we fix that this one should be fine too.

cc @dinahshi

@dinahshi
Copy link
Contributor

dinahshi commented Mar 1, 2018

In the case of an inheritance hierarchy, I think a child would typically inherit all associations from its parent. And #load_association being called at the parent level would raise errors for child elements.

However, that doesn't address the AR::Base case. I actually like the idea of homogeneity because a call to #preload implies one query for one listed association. The cognitive load of mixing different types of records that happen to have common associations (or common associations names) is pretty high for me as I'm reading code but maybe that's just because I'm not used to it. That's another one of the reasons I like exposing Preloader via Relation rather than viewing it as a separate service. Discovered that Preloader already groups records by association class to handle non-homogeneous collections! https://github.com/rails/rails/blob/master/activerecord/lib/active_record/associations/preloader.rb#L139 I think it makes sense to expose this to public API; I will remove the incompatible object check.

@rails-bot
Copy link

rails-bot bot commented Jun 6, 2018

This issue has been automatically marked as stale because it has not been commented on for at least three months.
The resources of the Rails team are limited, and so we are asking for your help.
If you can still reproduce this error on the 5-2-stable branch or on master, please reply with all of the information you have about it in order to keep the issue open.
Thank you for all your contributions.

@rails-bot rails-bot bot removed stale labels Jun 7, 2018
@eileencodes eileencodes added this to the 6.0.0 milestone Jun 7, 2018
stanhu pushed a commit to gitlabhq/gitlabhq that referenced this issue Aug 28, 2018
Whether the preloading belongs in the service or the controller is arguable,
here. As the service is only used for one controller action, it seems reasonable
to put it in the service, but that is not a definitive answer.

Adding the preloads for MR project routes here doesn't seem to work, perhaps
because of rails/rails#32140.
jdbean pushed a commit to jdbean/gitlab-ce that referenced this issue Oct 26, 2018
Whether the preloading belongs in the service or the controller is arguable,
here. As the service is only used for one controller action, it seems reasonable
to put it in the service, but that is not a definitive answer.

Adding the preloads for MR project routes here doesn't seem to work, perhaps
because of rails/rails#32140.
@rafaelfranca rafaelfranca removed this from the 6.0.0 milestone Apr 12, 2019
@stephenh
Copy link

FWIW we're also using Preloader as a "public" API and hitting the "if 1st record is loaded, return AlreadyLoaded" gotcha. It sounds like there are larger issues with making Preloader a completely public/blessed API, but FWIW just incrementally fixing this one aspect of it would be helpful.

Is that something that might happen?

@bogdan
Copy link
Contributor

bogdan commented May 31, 2019

You can not just change the records.first.association.loaded? into records.all?{association.loaded?} because it will just cause loaded associations to be loaded twice. I personally don't see a quick bug free way of doing that except just filter them on the application end and never pass nested associations into direct Preloader.new.preload call:

Preloader.new(array.reject{|r| r.association.loaded?), :association)

@stephenh
Copy link

@bogdan thanks for the quick reply! This is perhaps a naive question, but could that be done within Preloader itself, like preloaders_on or what not? We're looking for a single slice point, instead of having to have all of our preloader calls do the same reject-if-loaded logic.

@bogdan
Copy link
Contributor

bogdan commented May 31, 2019

No, because it will skip nested associations if any. So you would need something like this:

if records.partition{|r| r.send(association).loaded?}.all?(&:any?) && nested_associations_given_to_preloader_for?(association)
  raise NotImplementedError
else
  records = records.reject{|r| r.send(association).loaded?}
end

The logic for nested_associations_given_to_preloader_for? seems unfeasible in a trivial way.

@stephenh
Copy link

@bogdan ah okay, thanks again for the reply and explanation; that makes sense. We'll have to think about this a bit.

I think I now understand an initially-cryptic comment in one of our manual-Preloader codepaths where we were just calling records.each { |r| r.send(:clear_association_cache) } for (initially to me) unclear reasons.

In retrospect, it was probably due to this behavior, i.e. that line gives the subsequent Preloader call a clean slate that forces the preloads that it specifically wants to be executed.

But, at the same time, it's throwing away any preloads that previous callers had setup.

This seems surprisingly nuanced/hard right now with ActiveRecord; i.e. passing around a non-trivially sized (either # of instances or continually-changing-as-features-change) list/tree of active record instances and letting disparate parts of the codebase perform the specific preloads that they need.

@bogdan
Copy link
Contributor

bogdan commented Jun 3, 2019

The way we achieved that in our project: disallow nested associations in manual preloader execution.
If you would need nested associations, you can map-flatten-uniq-compact parent association records and perform non-nested preloading again.

@michaelgpearce
Copy link

I wrote a wrapper around ActiveRecord::Associations::Preloader that attempts to address the bug referenced in this issue.

https://gist.github.com/michaelgpearce/52446b4de80664cfb533ec766f601854

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants