-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support distributed preloading calls #45724
base: main
Are you sure you want to change the base?
Conversation
Prior to this commit, calling `includes` or `preload` on an already-loaded relation or collection proxy would return a new relation that ignored the loaded records. This commit passes the loaded records along to the new relation so that it can use them when preloading an association. Consider the following code: ```ruby posts.each do |post| # ... end # ...some intervening code... posts.includes(:comments).each do |post| post.comments.each do |comment| # ... end end ``` Before this commit, the code would execute 3 queries: ``` SELECT "posts".* FROM "posts" SELECT "posts".* FROM "posts" SELECT "comments".* FROM "comments" WHERE "comments"."post_id" IN (?, ?, ?, ?, ?) [["post_id", 1], ["post_id", 2], ["post_id", 3], ["post_id", 4], ["post_id", 5]] ``` After this commit, the code executes only 2 queries: ``` SELECT "posts".* FROM "posts" SELECT "comments".* FROM "comments" WHERE "comments"."post_id" IN (?, ?, ?, ?, ?) [["post_id", 1], ["post_id", 2], ["post_id", 3], ["post_id", 4], ["post_id", 5]] ```
This commit modifies the `Preloader` to track preloaded associations in buckets, and, when preloading a downstream association, to preload that association for all owner records from the same bucket. The goal of this change is to allow individual `includes` and `preload` calls to be placed alongside the code that depends on them. This makes it easier to add a preload when needed or remove a preload when it is no longer needed. Consider the following code: ```ruby def render_posts(posts) posts.includes(:comments).each do |post| post.comments.each do |comment| # ... end end end def render_authors(authors) authors.includes(:posts).each do |author| render_posts(author.posts) end end render_authors(Author.all) ``` Before this commit, with N authors each having at least 1 post, the code would execute N + 2 queries. For example, with 5 authors: ``` SELECT "authors".* FROM "authors" SELECT "posts".* FROM "posts" WHERE "posts"."author_id" IN (?, ?, ?, ?, ?) [["author_id", 1], ["author_id", 2], ["author_id", 3], ["author_id", 4], ["author_id", 5]] SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = ? [["post_id", 11]] SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = ? [["post_id", 12]] SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = ? [["post_id", 13]] SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = ? [["post_id", 14]] SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = ? [["post_id", 15]] ``` After this commit, the code executes only 3 queries: ``` SELECT "authors".* FROM "authors" SELECT "posts".* FROM "posts" WHERE "posts"."author_id" IN (?, ?, ?, ?, ?) [["author_id", 1], ["author_id", 2], ["author_id", 3], ["author_id", 4], ["author_id", 5]] SELECT "comments".* FROM "comments" WHERE "comments"."post_id" IN (?, ?, ?, ?, ?) [["post_id", 11], ["post_id", 12], ["post_id", 13], ["post_id", 14], ["post_id", 15]] ```
Very cool idea. I think the final SQL output is similar to what you'd get if using goldiloader, without any |
@ghiculescu Thanks! 😃 If my understanding is correct, I think #45413 is closer to In this PR, I wanted to take a more conservative approach, to try to avoid situations where deeply nested code must defensively disable the feature for fear of loading too much data. Granted, that can still happen with this PR, but my hope is that restricting the feature to explicit (distributed) chains of |
Yeah, glad to see another approach around this. I took a shot at this style of api before I submitted my proposed approach. The feedback I received I would like to share. Many people thought it was not a great ergonomic to have to opt in to the include since that is more frequently the preferred mode and they preferred an opt out. The tricky thing I think is balancing the shortest code to write that minimizes mistakes we care about most. Large data loads could be a problem but which is our api optimized to fix? Should it more eagerly load to prevent the n+1 or should it eagerly optimize for large data flows and make fixing n+1 queries an opt in? |
I do wonder if perhaps a new api is needed that does not exist. The concern we are working with here is fairly well understood when you maybe loading a lot of records directly from the database, we use the batch api for this. What if the associations allowed for a batch like method to be called? Something like
This would allow the user to limit the number of things brought back from the database in a familiar way and also allow dynamic includes to be the default. It also provides a familiar api for working with large data sets. |
This PR is split into two commits.
The 1st commit allows records from already-loaded relations to be used when performing post-hoc preloads of downstream associations.
The 2nd commit modifies the
Preloader
to track preloaded associations in buckets, and, when preloading a downstream association, to preload that association for all owner records from the same bucket.The goal of this PR is to allow individual
includes
andpreload
calls to be placed alongside the code that depends on them. This makes it easier to add a preload when needed or remove a preload when it is no longer needed.Consider the following code:
Before the 1st commit, with N authors each having at least 1 post, the code would execute 2N + 2 queries. For example, with 5 authors:
After the 1st commit, the code executes N + 2 queries:
After the 2nd commit, the code executes 3 queries:
Related: #45161, #45413, #45231. The primary difference of this PR is that users must still opt in to preloading via
includes
orpreload
. The intent is to allow fine-grained control over when preloading happens, but to make cascading automatic.