Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to skip joins for associations. #41937

Merged
merged 1 commit into from
Apr 19, 2021

Conversation

eileencodes
Copy link
Member

@eileencodes eileencodes commented Apr 12, 2021

In a multiple database application, associations can't join across
databases. When set, this option tells Rails to make 2 or more queries
rather than using joins for associations.

Set the option on a has many through association:

class Dog
  has_many :treats, through: :humans, disable_joins: true
  has_many :humans
end

Then instead of generating join SQL, two queries are used for @dog.treats:

SELECT "humans"."id" FROM "humans" WHERE "humans"."dog_id" = ?  [["dog_id", 1]]
SELECT "treats".* FROM "treats" WHERE "treats"."human_id" IN (?, ?, ?)  [["human_id", 1], ["human_id", 2], ["human_id", 3]]

This code is extracted from a gem we use internally at GitHub which
means the implementation here is used in production daily and isn't
experimental.

I often get the question "why can't Rails do this automatically" so I
figured I'd include the answer in the commit. Rails can't do this
automatically because associations are lazily loaded. dog.treats needs
to load Dog, then Human and then Treats. When dog.treats is
called Rails pre-generates the SQL that will be run and puts that
information into a reflection object. Because the SQL parts are pre-generated,
as soon as dog.treats is loaded it's too late to skip a join. The join
is already available on the object and that join is what's run to load
treats from dog through humans. I think the only way to avoid setting
an option on the association is to rewrite how and when the SQL is
generated for associations which is a large undertaking. Basically the
way that Active Record associations are designed, it is currently
impossible to have Rails figure out to not join (loading the association
will cause the join to occur, and that join will raise an error if the
models don't live in the same db).

The original implementation was written by me and Aaron. Lee helped port
over tests, and I refactored the extraction to better match Rails style.

Co-authored-by: Lee Quarella leequarella@gmail.com
Co-authored-by: Aaron Patterson aaron@rubyonrails.org

@eileencodes
Copy link
Member Author

I went back and forth about the option a few times - in our gem it's split: true but I didn't think that was descriptive enough. I went with skip_joins: true for this extraction.


```ruby
class Dog
has_many :treats, through: :humans, skip_joins: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think :disable_joins is more descriptive on the intention of disabling the join in the association table.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that, will update tomorrow.

records = @records

if records.length > TOO_MANY_RECORDS
warn("You've requested to order #{records.length} in memory. This may have an impact on the performance of this query. Use with caution.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt this warning will be useful. It will be buried in a lot of logs and depending in the configuration, since it is not going to the logger but to STDOUT it will just not be seeing. I think it is better to remove.

@eileencodes eileencodes force-pushed the has_many_through_skipping_joins branch from abf5635 to 69c9e80 Compare April 13, 2021 14:09
@eileencodes
Copy link
Member Author

So the postgres tests that are failing are related to that change I mistakingly made about a month ago where the resource_id is a string (introduced in this PR #14855).

The tests that are failing aren't failing on the disabled joins, they're actually failing on the joins because it's trying to compare a bigint with a string.

HasManyThroughDisableJoinsAssociationsTest#test_order_applied_in_double_join:
ActiveRecord::StatementInvalid: PG::UndefinedFunction: ERROR:  operator does not exist: bigint = character varying
LINE 1: ..."members" INNER JOIN "comments" ON "members"."id" = "comment...
                                                             ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

Is this supposed to work or is it supposed to fail? I'm not sure what the correct fix here is.

Copy link
Member

@rafaelfranca rafaelfranca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some skip_joins in the doc but other than that :shipit:


```ruby
class Dog < AnimalsRecord
has_many :treats, through: :humans, skip_joins: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
has_many :treats, through: :humans, skip_joins: true
has_many :treats, through: :humans, disable_joins: true


As of Rails 7.0+, Active Record has an option for handling associations that would perform
a join across multiple databases. If you have a has many through association that you want to
skip joining and perform 2 or more queries, pass the `skip_joins: true` option.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
skip joining and perform 2 or more queries, pass the `skip_joins: true` option.
skip joining and perform 2 or more queries, pass the `disable_joins: true` option.

@eileencodes eileencodes force-pushed the has_many_through_skipping_joins branch from 69c9e80 to 7fd9e5a Compare April 13, 2021 19:39
@rafaelfranca
Copy link
Member

For what I could get this test is going to fail because the join really can't work with type cast in the query. The type added in #14855 is correct thought, it will not work for joins but it should work for association accessors which is what that PR is fixing.

Maybe we should change the test of that PR to use a different model/association or this PR should use a different model/association?

@eileencodes
Copy link
Member Author

Maybe we should change the test of that PR to use a different model/association or this PR should use a different model/association?

Yea I think that makes sense. I'll do that tomorrow or later in the week.

@matthewd
Copy link
Member

Is this generally necessary? In theory we already know the two models live in different databases, so it feels like we should be able to infer that an in-DB join is inappropriate 🤔

(Note that's not "I don't think we should have this option", but "I don't think people should normally need to specify it themselves"... am I missing something that makes it impractical for us to detect this situation automatically, or is that just a future possible improvement?)

@eileencodes
Copy link
Member Author

eileencodes commented Apr 14, 2021

I don't think people should normally need to specify it themselves"... am I missing something that makes it impractical for us to detect this situation automatically, or is that just a future possible improvement?

Aaron and I spent a lot of time trying to skip the whole needing to specify the disable joins, but it's not really possible with how Rails association loading works. The problem is that associations are lazily loaded so by the time the dog.treats is actually loaded it's too late to make two queries - to know they're in different dbs means loading the models and load the models means loading the db connection and when that happens Rails automatically does a join. By the time Rails knows they're in different dbs, it's too late so we have to tell it ahead of time to not join. We can probably fix this long term, but not in any immediate future because changing it would require changing how and when associations are loaded.

Comment on lines 9 to 10
def initialize(klass, *args)
key, ids = args
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def initialize(klass, *args)
key, ids = args
def initialize(klass, key, ids)

@eileencodes eileencodes force-pushed the has_many_through_skipping_joins branch 2 times, most recently from de24b93 to 1e20d0d Compare April 15, 2021 21:19
@eileencodes eileencodes force-pushed the has_many_through_skipping_joins branch from 1e20d0d to e2302dd Compare April 19, 2021 14:52
In a multiple database application, associations can't join across
databases. When set, this option tells Rails to make 2 or more queries
rather than using joins for associations.

Set the option on a has many through association:

```ruby
class Dog
  has_many :treats, through: :humans, disable_joins: true
  has_many :humans
end
```

Then instead of generating join SQL, two queries are used for `@dog.treats`:

```
SELECT "humans"."id" FROM "humans" WHERE "humans"."dog_id" = ?  [["dog_id", 1]]
SELECT "treats".* FROM "treats" WHERE "treats"."human_id" IN (?, ?, ?)  [["human_id", 1], ["human_id", 2], ["human_id", 3]]
```

This code is extracted from a gem we use internally at GitHub which
means the implementation here is used in production daily and isn't
experimental.

I often get the question "why can't Rails do this automatically" so I
figured I'd include the answer in the commit. Rails can't do this
automatically because associations are lazily loaded. `dog.treats` needs
to load `Dog`, then `Human` and then `Treats`. When `dog.treats` is
called Rails pre-generates the SQL that will be run and puts that
information into a reflection object. Because the SQL parts are pre-generated,
as soon as `dog.treats` is loaded it's too late to skip a join. The join
is already available on the object and that join is what's run to load
`treats` from `dog` through `humans`. I think the only way to avoid setting
an option on the association is to rewrite how and when the SQL is
generated for associations which is a large undertaking. Basically the
way that Active Record associations are designed, it is currently
impossible to have Rails figure out to not join (loading the association
will cause the join to occur, and that join will raise an error if the
models don't live in the same db).

The original implementation was written by me and Aaron. Lee helped port
over tests, and I refactored the extraction to better match Rails style.

Co-authored-by: Lee Quarella <leequarella@gmail.com>
Co-authored-by: Aaron Patterson <aaron@rubyonrails.org>
@eileencodes eileencodes force-pushed the has_many_through_skipping_joins branch from e2302dd to de6b4ef Compare April 19, 2021 15:18
@eileencodes eileencodes merged commit 6ebd134 into rails:main Apr 19, 2021
@eileencodes eileencodes deleted the has_many_through_skipping_joins branch April 19, 2021 16:08
@cheshire137
Copy link
Contributor

cheshire137 commented Apr 23, 2021

Thank you for adding this! I would love to have this option for has_one ... through: relations as well at some point.

Edit: I've opened #42079 to do this.

@djpate
Copy link

djpate commented Sep 14, 2023

@eileencodes Just curious if there is a reason why this was added only on has_many through:... and not on regular has_many as well?

@eileencodes
Copy link
Member Author

Because a regular has_many doesn't produce a join query.

@djpate
Copy link

djpate commented Sep 14, 2023

Got it. thank you!

@ryders
Copy link

ryders commented Sep 29, 2023

Because a regular has_many doesn't produce a join query.

@eileencodes I was wondering if you could help me clarify this;

According to my findings, the has_many doesn't produce a join query only when there are no (WHERE) conditions.

so considering the following pseudo-rails-code

Customer->has_many->Orders
Customer->has_many->Warehouses :through => Orders
Customer->has_many->Emails

Then Customer.includes([:orders,:emails]) is correctly generating TWO sql queries.

However, Customer.joins(:warehouses).where(:warehouses => { :country_id => 123}).includes([:orders,:emails]) will generate a single SQL -- Rails essentially attempts at applying the warehouse WHERE condition on BOTH the emails, and the orders. This seems to happen only when the condition is looking up one of the tables specified in .includes(), any other where conditions (eg .where(:active => true) is fine and correctly generates multiple queries.

Are you able to confirm I'm not missing something here? Any chance you can comment on a possible workaround, or suggest any different approach?

(this is a problem for me because Warehouse connects_to a different db. We are nesting objects in an API response so we really want to lean on rails as much as possible and no do this by hand!). Works great without a where condition! 🤔

Many thanks, and my apologies for reviving an old thread..

@eileencodes
Copy link
Member Author

eileencodes commented Sep 29, 2023

I don't have the bandwidth to offer application support. If there's a bug please open a new issue, if it's just you're not sure how to use the feature start a thread on the forum.

But calling joins on a query that you don't want to join doesn't make sense. You're literally asking Rails to perform a join.

@ryders
Copy link

ryders commented Oct 1, 2023

no problem. I did not ask for application support. I'm only suggesting your feedback was not quite accurate and was wondering if you could contribute any internal Rails knowledge to confirm my findings.

disable_joins was really partially implemented. Only for has_many - through, not for has_many relationships.

Rails DOES produce a join query with normal has_many queries, when a where condition is used and that where condition references a relationship.

So

Because a regular has_many doesn't produce a join query.

is not accurate

unless I'm missing something.

@eileencodes
Copy link
Member Author

unless I'm missing something

A has many in which you are not manually calling joins does not produce a join query.

Given the following associations:

class Post
  has_many :comments
end

class Comment
  belongs_to :post
end

Calling comments for a post does not produce a join query:

Post.first.comments

The SQL:

Post Load (0.0ms)  SELECT "posts".* FROM "posts" ORDER BY "posts"."id" ASC LIMIT ?  [["LIMIT", 1]]
Comment Load (0.0ms)  SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = ? /* loading for inspect */ LIMIT ?  [["post_id", 1], ["LIMIT", 11]]

@ryders
Copy link

ryders commented Oct 2, 2023

Thanks @eileencodes

Anyone else landing here with the same problem, please read my previous comment:

Rails DOES produce a join query with normal has_many queries, when a where condition is used and that where condition references a relationship.

As such:

Post.joins(:comments).where(:comments => { approved:true })

is what triggers a JOIN to be created instead (or variations thereof)

Post.joins(:comments).where(:comments => { approved:true }).first
Post.joins(:comments).where(:comments => { approved:true }).first.comments

The SQL:

Post Load (1.0ms)  SELECT "posts".* FROM "posts" INNER JOIN "comments" ON "comments"."post_id" = "posts"."id" WHERE "comments"."approved" = ? ORDER BY "posts"."id" ASC LIMIT ?  [["approved", 1], ["LIMIT", 1]]

This is all valid SQL and valid operations, however the lack of a disable_join option on the has_many will kick your arse if you're storing comments is in a different db (ref connects_to) which most definitely do require NO join, but instead multiple SQL calls for each has_many relationships -- just like has_many :through :disable_joins does. Hence the need for disable_joins on has_many (without :through).

Rails Documentation is arguably not wrong, but not exactly accurate either.

@eileencodes
Copy link
Member Author

But you're calling joins. Of course Rails is going to attempt a join, you're asking it to. If you don't want a has_many to join, don't call joins on it.

@ryders
Copy link

ryders commented Oct 14, 2023

Ha! Sorry, yes, my attempt to simplify my code has obviously led to using a badly chosen example! 🤣

After subsequent investigation I discovered that the source of the problem originated from some Rails magic (.includes more specifically) which sometimes forces and joins and sometimes does two separate queries. I ended up using .preload instead which works around the lack of a :disable_joins on has_many.

I was something along the lines of (don't have my notes with me atm)

1xincludes leads to a join, 
2xincludes leads to 2 separate queries, and
3xincludes leads to a join

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants