Skip to content

Implement Relation#load_async to schedule the query on the background thread pool#41372

Merged
eileencodes merged 1 commit into
rails:mainfrom
Shopify:ar-relation-async-query
Feb 16, 2021
Merged

Implement Relation#load_async to schedule the query on the background thread pool#41372
eileencodes merged 1 commit into
rails:mainfrom
Shopify:ar-relation-async-query

Conversation

@casperisfine
Copy link
Copy Markdown
Contributor

@casperisfine casperisfine commented Feb 8, 2021

This is a followup to #40037 (see that PR for complete context)

Relation#load_async schedules the query on the background thread pool, and returns self.

Comment thread activerecord/lib/active_record/relation.rb Outdated
@casperisfine casperisfine force-pushed the ar-relation-async-query branch from 085482a to a80388d Compare February 8, 2021 19:22
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very happy with having to store this in an instance variable for later use, but apply_join_dependency can sometimes trigger a query on its own, so if we were to call it twice, we'd generate an extra query in some cases.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test for this case? Or can you elaborate a bit more what you mean by it "can sometimes trigger a query on its own"? I don't follow what scenarios this might happen in.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is exactly one test case: https://buildkite.com/rails/rails/builds/74758#40f90c22-7273-4458-a0db-71ec3fbf3638/1055-1064

The extra query is triggered here:

klass.connection.distinct_relation_for_primary_key(relation)

Which mean that in such case load_async would actually trigger a query in the main thread.

There might be a way to refactor this out, but I chose to prioritize a smaller more comprehensible PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasoning here is that some shared code might attempt to use .defer, and sometime might be called from inside a transaction, and sometimes outside.

So have have to make an API decision of either

  • Querying immediately
  • Just do nothing

I believe the later make more sense.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new name (which I agree with), I think it's probably better that load_async will still load even if it can't do it async.

In the async-possible case, the resulting relation here will behave like it was already loaded (by blocking if necessary to make that true)... I can't think of specific examples, but my gut says an un-loaded relation will behave less similarly in some edge cases.

That said, I can also see use cases where a knowing caller might want to say "start an async query for this if you can, because I think it's going to be used later... but if you'd need to block, then we'll just wait and see whether it's actually needed". (Association preload and "all Relation-shaped instance variables that get copied from a controller into a view context" being variously-reasonable examples there.) 🤔

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can find arguments for both behaviors too.

I think the most common case this will happen, will be in tests using transactional fixtures. So I'm thinking we should chose the solution that will make this fallbacks behave closer to production, in term of assert_queries and similar test assertions.

Using this reasoning, I'd say you are right and we should fallback by either straight calling load, or maybe by calling exec_main_query(async: false), as to simulate the query being performed, but the records not being instantiated.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I pushed 2042a2914535551f165378e11da19fef98ee7fbb, which does call exec_main_query(async: false) if we're inside a transaction.

The pro is that the behavior is much closer to a query that would be properly scheduled asynchronously, as the records instantiation is still delayed.

The con is that it open the door for some weird behavior, as the main query is executed inside the transaction, but if you have preloads or something like that, and you iterate on the relation outside the transaction, they might be performed outside.

So maybe falling back to load might be better?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not possible to do an async query then we should go through the regular load path rather than a new future result path that isn't actually future. This makes the feature easier to debug and harder to make accidental mistakes.

I made this mistake in my demo app by calling to_a on the load_async (ie Dog.where(name: "fido #{i}").load_async.to_a). From a user experience perspective it was pretty confusing to debug why it said it was async but it was not actually async.

Additionally I made another mistake that revealed a bug with pluck and load_async. If you do Dog.where(name: "fido #{i}").load_async.pluck(:id) it results in duplicate queries. I now know this is not the correct way to use the feature but it's an easy mistake to make that results in incorrect behavior.

  Dog Load (0.3ms)  SELECT `dogs`.* FROM `dogs` WHERE `dogs`.`name` = 'fido 1'
  Dog Pluck (0.2ms)  SELECT `dogs`.`id` FROM `dogs` WHERE `dogs`.`name` = 'fido 1'

In the non-async code path (ie Dog.where(name: "fido #{i}")load.pluck(:login) ) Rails knows what to do with an already loaded relation and we don't get duplicate queries (see https://github.com/rails/rails/blob/main/activerecord/lib/active_record/relation/calculations.rb#L184-L186)

  Dog Load (0.3ms)  SELECT `dogs`.* FROM `dogs` WHERE `dogs`.`name` = 'fido 1'

If we go the previous path for relations that can't be loaded async, we'll have less edge cases to worry about. I think the mistake I made with pluck and to_a is an easy one to make and so in that case we should make sure the app behaves as it previously did rather than having potentially surprising non-async async behavior.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pluck is a nice find:

    def pluck(*column_names)
      if loaded? && all_attributes?(column_names)
        return records.pluck(*column_names)
      end

Since load_async doesn't set @loaded = true but @scheduled = true, pluck doesn't know it could use the future.

I either need to go over all Relation methods to make sure they handle both loaded? and scheduled? states, or we could merge both states together.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this mistake in my demo app by calling to_a on the load_async [...] it was pretty confusing to debug why it said it was async but it was not actually async.

I think this is my fault, that part of the design was explained in the initial proof of concept, but I didn't mention it in the first actual PR.

Then wehn you do use the relation, either the query is completed and it just uses the results, or it is being executed and it waits for it to complete, or the thread pool was busy and it simply execute it in the foreground.

That third case is important, because of the way Ruby thread scheduling works, unless you do other unrelated IOs between the time you did call load_async and the time you actually need the result, it's entirely possible that the query have been sitting idle in the work queue without even having been sent to the database.

We could of course wait for the thread pool to execute it, but that would be a waste because at this stage we know that the main thread will be blocked on that result, and that it's free to execute the query itself.

So yes, it's very easy to misuse load_async, and we should be on the lookout for making easier to understand what's happening, but I do believe that behavior is the current one.

Now if the database clients had true native async APIs, we could send the query and return a response promise/future, but that's unlikely to happen any time soon.

Let me know if that behavior is still unclear.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: pluck

Now load_async marks the Relation as loaded?, which fixes pluck and similar methods that behave differently based on wether the Relation was loaded.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something else we could do it to call Thread.pass right after we enqueue the query, this would increase the chance that the query get performed asynchronously.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So interestingly enough, calling Thread.pass caused the test of that fallback behavior to fail, which proves that it's actually effective at producing the behavior you expected in this simple scenario.

@casperisfine casperisfine force-pushed the ar-relation-async-query branch from a80388d to 901c4e6 Compare February 9, 2021 09:31
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some thought, I think I make sense to have defer work by default. It's good to cancel queries when we know we exited the context in which they are useful, but if we're in rails console or just a random rails runner script, there's no reason to not allow it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test for this case? Or can you elaborate a bit more what you mean by it "can sometimes trigger a query on its own"? I don't follow what scenarios this might happen in.

Comment thread activerecord/lib/active_record/asynchronous_queries_tracker.rb Outdated
Comment thread activerecord/lib/active_record/relation.rb Outdated
@casperisfine casperisfine force-pushed the ar-relation-async-query branch 3 times, most recently from 022015d to 56ab574 Compare February 11, 2021 08:13
@casperisfine casperisfine changed the title Implement Relation#defer to schedule the query on the background thread pool Implement Relation#load_async to schedule the query on the background thread pool Feb 11, 2021
@casperisfine
Copy link
Copy Markdown
Contributor Author

I've put more thoughts in the @async_executor.

In #40037 I made it held by the ConnectionPool, because I wanted to guarantee some level of concurrency for each database, as in I wanted to prevent one database to use all the threads, making others database wait.

But now I don't think it's really worth the tradeoff. First the vast majority of Rails applications out there have a single database, so for them, a per pool executor or a global one don't matter.

Then for apps using horizontal sharding (like Shopify), you can have a large number of pools to similar shards, but you only ever query a single one during a request cycle. If you have say 30 shards, you'd end up with 30 executor with X threads each, most of them sitting idle the vast majority of the time, and I think we should rather keep the number of threads low, as I don't quite trust the scheduler to properly handle hundreds of idle threads.

@casperisfine casperisfine force-pushed the ar-relation-async-query branch from 9f23296 to 2042a29 Compare February 11, 2021 10:50
eileencodes added a commit to eileencodes/rails that referenced this pull request Feb 11, 2021
While testing out rails#41372 in a demo application I noticed that the
`ASYNC` portion wasn't getting added to the log. I tracked it down to
this method not passing `async: true` to `execute_query` so the `sql`
subscriber was always getting `async: false` for the payload.

Before:

```
(0.3ms)  SELECT * FROM dogs where name = 'fido 1'
(0.1ms)  SELECT * FROM dogs where name = 'fido 2'
(0.1ms)  SELECT * FROM dogs where name = 'fido 3'
```

After:

```
ASYNC  (0.3ms)  SELECT * FROM dogs where name = 'fido 1'
ASYNC  (0.2ms)  SELECT * FROM dogs where name = 'fido 2'
ASYNC  (0.2ms)  SELECT * FROM dogs where name = 'fido 3'
```

I also verified that this fixes `Dog Load` to `ASYNC Dog Load` in the
new PR. It will be easier to add a test for this functionality when
we merge rails#41372.
@casperisfine casperisfine force-pushed the ar-relation-async-query branch 4 times, most recently from 753aefe to d0fc4a8 Compare February 12, 2021 15:13
Comment thread activerecord/CHANGELOG.md Outdated
Comment thread activerecord/lib/active_record/asynchronous_queries_tracker.rb Outdated
… thread pool

This is built on previous async support added to Adapter#select_all.
@casperisfine casperisfine force-pushed the ar-relation-async-query branch from d0fc4a8 to 2a90104 Compare February 16, 2021 16:44
@casperisfine
Copy link
Copy Markdown
Contributor Author

@eileencodes code squashed, rebased, and about to be green on CI!

@eileencodes eileencodes merged commit 48effc7 into rails:main Feb 16, 2021
@casperisfine
Copy link
Copy Markdown
Contributor Author

casperisfine commented Feb 16, 2021 via email

records.empty?
else
!exists?
end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious - why we dont want to use guard clause in that case?

#
# Post.where(published: true).load_async # => #<ActiveRecord::Relation>
def load_async
unless loaded?
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of guard clause will simplify most of the conditional statements in the codebase.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion but we prefer to not use guard clauses.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a place where the core team explains why they don't want to use guard clauses? I did a quick search, but wasn't able to find anything and would love to hear the reasons.

casperisfine pushed a commit to Shopify/rails that referenced this pull request Mar 30, 2022
Followup: rails#41372

Something we knew we'd need when we implemented `Relation#load_async`
but that we chose to delay to have time to think about it.

Right now, Active Record async support is limited to "collection results",
but among the not so fast queries that would benefit from asynchronicity
you often find aggregates (e.g. `count`, `sum`, etc) as well as hand crafted
`find_by_sql` queries.

`load_async` was easy to add as an API, because `Relation` acts as a collection
so it was trivial to simply block whenever it was iterated while retaining total
API compatibility.

For aggregates and `find_by_sql`, we have no other choice but to return something
different in async mode, with its own API.

This proof of concept showcase what this API looks like for `Relation#count`:

```ruby
Post.where(published: true).count # => 2
promise = Post.where(published: true).async.count # => #<ActiveRecord::Promise status=pending>
promise.value # => 2
```

This API should be applicable to all aggregate methods, as well as all methods
returning a single record, or anything other than a `Relation`.
casperisfine pushed a commit to Shopify/rails that referenced this pull request Apr 14, 2022
Followup: rails#41372

Something we knew we'd need when we implemented `Relation#load_async`
but that we chose to delay to have time to think about it.

Right now, Active Record async support is limited to "collection results",
but among the not so fast queries that would benefit from asynchronicity
you often find aggregates (e.g. `count`, `sum`, etc) as well as hand crafted
`find_by_sql` queries.

`load_async` was easy to add as an API, because `Relation` acts as a collection
so it was trivial to simply block whenever it was iterated while retaining total
API compatibility.

For aggregates and `find_by_sql`, we have no other choice but to return something
different in async mode, with its own API.

This proof of concept showcase what this API looks like for `Relation#count`:

```ruby
Post.where(published: true).count # => 2
promise = Post.where(published: true).count(async: true) # => #<ActiveRecord::Promise status=pending>
promise.value # => 2
```

This API should be applicable to all aggregate methods, as well as all methods
returning a single record, or anything other than a `Relation`.
gmcgibbon added a commit to gmcgibbon/rails that referenced this pull request Nov 2, 2023
@rails rails deleted a comment from abdullaachilov Jan 7, 2024
dak2 added a commit to dak2/gem_rbs_collection that referenced this pull request Jan 13, 2025
dak2 added a commit to dak2/gem_rbs_collection that referenced this pull request Jan 13, 2025
github-actions Bot pushed a commit to ruby/gem_rbs_collection that referenced this pull request Jan 16, 2025
yoones pushed a commit to yoones/rails that referenced this pull request Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants