Implement Relation#load_async to schedule the query on the background thread pool#41372
Conversation
44d6dab to
e733187
Compare
085482a to
a80388d
Compare
There was a problem hiding this comment.
I'm not very happy with having to store this in an instance variable for later use, but apply_join_dependency can sometimes trigger a query on its own, so if we were to call it twice, we'd generate an extra query in some cases.
There was a problem hiding this comment.
Is there a test for this case? Or can you elaborate a bit more what you mean by it "can sometimes trigger a query on its own"? I don't follow what scenarios this might happen in.
There was a problem hiding this comment.
There is exactly one test case: https://buildkite.com/rails/rails/builds/74758#40f90c22-7273-4458-a0db-71ec3fbf3638/1055-1064
The extra query is triggered here:
Which mean that in such case load_async would actually trigger a query in the main thread.
There might be a way to refactor this out, but I chose to prioritize a smaller more comprehensible PR.
There was a problem hiding this comment.
The reasoning here is that some shared code might attempt to use .defer, and sometime might be called from inside a transaction, and sometimes outside.
So have have to make an API decision of either
- Querying immediately
- Just do nothing
I believe the later make more sense.
There was a problem hiding this comment.
With the new name (which I agree with), I think it's probably better that load_async will still load even if it can't do it async.
In the async-possible case, the resulting relation here will behave like it was already loaded (by blocking if necessary to make that true)... I can't think of specific examples, but my gut says an un-loaded relation will behave less similarly in some edge cases.
That said, I can also see use cases where a knowing caller might want to say "start an async query for this if you can, because I think it's going to be used later... but if you'd need to block, then we'll just wait and see whether it's actually needed". (Association preload and "all Relation-shaped instance variables that get copied from a controller into a view context" being variously-reasonable examples there.) 🤔
There was a problem hiding this comment.
Yes, I can find arguments for both behaviors too.
I think the most common case this will happen, will be in tests using transactional fixtures. So I'm thinking we should chose the solution that will make this fallbacks behave closer to production, in term of assert_queries and similar test assertions.
Using this reasoning, I'd say you are right and we should fallback by either straight calling load, or maybe by calling exec_main_query(async: false), as to simulate the query being performed, but the records not being instantiated.
There was a problem hiding this comment.
So I pushed 2042a2914535551f165378e11da19fef98ee7fbb, which does call exec_main_query(async: false) if we're inside a transaction.
The pro is that the behavior is much closer to a query that would be properly scheduled asynchronously, as the records instantiation is still delayed.
The con is that it open the door for some weird behavior, as the main query is executed inside the transaction, but if you have preloads or something like that, and you iterate on the relation outside the transaction, they might be performed outside.
So maybe falling back to load might be better?
There was a problem hiding this comment.
If it's not possible to do an async query then we should go through the regular load path rather than a new future result path that isn't actually future. This makes the feature easier to debug and harder to make accidental mistakes.
I made this mistake in my demo app by calling to_a on the load_async (ie Dog.where(name: "fido #{i}").load_async.to_a). From a user experience perspective it was pretty confusing to debug why it said it was async but it was not actually async.
Additionally I made another mistake that revealed a bug with pluck and load_async. If you do Dog.where(name: "fido #{i}").load_async.pluck(:id) it results in duplicate queries. I now know this is not the correct way to use the feature but it's an easy mistake to make that results in incorrect behavior.
Dog Load (0.3ms) SELECT `dogs`.* FROM `dogs` WHERE `dogs`.`name` = 'fido 1'
Dog Pluck (0.2ms) SELECT `dogs`.`id` FROM `dogs` WHERE `dogs`.`name` = 'fido 1'
In the non-async code path (ie Dog.where(name: "fido #{i}")load.pluck(:login) ) Rails knows what to do with an already loaded relation and we don't get duplicate queries (see https://github.com/rails/rails/blob/main/activerecord/lib/active_record/relation/calculations.rb#L184-L186)
Dog Load (0.3ms) SELECT `dogs`.* FROM `dogs` WHERE `dogs`.`name` = 'fido 1'
If we go the previous path for relations that can't be loaded async, we'll have less edge cases to worry about. I think the mistake I made with pluck and to_a is an easy one to make and so in that case we should make sure the app behaves as it previously did rather than having potentially surprising non-async async behavior.
There was a problem hiding this comment.
The pluck is a nice find:
def pluck(*column_names)
if loaded? && all_attributes?(column_names)
return records.pluck(*column_names)
endSince load_async doesn't set @loaded = true but @scheduled = true, pluck doesn't know it could use the future.
I either need to go over all Relation methods to make sure they handle both loaded? and scheduled? states, or we could merge both states together.
There was a problem hiding this comment.
I made this mistake in my demo app by calling to_a on the load_async [...] it was pretty confusing to debug why it said it was async but it was not actually async.
I think this is my fault, that part of the design was explained in the initial proof of concept, but I didn't mention it in the first actual PR.
Then wehn you do use the relation, either the query is completed and it just uses the results, or it is being executed and it waits for it to complete, or the thread pool was busy and it simply execute it in the foreground.
That third case is important, because of the way Ruby thread scheduling works, unless you do other unrelated IOs between the time you did call load_async and the time you actually need the result, it's entirely possible that the query have been sitting idle in the work queue without even having been sent to the database.
We could of course wait for the thread pool to execute it, but that would be a waste because at this stage we know that the main thread will be blocked on that result, and that it's free to execute the query itself.
So yes, it's very easy to misuse load_async, and we should be on the lookout for making easier to understand what's happening, but I do believe that behavior is the current one.
Now if the database clients had true native async APIs, we could send the query and return a response promise/future, but that's unlikely to happen any time soon.
Let me know if that behavior is still unclear.
There was a problem hiding this comment.
re: pluck
Now load_async marks the Relation as loaded?, which fixes pluck and similar methods that behave differently based on wether the Relation was loaded.
There was a problem hiding this comment.
Something else we could do it to call Thread.pass right after we enqueue the query, this would increase the chance that the query get performed asynchronously.
There was a problem hiding this comment.
So interestingly enough, calling Thread.pass caused the test of that fallback behavior to fail, which proves that it's actually effective at producing the behavior you expected in this simple scenario.
a80388d to
901c4e6
Compare
There was a problem hiding this comment.
After some thought, I think I make sense to have defer work by default. It's good to cancel queries when we know we exited the context in which they are useful, but if we're in rails console or just a random rails runner script, there's no reason to not allow it.
There was a problem hiding this comment.
Is there a test for this case? Or can you elaborate a bit more what you mean by it "can sometimes trigger a query on its own"? I don't follow what scenarios this might happen in.
022015d to
56ab574
Compare
|
I've put more thoughts in the In #40037 I made it held by the But now I don't think it's really worth the tradeoff. First the vast majority of Rails applications out there have a single database, so for them, a per pool executor or a global one don't matter. Then for apps using horizontal sharding (like Shopify), you can have a large number of pools to similar shards, but you only ever query a single one during a request cycle. If you have say 30 shards, you'd end up with 30 executor with X threads each, most of them sitting idle the vast majority of the time, and I think we should rather keep the number of threads low, as I don't quite trust the scheduler to properly handle hundreds of idle threads. |
9f23296 to
2042a29
Compare
While testing out rails#41372 in a demo application I noticed that the `ASYNC` portion wasn't getting added to the log. I tracked it down to this method not passing `async: true` to `execute_query` so the `sql` subscriber was always getting `async: false` for the payload. Before: ``` (0.3ms) SELECT * FROM dogs where name = 'fido 1' (0.1ms) SELECT * FROM dogs where name = 'fido 2' (0.1ms) SELECT * FROM dogs where name = 'fido 3' ``` After: ``` ASYNC (0.3ms) SELECT * FROM dogs where name = 'fido 1' ASYNC (0.2ms) SELECT * FROM dogs where name = 'fido 2' ASYNC (0.2ms) SELECT * FROM dogs where name = 'fido 3' ``` I also verified that this fixes `Dog Load` to `ASYNC Dog Load` in the new PR. It will be easier to add a test for this functionality when we merge rails#41372.
753aefe to
d0fc4a8
Compare
… thread pool This is built on previous async support added to Adapter#select_all.
d0fc4a8 to
2a90104
Compare
|
@eileencodes code squashed, rebased, and about to be green on CI! |
|
Thanks!
|
| records.empty? | ||
| else | ||
| !exists? | ||
| end |
There was a problem hiding this comment.
just curious - why we dont want to use guard clause in that case?
| # | ||
| # Post.where(published: true).load_async # => #<ActiveRecord::Relation> | ||
| def load_async | ||
| unless loaded? |
There was a problem hiding this comment.
The use of guard clause will simplify most of the conditional statements in the codebase.
There was a problem hiding this comment.
Thank you for the suggestion but we prefer to not use guard clauses.
There was a problem hiding this comment.
Is there a place where the core team explains why they don't want to use guard clauses? I did a quick search, but wasn't able to find anything and would love to hear the reasons.
Followup: rails#41372 Something we knew we'd need when we implemented `Relation#load_async` but that we chose to delay to have time to think about it. Right now, Active Record async support is limited to "collection results", but among the not so fast queries that would benefit from asynchronicity you often find aggregates (e.g. `count`, `sum`, etc) as well as hand crafted `find_by_sql` queries. `load_async` was easy to add as an API, because `Relation` acts as a collection so it was trivial to simply block whenever it was iterated while retaining total API compatibility. For aggregates and `find_by_sql`, we have no other choice but to return something different in async mode, with its own API. This proof of concept showcase what this API looks like for `Relation#count`: ```ruby Post.where(published: true).count # => 2 promise = Post.where(published: true).async.count # => #<ActiveRecord::Promise status=pending> promise.value # => 2 ``` This API should be applicable to all aggregate methods, as well as all methods returning a single record, or anything other than a `Relation`.
Followup: rails#41372 Something we knew we'd need when we implemented `Relation#load_async` but that we chose to delay to have time to think about it. Right now, Active Record async support is limited to "collection results", but among the not so fast queries that would benefit from asynchronicity you often find aggregates (e.g. `count`, `sum`, etc) as well as hand crafted `find_by_sql` queries. `load_async` was easy to add as an API, because `Relation` acts as a collection so it was trivial to simply block whenever it was iterated while retaining total API compatibility. For aggregates and `find_by_sql`, we have no other choice but to return something different in async mode, with its own API. This proof of concept showcase what this API looks like for `Relation#count`: ```ruby Post.where(published: true).count # => 2 promise = Post.where(published: true).count(async: true) # => #<ActiveRecord::Promise status=pending> promise.value # => 2 ``` This API should be applicable to all aggregate methods, as well as all methods returning a single record, or anything other than a `Relation`.
Related to #load_async queries added in rails#40037 rails#41372.
This method has been added since 7.0 - rails/rails#41372 - https://github.com/rails/rails/blob/7-0-stable/activerecord/CHANGELOG.md
This method has been added since 7.0 - rails/rails#41372 - https://github.com/rails/rails/blob/7-0-stable/activerecord/CHANGELOG.md
This method has been added since 7.0 - rails/rails#41372 - https://github.com/rails/rails/blob/7-0-stable/activerecord/CHANGELOG.md
Related to #load_async queries added in rails#40037 rails#41372.
This is a followup to #40037 (see that PR for complete context)
Relation#load_asyncschedules the query on the background thread pool, and returnsself.