Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Active Record API for general async queries #44446

Merged
merged 2 commits into from
Apr 26, 2022
Merged

Conversation

casperisfine
Copy link
Contributor

@casperisfine casperisfine commented Feb 16, 2022

Followup: #41372
Fix: #44169

Something we knew we'd need when we implemented Relation#load_async but that we chose to delay to have time to think about it.

Right now, Active Record async support is limited to "collection results", but among the not so fast queries that would benefit from asynchronicity you often find aggregates (e.g. count, sum, etc) as well as hand crafted find_by_sql queries.

load_async was easy to add as an API, because Relation acts as a collection so it was trivial to simply block whenever it was iterated while retaining total API compatibility.

For aggregates and find_by_sql, we have no other choice but to return something different in async mode, with its own API.

This proof of concept showcase what this API looks like for Relation#count:

Post.where(published: true).count # => 2
promise = Post.where(published: true).async_count # => #<ActiveRecord::Promise status=pending>
promise.value # => 2

But this API should be applicable to all aggregate methods, as well as all methods returning a single record, or anything other than a Relation.

end

def async! # :nodoc:
@async = true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went for a boolean attribute, but an AsyncRelation subclass could make sense if it makes specializing the various method easier. To be seen once and if we move forward.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that a subclass would be clearer here so if we do move this forward +1 on that change.

@casperisfine casperisfine force-pushed the async-aggregates branch 3 times, most recently from 74b3835 to 552963b Compare February 16, 2022 14:34
@eileencodes eileencodes self-requested a review March 9, 2022 21:24
Copy link
Member

@eileencodes eileencodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed this and tried it out in an application console to get a feel for the API. I think the public API is fine, I was trying to think of a different way that didn't require calling async.count and then value every time, but I haven't come up with anything that wouldn't require duplicating existing APIsto enable anasync_count`. It feels awkward to need to call so many methods to essentially do the same thing faster, but I also don't have a better solution at the moment.

@@ -1,6 +1,76 @@
# frozen_string_literal: true

module ActiveRecord
class Promise < BasicObject
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The public methods in here should be documented - I know this is WIP but I didn't want to forget to say that later when it's no longer draft.

end
end

class AsyncFallbackResult # :nodoc:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AsyncFallbackResult makes it sound like the result isn't Async and that it's falling back to a regular query. That doesn't seem to be what's actually happening so maybe a different name to clarify. AsyncPromiseResult maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AsyncFallbackResult makes it sound like the result isn't Async and that it's falling back to a regular query.

Yes that's the case. It's used when async_enabled? is false

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So while working further on this, I renamed it in ActiveRecord::FutureResult::Complete, which I think makes it bit more clear.

end

def async! # :nodoc:
@async = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that a subclass would be clearer here so if we do move this forward +1 on that change.

@matthewd
Copy link
Member

I was trying to think of a different way that didn't require calling async.count and then value every time

If relation.count knew about and checked for a potentially-in-flight async count request, at least for simple cases, that might make it feel more like "ordinary" relation API? 🤔

So more complex use cases could still build on top of promises, but for the trivial "I know I'm going to need the count later", it'd act more like a preload.

@casperisfine
Copy link
Contributor Author

If relation.count knew about and checked for a potentially-in-flight async count request, at least for simple cases, that might make it feel more like "ordinary" relation API? 🤔

Interesting idea. It would be a bit like asynchronously preloading the query cache?

It seems tricky to implement, though.

Right now I'm ok with the current relation.async.count(...), and IMO it doesn't complexify the Relation code too much. But I'm not sure wether I should aim to support all methods or not.

@casperisfine
Copy link
Contributor Author

To clarify, my main issue with the current Implementation is that since Relation#async still return a Relation, you can call all existing methods, but that doesn't mean they support async.

So that may be another argument in favor of a subclass, which would allow us to implement and document selectively the methods that would make sense to have an async version.

@casperisfine
Copy link
Contributor Author

casperisfine commented Mar 31, 2022

So ActiveRecord::Delegation makes it hard to have Relation subclasses, because it's already subclassed most of the time.

It also doesn't help only offering subset of methods, as we'd have to undef_method anything we don't want.

So the natural solution would be a decorator, but then it loses access to the private methods, so there would be quite a lot of duplication.

Another possible solution would be to either have extra methods (e.g. async_count, async_sum, etc), or to add parameters, e.g. count(..., async: true), find_by_sql(..., async: true).

I think that from a code maintenance perspective, the later (async: true parameter) would be best.

@casperisfine
Copy link
Contributor Author

I tried the named parameter approach in Shopify@37b0f60

@casperisfine casperisfine force-pushed the async-aggregates branch 2 times, most recently from 68322c8 to 1563bc7 Compare April 1, 2022 08:01
@casperisfine
Copy link
Contributor Author

After sleeping on it, I believe that the async: true named parameter is the better alternative so far:

  • It avoid duplicating the methods, so much easier to test and keep in sync.
  • It's also easier to document.
  • It explicitly fail if you are trying to use a method that doesn't have async support.

Assuming there's consensus on this, the question remain of which methods should have async support. For now I implemented:

  • ll aggregates (count, sum, etc)
  • find_by_sql and count_by_sql
  • pluck and pick

What may or may not make sense to support is first / last / take etc. (We can also delay that decision).

@eileencodes
Copy link
Member

I like the named approach too. It feels like a more natural API to me. 👍🏼

@casperisfine casperisfine force-pushed the async-aggregates branch 2 times, most recently from c1b1060 to 5ee4b80 Compare April 1, 2022 12:39
@matthewd
Copy link
Member

matthewd commented Apr 1, 2022

Of the various non-ideal options, I think I personally lean toward async_foo methods. Partly because "boolean params are bad", partly because having an argument change a method's return value can be tricky to document (and complicate downstream wrapper methods), and partly because async_* feels like it can more easily be extended to apply to associations and user-defined scopes.

@casperisfine
Copy link
Contributor Author

I think I personally lean toward async_foo methods.

It's definitely better on the API documentation side.

The downside is that it will be a bit harder to keep code in sync (except if async_foo simply calls foo(async: true) or async.foo or some shenanigan like this).

So I'm not opposed to it if there's consensus that it's the right tradeoff.

@matthewd
Copy link
Member

matthewd commented Apr 1, 2022

I think many of these public methods (especially the calculate ones) are already pretty close to one-liners into more unified private implementation methods, where a kwarg would be fine without the documentation difficulties.

(It occurs to me that documentation would be particularly tricky if some other options were only/not valid when called in async mode... I don't have a specific example right now, but it does seem fairly plausible that'd be something that comes up?)

@casperisfine
Copy link
Contributor Author

Ok, I refactored the PR again, the newly introduced methods are:

  • async_count
  • async_sum
  • async_minimum
  • async_maximum
  • async_average
  • async_pluck
  • async_pick
  • async_find_by_sql
  • async_count_by_sql

There may be other methods that make sense, but I think this PR is chunky enough, so if everyone is satisfied with this API and implementation, we can focus on documentation and eventually add some more in a followup PR.

@casperisfine casperisfine marked this pull request as ready for review April 14, 2022 07:53
Followup: rails#41372

Something we knew we'd need when we implemented `Relation#load_async`
but that we chose to delay to have time to think about it.

Right now, Active Record async support is limited to "collection results",
but among the not so fast queries that would benefit from asynchronicity
you often find aggregates (e.g. `count`, `sum`, etc) as well as hand crafted
`find_by_sql` queries.

`load_async` was easy to add as an API, because `Relation` acts as a collection
so it was trivial to simply block whenever it was iterated while retaining total
API compatibility.

For aggregates and `find_by_sql`, we have no other choice but to return something
different in async mode, with its own API.

This proof of concept showcase what this API looks like for `Relation#count`:

```ruby
Post.where(published: true).count # => 2
promise = Post.where(published: true).count(async: true) # => #<ActiveRecord::Promise status=pending>
promise.value # => 2
```

This API should be applicable to all aggregate methods, as well as all methods
returning a single record, or anything other than a `Relation`.
@nettofarah
Copy link
Contributor

I'm so excited for this PR to land. We have dozens of use cases where I think these aggregations will make a massive difference in the overall performance of our codebase.

@OskarEichler
Copy link

Thanks for the great work on this! @casperisfine

Super excited to apply this across the board!

@byroot byroot merged commit 7883100 into rails:main Apr 26, 2022
@byroot
Copy link
Member

byroot commented Apr 26, 2022

apply this across the board!

Careful though, this shouldn't be applied blindly to all queries. https://pawelurbanek.com/rails-load-async is a good write on the tradeoffs involved.

@pynixwang
Copy link

wrap model with Async, and all api stay with sync.

example:
Async(User).count
User::Async.count

@casperisfine
Copy link
Contributor Author

@pynixwang yes that's two very different things.

@pynixwang
Copy link

@casperisfine I mean keeping async api same as sync api.

@OskarEichler
Copy link

@casperisfine @byroot Is there any plans as to which new version of Rails this PR will be included in?

@byroot
Copy link
Member

byroot commented May 16, 2022

@OskarEichler, that will be Rails 7.1.

@nettofarah
Copy link
Contributor

Hey, folks, I'm curious if there are any plans for an async_group method too?
group_by and count are usually the slowest ones to run in my experience, so I believe async_group could also be extremely valuable.

@casperisfine
Copy link
Contributor Author

@nettofarah not sure what you mean, group is not a final query method like count, or even to_a.

async_* only makes sense for methods that trigger the query. What kind of query do you have in mind? Because User.group(:role).async_count does work.

@nettofarah
Copy link
Contributor

oh, you're right. I got confused.
Thank you for explaining :)

koic added a commit to koic/rubocop-rails that referenced this pull request Oct 14, 2023
…iveRecordAllMethod`

Follow up rails/rails#44446, rails/rails#37944,
rails/rails#46503, and rails/rails#47010.

This PR supports some Rails 7.1's new querying methods for `Rails/RedundantActiveRecordAllMethod`.
koic added a commit to koic/rubocop-rails that referenced this pull request Oct 14, 2023
…iveRecordAllMethod`

Follow up rails/rails#44446, rails/rails#37944,
rails/rails#46503, and rails/rails#47010.

This PR supports some Rails 7.1's new querying methods for `Rails/RedundantActiveRecordAllMethod`.
@rmosolgo
Copy link
Contributor

rmosolgo commented Jan 9, 2024

👋 I'm curious, why does ActiveRecord::Promise extend BasicObject instead of Object? For GraphQL-Ruby, I was hoping to support ActiveRecord::Promise just like a GraphQL-Batch Promise or a Concurrent::Future, but ActiveRecord::Promise doesn't implement #public_send or #hash (rmosolgo/graphql-ruby#4766), so it doesn't "fit the mold" well.

If extending BasicObject is by design, I'll find another way to support it. Or, if it's coincidental, I'd be happy to work up a PR extending Object instead, just let me know, thanks!

@byroot
Copy link
Member

byroot commented Jan 9, 2024

If extending BasicObject is by design

It is indeed by design to avoid the classic mistake of using the promise instead of its result which I've seen happen quite a lot with redis-rb.

@rmosolgo
Copy link
Contributor

rmosolgo commented Jan 9, 2024

Got it, thanks!

@bearded
Copy link

bearded commented May 20, 2024

@casperisfine, is there a way to disable this functionality globally by default?
it seem to broke behaviour of :select_value() method, which now accept :async parameter

in our app we started to get the following error:

Mysql2::Error: Commands out of sync; you can't run this command now

as a workaround we were able to use :select_values().first, but would be nice to be able to disable it globally via Rails configuration
Screenshot 2024-05-20 at 11 38 26

ref1: https://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/DatabaseStatements.html#method-i-select_value
ref2: https://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/DatabaseStatements.html#method-i-select_values

@byroot
Copy link
Member

byroot commented May 20, 2024

I doubt the error you are showing is related.

@bearded
Copy link

bearded commented May 20, 2024

@byroot, at this point this change is the only one we could map to the problem, maybe there were other changes related to async DB queries?

@byroot
Copy link
Member

byroot commented May 20, 2024

Are you trying to use multiple statement? Or perhaps have an explicit ; at the end of your query? https://stackoverflow.com/questions/614671/commands-out-of-sync-you-cant-run-this-command-now

@bearded
Copy link

bearded commented May 20, 2024

We run one SQL statement that trigger MySQL Function (related to column level encryption)

@byroot
Copy link
Member

byroot commented May 20, 2024

All I can tell you is that this MySQL error message has nothing to do with async queries.

@bearded
Copy link

bearded commented May 20, 2024

well, it is, before this change everything was working fine, so that is the reason I'm seeking for configuration option that would disable new behaviour of Rails..

@byroot
Copy link
Member

byroot commented May 20, 2024

If you can come up with a reproduction script, I will look at it, but at this stage I really don't believe this change is your issue.

Also there is already a setting to disable async queries globally, and they are disabled by default: https://guides.rubyonrails.org/configuring.html#config-active-record-async-query-executor

Hence why I really don't believe you are blaming the right change.

@bearded
Copy link

bearded commented May 20, 2024

@byroot, thanks a lot! (not that I wished to blame your change, was trying to find the cause, and was too fast to write about our problem here)

Setting :async_query_executor to :global_thread_pool resolves our problem (not sure why it is nil by default)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Rails 7] load_async for find_by_sql & Base.connection.execute
9 participants