New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async queries: use a single executor and default to an ImmediateExecutor #41406
Conversation
@@ -98,6 +98,22 @@ class Railtie < Rails::Railtie # :nodoc: | |||
end | |||
end | |||
|
|||
initializer "active_record.asynchronous_queries_executor" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This initializer may be totally unnecessary, or too soon, but that's an example of the kind of simplified configuration we could offer. For simple users setting a single max concurrency is probably plenty enough of control.
337af9b
to
a9103e6
Compare
min_threads: 0, | ||
max_threads: concurrency, | ||
max_queue: concurrency * 4, | ||
fallback_policy: :caller_runs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I misunderstood the direction you mentioned going in for this when we spoke earlier today.
By putting this in the initializer and having a global setting for max_threads
we can't configure the thread pool per database whereas before it was using the pool size from the database config for max threads.
I was going to send a PR that allows the db config to set these values so each db can control it's own parallelism because that's how we plan to use it at GitHub. This implementation would make it impossible to control the thread pool execution per database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. The ThreadPool ownership is really something I still haven't managed to find a clear cut solution for this.
In GItHub case IIRC you are sharing vertically (ModelA is on one shard, ModelB on another), which means for you it's interesting to have a concurrency limit per database.
In Shopify case we're sharding horizontally (ModelA and ModelB are on X instances of the same schema), which means for us we'd end up with dozens if not hundreds of threads sitting idle most of the time.
Maybe we need an actual API to select the thread pool? Or maybe we need a more complex executor that would allow to have a single ThreadPool but with multiple task queues with each their own concurrency limit?
I'll have to sleep on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An important use case (that we probably both share?) would be a different concurrency between reader (replica) and writer databases.
maybe we need a more complex executor that would allow to have a single ThreadPool but with multiple task queues with each their own concurrency limit
This feels ideal but probably tricky to get right without starvation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, right. Maybe we should stick with the current solution, but just make it easy to replace the strategy?
This is a followup/alternative to rails#41406. This change wouldn't work for GitHub because we intend to implement an executor for each database and use the database configuration to set the `min_threads` and `max_threads` for each one. The changes here borrow from rails#41406 by implementing an `Concurrent::ImmediateExecutor` by default. Otherwise applications have the option of having one global thread pool that is used by all connections or a thread pool for each connection. A global thread pool can set with `config.active_record.async_query_executor = :global_thread_pool`. This will create a single `Concurrent::ThreadPoolExecutor` for applications to utilize. By default the concurrency is 4, but it can be changed for the `global_thread_pool` by setting `global_executor_concurrency` to another number. If applications want to use a thread pool per database connection they can set `config.active_record.async_query_executor = :multi_thread_pool`. This will create a `Concurrent::ThreadPoolExecutor` for each database connection and set the `min_threads` and `max_threads` by their configuration values or the defaults. I've also moved the async tests out of the adapter test and into their own tests and added tests for all the new functionality. This change would allow us at GitHub to control threads per database and per writer/reader or other apps to use one global executor. The immediate executor allows apps to no-op by default. Took the immediate executor idea from Jean's PR. Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
Ref: #40037
Ref: #41372
As mentioned in #41372 (comment), I think having one executor per pool was a mistake. The idea was to isolate each connection pool so that they would never wait on each others, but it means that multi-database setups could easily end up with ridiculous amounts of idle threads.
So instead it's probably better to have a single executor for the whole process, that makes it much easier for app developers to configure.
While I was at it, I changed the default executor to
Concurrent::ImmediateExecutor
, which is basically a noop (it immediately execute jobs in the calling threads). It is probably a better default to make sure upgraded apps don't sudently have concurrency enabled. For newly generated apps, we can try to set a good default forload_defaults '7.0'
, but at this stage I don't think we can make such decision yet.I also allow to set an arbitrary concurrent-ruby executor, as I'd like to experiment with a few different configurations.
cc @eileencodes @jhawthorn @rafaelfranca @tenderlove