Async queries: use a single executor and default to an ImmediateExecutor #41406

casperisfine · 2021-02-11T14:52:22Z

As mentioned in #41372 (comment), I think having one executor per pool was a mistake. The idea was to isolate each connection pool so that they would never wait on each others, but it means that multi-database setups could easily end up with ridiculous amounts of idle threads.

So instead it's probably better to have a single executor for the whole process, that makes it much easier for app developers to configure.

While I was at it, I changed the default executor to Concurrent::ImmediateExecutor, which is basically a noop (it immediately execute jobs in the calling threads). It is probably a better default to make sure upgraded apps don't sudently have concurrency enabled. For newly generated apps, we can try to set a good default for load_defaults '7.0', but at this stage I don't think we can make such decision yet.

I also allow to set an arbitrary concurrent-ruby executor, as I'd like to experiment with a few different configurations.

cc @eileencodes @jhawthorn @rafaelfranca @tenderlove

casperisfine · 2021-02-11T14:53:26Z

activerecord/lib/active_record/railtie.rb

@@ -98,6 +98,22 @@ class Railtie < Rails::Railtie # :nodoc:
      end
    end

+    initializer "active_record.asynchronous_queries_executor" do


This initializer may be totally unnecessary, or too soon, but that's an example of the kind of simplified configuration we could offer. For simple users setting a single max concurrency is probably plenty enough of control.

eileencodes · 2021-02-11T17:44:05Z

activerecord/lib/active_record/railtie.rb

+             min_threads: 0,
+             max_threads: concurrency,
+             max_queue: concurrency * 4,
+             fallback_policy: :caller_runs


I think I misunderstood the direction you mentioned going in for this when we spoke earlier today.

By putting this in the initializer and having a global setting for max_threads we can't configure the thread pool per database whereas before it was using the pool size from the database config for max threads.

I was going to send a PR that allows the db config to set these values so each db can control it's own parallelism because that's how we plan to use it at GitHub. This implementation would make it impossible to control the thread pool execution per database.

I see. The ThreadPool ownership is really something I still haven't managed to find a clear cut solution for this.

In GItHub case IIRC you are sharing vertically (ModelA is on one shard, ModelB on another), which means for you it's interesting to have a concurrency limit per database.

In Shopify case we're sharding horizontally (ModelA and ModelB are on X instances of the same schema), which means for us we'd end up with dozens if not hundreds of threads sitting idle most of the time.

Maybe we need an actual API to select the thread pool? Or maybe we need a more complex executor that would allow to have a single ThreadPool but with multiple task queues with each their own concurrency limit?

I'll have to sleep on this.

An important use case (that we probably both share?) would be a different concurrency between reader (replica) and writer databases.

maybe we need a more complex executor that would allow to have a single ThreadPool but with multiple task queues with each their own concurrency limit

This feels ideal but probably tricky to get right without starvation.

Hum, right. Maybe we should stick with the current solution, but just make it easy to replace the strategy?

This is a followup/alternative to rails#41406. This change wouldn't work for GitHub because we intend to implement an executor for each database and use the database configuration to set the `min_threads` and `max_threads` for each one. The changes here borrow from rails#41406 by implementing an `Concurrent::ImmediateExecutor` by default. Otherwise applications have the option of having one global thread pool that is used by all connections or a thread pool for each connection. A global thread pool can set with `config.active_record.async_query_executor = :global_thread_pool`. This will create a single `Concurrent::ThreadPoolExecutor` for applications to utilize. By default the concurrency is 4, but it can be changed for the `global_thread_pool` by setting `global_executor_concurrency` to another number. If applications want to use a thread pool per database connection they can set `config.active_record.async_query_executor = :multi_thread_pool`. This will create a `Concurrent::ThreadPoolExecutor` for each database connection and set the `min_threads` and `max_threads` by their configuration values or the defaults. I've also moved the async tests out of the adapter test and into their own tests and added tests for all the new functionality. This change would allow us at GitHub to control threads per database and per writer/reader or other apps to use one global executor. The immediate executor allows apps to no-op by default. Took the immediate executor idea from Jean's PR. Co-authored-by: Jean Boussier <jean.boussier@gmail.com>

rails-bot bot added the activerecord label Feb 11, 2021

casperisfine commented Feb 11, 2021

View reviewed changes

Async queries: use a single executor and default to an ImmediateExecutor

a9103e6

casperisfine force-pushed the async-executor-ownership branch from 337af9b to a9103e6 Compare February 11, 2021 14:58

eileencodes reviewed Feb 11, 2021

View reviewed changes

eileencodes mentioned this pull request Feb 18, 2021

Allow async executor to be configurable #41495

Merged

casperisfine closed this Mar 2, 2021

casperisfine deleted the async-executor-ownership branch March 2, 2021 09:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Async queries: use a single executor and default to an ImmediateExecutor #41406

Async queries: use a single executor and default to an ImmediateExecutor #41406

casperisfine commented Feb 11, 2021

casperisfine Feb 11, 2021

eileencodes Feb 11, 2021

casperisfine Feb 11, 2021

jhawthorn Feb 11, 2021

casperisfine Feb 11, 2021

Async queries: use a single executor and default to an ImmediateExecutor #41406

Async queries: use a single executor and default to an ImmediateExecutor #41406

Conversation

casperisfine commented Feb 11, 2021

casperisfine Feb 11, 2021

Choose a reason for hiding this comment

eileencodes Feb 11, 2021

Choose a reason for hiding this comment

casperisfine Feb 11, 2021

Choose a reason for hiding this comment

jhawthorn Feb 11, 2021

Choose a reason for hiding this comment

casperisfine Feb 11, 2021

Choose a reason for hiding this comment