Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async queries: use a single executor and default to an ImmediateExecutor #41406

Closed
wants to merge 1 commit into from

Conversation

casperisfine
Copy link
Contributor

Ref: #40037
Ref: #41372

As mentioned in #41372 (comment), I think having one executor per pool was a mistake. The idea was to isolate each connection pool so that they would never wait on each others, but it means that multi-database setups could easily end up with ridiculous amounts of idle threads.

So instead it's probably better to have a single executor for the whole process, that makes it much easier for app developers to configure.

While I was at it, I changed the default executor to Concurrent::ImmediateExecutor, which is basically a noop (it immediately execute jobs in the calling threads). It is probably a better default to make sure upgraded apps don't sudently have concurrency enabled. For newly generated apps, we can try to set a good default for load_defaults '7.0', but at this stage I don't think we can make such decision yet.

I also allow to set an arbitrary concurrent-ruby executor, as I'd like to experiment with a few different configurations.

cc @eileencodes @jhawthorn @rafaelfranca @tenderlove

@@ -98,6 +98,22 @@ class Railtie < Rails::Railtie # :nodoc:
end
end

initializer "active_record.asynchronous_queries_executor" do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This initializer may be totally unnecessary, or too soon, but that's an example of the kind of simplified configuration we could offer. For simple users setting a single max concurrency is probably plenty enough of control.

min_threads: 0,
max_threads: concurrency,
max_queue: concurrency * 4,
fallback_policy: :caller_runs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I misunderstood the direction you mentioned going in for this when we spoke earlier today.

By putting this in the initializer and having a global setting for max_threads we can't configure the thread pool per database whereas before it was using the pool size from the database config for max threads.

I was going to send a PR that allows the db config to set these values so each db can control it's own parallelism because that's how we plan to use it at GitHub. This implementation would make it impossible to control the thread pool execution per database.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. The ThreadPool ownership is really something I still haven't managed to find a clear cut solution for this.

In GItHub case IIRC you are sharing vertically (ModelA is on one shard, ModelB on another), which means for you it's interesting to have a concurrency limit per database.

In Shopify case we're sharding horizontally (ModelA and ModelB are on X instances of the same schema), which means for us we'd end up with dozens if not hundreds of threads sitting idle most of the time.

Maybe we need an actual API to select the thread pool? Or maybe we need a more complex executor that would allow to have a single ThreadPool but with multiple task queues with each their own concurrency limit?

I'll have to sleep on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An important use case (that we probably both share?) would be a different concurrency between reader (replica) and writer databases.

maybe we need a more complex executor that would allow to have a single ThreadPool but with multiple task queues with each their own concurrency limit

This feels ideal but probably tricky to get right without starvation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, right. Maybe we should stick with the current solution, but just make it easy to replace the strategy?

eileencodes added a commit to eileencodes/rails that referenced this pull request Feb 19, 2021
This is a followup/alternative to rails#41406. This change wouldn't work for
GitHub because we intend to implement an executor for each database and
use the database configuration to set the `min_threads` and
`max_threads` for each one.

The changes here borrow from rails#41406 by implementing an
`Concurrent::ImmediateExecutor` by default. Otherwise applications have
the option of having one global thread pool that is used by all connections
or a thread pool for each connection. A global thread pool can set with
`config.active_record.async_query_executor = :global_thread_pool`. This
will create a single `Concurrent::ThreadPoolExecutor` for applications
to utilize. By default the concurrency is 4, but it can be changed for the
`global_thread_pool` by setting `global_executor_concurrency` to another
number. If applications want to use a thread pool per database
connection they can set `config.active_record.async_query_executor =
:multi_thread_pool`. This will create a `Concurrent::ThreadPoolExecutor`
for each database connection and set the `min_threads` and `max_threads`
by their configuration values or the defaults.

I've also moved the async tests out of the adapter test and into their
own tests and added tests for all the new functionality. This change
would allow us at GitHub to control threads per database and per
writer/reader or other apps to use one global executor. The immediate
executor allows apps to no-op by default.

Took the immediate executor idea from Jean's PR.
Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
@casperisfine casperisfine deleted the async-executor-ownership branch March 2, 2021 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants