Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add thread pool and concurrency_max_threads configuration option #470

Merged
merged 5 commits into from
Mar 20, 2024

Conversation

MattFenelon
Copy link
Contributor

@MattFenelon MattFenelon commented Mar 19, 2024

This option allows to limit the maximum number of resources that can be
sideloaded concurrently. With a properly configured connection pool,
this ensures that the activerecord's connection pool is not exhausted by
the sideloading process.

The thread pool configuration is based on ActiveRecord's
global_thread_pool_async_query_executor.

Closes #469

See: https://github.com/rails/rails/blob/94faed4dd4c915829afc2698e055a0265d9b5773/activerecord/lib/active_record.rb#L279-L287

This option allows to limit the maximum number of resources that can be
sideloaded concurrently. With a properly configured connection pool,
this ensures that the activerecord's connection pool is not exhausted by
the sideloading process.

Closes #469
@MattFenelon MattFenelon marked this pull request as ready for review March 19, 2024 11:57
@MattFenelon
Copy link
Contributor Author

Is it possible the test failure is flaky? It doesn't look related to the changes.

@jkeen
Copy link
Collaborator

jkeen commented Mar 19, 2024

@MattFenelon yep, I re-ran it and the tests pass now.

I've run into some thread pool issues on the app I use graphiti with, but never traced it back to the sideloads… so this is an interesting find!

It looks like the new code sets the default at 4? I'm not seeing where that default was set before your change, and just want to ensure that this doesn't do anything unexpected for people who are running just fine. In your configuration you set the value at 1?

@MattFenelon
Copy link
Contributor Author

@MattFenelon yep, I re-ran it and the tests pass now.

Nice. Thank you!

It looks like the new code sets the default at 4? I'm not seeing where that default was set before your change, and just want to ensure that this doesn't do anything unexpected for people who are running just fine. In your configuration you set the value at 1?

That's a little tricky to answer :) The short answer is that the code was using Concurrent::Promise and that uses concurrent-ruby's global io thread pool, which has a max size equal to Java's max int: https://github.com/ruby-concurrency/concurrent-ruby/blob/8b9b0da4a37585ce5eb71516aca55e93bde39115/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb#L15.

The problem with the default is that people will likely see these threading issues pop up so I think the default needs to be set lower. However, lowering the default will mean that people may see less concurrency happening than they were before. I went with 4 as that's what Rails' uses as its default pool size for async queries. But it's fairly arbitrary.

I have tested it configured to 1 and that works happily because the fallback option :caller_runs just runs on the calling thread. That also means no one should see any errors from the change because the calling thread will just be used to run anything that goes beyond the queue/pool size. That's why I didn't want to set the default to 1 because the concurrency will be limited to 1 thread / 1 sideload.

Perhaps it would be worth updating the docs to tell people they need to set their db connection pool appropriately otherwise even with this change they may see threading issues.

wdyt?

@jkeen
Copy link
Collaborator

jkeen commented Mar 19, 2024

@MattFenelon Thanks for the details.

The problem with the default is that people will likely see these threading issues pop up so I think the default needs to be set lower. However, lowering the default will mean that people may see less concurrency happening than they were before. I went with 4 as that's what Rails' uses as its default pool size for async queries. But it's fairly arbitrary.

But your theory is that if they're seeing those errors with the new explicit default set at 4, they'd have been seeing them with the implicit default being at 4 currently?

I only recently became a maintainer on this repo and don't yet have access to editing docs. @richmolj's time is totally consumed by a tsunami of life so I'm not sure when he'll get me plugged into that side of things. https://www.graphiti.dev/guides/concepts/resources#concurrency

If I'm understanding the issue correctly though, will having concurrency = true (default in rails) and the new concurrency_pool_max_size option set to 1, is that effectively the same as having concurrency = false ?

@MattFenelon
Copy link
Contributor Author

If I'm understanding the issue correctly though, will having concurrency = true (default in rails) and the new concurrency_pool_max_size option set to 1, is that effectively the same as having concurrency = false ?

Sorry, I could have been clearer. 1 will set the pool size to 1 so there'll be 1 thread in the pool to be used for side loading. The max queue size would be 1 * 4 so any side loads above that will just be loaded using the calling thread.

If you want to provide a maximum queue size, you may also consider the fallback_policy which defines what will happen if work is posted to a pool when the queue of waiting work has reached the maximum size and no new threads can be created. Available policies:
...
caller_runs: The work will be executed in the thread of the caller, instead of being given to another thread in the pool.
From https://ruby-concurrency.github.io/concurrent-ruby/master/file.thread_pools.html

But your theory is that if they're seeing those errors with the new explicit default set at 4, they'd have been seeing them with the implicit default being at 4 currently?

That's it exactly. There's basically no safe default without configuring the database pool correctly. This just lowers the possibility as there'll only ever be 4 threads side loading at once rather than DEFAULT_MAX_POOL_SIZE. But I think setting concurrency to false by default to be safer is probably a more surprising change at this point. What do you think?

We could set the default to 1 but even so there's still a chance the pool will become exhausted. For example, with puma set to 5 threads and the pool set to 5, if the system is processing 5 requests at the same time the connection pool is already exhausted and sideloading any resources at that point will raise the error. But 1 does lower the risk at the expense of less performance. I'm not sure what the right balance is.

@jkeen
Copy link
Collaborator

jkeen commented Mar 19, 2024

I think your logic is sound? I'm not sure what the best default is that would hit that mark of best performance without issues. I feel tiny bit out of my depth when it comes to thread-related issues, but one thing that jumps out at me is that if this new setting is just for concurrency relating to sideloads, I think naming the configuration option to more clearly reflect that would be helpful. Graphiti.config.sideload_max_threads, perhaps?

@MattFenelon
Copy link
Contributor Author

I think naming the configuration option to more clearly reflect that would be helpful

Sounds good. I went with concurrency_max_threads just because the concurrency setting already says it's for sideloads and I think it's nice that the settings sit next together alphabetically.

# @return [Boolean] Concurrently fetch sideloads?
# Defaults to false OR if classes are cached (Rails-only)
attr_accessor :concurrency

Thanks for the reviews on this!

@MattFenelon
Copy link
Contributor Author

I just realised that the executor initialisation wasn't thread-safe so I've fixed that.

@MattFenelon MattFenelon changed the title Add concurrency_pool_max_size configuration option Add thread pool and concurrency_max_threads configuration option Mar 20, 2024
@jkeen
Copy link
Collaborator

jkeen commented Mar 20, 2024

@MattFenelon Oh didn't realize that re: the concurrency flag. Your naming makes perfect sense then—nice work, and good catch on the thread-safety issue.

I fixed a silly lint error only the CI robots care about, but the rest looks good to me and I'll cut a minor release with this change. Thanks for the contribution!

@jkeen jkeen merged commit 697d761 into graphiti-api:master Mar 20, 2024
36 checks passed
@MattFenelon
Copy link
Contributor Author

I fixed a silly lint error only the CI robots care about, but the rest looks good to me and I'll cut a minor release with this change. Thanks for the contribution!

Ha thanks. My editor really wants to auto format a lot of that code 😅

Looking forward to the release!

github-actions bot pushed a commit that referenced this pull request Mar 20, 2024
# [1.6.0](v1.5.3...v1.6.0) (2024-03-20)

### Features

* add thread pool and concurrency_max_threads configuration option ([#470](#470)) ([697d761](697d761))
Copy link

🎉 This PR is included in version 1.6.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@MattFenelon MattFenelon deleted the add_concurrency_pool_max_size branch March 20, 2024 17:50
@jkeen
Copy link
Collaborator

jkeen commented Mar 22, 2024

@MattFenelon I had to make a small change to the thread-safety logic (1.6.1) after running into some major lockups in production (as in, the web server would stop responding) with the change as it was. When trying it out locally, the change I made to how the mutex was set up made all the difference, but redeploying this just now still causes a lock-up in production when concurrency is enabled? I don't know what/why that is happening? I'm at the end of a long day here, but if you see this in the morning see if anything jumps out at you?

@MattFenelon
Copy link
Contributor Author

MattFenelon commented Mar 22, 2024 via email

@MattFenelon
Copy link
Contributor Author

@jkeen I've raised a new PR to fix the issue. I can't say for certain that it was what caused the issue you saw but it seems likely.

@MattFenelon
Copy link
Contributor Author

Thanks again for the merge @jkeen. Apologies for the muck up there.

@jkeen
Copy link
Collaborator

jkeen commented Mar 22, 2024

@MattFenelon No problem at all, thanks for the quick turnaround!

jkeen added a commit to jkeen/graphiti that referenced this pull request Mar 26, 2024
…er handle on what's causing the thread-pools to hang
jkeen added a commit to jkeen/graphiti that referenced this pull request Mar 26, 2024
…er handle on what's causing the thread-pools to hang
jkeen added a commit to jkeen/graphiti that referenced this pull request Mar 26, 2024
…er handle on what's causing the thread-pools to hang
jkeen added a commit that referenced this pull request Mar 26, 2024
…n what's causing thread pool hangs

revert: "thread pool scope and mutex need to be global across all instances of Scope for it to be a global thread pool (#471)"
revert: "add thread pool and concurrency_max_threads configuration option (#470)"

This reverts commit 51fb51c.
This reverts commit 697d761.
jkeen added a commit that referenced this pull request Mar 26, 2024
…n what's causing thread pool hangs. refs #469

revert: "thread pool scope and mutex need to be global across all instances of Scope for it to be a global thread pool (#471)"
revert: "add thread pool and concurrency_max_threads configuration option (#470)"

This reverts commit 51fb51c.
This reverts commit 697d761.
github-actions bot pushed a commit that referenced this pull request Mar 26, 2024
## [1.6.3](v1.6.2...v1.6.3) (2024-03-26)

### Bug Fixes

* Remove thread pool executor logic until we get a better handle on what's causing thread pool hangs. refs [#469](#469) ([7941b6f](7941b6f)), closes [#471](#471) [#470](#470)
MattFenelon added a commit to MattFenelon/graphiti that referenced this pull request Mar 27, 2024
This option allows to limit the maximum number of resources that can be
sideloaded concurrently. With a properly configured connection pool,
this ensures that the activerecord's connection pool is not exhausted by
the sideloading process.

The thread pool configuration is based on ActiveRecord's
global_thread_pool_async_query_executor.

This was previously attempted but there were reports of deadlocks. This
code differs from the original by using Concurrency::Delay assigned to a
constant instead of a regular Mutex. The Delay+constant is how
concurrent-ruby sets up their global thread pools so it's more likely to
be correct.

Closes graphiti-api#469

See: https://github.com/rails/rails/blob/94faed4dd4c915829afc2698e055a0265d9b5773/activerecord/lib/active_record.rb#L279-L287
See: graphiti-api#470
jkeen pushed a commit to jkeen/graphiti that referenced this pull request Mar 27, 2024
graphiti-api#470)

This option allows to limit the maximum number of resources that can be sideloaded concurrently. With a properly configured connection pool, this ensures that the activerecord's connection pool is not exhausted by the sideloading process.
jkeen pushed a commit to jkeen/graphiti that referenced this pull request Mar 27, 2024
# [1.6.0](graphiti-api/graphiti@v1.5.3...v1.6.0) (2024-03-20)

### Features

* add thread pool and concurrency_max_threads configuration option ([graphiti-api#470](graphiti-api#470)) ([697d761](graphiti-api@697d761))
jkeen added a commit to jkeen/graphiti that referenced this pull request Mar 27, 2024
…n what's causing thread pool hangs. refs graphiti-api#469

revert: "thread pool scope and mutex need to be global across all instances of Scope for it to be a global thread pool (graphiti-api#471)"
revert: "add thread pool and concurrency_max_threads configuration option (graphiti-api#470)"

This reverts commit 51fb51c.
This reverts commit 697d761.
jkeen pushed a commit to jkeen/graphiti that referenced this pull request Mar 27, 2024
## [1.6.3](graphiti-api/graphiti@v1.6.2...v1.6.3) (2024-03-26)

### Bug Fixes

* Remove thread pool executor logic until we get a better handle on what's causing thread pool hangs. refs [graphiti-api#469](graphiti-api#469) ([7941b6f](graphiti-api@7941b6f)), closes [graphiti-api#471](graphiti-api#471) [graphiti-api#470](graphiti-api#470)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ActiveRecord::ConnectionTimeoutError "all pooled connections were in use" after enabling concurrency
2 participants