New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better integration between Sidekiq thread count and Active Record pool size. #2985
Comments
This strikes me as easier to solve with documentation than with code. # config/sidekiq.yml
---
:concurrency: <%= ENV['SIDEKIQ_THREADS'] %> or
then # config/database.yml
pool: <%= ENV['SIDEKIQ_THREADS'] || ENV.fetch("RAILS_MAX_THREADS") { 5 } %> All that would be required would be a change wiki docs, and maybe a nice announcement in that fancy Sidekiq Newsletter ;) edit: or just set |
Thanks for the video conference this morning, @schneems. I think having Sidekiq use this variable over the config settings is reasonable if it means the db pool will auto-size itself. The logic would be something like:
Procfile would just need to be (some pretty nice symmetry here):
WDYT? |
That Procfile does look very nice! In production, I'll bet a lot of people are running with db pools that are too small for their Sidekiq concurrency anyway. If modifying What about non-AR ORMs? I've never really used them, though, so I don't know how they configure their connection pools. |
I'm not sure about the procfile solution, it would require you re-deploy after every change, which is a bad experience. Maybe something like this in
Then you could change Ideally I would like this to work out of the box without a custom procfile. @jeremyevans any thoughts on @nateberkopec's comment? Would be cool if Sequel could Just work (TM) with Rails 5 + Puma like Active Record. Happy to take that conversation to another thread if you want. |
Sequel can usually work fine with a significantly smaller connection pool than ActiveRecord due to the fact that it doesn't leak connections. That said, if someone wants to use the number of Sidekiq threads as the maximum number of Sequel database connections, they would just need to set use the I'm against adding Rails-specific code to non-Rails-specific libraries as a matter of principle, just as I'm against adding Sequel-specific code to non-Sequel-specific libraries. Giving special treatment to Rails gives Rails an unjust advantage and makes it more difficult for alternatives to Rails to compete on merit. On the flipside, adding support for Rails, Sequel, ROM, Mongoid, etc. leads to a slippery slope. In generic terms, unless library A depends on library B (either directly or via an adapter for library B that ships with library A), I don't think library A should have library B-specific code. As a matter of good engineering practices, I think libraries should never set environment variables. I think reading library-specific environment variables is fine, but library A should not care about library B-specific environment variables. |
That tends to help, yes. XD |
Ideally I agree, pragmatically I disagree. There's lots of libraries that only serve to integrate with another library. For example devise. There are also the in-between libraries that bridge an ideologically pure library and something else, for example sprockets-rails adds the bridges needed to integrate the two libraries. In this case I actually wish that sprockets shipped with a railtie instead, yes it would add coupling but also decrease overall complexity in an end user's app. Right now sprockets-rails has to support Sprockets 2-4 which cause problems and lots of potentially unneeded flow control. My gold standard is the end user's experience. If giving them the best possible experience doesn't introduce a huge maintenance burdeon on a library, I would consider doing it. I have some libaries that don't do authorization but need it, for that end I integrate with devise if it's available, however I also provide custom configuration endpoints. One example is oPRO https://github.com/opro/opro#custom-auth. This gives anyone using the currently number one auth library for Rails a drop in solution, but doesn't prevent anyone else from using it. In the future if other libraries wanted to add support directly into oPRO, there is already a pattern to follow. If the maintenance burdeon got too large or there were too many people adding in support for their random library, then I would probably start pushing back and asking people to make bridge libraries like Keep in mind that I make a living working for Heroku maintaining software whose core value proposition is that it is highly knowledgable and coupled to external software. Doing this isn't always the easiest or cleanest proposition, but sometimes it's needed to provide the absolutely best experience. Over time as standards develop hopefully others see that value and start using standard interfaces for example https://blog.heroku.com/archives/2016/5/2/container_ready_rails_5. On the note of the unfair advantage, it wouldn't be an advantge if it wasn't beneficial for users. I feel it's fine to have each library maintainer make a call on a case by case basis. I agree that docs can do a lot for integrations, but there's a huge difference between "1 step and it works" to working out of the box. Ideally we would have standard interfaces across libraries, a good example of this is |
If the point of the library is to integrate with one or more other libraries, then as I mentioned earlier, I think it's fine if the library has code specific to the other libraries. That's not the case here, though. If you want to integrate sidekiq with rails, that sounds like a good idea for a sidekiq-rails gem. If you want to ship such support with sidekiq, have it in a separate file that is not required by default (an adapter file). While the end-user experience is important, I don't believe it trumps good library design principles. I think it is fairly easy to understand that the philosophy of "whatever is good for the majority is what we should do", is harmful to the diversity of any ecosystem. |
That's what a Railtie is?
This point is far stronger, IMO. Setting an environment variable at runtime is kinda weird to begin with, then overriding Rails' env variables from Sidekiq doubles that weirdness. My original proposed approach fixes that issue, with the downside of making it not zero-config/plug-n-play for Rails users. |
I don't think there is any problem if sidekiq ships a railtie, or has Rails-specific code in a Rails-specific file that sidekiq doesn't load by default (if Rails loads sidekiq's railtie by default, that's fine). I think you only run into issues if you start doing things by default for the most popular library and not other competing alternatives. For example, I would consider code like this to be a problem: https://github.com/phusion/passenger/blob/085504c2e00b6e6322bb0ec97aa5ff43d037c729/src/ruby_supportlib/phusion_passenger/loader_shared_helpers.rb#L381 |
First of all, thanks for all the discussion. The original issue was me throwing things to the wall to see what stuck. I appreciate your input. I never loved setting another library's env var. Right now we have two proposed solutions. Auto setting env var and through explicit procfile writing. In thinking about the two proposed solutions I think I have a middle of the road, hybrid solution. First a recap of the other solutions: Auto Env VarHave sidekiq set the
ProcfileDocument how to set the RAILS_MAX_THREADS in the procfile like
Hybrid solution - Env var default and Procfile docsI think we should have docs on how to set the procfile:
However we could also default Sidekiq's default number of threads to match Active Record's by default (do this in a Railtie perhaps). For example if a This still buys us explicitness through documentation. It also buys us that "just works" out of the box experience. In that code run by Sidekiq won't try to check out more connections than AR has. The downside is that this drastically reduces the number of default threads from 25 to 5. However if they're using the database while running sidekiq they're already artificially limited as only 5 jobs would be able to run at a time since the others will be blocked waiting on a connection. Basically you're limited by amdahl's law anyway so it's not really a downside. If you REALLY need all possible speed, you can't expect it from defaults, it's reasonable to expect people to look at docs (my opinion) What do you think?
|
Thanks for all the input, folks. I really have been reading and considering all of the things. I like Jeremy's point that ENV vars should be considered read-only for libraries but Sidekiq is not a library, it's the top-level framework and the binary being run. The user is configuring the Sidekiq process via command line parameters and so tuning various libraries within the Sidekiq process by auto-setting ENV vars if not already set is fair game. For example, Sidekiq sets the RAILS_ENV and RACK_ENV variables based on the -e parameter. I'm leaning toward -c mapping to RAILS_MAX_THREADS if not set. With that logic, the 90% use case is the user sets That naming seems silly though; IMO the
to make the env vars more descriptive: |
Why not just make the db connection pool size an active record configuration setter? # config.application.rb, somewhere:
config.active_record.database_pool_size = ENV["CONCURRENCY"]
# another alternative for the foreman case
config.active_record.database_pool_size = ENV["RAILS_MAX_THREADS"] || ENV["SIDEKIQ_POOL_SIZE"] || 42 From what I understand, rails doesn't care about threads, is thread-safe, these are an application server / bj framework details. Why call it "max threads" when it's just the "db pool size" I want? |
That's a tangential discussion. The problem we have with that approach would be now there are two canonical ways to set the pool size. What happens when you set via the config AND the database.yml? Who provides the canonical value? What if the ordering of your code loading changes? We ran into this problem by suggesting people set that value in Rails pre 4.1 see https://devcenter.heroku.com/articles/rails-database-connection-behavior#prior-to-rails-4-1. It was bad, causes confusion and would likely make things more confusing instead of less confusing. I would be all for getting rid of database.yml in favor of non-yml configuration entry points like you've suggested, however that's a discussion that's not really on the table. |
I'd say that, if you take RAILS_MAX_THREADS as the highest priority (as rails does with DATABASE_URL(?)), you at least won't break backwards compatibility. and config would be the evangelized way (and in heroku you could just ignore the database.yml if pool is already set, which is yours and mine desired solution). But that's just me. I have a hard time seeing these suggested to stuff like sidekiq, which is a layer above rails, as the author suggested. Same as puma/passenger/unicorn/etc. If you make it configurable, as most of the libs are (I don't think devise relies on envvars, for example), you give freedom to the user, and it's not sidekiq's business anymore. It is indeed a breaking change from the link in heroku you sent, but suggesting that the framework should mitigate the lack of configuration in active record is unfair. As for the author of sidekiq, I have a question: why is sidekiq handling RAILS_ENV and RACK_ENV? Why not SINATRA_ENV or UNICORN_ENV? There was a discussion before on the different meanings of these variables, and that not all of them correlate to one another. Why is sidekiq handling them then? From what I can get from a quick github search, it's used to require some expected rails project files, and all inside the cli.rb file, which is also loaded for non-rails. Isn't this an example of "giving special treatment to rails"? If you want to target rails, just |
How does that look? |
This is a follow up to sidekiq#2985 (52828e4) adding similar support for the client connection pool. For Rails servers, Sidekiq is not loaded from the CLI so the prior change to support setting the concurrency via `RAILS_MAX_THREADS` does not apply. This means for Rails servers which do not configure a custom size through an initializer they will run with only the default 5 connections available to the pool. When the Rails server runs the initial Redis connection may be made through `Sidekiq::Client` (e.g. from [`ActiveJob::QueueAdapters::SidekiqAdapter`](https://github.com/rails/rails/blob/v5.1.5/activejob/lib/active_job/queue_adapters/sidekiq_adapter.rb#L20)). This causes the `redis_pool` to be initialized without any options, setting the pool size to the default of 5. .gem/ruby/2.5.0/gems/sidekiq-5.1.1/lib/sidekiq.rb:125:in `redis_pool' .gem/ruby/2.5.0/gems/sidekiq-5.1.1/lib/sidekiq/client.rb:42:in `initialize' .gem/ruby/2.5.0/gems/sidekiq-5.1.1/lib/sidekiq/client.rb:131:in `new' .gem/ruby/2.5.0/gems/sidekiq-5.1.1/lib/sidekiq/client.rb:131:in `push' .gem/ruby/2.5.0/gems/activejob-5.1.5/lib/active_job/queue_adapters/sidekiq_adapter.rb:20:in `enqueue' .gem/ruby/2.5.0/gems/activejob-5.1.5/lib/active_job/enqueuing.rb:51:in `block in enqueue' .gem/ruby/2.5.0/gems/activesupport-5.1.5/lib/active_support/callbacks.rb:108:in `block in run_callbacks' .gem/ruby/2.5.0/gems/activejob-5.1.5/lib/active_job/logging.rb:15:in `block (3 levels) in <module:Logging>' .gem/ruby/2.5.0/gems/activejob-5.1.5/lib/active_job/logging.rb:44:in `block in tag_logger' .gem/ruby/2.5.0/gems/activesupport-5.1.5/lib/active_support/tagged_logging.rb:69:in `block in tagged' .gem/ruby/2.5.0/gems/activesupport-5.1.5/lib/active_support/tagged_logging.rb:26:in `tagged' .gem/ruby/2.5.0/gems/activesupport-5.1.5/lib/active_support/tagged_logging.rb:69:in `tagged' .gem/ruby/2.5.0/gems/activejob-5.1.5/lib/active_job/logging.rb:44:in `tag_logger' .gem/ruby/2.5.0/gems/activejob-5.1.5/lib/active_job/logging.rb:14:in `block (2 levels) in <module:Logging>' .gem/ruby/2.5.0/gems/activesupport-5.1.5/lib/active_support/callbacks.rb:117:in `instance_exec' .gem/ruby/2.5.0/gems/activesupport-5.1.5/lib/active_support/callbacks.rb:117:in `block in run_callbacks' .gem/ruby/2.5.0/gems/activesupport-5.1.5/lib/active_support/callbacks.rb:135:in `run_callbacks' .gem/ruby/2.5.0/gems/activejob-5.1.5/lib/active_job/enqueuing.rb:47:in `enqueue' .gem/ruby/2.5.0/gems/activejob-5.1.5/lib/active_job/enqueuing.rb:18:in `perform_later' For the majority of cases, a client pool size of 5 is sufficient. However, servers which utilize a high number of threads, with large job payloads, and which may experience some network latency issues can see `Timeout::Error` crashes. This may be further exacerbated by the ~2-20x performance decrease through `ActiveJob` (sidekiq#3782). Rails addresses this general type of connection issue for the main database by suggesting that the DB pool size match the number of threads running. This change applies that logic to the default client pool size by leveraging the same environment setting; this way there's a connection available per thread. This may also have the side effect of a slight performance boost, as there is less of a chance that threads will be blocked waiting on connections. The trade-off is that there may be a memory profile increase to handle the additional Redis connections in the pool; note the pool only creates new connections as necessary to handle the requests. Resolves sidekiq#3806
This is a follow up to sidekiq#2985 (52828e4) adding similar support for the client connection pool. For Rails servers, Sidekiq is not loaded from the CLI so the prior change to support setting the concurrency via `RAILS_MAX_THREADS` is not applied to the web server process. This means for Rails servers which do not configure a custom size through an initializer they will run with the default connection pool size of 5. When the Rails server runs the initial Redis connection may be made through `Sidekiq::Client` (e.g. from [`ActiveJob::QueueAdapters::SidekiqAdapter`](https://github.com/rails/rails/blob/v5.1.5/activejob/lib/active_job/queue_adapters/sidekiq_adapter.rb#L20)). This causes the `redis_pool` to be initialized without any options, setting the pool size to the default of 5. .gem/ruby/2.5.0/gems/sidekiq-5.1.1/lib/sidekiq.rb:125:in `redis_pool' .gem/ruby/2.5.0/gems/sidekiq-5.1.1/lib/sidekiq/client.rb:42:in `initialize' .gem/ruby/2.5.0/gems/sidekiq-5.1.1/lib/sidekiq/client.rb:131:in `new' .gem/ruby/2.5.0/gems/sidekiq-5.1.1/lib/sidekiq/client.rb:131:in `push' .gem/ruby/2.5.0/gems/activejob-5.1.5/lib/active_job/queue_adapters/sidekiq_adapter.rb:20:in `enqueue' .gem/ruby/2.5.0/gems/activejob-5.1.5/lib/active_job/enqueuing.rb:51:in `block in enqueue' .gem/ruby/2.5.0/gems/activesupport-5.1.5/lib/active_support/callbacks.rb:108:in `block in run_callbacks' .gem/ruby/2.5.0/gems/activejob-5.1.5/lib/active_job/logging.rb:15:in `block (3 levels) in <module:Logging>' .gem/ruby/2.5.0/gems/activejob-5.1.5/lib/active_job/logging.rb:44:in `block in tag_logger' .gem/ruby/2.5.0/gems/activesupport-5.1.5/lib/active_support/tagged_logging.rb:69:in `block in tagged' .gem/ruby/2.5.0/gems/activesupport-5.1.5/lib/active_support/tagged_logging.rb:26:in `tagged' .gem/ruby/2.5.0/gems/activesupport-5.1.5/lib/active_support/tagged_logging.rb:69:in `tagged' .gem/ruby/2.5.0/gems/activejob-5.1.5/lib/active_job/logging.rb:44:in `tag_logger' .gem/ruby/2.5.0/gems/activejob-5.1.5/lib/active_job/logging.rb:14:in `block (2 levels) in <module:Logging>' .gem/ruby/2.5.0/gems/activesupport-5.1.5/lib/active_support/callbacks.rb:117:in `instance_exec' .gem/ruby/2.5.0/gems/activesupport-5.1.5/lib/active_support/callbacks.rb:117:in `block in run_callbacks' .gem/ruby/2.5.0/gems/activesupport-5.1.5/lib/active_support/callbacks.rb:135:in `run_callbacks' .gem/ruby/2.5.0/gems/activejob-5.1.5/lib/active_job/enqueuing.rb:47:in `enqueue' .gem/ruby/2.5.0/gems/activejob-5.1.5/lib/active_job/enqueuing.rb:18:in `perform_later' For the majority of cases, a client pool size of 5 is sufficient. However, servers which utilize a high number of threads, with large job payloads, and which may experience some network latency issues can see `Timeout::Error` crashes. This may be further exacerbated by the ~2-20x performance decrease through `ActiveJob` (sidekiq#3782). Rails addresses this general type of connection issue for the main database by suggesting that the DB pool size match the number of threads running. This change applies that logic to the default client pool size by leveraging the same environment setting; this way there's a connection available per thread. This may also have the side effect of a slight performance boost, as there is less of a chance that threads will be blocked waiting on connections. The trade-off is that there may be a memory profile increase to handle the additional Redis connections in the pool; note the pool only creates new connections as necessary to handle the requests. Resolves sidekiq#3806
This is a follow up to #2985 (52828e4) adding similar support for the client connection pool. For Rails servers, Sidekiq is not loaded from the CLI so the prior change to support setting the concurrency via `RAILS_MAX_THREADS` is not applied to the web server process. This means for Rails servers which do not configure a custom size through an initializer they will run with the default connection pool size of 5. When the Rails server runs the initial Redis connection may be made through `Sidekiq::Client` (e.g. from [`ActiveJob::QueueAdapters::SidekiqAdapter`](https://github.com/rails/rails/blob/v5.1.5/activejob/lib/active_job/queue_adapters/sidekiq_adapter.rb#L20)). This causes the `redis_pool` to be initialized without any options, setting the pool size to the default of 5. .gem/ruby/2.5.0/gems/sidekiq-5.1.1/lib/sidekiq.rb:125:in `redis_pool' .gem/ruby/2.5.0/gems/sidekiq-5.1.1/lib/sidekiq/client.rb:42:in `initialize' .gem/ruby/2.5.0/gems/sidekiq-5.1.1/lib/sidekiq/client.rb:131:in `new' .gem/ruby/2.5.0/gems/sidekiq-5.1.1/lib/sidekiq/client.rb:131:in `push' .gem/ruby/2.5.0/gems/activejob-5.1.5/lib/active_job/queue_adapters/sidekiq_adapter.rb:20:in `enqueue' .gem/ruby/2.5.0/gems/activejob-5.1.5/lib/active_job/enqueuing.rb:51:in `block in enqueue' .gem/ruby/2.5.0/gems/activesupport-5.1.5/lib/active_support/callbacks.rb:108:in `block in run_callbacks' .gem/ruby/2.5.0/gems/activejob-5.1.5/lib/active_job/logging.rb:15:in `block (3 levels) in <module:Logging>' .gem/ruby/2.5.0/gems/activejob-5.1.5/lib/active_job/logging.rb:44:in `block in tag_logger' .gem/ruby/2.5.0/gems/activesupport-5.1.5/lib/active_support/tagged_logging.rb:69:in `block in tagged' .gem/ruby/2.5.0/gems/activesupport-5.1.5/lib/active_support/tagged_logging.rb:26:in `tagged' .gem/ruby/2.5.0/gems/activesupport-5.1.5/lib/active_support/tagged_logging.rb:69:in `tagged' .gem/ruby/2.5.0/gems/activejob-5.1.5/lib/active_job/logging.rb:44:in `tag_logger' .gem/ruby/2.5.0/gems/activejob-5.1.5/lib/active_job/logging.rb:14:in `block (2 levels) in <module:Logging>' .gem/ruby/2.5.0/gems/activesupport-5.1.5/lib/active_support/callbacks.rb:117:in `instance_exec' .gem/ruby/2.5.0/gems/activesupport-5.1.5/lib/active_support/callbacks.rb:117:in `block in run_callbacks' .gem/ruby/2.5.0/gems/activesupport-5.1.5/lib/active_support/callbacks.rb:135:in `run_callbacks' .gem/ruby/2.5.0/gems/activejob-5.1.5/lib/active_job/enqueuing.rb:47:in `enqueue' .gem/ruby/2.5.0/gems/activejob-5.1.5/lib/active_job/enqueuing.rb:18:in `perform_later' For the majority of cases, a client pool size of 5 is sufficient. However, servers which utilize a high number of threads, with large job payloads, and which may experience some network latency issues can see `Timeout::Error` crashes. This may be further exacerbated by the ~2-20x performance decrease through `ActiveJob` (#3782). Rails addresses this general type of connection issue for the main database by suggesting that the DB pool size match the number of threads running. This change applies that logic to the default client pool size by leveraging the same environment setting; this way there's a connection available per thread. This may also have the side effect of a slight performance boost, as there is less of a chance that threads will be blocked waiting on connections. The trade-off is that there may be a memory profile increase to handle the additional Redis connections in the pool; note the pool only creates new connections as necessary to handle the requests. Resolves #3806
Hi, May I know if the sidekiq concurrency is per client or shared? If shared then, I think it cannot served 50 client sending background task request at the same time, if default is 25. If per client, then much better. |
@kurtlaunchlabs i don't understand what you mean |
Another approach: # config/database.yml
pool: <%= Sidekiq.server? ? Sidekiq.options[:concurrency] : ENV.fetch("RAILS_MAX_THREADS") { 5 } %> |
Right now if someone wants to use sidekiq they must configure the number of threads they want to run on or use the default (25). If they are using a database (likely) then to not hit a connection timeout error they will need to have at least as many connections in their database pool as sidekiq threads. Currently they must configure these two values in two different places, database pool in
database.yml
for ActiveRecord, and sidekiq connections either via command line or the sidekiq yml file. This is a problem as it is difficult to remember when you are modifying one value that you need to modify both.In Rails 5 there is a default environment variable to configure the number of threads that will be in the database connection pool
This convention is used along side of web server configuration values. For example Rails 5 also ships with puma which is configured to read from this value:
It is common that the number of sidekiq threads run in a "worker" process would be greater than the number of web threads used by a server like Puma. For this reason it does not make sense to default the sidekiq connections to
RAILS_MAX_THREADS
. However it would make sense for sidekiq to use this enviornment variable interface and se the environment variable:So now when the application boots in a "worker" process via the
sidekiq
command and we know that we are using Rails, the appropriate number of database connections will be configured by default.Potential Problems
This behavior may be slightly unexpected (for Sidekiq to change an environment variable at boot time) however I believe it is perfectly warranted. As a mitigation step we could detect when an enviornment variable was already set
A warning or similar would eliminate any confusion. It would also allow the use of
RAILS_MAX_THREADS
for when the web process is booted, and a different configuration option for use when sidekiq is booted.People may want to opt out of this behavior. They can do this by modifying their
database.yml
file to use a different environment variableSo now they have the ability to not rely on the behavior of setting
RAILS_MAX_THREADS
.Recap
Right now if you try the default database size of 5 will artificially limit the default sidekiq size of 25. Applying this change means that Rails applications will work with sidekiq out of the box. It also means that as Sidekiq is scaled up or down that the database size will be set appropriately without additional action (if conventions are being followed). I believe the costs to be minimal and the benefits to outweigh them.
The text was updated successfully, but these errors were encountered: