Skip to content

Pro Reliability Server

Mike Perham edited this page Mar 11, 2016 · 23 revisions

Sidekiq does what it can to not lose jobs. When it shuts down, it will push back any unfinished jobs to Redis. 99% of the time, that's sufficient. But there are limits: jobs are stored in-process while executing so if the process crashes or network connectivity goes down, the job can be lost.

To handle those edge cases, the job must remain in Redis while Sidekiq executes it. Sidekiq Pro provides two different algorithms to do just that. To activate one, add this to your initializer:

ONE_HOUR = 3600 # this is the default
Sidekiq.configure_server do |config|
  # uncomment one!
  #config.reliable_fetch!
  #config.timed_fetch! ONE_HOUR
end

reliable_fetch

This is the algorithm that Sidekiq Pro has provided from Day 1. It uses the rpoplpush command and stores jobs within a private queue for each process while executing.

Pros

  • Scales to 10,000+ jobs/sec because it uses O(1) operations
  • Old and battle tested

Cons

  • Requires stable hostnames and a unique index per-process
  • Does not work well with Heroku, Docker, Amazon's ECS or Elastic Beanstalk
  • Susceptible to "poison pill" jobs

Good choice if you are running in the traditional manner on your own servers, virtual or physical. Avoid if you are using containers or a PaaS like Heroku. If a job can crash the Ruby VM, this "poison pill" can crash your processes non-stop until the job is removed manually because jobs are retried when the process restarts.

timed_fetch

This is a new algorithm introduced in Sidekiq Pro v3.1. It uses Lua and stores jobs within a "pending" area with a timeout. If the job execution is not finished and acknowledged by the client within that timeout period, the job can be pushed back onto the queue for another process to pick up.

Pros

  • No special configuration or specialization required
  • Works in every deployment environment, containers or not
  • Handles "poison pills" gracefully

Cons

  • Less scalable because it uses O(log N) operations.
  • New, unproven

Good choice for anyone processing less than 10M jobs/day or wanting to use containers. Jobs which crash the Ruby VM, "poison pills", are not retried until the timeout is up (default of one hour) so they can't crash Sidekiq non-stop, only one per hour.

Clone this wiki locally