Reliability

Mike Perham edited this page Apr 21, 2016 · 53 revisions

There are three aspects of reliability with Sidekiq and Redis:

  1. pushing jobs to Redis with the client, see the client reliability page.
  2. fetching jobs from Redis with the server, see the server reliability page.
  3. scheduling jobs

Setup

TL;DR To use the Reliability features in Sidekiq Pro, add this to your initializer:

Sidekiq::Client.reliable_push! unless Rails.env.test?

Sidekiq.configure_server do |config|
  # uncomment ONE of these fetch algorithms:
  # uncomment this if you are on your own servers
  #config.reliable_fetch!
  # uncomment this if you are using containers (Docker, AWS ECS) or on a PaaS like Heroku
  #config.timed_fetch!

  config.reliable_scheduler!
end

Read on for more detail. This screencast gives a quick overview:

Reliability

Using reliable_fetch

Sidekiq uses BRPOP to pop a job off the queue in Redis. This is very efficient and simple but it has one drawback: the job is now removed from Redis. If Sidekiq crashes while processing that job, it is lost forever. This is not a problem for many but some businesses need absolute reliability when processing jobs.

Sidekiq does its best to never lose jobs but it can't guarantee it; the only way to guarantee job durability is to not remove it from Redis until it is complete. For instance, if Sidekiq is restarted mid-job, it will try to push the unfinished jobs back to Redis but networking issues can prevent this.

Sidekiq Pro offers an alternative strategy for job processing using Redis' RPOPLPUSH command which keeps jobs in Redis. To enable reliable_fetch you must tag each process on a machine with a unique index and enable the strategy:

Start Sidekiq with a unique index for each process on the machine:

sidekiq -e production -i 0
sidekiq -e production -i 1
sidekiq -e production -i 2

Require the reliable fetch code:

Sidekiq.configure_server do |config|
  # This needs to be within the configure_server block
  config.reliable_fetch!
end

When Sidekiq starts, you should see ReliableFetch activated:

INFO: Booting Sidekiq 2.6.2 with Redis at redis://localhost:6379/0
INFO: Running in ruby 1.9.3p327 (2012-11-10 revision 37606) [x86_64-darwin11.4.2]
INFO: Sidekiq Pro 0.9.0, commercially licensed.  Thanks for your support!
INFO: ReliableFetch activated
INFO: Starting processing, hit Ctrl-C to stop

Any jobs which are not fully processed (e.g. due to a segfault or network failure) are restarted upon process restart.

Limitations

Sidekiq Pro's reliable fetch does not work well with Docker, Amazon's Elastic Beanstalk and Container Services, etc. These services might not allow any way to send a unique index per process or use stable hostnames. In this case, use timed_fetch.

You must ensure that any old process using reliable_fetch is shut down before starting up a new process to replace it during deploy. If old and new processes are running at the same time, it's possible for jobs to be processed twice.

The (queue, hostname, index) tuple must be unique for each Sidekiq process. If multiple processes share the same private queue, jobs can be duplicated or executed multiple times.

Using timed_fetch

Timed fetch is an alternative algorithm to reliable_fetch which doesn't require any configuration or process customization, so it works well in containers and PaaS environments but does not scale as well as reliable_fetch. See the server reliability page for a rundown on each and their pros and cons.

Sidekiq.configure_server do |config|
  config.timed_fetch!
end

Fetch algorithms

Reliable and timed fetch support the same two queue prioritization mechanisms as Sidekiq's basic fetch: strict priority and weighted random.

Strict queue ordering

sidekiq -e production -i 0 -q critical -q default -q bulk

Beware that strict prioritization can lead to starvation: bulk jobs will only be processed once the critical and default queues are empty. You can switch priorities for different processes to ensure everyone gets processed:

sidekiq -e production -i 0 -q critical -q default -q bulk
sidekiq -e production -i 1 -q bulk -q default -q critical

Weighted random algorithm

sidekiq -e production -i 0 -q critical,3 -q default,2 -q bulk,1

When using weighted queues, sidekiq will randomly choose a queue to check, without blocking, using weighted random choice. For example, in the command given above, sidekiq will sample from the array ["critical", "critical", "critical", "default", "default", "bulk"]

Scheduler

Sidekiq's default scheduler is not atomic, it pops jobs off the scheduled queue and enqueues them with two network round trips. Sidekiq Pro offers a reliable scheduler which uses Lua to perform the same task atomically:

Sidekiq.configure_server do |config|
  config.reliable_scheduler!
end

This feature is optional but highly recommended to enable. It is not safe to enable if you are running Redis Cluster.