Reliability

There are three aspects of reliability with Sidekiq and Redis:

pushing jobs to Redis with the client, see the client reliability page.
fetching jobs from Redis with the server, see below.
scheduling jobs, see below.

Setup

TL;DR To use the Reliability features in Sidekiq Pro, add this to your initializer:

Sidekiq::Client.reliable_push! unless Rails.env.test?

Sidekiq.configure_server do |config|
  config.reliable!
end

Read on for more detail. This screencast gives a quick overview:

Using super_fetch

Sidekiq uses BRPOP to fetch a job from the queue in Redis. This is very efficient and simple but it has one drawback: the job is now removed from Redis. If Sidekiq crashes while processing that job, it is lost forever. This is not a problem for many but some businesses need absolute reliability when processing jobs.

Sidekiq does its best to never lose jobs but it can't guarantee it; the only way to guarantee job durability is to not remove it from Redis until it is complete. For instance, if Sidekiq is restarted mid-job, it will try to push the unfinished jobs back to Redis but networking issues can prevent this.

Sidekiq Pro offers an alternative fetch strategy, super_fetch, for job processing using Redis' RPOPLPUSH command which keeps jobs in Redis. To enable super_fetch:

Sidekiq.configure_server do |config|
  # This needs to be within the configure_server block
  config.super_fetch!
end

When Sidekiq starts, you should see SuperFetch activated:

INFO: Sidekiq Pro 3.5.0, commercially licensed.  Thanks for your support!
INFO: Booting Sidekiq 5.0.0 with redis options {:url=>nil}
INFO: Starting processing, hit Ctrl-C to stop
INFO: SuperFetch activated

Recovering Jobs

When a Sidekiq process dies, its jobs in progress become orphans. On process startup, super_fetch will look for orphaned jobs:

if the process's heartbeat has expired (it takes 60 seconds to expire); AND
if an hour has passed since the last orphan check

The orphan check requires a complete SCAN of the Redis database; it can take a substantial amount of time (i.e. over a few seconds) if your Redis database has a lot of keys. As always, I recommend using a separate Redis database or instance for cache data vs job data. The hour buffer prevents Sidekiq from slamming Redis with constant SCANs and ensures that you don't have a continual cycle of process death due to poison pill jobs.

In summary, super_fetch might recover jobs in 5 minutes or 3 hours, there's no guarantee. Restarting a process is the best way to signal Sidekiq Pro to look for orphans.

Fetch algorithms

super_fetch supports the same two queue prioritization mechanisms as Sidekiq's basic fetch: strict priority and weighted random.

Strict ordering

sidekiq -e production -q critical -q default -q bulk

Beware that strict ordering can lead to starvation: bulk jobs will only be processed once the critical and default queues are empty. You can switch ordering for different processes to ensure everyone gets processed:

sidekiq -e production -q critical -q default -q bulk
sidekiq -e production -q bulk -q default -q critical

Weighted random

sidekiq -e production -q critical,3 -q default,2 -q bulk,1

When using weighted ordering, sidekiq will randomly choose a queue to check, without blocking, using weighted random choice. For example, in the command given above, sidekiq will sample from the array ["critical", "critical", "critical", "default", "default", "bulk"] so critical will be checked first 50% of the time.

Scheduler

Sidekiq's default scheduler is not atomic, it pops jobs off the scheduled queue and enqueues them with two network round trips. Sidekiq Pro offers a reliable scheduler which uses Lua to perform the same task atomically:

Sidekiq.configure_server do |config|
  config.reliable_scheduler!
end

This feature is optional but highly recommended to enable. It is not safe to enable if you are running Redis Cluster. More detail

Notes

Older versions of Sidekiq Pro offered reliable_fetch and timed_fetch. These algorithms are now deprecated and no longer documented.

Reliability

Setup

Using super_fetch

Recovering Jobs

Fetch algorithms

Strict ordering

Weighted random

Scheduler

Notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally