-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Reliability
There are three aspects of reliability with Sidekiq and Redis:
- pushing jobs to Redis with the client, see the client reliability page.
- fetching jobs from Redis with the server, see below.
- scheduling jobs, see below.
TL;DR To use the Reliability features in Sidekiq Pro, add this to your initializer:
Sidekiq::Client.reliable_push! unless Rails.env.test?
Sidekiq.configure_server do |config|
config.reliable!
endRead on for more detail. This screencast gives a quick overview:
Sidekiq uses BRPOP to fetch a job from the queue in Redis. This is very efficient and simple but it has one drawback: the job is now removed from Redis. If Sidekiq crashes while processing that job, it is lost forever. This is not a problem for many but some businesses need absolute reliability when processing jobs.
Sidekiq does its best to never lose jobs but it can't guarantee it; the only way to guarantee job durability is to not remove it from Redis until it is complete. For instance, if Sidekiq is restarted mid-job, it will try to push the unfinished jobs back to Redis but networking issues can prevent this.
Sidekiq Pro offers an alternative fetch strategy, super_fetch, for job processing using Redis' RPOPLPUSH command which keeps jobs in Redis. To enable super_fetch:
Sidekiq.configure_server do |config|
# This needs to be within the configure_server block
config.super_fetch!
endWhen Sidekiq starts, you should see SuperFetch activated:
INFO: Sidekiq Pro 3.5.0, commercially licensed. Thanks for your support!
INFO: Booting Sidekiq 5.0.0 with redis options {:url=>nil}
INFO: Starting processing, hit Ctrl-C to stop
INFO: SuperFetch activated
When a Sidekiq process dies, its jobs in progress become orphans. On process startup, super_fetch will look for orphaned jobs:
- if the process's heartbeat has expired (it takes 60 seconds to expire); AND
- if an hour has passed since the last orphan check
The orphan check requires a complete SCAN of the Redis database; it can take a substantial amount of time (i.e. over a few seconds) if your Redis database has a lot of keys. As always, I recommend using a separate Redis database or instance for cache data vs job data. The hour buffer prevents Sidekiq from slamming Redis with constant SCANs and ensures that you don't have a continual cycle of process death due to poison pill jobs.
In summary, super_fetch might recover jobs in 5 minutes or 3 hours, there's no guarantee. Restarting a process is the best way to signal Sidekiq Pro to look for orphans.
super_fetch supports the same two queue prioritization mechanisms as Sidekiq's basic fetch: strict priority and weighted random.
sidekiq -e production -q critical -q default -q bulk
Beware that strict ordering can lead to starvation: bulk jobs will only be processed once the critical and default queues are empty. You can switch ordering for different processes to ensure everyone gets processed:
sidekiq -e production -q critical -q default -q bulk
sidekiq -e production -q bulk -q default -q critical
sidekiq -e production -q critical,3 -q default,2 -q bulk,1
When using weighted ordering, sidekiq will randomly choose a queue to check, without blocking, using weighted random choice. For example, in the command given above, sidekiq will sample from the array ["critical", "critical", "critical", "default", "default", "bulk"] so critical will be checked first 50% of the time.
Sidekiq's default scheduler is not atomic, it pops jobs off the scheduled queue and enqueues them with two network round trips. Sidekiq Pro offers a reliable scheduler which uses Lua to perform the same task atomically:
Sidekiq.configure_server do |config|
config.reliable_scheduler!
endThis feature is optional but highly recommended to enable. It is not safe to enable if you are running Redis Cluster. More detail
Older versions of Sidekiq Pro offered reliable_fetch and timed_fetch. These algorithms are now deprecated and no longer documented.
