Skip to content

Error Handling

Mike Perham edited this page · 64 revisions

I hate to say it but some of your workers will raise exceptions when processing jobs. It's true.

Sidekiq has a number of features to handle errors of all types. Note this page covers Sidekiq 3.0.

Best Practices

  1. Use an error service - Honeybadger, Airbrake, Rollbar, BugSnag, Sentry, Exceptiontrap, Raygun, etc. They're all similar in feature sets and pricing but pick one and use it. The error service will send you an email every time there is an exception in a job. Note, Sidekiq 3.0 removed built-in support for Airbrake, Honeybadger, Exceptional and ExceptionNotifier. Ensure your error service supports Sidekiq or see below for adding it as a custom error handler.
  2. Let Sidekiq catch errors raised by your jobs. Sidekiq's built-in retry mechanism will catch those exceptions and retry the jobs regularly. The error service will notify you of the exception. You fix the bug, deploy the fix and Sidekiq will retry your job successfully.
  3. If you don't fix the bug within 25 retries (about 21 days), Sidekiq will stop retrying and move your job to the Dead Job Queue. You can fix the bug and retry the job manually anytime within the next 6 months using the Web UI.
  4. After 6 months, Sidekiq will discard the job.

Error Handlers

Gems can attach to Sidekiq's global error handlers so they will be informed any time there is an error inside Sidekiq. Error services should all provide integration automatically by including their gem within your application's Gemfile.

You can create your own error handler by providing something which responds to call(exception, context_hash):

Sidekiq.configure_server do |config|
  config.error_handlers << Proc.new {|ex,ctx_hash| MyErrorService.notify(ex, ctx_hash) }
end

Note that error handlers are only relevant to the Sidekiq server process. They aren't active in Rails console, for instance.

Backtrace Logging

Enabling backtrace logging for a job will cause the backtrace to be persisted throughout the lifetime of the job. This can cause your Redis memory usage to grow without new jobs being added if a large quantity of jobs are failing repeatedly and being requeued.

You should use caution when enabling backtrace by limiting it to a couple of lines, or use an error service to keep track of failures.

Automatic job retry

Sidekiq will retry failures with an exponential backoff using the formula (retry_count ** 4) + 15 + (rand(30) * (retry_count + 1)) (i.e. 15, 16, 31, 96, 271, ... seconds + a random amount of time). It will perform 25 retries over approximately 21 days. Assuming you deploy a bug fix within that time, the job will get retried and successfully processed. After 25 times, Sidekiq will move that job to the Dead Job queue, assuming that it will need manual intervention to work.

Web UI

The Sidekiq Web UI has a "Retries" and "Dead" tab which lists failed jobs and allows you to run them, inspect them or delete them.

Dead Job Queue

The DJQ is a holding pen for jobs which have failed all their retries. Sidekiq will not retry those jobs, you must manually retry them via the UI. The dead job queue is limited by default to 10,000 jobs or 6 months so it doesn't grow infinitely. Only jobs configured with 0 or greater retries will go to the Dead Job Queue. Use :retry => false if you want a particular type of job to be executed only once, no matter what happens.

Configuration

You can specify the number of retries for a particular worker if 25 is too many:

class LessRetryableWorker
  include Sidekiq::Worker
  sidekiq_options :retry => 5 # Only five retries and then to the Dead Job Queue

  def perform(...)
  end
end

You can disable retry support for a particular worker. Note with retry disabled, Sidekiq will not track or save any error data for the worker's jobs.

class NonRetryableWorker
  include Sidekiq::Worker
  sidekiq_options :retry => false # job will be discarded immediately if failed

  def perform(...)
  end
end

You can disable a job going to the DJQ:

class NonRetryableWorker
  include Sidekiq::Worker
  sidekiq_options :retry => 5, :dead => false

  def perform(...)
  end
end

The retry delay can be customized using sidekiq_retry_in, if needed.

class WorkerWithCustomRetry
  include Sidekiq::Worker
  sidekiq_options :retry => 5

  # The current retry count is yielded. The return value of the block must be 
  # an integer. It is used as the delay, in seconds. 
  sidekiq_retry_in do |count|
    10 * (count + 1) # (i.e. 10, 20, 30, 40)
  end

  def perform(...)
  end
end

After retrying so many times, Sidekiq will call the sidekiq_retries_exhausted hook on your Worker if you've defined it. The hook receives the queued message as an argument. This hook is called right before Sidekiq moves the job to the DJQ.

class FailingWorker
  include Sidekiq::Worker

  sidekiq_retries_exhausted do |msg|
    Sidekiq.logger.warn "Failed #{msg['class']} with #{msg['args']}: #{msg['error_message']}"
  end

  def perform(*args)
    raise "or I don't work"
  end
end

Process Crashes

If the Sidekiq process segfaults or crashes the Ruby VM, any jobs that were being processed are lost. Sidekiq Pro offers a reliable queueing feature which does not lose those jobs.

No More Bike Shedding

Sidekiq's retry mechanism is a set of best practices but many people have suggested various knobs and options to tweak in order to handle their own edge case. This way lies madness. Design your code to work well with Sidekiq's retry mechanism as it exists today or fork the RetryJobs middleware and add your own logic. I'm no longer accepting any functional changes to the retry mechanism unless you make an extremely compelling case for why Sidekiq's thousands of users would want that change.

Previous: Using Redis Next: Advanced Options

Something went wrong with that request. Please try again.