Monitoring

Eliot Sykes edited this page Jan 4, 2017 · 105 revisions

I recommend using a tool to monitor your Sidekiq processes in production to ensure they are always up and aren't using too much memory or CPU. I built Inspeqtor because I didn't like the existing tools that were available (e.g. monit, God and bluepill). My recommendations:

  1. Use Upstart or Systemd to start/stop Sidekiq. This ensures if the Ruby VM crashes, the process will respawn immediately.
  2. Use Inspeqtor to monitor the CPU and memory usage and restart Sidekiq if necessary.

Web UI

Sidekiq comes with a web application that can display the current state of a Sidekiq installation.

Rails

Add the following to your config/routes.rb:

require 'sidekiq/web'
mount Sidekiq::Web => '/sidekiq'

Forbidden

If you receive a Forbidden error when trying to submit a form, you do not have a valid session configured. A valid session is required to prevent CSRF attacks. You must configure the webapp to share the same session with Rails. Try putting this in your routes.rb after the require:

# Rails < 4:
Sidekiq::Web.set :session_secret, Rails.configuration.secret_token
# Rails >= 4:
Sidekiq::Web.set :session_secret, Rails.application.secrets[:secret_key_base]

Authentication

In a production application you'll likely want to protect access to this information. You can use the constraints feature of routing (in the config/routes.rb file) to accomplish this:

Devise

Allow any authenticated User

# config/routes.rb
authenticate :user do
  mount Sidekiq::Web => '/sidekiq'
end

Same as above but also ensures that User#admin? returns true

# config/routes.rb
authenticate :user, lambda { |u| u.admin? } do
  mount Sidekiq::Web => '/sidekiq'
end

Clearance

Clearance provides routing constraints to restrict access to routes.

Blog::Application.routes.draw do
  # Restricts access to all authenticated users
  constraints Clearance::Constraints::SignedIn.new do
    mount Sidekiq::Web, at: '/sidekiq'
  end

  # Restricts access to all authenticated admins
  constraints Clearance::Constraints::SignedIn.new { |user| user.admin? } do
    mount Sidekiq::Web, at: '/sidekiq'
  end
end

Authlogic

# lib/admin_constraint.rb
class AdminConstraint
  def matches?(request)
    return false unless request.cookie_jar['user_credentials'].present?
    user = User.find_by_persistence_token(request.cookie_jar['user_credentials'].split(':')[0])
    user && user.admin?
  end
end

# config/routes.rb
require "admin_constraint"
mount Sidekiq::Web => '/sidekiq', :constraints => AdminConstraint.new

Restful Authentication or Sorcery

Checks a User model instance that responds to admin?

# lib/admin_constraint.rb
class AdminConstraint
  def matches?(request)
    return false unless request.session[:user_id]
    user = User.find request.session[:user_id]
    user && user.admin?
  end
end

# config/routes.rb
require 'sidekiq/web'
require 'admin_constraint'
mount Sidekiq::Web => '/sidekiq', :constraints => AdminConstraint.new

Custom External Authentication

class AuthConstraint
  def self.admin?(request)
    return false unless (cookie = request.cookie_jar['auth'])

    Rails.cache.fetch(cookie['user'], :expires_in => 1.minute) do
      auth_data = JSON.parse(Base64.decode64(cookie['data']))
      response = HTTParty.post(Auth.validate_url, :query => auth_data)

      response.code == 200 && JSON.parse(response.body)['roles'].to_a.include?('Admin')
    end
  end
end

# config/routes.rb
constraints lambda {|request| AuthConstraint.admin?(request) } do
  mount Sidekiq::Web => '/admin/sidekiq'
end

Rails with Google authentication

@jonhyman breaks down how Appboy uses Google to protect access to Sidekiq.

Rails HTTP Basic Auth from Routes

# config/routes.rb
require "sidekiq/web"
Sidekiq::Web.use Rack::Auth::Basic do |username, password|
  # Protect against timing attacks:
  # - See https://codahale.com/a-lesson-in-timing-attacks/
  # - See https://thisdata.com/blog/timing-attacks-against-string-comparison/
  # - Use & (do not use &&) so that it doesn't short circuit.
  # - Use digests to stop length information leaking
  ActiveSupport::SecurityUtils.secure_compare(::Digest::SHA256.hexdigest(username), ::Digest::SHA256.hexdigest(ENV["SIDEKIQ_USERNAME"])) &
    ActiveSupport::SecurityUtils.secure_compare(::Digest::SHA256.hexdigest(password), ::Digest::SHA256.hexdigest(ENV["SIDEKIQ_PASSWORD"]))
end if Rails.env.production?
mount Sidekiq::Web, at: "/sidekiq"

If you get an ActionDispatch::Request::Session error, you've hit an incompatibility between Rails and Rack. See this comment for a workaround.

Standalone

Here's an example config.ru for booting Sidekiq::Web in your choice of Rack server:

require 'sidekiq'

Sidekiq.configure_client do |config|
  config.redis = { :size => 1 }
end

require 'sidekiq/web'
run Sidekiq::Web

You can mount sidekiq to existing Rack application as well:

require 'your_app'

require 'sidekiq/web'
run Rack::URLMap.new('/' => Sinatra::Application, '/sidekiq' => Sidekiq::Web)

Rack session and protection against web attacks

Note that Sidekiq::Web requires a valid Rack session to work. If you see a Forbidden error when clicking a button in the Web UI, it's because the Rack session is not configured correctly. Sidekiq cannot configure a session for you. If you do not know how to set up a valid session in your system, your best option is to search StackOverflow or post a question there with the code you are using to run the Web UI.

Sidekiq::Web uses Rack::Protection to protect your application against typical web attacks (such as CSRF, XSS, etc). Rack::Protection would invalidate your session and raise Forbidden error if it finds that your request doesn't satisfy security requirements. One of the possible situations is having your application working behind a reverse proxy and not passing important headers to it (X-Forwarded-For, X-Forwarded-Proto). Such situation and solution could be found in this article and issue #2560.

If you have wildcard domains with your Rails app and want to access the Web UI from all of them, see issue #2730.


If you do everything right, you should see this in your browser:

Web UI

Standalone with Basic Auth

# this code goes in your config.ru
require 'sidekiq'

Sidekiq.configure_client do |config|
  config.redis = { :size => 1 }
end

require 'sidekiq/web'
map '/sidekiq' do
  use Rack::Auth::Basic, "Protected Area" do |username, password|
    # Protect against timing attacks:
    # - See https://codahale.com/a-lesson-in-timing-attacks/
    # - See https://thisdata.com/blog/timing-attacks-against-string-comparison/
    # - Use & (do not use &&) so that it doesn't short circuit.
    # - Use digests to stop length information leaking
    Rack::Utils.secure_compare(::Digest::SHA256.hexdigest(username), ::Digest::SHA256.hexdigest(ENV["SIDEKIQ_USERNAME"])) &
      Rack::Utils.secure_compare(::Digest::SHA256.hexdigest(password), ::Digest::SHA256.hexdigest(ENV["SIDEKIQ_PASSWORD"]))
  end

  run Sidekiq::Web
end

Nagios

Below is a collection of nagios checks that includes check_sidekiq_queue script, which validates that a given queue depth is within a particular range. It's a simple shell script that uses redis-cli command line tool, and does not have any dependency on ruby.

https://github.com/wanelo/nagios-checks

Scout

Scout, a Rails app monitoring service, provides:

  1. Key metrics for each Sidekiq worker (mean and 95th percentile execution time, latency, error rate, etc).
  2. GitHub-enhanced transaction traces of both timing and memory allocations for individual jobs.

Scout Sidekiq Monitoring

Monitoring Queue Backlog

You can use a simple HTTP endpoint with Pingdom to check the size of your Sidekiq 'default' queue backlog. Put this in config/routes.rb:

require 'sidekiq/api'
match "queue-status" => proc { [200, {"Content-Type" => "text/plain"}, [Sidekiq::Queue.new.size < 100 ? "OK" : "UHOH" ]] }, via: :get

Now when you hit http://example.com/queue-status, the body of the response will be either 'OK' or 'UHOH'. We have a Pingdom check every minute which fires off an email if the response == 'UHOH'.

Monitoring Queue Latency

Using a custom end-point

If you throw a lot of jobs into the queue, you can get false positives when monitoring the queue backlog. Instead, monitor the queue latency. Queue latency is the difference between when the oldest job was pushed onto the queue versus the current time. This code will check that jobs don't spend more than 30 seconds enqueued. Put this in config/routes.rb:

require 'sidekiq/api'
match "queue-latency" => proc { [200, {"Content-Type" => "text/plain"}, [Sidekiq::Queue.new.latency < 30 ? "OK" : "UHOH" ]] }, via: :get

Now when you hit http://example.com/queue-latency, the body of the response will be either 'OK' or 'UHOH'.

Using the built-in dashboard

Sidekiq provides a JSON formatted dashboard at /dashboard/stats. You get this :

{
  "sidekiq": {
    "processed": 12345,
    "failed": 56,
    "busy": 25,
    "enqueued": 178,
    "scheduled": 0,
    "retries": 0,
    "default_latency": 12
  },
  "redis": {
    "connected_clients": "120",
    "uptime_in_days": "35",
    "used_memory_human": "602.31M",
    "used_memory_peak_human": "1.01G"
  }
}

Previous: Deployment Next: Internals