Ent Multi Process

Mike Perham edited this page Nov 2, 2016 · 26 revisions

Sidekiq Enterprise 1.2.0+ has the ability to start and manage multiple Sidekiq worker processes, a la Unicorn or Puma. With multi-process mode, you get several advantages:

  1. modest memory savings by sharing memory between processes
  2. running multiple processes on a single Heroku dyno, allowing you to minimize your dyno costs
  3. easy to create a single service in Upstart or Systemd which scales to all machine cores
  4. automated memory monitoring and restart for bloated child processes

The term for running multi-process is swarm. A swarm has a master process and N worker processes.

Starting a Swarm

Sidekiq Enterprise provides a sidekiqswarm binary. This binary is designed to run under Upstart, Systemd or Foreman as a service. It does not allow old-style options like --daemonize, --logfile or --pidfile.

sidekiqswarm [options]

Start and supervise a swarm of Sidekiq processes.
All arguments are passed to each Sidekiq instance.
Do not provide `-i` as sidekiqswarm will automatically generate it.

You may not use the `-d`, `-L` or `-P` options.

Use the COUNT and INDEX environment variables to control sidekiqswarm.

COUNT   Number of Sidekiq processes to start, defaults to number of cores
INDEX   Starting index for Sidekiq Pro's reliable fetch, defaults to 0

Example:
COUNT=5 bundle exec sidekiqswarm -r ./myworker.rb

Running via init

systemd

Sidekiq has a sample systemd unit file here. Starting sidekiqswarm instead is almost identical, just update the ExecStart line and configure the environment as necessary:

# if you want to override the default number of processes
Environment=COUNT=2
ExecStart=/usr/local/bin/bundle exec sidekiqswarm -e production

upstart

Sidekiq has a sample upstart conf file here. Starting sidekiqswarm instead is almost identical, just update the exec line within the script block and configure the environment as necessary:

# if you want to override the default number of processes
env COUNT=2

exec bundle exec sidekiqswarm -e production

Signals and Controlling a Swarm

Use the standard upstart and systemd tools to manage the service for your swarm, e.g. systemctl restart sidekiq.

You can send the TERM and USR1 signals to the master process and it will pass those signals to the underlying children. Once the master process has received USR1 or TERM, it will not spawn any more children; it must be restarted. The master process does not handle the TTIN signal.

Bundler Preload

Sidekiq forks the worker processes after running Bundler.require(:default) but before booting the application so the workers can share the memory consumed by loading the gems. Your Gemfile should eager load gems where possible; using gem 'something', require: false in your Gemfile will limit any memory savings.

If you find that sidekiqswarm's default Bundler require is breaking your app on boot, you can control which groups get preloaded or disable preload completely:

# preload both the default and production groups
SIDEKIQ_PRELOAD=default,production bin/sidekiqswarm ...
# disable gem preload completely
SIDEKIQ_PRELOAD= bin/sidekiqswarm ...

Memory Monitoring

The master process can watch all children and restart any that get above a certain memory usage. Set the MAXMEM_KB environment variable with the maximum memory in kilobytes. If a child goes over that limit, the master will detect it and do the following:

  1. Send USR1
  2. Wait 60 seconds
  3. Send TERM
  4. Fork a new child
$ MAXMEM_KB=30000 COUNT=1 bundle exec sidekiqswarm -r ./test.rb
2016-03-02T21:12:08.802Z 18308 TID-8nnh4 INFO: Running in ruby 2.0.0p598 (2014-11-13) [x86_64-linux]
2016-03-02T21:12:08.846Z 18308 TID-8nnh4 INFO: Starting processing, hit Ctrl-C to stop
Process 18308 too large at 31184KB, stopping it...
2016-03-02T21:12:43.893Z 18308 TID-8nnh4 INFO: Received USR1, no longer accepting new work
2016-03-02T21:12:43.893Z 18308 TID-8nnh4 INFO: Terminating quiet workers
2016-03-02T21:12:43.898Z 18308 TID-9xw6k INFO: Scheduler exiting...
2016-03-02T21:13:43.909Z 18308 TID-8nnh4 INFO: Shutting down
2016-03-02T21:13:44.014Z 18308 TID-8nnh4 INFO: Bye!
Child exited, PID 18308, code 0, restarting...