Skip to content
This repository has been archived by the owner on Mar 14, 2019. It is now read-only.

Why resque brain can't kill a worker

David Copeland edited this page May 24, 2014 · 3 revisions

Because Resque myopically hard-codes a single Ruby VM-wide instance of Redis for its operations, and because killing a worker will run failure hooks, which could depend on that instance, resque-brain cannot safely kill a worker.

How it works in Resque

Here's the code for when you unregister a worker:

def unregister_worker(exception = nil)
  # If we're still processing a job, make sure it gets logged as a
  # failure.
  if (hash = processing) && !hash.empty?
    job = Job.new(hash['queue'], hash['payload'])
    job.data_store = data_store
    # Ensure the proper worker is attached to this job, even if
    # it's not the precise instance that died.
    job.worker = self
    job.fail(exception || DirtyExit.new)
  end

  data_store.unregister_worker(self) do
    Stat.clear("processed:#{self}")
    Stat.clear("failed:#{self}")
  end
end

This part is OK for resque-brain, because Resque::DataStore is set up to allow this to work. Even setting job.data_store = data_store seems like it will work.

Here's the code in Job for fail:

def fail(exception)
  run_failure_hooks(exception)
  Failure.create \
    :payload   => payload,
    :exception => exception,
    :worker    => worker,
    :queue     => queue
end

This is already a problem, because Failure uses the hard-coded VM-wide redis instance, and not the one that resque-brain operates. But, it gets worse when we look at run_failure_hooks:

def run_failure_hooks(exception)
  begin
    job_args = args || []
    if has_payload_class?
      failure_hooks.each { |hook| payload_class.send(hook, exception, *job_args) } unless @failure_hooks_ran
    end
  ensure
    @failure_hooks_ran = true
  end
end

Here, we fire hooks on the job class, and those hooks could be anything. Since Idiomatic Ruby™ is to use global symbols like Resque.redis with reckless abandon, there's no way to be sure what will happen in those hooks.

Why not temporarily set Resque.redis?

This is probably the only viable option. Doing so, of course, is not thread-safe even a little bit, so we'd have to surround it with mutexes and all that. Maybe we can do that, but for now, we'll just count it as a win that we can even see stale workers.