Permalink
Browse files

Fix a race condition encountered with pruning dead workers

On a single machine, starting multiple workers at the same time would cause the same job to have
it's failure hook called multiple times. In this patch I'm only calling job.Fail when we've
successfully removed the worker from the :workers set.

Effectively this synchronizes the job fail
on the first worker to successfully remove itself from the set.
  • Loading branch information...
1 parent c773e57 commit 7996c63c4f82b5010b5705a87e7a74877fe21add @raykrueger raykrueger committed Jan 19, 2012
Showing with 17 additions and 10 deletions.
  1. +14 −9 lib/resque/worker.rb
  2. +3 −1 test/worker_test.rb
View
@@ -369,17 +369,22 @@ def run_hook(name, *args)
# Unregisters ourself as a worker. Useful when shutting down.
def unregister_worker
- # If we're still processing a job, make sure it gets logged as a
- # failure.
- if (hash = processing) && !hash.empty?
- job = Job.new(hash['queue'], hash['payload'])
- # Ensure the proper worker is attached to this job, even if
- # it's not the precise instance that died.
- job.worker = self
- job.fail(DirtyExit.new)
+
+ # Multiple workers on a single machine can fail a job, causing
+ # duplicate retries when using retry plugin. Here we're effectively
+ # synchronizing on the srem call to prevent that.
+ if redis.srem(:workers, self)
+ # If we're still processing a job, make sure it gets logged as a
+ # failure.
+ if (hash = processing) && !hash.empty?
+ job = Job.new(hash['queue'], hash['payload'])
+ # Ensure the proper worker is attached to this job, even if
+ # it's not the precise instance that died.
+ job.worker = self
+ job.fail(DirtyExit.new)
+ end
end
- redis.srem(:workers, self)
redis.del("worker:#{self}")
redis.del("worker:#{self}:started")
View
@@ -9,6 +9,7 @@
Resque.after_fork = nil
@worker = Resque::Worker.new(:jobs)
+ @worker.register_worker
Resque::Job.create(:jobs, SomeJob, 20, '/tmp')
end
@@ -307,7 +308,8 @@ def self.exception
workerB.instance_variable_set(:@to_s, "#{`hostname`.chomp}:2:high,low")
workerB.register_worker
- assert_equal 2, Resque.workers.size
+ # should be 3 counting @worker
+ assert_equal 3, Resque.workers.size
# then we prune them
@worker.work(0) do

0 comments on commit 7996c63

Please sign in to comment.