Worker down when Redis cluster switch to another master. #1531

evgeny-s · 2017-01-18T16:04:36Z

Hello, we have the following issue (1.26.0).
We have Redis cluster, configured with Sentinels.
And there is problem: when Master Redis server down, there are approximately 3-5 seconds of Redis downtime, and when Worker trying request Redis - he died and need to start Worker again.
We were forced to create monkey patch like this one:

Resque::Worker.class_eval do
  def work(interval = 5.0, &block)
    interval = Float(interval)
    $0 = "resque: Starting"
    startup

    loop do
      break if shutdown?
      begin
        if not paused? and job = reserve
          log_with_severity :info, "got: #{job.inspect}"
          job.worker = self
          working_on job

          procline "Processing #{job.queue} since #{Time.now.to_i} [#{job.payload_class_name}]"
          if @child = fork(job)
            srand # Reseeding
            procline "Forked #{@child} at #{Time.now.to_i}"
            begin
              Process.waitpid(@child)
            rescue SystemCallError
              nil
            end
            job.fail(DirtyExit.new("Child process received unhandled signal #{$?.stopsig}")) if $?.signaled?
          else
            unregister_signal_handlers if will_fork? && term_child
            begin

              reconnect if will_fork?
              perform(job, &block)

            rescue Exception => exception
              report_failed_job(job,exception)
            end

            if will_fork?
              run_at_exit_hooks ? exit : exit!
            end
          end

          done_working
          @child = nil
        else
          break if interval.zero?
          log_with_severity :debug, "Sleeping for #{interval} seconds"
          procline paused? ? "Paused" : "Waiting for #{queues.join(',')}"
          sleep interval
        end
      rescue Redis::CannotConnectError
        sleep interval
      end
    end

    unregister_worker
  rescue Exception => exception
    unless exception.class == SystemExit && !@child && run_at_exit_hooks
      log_with_severity :error, "Failed to start worker : #{exception.inspect}"

      unregister_worker(exception)
    end
  end
end

catch Redis::CannotConnectError then sleep and try again. This prevent worker failing in case Redis Switching downtime.

Would be great to add any option to enable this kind of processing.

Thank you.

The text was updated successfully, but these errors were encountered:

weiyunlu mentioned this issue Oct 3, 2019

Rescue workers should, but do not, attempt to reconnect when Redis goes down #1693

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Worker down when Redis cluster switch to another master. #1531

Worker down when Redis cluster switch to another master. #1531

evgeny-s commented Jan 18, 2017 •

edited

Worker down when Redis cluster switch to another master. #1531

Worker down when Redis cluster switch to another master. #1531

Comments

evgeny-s commented Jan 18, 2017 • edited

evgeny-s commented Jan 18, 2017 •

edited