Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker down when Redis cluster switch to another master. #1531

Open
evgeny-s opened this issue Jan 18, 2017 · 0 comments
Open

Worker down when Redis cluster switch to another master. #1531

evgeny-s opened this issue Jan 18, 2017 · 0 comments

Comments

@evgeny-s
Copy link

evgeny-s commented Jan 18, 2017

Hello, we have the following issue (1.26.0).
We have Redis cluster, configured with Sentinels.
And there is problem: when Master Redis server down, there are approximately 3-5 seconds of Redis downtime, and when Worker trying request Redis - he died and need to start Worker again.
We were forced to create monkey patch like this one:

Resque::Worker.class_eval do
  def work(interval = 5.0, &block)
    interval = Float(interval)
    $0 = "resque: Starting"
    startup

    loop do
      break if shutdown?
      begin
        if not paused? and job = reserve
          log_with_severity :info, "got: #{job.inspect}"
          job.worker = self
          working_on job

          procline "Processing #{job.queue} since #{Time.now.to_i} [#{job.payload_class_name}]"
          if @child = fork(job)
            srand # Reseeding
            procline "Forked #{@child} at #{Time.now.to_i}"
            begin
              Process.waitpid(@child)
            rescue SystemCallError
              nil
            end
            job.fail(DirtyExit.new("Child process received unhandled signal #{$?.stopsig}")) if $?.signaled?
          else
            unregister_signal_handlers if will_fork? && term_child
            begin

              reconnect if will_fork?
              perform(job, &block)

            rescue Exception => exception
              report_failed_job(job,exception)
            end

            if will_fork?
              run_at_exit_hooks ? exit : exit!
            end
          end

          done_working
          @child = nil
        else
          break if interval.zero?
          log_with_severity :debug, "Sleeping for #{interval} seconds"
          procline paused? ? "Paused" : "Waiting for #{queues.join(',')}"
          sleep interval
        end
      rescue Redis::CannotConnectError
        sleep interval
      end
    end

    unregister_worker
  rescue Exception => exception
    unless exception.class == SystemExit && !@child && run_at_exit_hooks
      log_with_severity :error, "Failed to start worker : #{exception.inspect}"

      unregister_worker(exception)
    end
  end
end

catch Redis::CannotConnectError then sleep and try again. This prevent worker failing in case Redis Switching downtime.

Would be great to add any option to enable this kind of processing.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant