Scheduler just terminates randomly for no apparent reason #10

Closed
troessner opened this Issue Dec 1, 2010 · 15 comments

Projects

None yet

7 participants

@troessner

Strange problem: In my scenario Rufus Scheduler terminates at different points during task execution for no apparent reason.

This is my config/initializers/task_scheduler.rb:

require 'rufus/scheduler'

$scheduler = Rufus::Scheduler.start_new

$scheduler.every("5m") do
  custom_log("Keep-Alive Rufus log, time: #{Time.now}")
end

if %w!production staging!.include? Rails.env
  $scheduler.every("45m") do
    custom_log("autocomplete:generate, time: #{Time.now}")
    Rake::Task["autocomplete:generate"].invoke
  end

  $scheduler.every("30m") do
    custom_log("ts:index, time: #{Time.now}")
    Rake::Task["ts:index"].invoke
  end

  $scheduler.every('30m') do
    custom_log("Checking for new contact requests in Contact.unsent.inspect - time: #{Time.now}")
    Contact.unsent.each do |contact|
      Mailer.deliver_mimi_contact_request(contact)
      Mailer.deliver_mimi_contact_requested(contact)
      contact.update_attribute(:notification_sent, true)
    end
  end
end

Now, when I restart passenger I always get mixed results how "far" rufus gets before he terminates:

This is from today:

# Restart passenger
Keep-Alive Rufus log, time: Wed Dec 01 12:06:36 -0800 2010
Keep-Alive Rufus log, time: Wed Dec 01 12:11:36 -0800 2010
Keep-Alive Rufus log, time: Wed Dec 01 12:16:36 -0800 2010
Keep-Alive Rufus log, time: Wed Dec 01 12:21:36 -0800 2010
Keep-Alive Rufus log, time: Wed Dec 01 12:26:36 -0800 2010
Keep-Alive Rufus log, time: Wed Dec 01 12:31:36 -0800 2010
ts:index, time: Wed Dec 01 12:31:36 -0800 2010
Checking for new contact requests in Contact.unsent.inspect - time: Wed Dec 01 12:31:36 -0800 2010
Keep-Alive Rufus log, time: Wed Dec 01 12:36:36 -0800 2010
Keep-Alive Rufus log, time: Wed Dec 01 12:41:36 -0800 2010
Keep-Alive Rufus log, time: Wed Dec 01 12:46:35 -0800 2010
autocomplete:generate, time: Wed Dec 01 12:46:35 -0800 2010
# -> nothing more after here

2 days before:

# Restart passenger
Keep-Alive Rufus log, time: Mon Nov 29 01:26:32 -0800 2010
Keep-Alive Rufus log, time: Mon Nov 29 01:31:31 -0800 2010
Keep-Alive Rufus log, time: Mon Nov 29 01:36:31 -0800 2010
Keep-Alive Rufus log, time: Mon Nov 29 01:41:31 -0800 2010
Keep-Alive Rufus log, time: Mon Nov 29 01:46:31 -0800 2010
Keep-Alive Rufus log, time: Mon Nov 29 01:51:31 -0800 2010
ts:index, time: Mon Nov 29 01:51:31 -0800 2010
Checking for new contact requests in Contact.unsent.inspect - time: Mon Nov 29 01:51:31 -0800 2010
# -> nothing more after here

From the same day:

# Restart passenger
Keep-Alive Rufus log, time: Mon Nov 29 15:39:19 -0800 2010
Keep-Alive Rufus log, time: Mon Nov 29 15:44:20 -0800 2010
Keep-Alive Rufus log, time: Mon Nov 29 15:49:19 -0800 2010
Keep-Alive Rufus log, time: Mon Nov 29 15:54:20 -0800 2010
Keep-Alive Rufus log, time: Mon Nov 29 15:59:19 -0800 2010
Keep-Alive Rufus log, time: Mon Nov 29 16:04:20 -0800 2010
ts:index, time: Mon Nov 29 16:04:20 -0800 2010
Checking for new contact requests in Contact.unsent.inspect - time: Mon Nov 29 16:04:20 -0800 2010
Keep-Alive Rufus log, time: Mon Nov 29 16:09:20 -0800 2010
Keep-Alive Rufus log, time: Mon Nov 29 16:14:19 -0800 2010
Keep-Alive Rufus log, time: Mon Nov 29 16:19:20 -0800 2010
autocomplete:generate, time: Mon Nov 29 16:19:20 -0800 2010
Keep-Alive Rufus log, time: Mon Nov 29 16:24:19 -0800 2010
Keep-Alive Rufus log, time: Mon Nov 29 16:29:20 -0800 2010
# -> nothing more after here

As you can see, the scheduler just kind of stops working at seemingly random points. All of the jobs in my scheduler file run fine when executed by hand and, even worse, also fine by the scheduler because I can see in the logs that every job has been executed successfully multiple times.
I tried a lot of things:

  • Playing around with intervalls as I suspected some weird concurrency issue
  • Disabling / Enabling all possible mutations of running jobs

The result is always the frustrating one from - I can't see any system / logic behind Rufus just stopping to work.

How can I debug this further?

Sys-Info:

ruby -v
ruby 1.8.7 (2009-06-12 patchlevel 174) [i686-linux]
gem -v
1.3.6
gem list rails
rails (2.3.5)
@troessner

Ah, thanks a lot for clearing that up.
It would be great if you could add a small note in the rufus-docs warning about this behaviour - I now know that this is a passenger and not a rufus issue, however, to the untrained eye (a.k.a. me) it seems to be a rufus issue.

@jmettraux
Owner

Done

19b61e7

Best regards,

John

@ShogunPanda

Hi!
I actually found a nice way (which could, of course, be improved) to handle this issue with Passenger.
Basically, i use Passenger events and a lock (PID) file to make sure than at any time at least a Rufus::Scheduler is active.
For my servers (tested both on OSX and Linux) it works either with passenger_pool_idle_time set to 0 and > 0. Probably the only requirement is passenger_min_instances >= 1.

The code follows, hope it helps. Basically you should only redefine the execute_scheduler function basing on your needs.

# encoding: utf-8

def execute_scheduler
    # Create your scheduler here
    scheduler = Rufus::Scheduler.start_new  
    logger = Logger.new(Rails.root.to_s + "/log/scheduler.log")

    # Test job
    scheduler.every("10m") do  
        logger.info "Log"
        end
end

# Create the main logger and set some useful variables.
main_logger = Logger.new(Rails.root.to_s + "/log/scheduler.log")
pid_file = (Rails.root.to_s + "/tmp/pids/scheduler").to_s
File.delete(pid_file) if FileTest.exists?(pid_file)

if defined?(PhusionPassenger) then
    # Passenger is starting a new process
    PhusionPassenger.on_event(:starting_worker_process) do |forked| 
        # If we are forked and there's no pid file (that is no lock)
        if forked && !FileTest.exists?(pid_file) then
            main_logger.debug "SCHEDULER START ON PROCESS #{$$}"
            # Write the current PID on the file
            File.open(pid_file, "w") {
                |f| f.write($$)
            }

            # Execute the scheduler
            execute_scheduler
        end
    end
    # Passenger is shutting down a process.   
    PhusionPassenger.on_event(:stopping_worker_process) do
        # If a pid file exists and the process which 
        # is being shutted down is the same which holds the lock 
        # (in other words, the process which is executing the scheduler)
        # we remove the lock.
        if FileTest.exists?(pid_file) then
            if File.open(pid_file, "r") {|f| pid = f.read.to_i} == $$ then 
                main_logger.debug "SCHEDULER STOP ON PROCESS #{$$}"
                File.delete(pid_file)
            end
        end
    end
else # Only execute one scheduler
    execute_scheduler
end

main_logger.info "SCHEDULER START"

Best regards!
Shogun

@jmettraux
Owner

将軍 : many thanks, but nobody will read this here, in a closed issue report. If you really want to help you could post that to the mailing list (http://groups.google.com/group/rufus-ruby/).

Best regards and thanks again,

@ShogunPanda

Just done!
Bye bye! ^^

@jmettraux
Owner

Excellent !

@concept47

Did this suddenly change in Passenger 3.
I've been using Rufus with Passenger for a while without any problems.

@jmettraux
Owner

No idea, this thread is dead.

Cheers,

John

@concept47

I know this is a dead thread, but I found this via google when I was looking for the problem, so in case anyone else has the problem, just set these in your virtual host

RailsAppSpawnerIdleTime 0
PassengerMinInstances 1

@jmettraux
Owner

Excellent, thanks !

@mareczek

Unfortunately, none of the above worked for me. The only thing that is actually working for me is setting:

PassengerSpawnMethod conservative

In the VirtualHost at the apache.conf file

I will keep my eyes and ears opened for a better solution :)

@riyad
riyad commented May 30, 2012

The solution @concept47 presented works for me also. Just put

RailsAppSpawnerIdleTime 0
PassengerMinInstances 1

in my vhost configuration and it worked. Thanks again.

System:
Apache 2.2.16
Passenger 3.0.12
Ruby 1.9.3

@jproyo
jproyo commented Nov 15, 2012

I developed an Application with Sinatra+Rufus+Datamapper+Other helper Gems, and It is deployed in AWS Elastic Beanstalk.
At the beginning I've experienced the same issue, because AWS Beanstalk use Passenger.

I solved the problem with the @ShogunPanda approach and It's working great for me now.

Here is my adaptation in Sinatra's case:

`
class Scheduler

class << self
    include Logging

    def run_every(time)

        pid_file = (File.dirname(__FILE__)+"/scheduler.pid").to_s
        File.delete(pid_file) if FileTest.exists?(pid_file)

        if defined?(PhusionPassenger) then
            # Passenger is starting a new process
            PhusionPassenger.on_event(:starting_worker_process) do |forked| 
                # If we are forked and there's no pid file (that is no lock)
                if forked && !FileTest.exists?(pid_file) then
                    logger.info "SCHEDULER START ON PROCESS #{$$}"
                    # Write the current PID on the file
                    File.open(pid_file, "w") {
                        |f| f.write($$)
                    }

                    # Execute the scheduler
                    execute_scheduler(time)
                end
            end
            # Passenger is shutting down a process.   
            PhusionPassenger.on_event(:stopping_worker_process) do
                # If a pid file exists and the process which 
                # is being shutted down is the same which holds the lock 
                # (in other words, the process which is executing the scheduler)
                # we remove the lock.
                if FileTest.exists?(pid_file) then
                    if File.open(pid_file, "r") {|f| pid = f.read.to_i} == $$ then 
                        logger.info "SCHEDULER STOP ON PROCESS #{$$}"
                        File.delete(pid_file)
                    end
                end
            end
        else # Only execute one scheduler
            execute_scheduler(time)
        end

    end

    def execute_scheduler(time)
        scheduler = Rufus::Scheduler.start_new

        scheduler.every time,:allow_overlapping => false do
          logger.info "Running Scheduler Sender Emails"
          Gmail::Sender.process_emails
        end
    end
end

end
`

This helper class is call from configure ... do ... end in Sinatra::Base Class in the following way:

configure :development, :debug, :test do Scheduler.run_every '5s' end

I hope this helps to others who are using Sinatra + Rufus.

Thank you @ShogunPanda!!!

@ShogunPanda

@jproyo You're welcome! :)

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment