God appears to be leaving old god processes around indefinitely #51

Open
ajsharp opened this Issue Jun 22, 2011 · 9 comments

Comments

Projects
None yet
6 participants
@ajsharp

ajsharp commented Jun 22, 2011

When god forks processes, whether to do monitoring or restart processes, it seems to leave a lot of these processes lying around. I seem to notice this happening when we after we do deploys, though not necessarily in a consistent manner. On deploys, we reload the god config (via goad load) and then we restart some processes. On our staging server, we only have a total of four processes being monitored, and we have 17 god processes running. Seems excessive. Also, on a recent deploy, I noticed this log output after reloading the god config.

We are using ruby 1.9.

I [2011-06-22 16:54:14]  INFO: api-unicorn-master-1 Reloaded config
I [2011-06-22 16:54:14]  INFO: resque-0 unwatched
F [2011-06-22 16:54:14] FATAL: Unhandled exception in driver loop - (NoMethodError): undefined method `handle_event' for nil:NilClass
/usr/local/rvm/gems/ruby-1.9.2-p180/gems/god-0.11.0/lib/god/driver.rb:151:in `block (2 levels) in initialize'
/usr/local/rvm/gems/ruby-1.9.2-p180/gems/god-0.11.0/lib/god/driver.rb:149:in `loop'
/usr/local/rvm/gems/ruby-1.9.2-p180/gems/god-0.11.0/lib/god/driver.rb:149:in `block in initialize'
I [2011-06-22 16:54:14]  INFO: resque-0 Reloaded config
I [2011-06-22 16:54:14]  INFO: resque-1 unwatched
F [2011-06-22 16:54:14] FATAL: Unhandled exception in driver loop - (NoMethodError): undefined method `handle_event' for nil:NilClass
/usr/local/rvm/gems/ruby-1.9.2-p180/gems/god-0.11.0/lib/god/driver.rb:151:in `block (2 levels) in initialize'
/usr/local/rvm/gems/ruby-1.9.2-p180/gems/god-0.11.0/lib/god/driver.rb:149:in `loop'
/usr/local/rvm/gems/ruby-1.9.2-p180/gems/god-0.11.0/lib/god/driver.rb:149:in `block in initialize'
I [2011-06-22 16:54:14]  INFO: resque-1 Reloaded config
I [2011-06-22 16:54:14]  INFO: resque-scheduler Loaded configI [2011-06-22 16:54:14]  INFO: resque-scheduler move 'unmonitored' to 'init'
D [2011-06-22 16:54:14] DEBUG: driver schedule #<God::Conditions::ProcessRunning:0x00000001cbae48> in 0 seconds
I [2011-06-22 16:54:14]  INFO: resque-scheduler moved 'unmonitored' to 'init'
I [2011-06-22 16:54:15]  INFO: resque-scheduler [trigger] process is running (ProcessRunning)
D [2011-06-22 16:54:15] DEBUG: resque-scheduler ProcessRunning [true] {true=>:up, false=>:start}
I [2011-06-22 16:54:15]  INFO: resque-scheduler move 'init' to 'up'
D [2011-06-22 16:54:15] DEBUG: driver schedule #<God::Conditions::MemoryUsage:0x00000001cbb9d8> in 0 seconds
D [2011-06-22 16:54:15] DEBUG: driver schedule #<God::Conditions::ProcessRunning:0x00000001cb9980> in 0 seconds
I [2011-06-22 16:54:15]  INFO: resque-scheduler moved 'init' to 'up'
I [2011-06-22 16:54:15]  INFO: resque-scheduler [ok] memory within bounds [73620kb] (MemoryUsage)
D [2011-06-22 16:54:15] DEBUG: resque-scheduler MemoryUsage [false] {true=>:restart}
D [2011-06-22 16:54:15] DEBUG: driver schedule #<God::Conditions::MemoryUsage:0x00000001cbb9d8> in 30 seconds
I [2011-06-22 16:54:15]  INFO: resque-scheduler [ok] process is running (ProcessRunning)D [2011-06-22 16:54:15] DEBUG: resque-scheduler ProcessRunning [false] {true=>:start}
D [2011-06-22 16:54:15] DEBUG: driver schedule #<God::Conditions::ProcessRunning:0x00000001cb9980> in 30 seconds
I [2011-06-22 16:54:15]  INFO: resque-1 move 'unmonitored' to 'restart'I [2011-06-22 16:54:15]  INFO: resque-scheduler move 'up' to 'restart'I [2011-06-22 16:54:15]  INFO: resque-0 move 'unmonitored' to 're
start'
I [2011-06-22 16:54:15]  INFO: resque-scheduler stop: kill -QUIT `cat /var/www/api/current/tmp/pids/resque-scheduler.pid`
I [2011-06-22 16:54:15]  INFO: resque-0 stop: kill -QUIT `cat /var/www/api/current/tmp/pids/resque-0.pid`
I [2011-06-22 16:54:15]  INFO: resque-1 stop: kill -QUIT `cat /var/www/api/current/tmp/pids/resque-1.pid`
@shayfrendt

This comment has been minimized.

Show comment Hide comment
@shayfrendt

shayfrendt Jun 23, 2011

I've been seeing this same issue. I've got 2 processed being monitored by god, but 6 leftover god processes running.

I'm running ruby-1.9.2-p180, god 0.11.0, and Ubuntu 10.04.2 and am also reloading the god config using the god load method.

Also - just saw issue #50 related to this.

I've been seeing this same issue. I've got 2 processed being monitored by god, but 6 leftover god processes running.

I'm running ruby-1.9.2-p180, god 0.11.0, and Ubuntu 10.04.2 and am also reloading the god config using the god load method.

Also - just saw issue #50 related to this.

@mezis

This comment has been minimized.

Show comment Hide comment
@mezis

mezis Apr 25, 2012

Still having this issue.

$ god -V
Version: 0.12.1
$ ruby -v
ruby 1.8.7 (2011-12-28 MBARI 8/0x6770 on patchlevel 357) [x86_64-linux], MBARI 0x6770, Ruby Enterprise Edition 2011.12

Happens every other load/unmonitor cycle, seemingly at random ...

Unfortunately makes God kinda useless as a watchdog ;(

mezis commented Apr 25, 2012

Still having this issue.

$ god -V
Version: 0.12.1
$ ruby -v
ruby 1.8.7 (2011-12-28 MBARI 8/0x6770 on patchlevel 357) [x86_64-linux], MBARI 0x6770, Ruby Enterprise Edition 2011.12

Happens every other load/unmonitor cycle, seemingly at random ...

Unfortunately makes God kinda useless as a watchdog ;(

@emirkin

This comment has been minimized.

Show comment Hide comment
@emirkin

emirkin May 19, 2012

+1

emirkin commented May 19, 2012

+1

@emirkin

This comment has been minimized.

Show comment Hide comment
@emirkin

emirkin May 19, 2012

By the way, given this bug, it's particularly unclear whether "god load" is enough to load the new config AND restart current tasks or whether another "god restart" is needed. Another question I can't answer until this issue is resolved is: what's "god monitor" for? I thought putting monitoring behaviors inside my .god file was enough?

emirkin commented May 19, 2012

By the way, given this bug, it's particularly unclear whether "god load" is enough to load the new config AND restart current tasks or whether another "god restart" is needed. Another question I can't answer until this issue is resolved is: what's "god monitor" for? I thought putting monitoring behaviors inside my .god file was enough?

@mezis

This comment has been minimized.

Show comment Hide comment
@mezis

mezis May 19, 2012

FYI this bug completely dissapeared for us after downgrading to REE 2011.03 (appears to be a known bug in the RVM affecting Ruby programs combining forking an threading).

@emirkin it's also unclear for me, although in my experience god load <config> always start the watch, except if you add the (undocumented) state argument:

god load <config> stop

Doesn't behave very consistently though.

mezis commented May 19, 2012

FYI this bug completely dissapeared for us after downgrading to REE 2011.03 (appears to be a known bug in the RVM affecting Ruby programs combining forking an threading).

@emirkin it's also unclear for me, although in my experience god load <config> always start the watch, except if you add the (undocumented) state argument:

god load <config> stop

Doesn't behave very consistently though.

@mezis

This comment has been minimized.

Show comment Hide comment
@mezis

mezis May 19, 2012

@emirkin misunderstood your comment I think. Given this bug, if God crashes after spawning proceses (e.g. Rainbows! in our case) and you restart God and reload your config, a new process will be spawned.

Bottom line: a watchdog that crashes so often and doesn't behave properly when restarting isn't very helpful.

mezis commented May 19, 2012

@emirkin misunderstood your comment I think. Given this bug, if God crashes after spawning proceses (e.g. Rainbows! in our case) and you restart God and reload your config, a new process will be spawned.

Bottom line: a watchdog that crashes so often and doesn't behave properly when restarting isn't very helpful.

@emirkin

This comment has been minimized.

Show comment Hide comment
@emirkin

emirkin May 20, 2012

@mezis Based on what you're saying you did understand and answer my concerns. So now, I'm just waiting for the fix too

emirkin commented May 20, 2012

@mezis Based on what you're saying you did understand and answer my concerns. So now, I'm just waiting for the fix too

@RobertLowe

This comment has been minimized.

Show comment Hide comment
@RobertLowe

RobertLowe May 30, 2012

+1

+1

@Aethelflaed

This comment has been minimized.

Show comment Hide comment
@Aethelflaed

Aethelflaed Apr 2, 2016

👍

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment