Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

God reports incorrect up status when it fails to fork #108

Open
b4hand opened this Issue · 1 comment

2 participants

@b4hand

I've seen this now a couple of times on different machines when or after the OOM killer is invoked. This is probably do to some run away memory in either my service or memcached, but I would expect god to at least report the correct status. Instead it shows the corresponding process to be "up" even though ps shows it to not be running.

F [2012-09-07 02:30:07] FATAL: Unhandled exception in driver loop - (Errno::ENOMEM): Cannot allocate memory - fork(2)
/var/lib/gems/1.8/gems/god-0.12.1/bin/../lib/god/process.rb:238:in `fork'
/var/lib/gems/1.8/gems/god-0.12.1/bin/../lib/god/process.rb:238:in `call_action'
/var/lib/gems/1.8/gems/god-0.12.1/bin/../lib/god/watch.rb:305:in `call_action'
/var/lib/gems/1.8/gems/god-0.12.1/bin/../lib/god/watch.rb:261:in `action'
/var/lib/gems/1.8/gems/god-0.12.1/bin/../lib/god/task.rb:215:in `move'
/var/lib/gems/1.8/gems/god-0.12.1/bin/../lib/god/task.rb:444:in `handle_event'
/var/lib/gems/1.8/gems/god-0.12.1/bin/../lib/god/driver.rb:87:in `send'
/var/lib/gems/1.8/gems/god-0.12.1/bin/../lib/god/driver.rb:87:in `handle_event'
/var/lib/gems/1.8/gems/god-0.12.1/bin/../lib/god/driver.rb:181:in `initialize'
/var/lib/gems/1.8/gems/god-0.12.1/bin/../lib/god/driver.rb:179:in `loop'
/var/lib/gems/1.8/gems/god-0.12.1/bin/../lib/god/driver.rb:179:in `initialize'
/var/lib/gems/1.8/gems/god-0.12.1/bin/../lib/god/driver.rb:178:in `new'
/var/lib/gems/1.8/gems/god-0.12.1/bin/../lib/god/driver.rb:178:in `initialize'
/var/lib/gems/1.8/gems/god-0.12.1/bin/../lib/god/task.rb:51:in `new'
/var/lib/gems/1.8/gems/god-0.12.1/bin/../lib/god/task.rb:51:in `initialize'
/var/lib/gems/1.8/gems/god-0.12.1/bin/../lib/god/watch.rb:39:in `initialize'
/var/lib/gems/1.8/gems/god-0.12.1/bin/../lib/god.rb:283:in `new'
/var/lib/gems/1.8/gems/god-0.12.1/bin/../lib/god.rb:283:in `task'
/var/lib/gems/1.8/gems/god-0.12.1/bin/../lib/god.rb:271:in `watch'
@scomma

Getting this and it really burns because we trusted god to keep the production system alive. :disappointed:

I'm not sure if anything can be done about it, since god is written in Ruby and the options to handle out of memory error can be limited. Attempting to do most anything else will involve allocating more memory (unlike in C). Hope somebody proves me wrong.

Adding to the injury, after the crisis is over and memory has been restored, god lies to me (in god status) that all workers are up when in fact many have failed to spawn, and a simple ps query with their PID will reveal this fact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.