New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Host get stuck on "DOWN" after an memory allocation error #1598
Comments
The first one is normal, there is no unknown for hosts, only up or down But the second is more problematic. Was the error on the check output or in On Wed, Apr 29, 2015 at 4:29 PM, Arthur Lutz notifications@github.com
|
This is a critical failure of shinken on debian jessie. We cannot find any way to get shinken to restart checks. After try to force checks with schedule downtimes, with forcing checks via livestatus
we tried removing the retention data in /var/lib/shinken/*.dat (after stopping shinken), the restarting. Everything stays as pending checks. No usable information in the logs. Putting the logs in debug mode drowns the information in performance print outs. |
Wow... we ended up getting to work (after much debugging) by restarting the service in a given order. The restart of poller when all other services were running, got it to work again. Would it be an init script bug ? is this bug specific to debian ? |
bug report for debian maintainers : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=784624 |
cool :) On Thu, May 7, 2015 at 11:05 AM, Arthur Lutz notifications@github.com
|
At some point we had a memory shortage on the shinken server, a lot of hosts then went to a DOWN status with the following error :
I believe the status should be unknown in this case.
Second problem : when the memory shortage was fixed, we can't find a way to get them to be seen again. One way I've found is say a host has a downtime of 1minute and then it goes back to green.
The text was updated successfully, but these errors were encountered: