-
-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dispatcher watchdog is always disabled #149
Comments
@haydenseitz Watchdog scheduler is disabled for the Docker image because polling service is already handled by Docker itself. |
Service recovery is not handled by docker, unless there's a container health check. When the poller disconnects from redis,the polling threads die, but the main dispatcher thread stays alive. The result is a "healthy" container that stops polling. Thoughts on a health check to verify if the service is still polling? If not I can submit PR to enable watchdog |
Ok then that's an issue with the dispatcher service itself.
I don't think watchdog is the proper way to handle this for the Docker image as it relies on log file. A Docker healthcheck instruction for the dispatcher service would be the right enhancement for this I think.
Do you have some logs? |
Is there an existing HTTP endpoint on the dispatcher we can hit to see if it's healthy? I'm running librenms in kubernetes and would like an HTTP endpoint i can hit to see if it's healthy, and restart if not; or a command I can run to check if it's healthy. |
@chancez no endpoint that I know of. My way around this is to copy a health check script to the the container image. The script runs a SQL query to make sure the dispatcher in question has polled more than X devices in the last poll period. Here's the sql query: SELECT pc.node_id, devices FROM poller_cluster pc JOIN poller_cluster_stats pcs ON pc.id = pcs.parent_poller WHERE poller_type = 'poller' AND node_id = '$NODE_ID' where NODE_ID is sourced from the librenms .env file. somewhat related - I will try to get back to the upstream librenms project to fix the current "watchdog" process to count polled devices in the python dispatcher, and stop watching log files. that would be cleaner and should be fit to enable in the docker image |
Behaviour
Dispatcher watchdog service (
service_watchdog_enabled
) is disabled in the local configSteps to reproduce this issue
service_watchdog_enabled
to enabled in the global configservice_watchdog_enabled
is set to FalseExpected behaviour
To follow the configured
service_watchdog_enabled
setting.Is there a reason the dispatcher watchdog should be disabled in the container? I'm seeing an issue that my dispatchers are losing connection with my redis container, and the dispatcher completely stops polling. I'm thinking the watchdog would help with this, but wondering if there would be a bigger impact
The text was updated successfully, but these errors were encountered: