Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dispatcher watchdog is always disabled #149

Open
haydenseitz opened this issue Dec 8, 2020 · 5 comments
Open

Dispatcher watchdog is always disabled #149

haydenseitz opened this issue Dec 8, 2020 · 5 comments

Comments

@haydenseitz
Copy link
Contributor

Behaviour

Dispatcher watchdog service (service_watchdog_enabled) is disabled in the local config

Steps to reproduce this issue

  1. Set service_watchdog_enabled to enabled in the global config
  2. Start container with dispatcher enabled
  3. service_watchdog_enabled is set to False

Expected behaviour

To follow the configured service_watchdog_enabled setting.

Is there a reason the dispatcher watchdog should be disabled in the container? I'm seeing an issue that my dispatchers are losing connection with my redis container, and the dispatcher completely stops polling. I'm thinking the watchdog would help with this, but wondering if there would be a bigger impact

@crazy-max
Copy link
Member

@haydenseitz Watchdog scheduler is disabled for the Docker image because polling service is already handled by Docker itself.

@haydenseitz
Copy link
Contributor Author

haydenseitz commented Dec 10, 2020

Service recovery is not handled by docker, unless there's a container health check. When the poller disconnects from redis,the polling threads die, but the main dispatcher thread stays alive. The result is a "healthy" container that stops polling.

Thoughts on a health check to verify if the service is still polling? If not I can submit PR to enable watchdog

@crazy-max
Copy link
Member

@haydenseitz

When the poller disconnects from redis,the polling threads die, and the main dispatcher thread stays alive.

Ok then that's an issue with the dispatcher service itself.

If not I can submit PR to enable watchdog

I don't think watchdog is the proper way to handle this for the Docker image as it relies on log file. A Docker healthcheck instruction for the dispatcher service would be the right enhancement for this I think.

I'm seeing an issue that my dispatchers are losing connection with my redis container, and the dispatcher completely stops polling.

Do you have some logs?

@crazy-max crazy-max reopened this Dec 10, 2020
@chancez
Copy link

chancez commented Feb 22, 2021

Is there an existing HTTP endpoint on the dispatcher we can hit to see if it's healthy? I'm running librenms in kubernetes and would like an HTTP endpoint i can hit to see if it's healthy, and restart if not; or a command I can run to check if it's healthy.

@haydenseitz
Copy link
Contributor Author

@chancez no endpoint that I know of. My way around this is to copy a health check script to the the container image. The script runs a SQL query to make sure the dispatcher in question has polled more than X devices in the last poll period.

Here's the sql query:

SELECT pc.node_id, devices FROM poller_cluster pc JOIN poller_cluster_stats pcs ON pc.id = pcs.parent_poller WHERE poller_type = 'poller' AND node_id = '$NODE_ID'

where NODE_ID is sourced from the librenms .env file.

somewhat related - I will try to get back to the upstream librenms project to fix the current "watchdog" process to count polled devices in the python dispatcher, and stop watching log files. that would be cleaner and should be fit to enable in the docker image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants