Dispatcher watchdog is always disabled #149

haydenseitz · 2020-12-08T20:37:45Z

Behaviour

Dispatcher watchdog service (service_watchdog_enabled) is disabled in the local config

Steps to reproduce this issue

Set service_watchdog_enabled to enabled in the global config
Start container with dispatcher enabled
service_watchdog_enabled is set to False

Expected behaviour

To follow the configured service_watchdog_enabled setting.

Is there a reason the dispatcher watchdog should be disabled in the container? I'm seeing an issue that my dispatchers are losing connection with my redis container, and the dispatcher completely stops polling. I'm thinking the watchdog would help with this, but wondering if there would be a bigger impact

The text was updated successfully, but these errors were encountered:

crazy-max · 2020-12-10T02:23:31Z

@haydenseitz Watchdog scheduler is disabled for the Docker image because polling service is already handled by Docker itself.

haydenseitz · 2020-12-10T03:20:32Z

Service recovery is not handled by docker, unless there's a container health check. When the poller disconnects from redis,the polling threads die, but the main dispatcher thread stays alive. The result is a "healthy" container that stops polling.

Thoughts on a health check to verify if the service is still polling? If not I can submit PR to enable watchdog

crazy-max · 2020-12-10T03:27:09Z

@haydenseitz

When the poller disconnects from redis,the polling threads die, and the main dispatcher thread stays alive.

Ok then that's an issue with the dispatcher service itself.

If not I can submit PR to enable watchdog

I don't think watchdog is the proper way to handle this for the Docker image as it relies on log file. A Docker healthcheck instruction for the dispatcher service would be the right enhancement for this I think.

I'm seeing an issue that my dispatchers are losing connection with my redis container, and the dispatcher completely stops polling.

Do you have some logs?

chancez · 2021-02-22T18:51:50Z

Is there an existing HTTP endpoint on the dispatcher we can hit to see if it's healthy? I'm running librenms in kubernetes and would like an HTTP endpoint i can hit to see if it's healthy, and restart if not; or a command I can run to check if it's healthy.

haydenseitz · 2021-03-03T00:04:35Z

@chancez no endpoint that I know of. My way around this is to copy a health check script to the the container image. The script runs a SQL query to make sure the dispatcher in question has polled more than X devices in the last poll period.

Here's the sql query:

SELECT pc.node_id, devices FROM poller_cluster pc JOIN poller_cluster_stats pcs ON pc.id = pcs.parent_poller WHERE poller_type = 'poller' AND node_id = '$NODE_ID'

where NODE_ID is sourced from the librenms .env file.

somewhat related - I will try to get back to the upstream librenms project to fix the current "watchdog" process to count polled devices in the python dispatcher, and stop watching log files. that would be cleaner and should be fit to enable in the docker image

crazy-max closed this as completed Dec 10, 2020

crazy-max reopened this Dec 10, 2020

crazy-max added the status/needs-investigation label Mar 18, 2021

crazy-max added kind/upstream and removed status/needs-investigation labels Apr 18, 2021

crazy-max mentioned this issue May 26, 2021

Add HEALTHCHECK instruction to Dockerfile #206

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dispatcher watchdog is always disabled #149

Dispatcher watchdog is always disabled #149

haydenseitz commented Dec 8, 2020

crazy-max commented Dec 10, 2020

haydenseitz commented Dec 10, 2020 •

edited

Loading

crazy-max commented Dec 10, 2020

chancez commented Feb 22, 2021

haydenseitz commented Mar 3, 2021

Dispatcher watchdog is always disabled #149

Dispatcher watchdog is always disabled #149

Comments

haydenseitz commented Dec 8, 2020

Behaviour

Steps to reproduce this issue

Expected behaviour

crazy-max commented Dec 10, 2020

haydenseitz commented Dec 10, 2020 • edited Loading

crazy-max commented Dec 10, 2020

chancez commented Feb 22, 2021

haydenseitz commented Mar 3, 2021

haydenseitz commented Dec 10, 2020 •

edited

Loading