Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dispatcher service stops updating poller-stats after losing Redis-connection #12707

Closed
ottorei opened this issue Apr 6, 2021 · 3 comments · Fixed by #13478
Closed

Dispatcher service stops updating poller-stats after losing Redis-connection #12707

ottorei opened this issue Apr 6, 2021 · 3 comments · Fixed by #13478

Comments

@ottorei
Copy link
Contributor

ottorei commented Apr 6, 2021

We encountered an issue with dispatcher service on a 4 poller distributed setup where the Redis-server was offline for a few hours. This cause 2 / 4 of the pollers to completely fail. For some reason, a few of them did reconnect but the others did not. Maybe the other poller connected at slightly different time when the Redis came back.

The issue can be fixed by manually restarting the service with systemctl. These are the last lines on log.

lnms-issue-rc

@maesbrisa
Copy link
Contributor

Hi!

We found the same issue in the last version (21.5.1). It turns out that the metrics from the devices lost track until we restart the dispatcher container. We found a temporary solution adding a service that restarts all unhealthy containers affected by redis (we also have to create a new librenms image that allows this feature). Documentation about docker healthcheck.

Hope it helps.

@ottorei
Copy link
Contributor Author

ottorei commented Aug 12, 2021

#13094 may possibly fix this

Edit: Did not fix the issue.

@ottorei
Copy link
Contributor Author

ottorei commented Nov 4, 2021

This issue seems to be caused by unhandled exceptions on some Redis-functions. This also causes the nodes to stop reporting their stats when using the dispatcher service if they lose their connection to the Redis-instance.

poller_stats_error

@ottorei ottorei changed the title Dispatcher service does not reliably restart when Redis-connection is lost Dispatcher service stops updating poller-stats after losing Redis-connection Nov 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants