Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

munin slack notification broken since setup #137

Closed
darkk opened this issue Aug 12, 2017 · 2 comments
Closed

munin slack notification broken since setup #137

darkk opened this issue Aug 12, 2017 · 2 comments
Labels

Comments

@darkk
Copy link
Contributor

darkk commented Aug 12, 2017

Impact: munin alerts were non-functional since May 17 till Aug 11

Detection: unexpected alert flood from ooni-munin to slack, noticed by @hellais

Timeline UTC:
17 May 03:30: notify_slack_munin deployed at munin.ooni.io
11 Aug 17:22: darkk deploys dom0-defaults to all:!no_passwd from #136 without filters
11 Aug 17:25: alert flood starts (as there are couple of boxes in warning state and munin alerts on all issues every tick)
11 Aug 18:22: hellais disabled an integration in #ooni-bots channel: ooni-munin
12 Aug 07:00: incident published

What went well:

  • it was quite easy to silence munin alerting :-)

What went wrong:

  • notify_slack_munin required curl and was broken since initial setup
  • darkk did not notice alert flood as it's not relayed to IRC & went AFK half an hour after
  • innocent-looking apt-get install curl changed behavior of running system

What could be done to prevent relapse and decrease impact:

  • general rule: avoid any changes to live systems if you're going AFK soon :)
  • another one: test alerting (e.g. lowering thresholds) while deploying it
@darkk darkk added the incident label Aug 12, 2017
@darkk darkk closed this as completed Aug 30, 2018
@hellais
Copy link
Member

hellais commented Sep 3, 2018

What was the resolution to this? Has munin been taken down and deprecated?

@darkk
Copy link
Contributor Author

darkk commented Sep 3, 2018

the resolution

None. This ticket was written down for historical and "knowledge share" purposes. It had no action points.

WRT munin destiny — it has been deprecated, but that node was re-used for tor test-helper. It should be re-deployed with clean and up-to-date debian, but it's tricky to do that within current GH limitations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants