Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have an alerting system for things like member lookup failures #34

Open
codersquid opened this issue Nov 26, 2016 · 4 comments
Open

Have an alerting system for things like member lookup failures #34

codersquid opened this issue Nov 26, 2016 · 4 comments

Comments

@codersquid
Copy link
Collaborator

It would be good for the go server to fire off an alert when it hits errors when trying to contact the member database. The alerts could be aggregated by something on the network, but should also show up on a physical display or blinkenlight in case the network is down.

@codersquid codersquid changed the title Have an alerting system for things like the member lookup fails Have an alerting system for things like member lookup failures Nov 26, 2016
@loansindi
Copy link
Owner

loansindi commented Nov 27, 2016 via email

@codersquid
Copy link
Collaborator Author

I am just brainstorming here.

I am not sure about the blinkenlight but some hardware thing would be good? What about an eink display that lists some status stuff? Because, you are right abut temporary failures. Things that alert all the time are annoying as hell. So, some transient thing that just does a status description is probably good.

for email, I think maybe let something else handle it. The go server can write to syslog, and other things go to syslog, and there are tools that take syslog messages and send them off to get collected. we can have something that sits around aggregating messages, and the go server can do things, for example, log non 2xx responses from ps1auth, and the aggregator can decide at some thresehold that it's important to send a message.

I think at some point kuroishi or someone set up a nagios thing too, which has some plugins for doing http checks, and we can have it hit some endpoint on the go server (and other things, like an endpoint on the ps1auth site).

For email, I've used the one that comes free with rackspace. It has a freemium model. you have to watch out things don't get classified as spam (I seem to remember that happening). I hate email.

@codersquid
Copy link
Collaborator Author

sorry for handwaving. anyway, when the bbb is off the network, a physically connected display/something/counter might be helpful but maybe there is a label to put next to it with a legend for what the stuff means.

for when the bbb is back on the network... handwavy again. I've used collectd to send messages to graphite. collectd has a plugin that can tail logs and send off messages. it has been a while so I don't remember how things worked. maybe there is a setting to have it retry until the network is back. that way it has a reasonable chance of collecting data (and if data gets loss, oh well. this isn't a pacemaker right?). but it might give us enough to go on when trying to figure stuff out.

@loansindi
Copy link
Owner

loansindi commented Nov 27, 2016 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants