-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Have an alerting system for things like member lookup failures #34
Comments
I don't know if a blinkenlight would be useful, it'd probably just make
emails about "the light is blinking" that wouldn't be all that useful.
So many of our network outages are brief but catastrophic that it's hard to
plan around them. Shooting an email at the least is a good first step. Are
there free email providers like mandrill(they eliminated their free service
recently I think) still?
…On Sat, Nov 26, 2016, 3:30 PM Sheila Miguez ***@***.***> wrote:
It would be good for the go server to fire off an alert when it hits
errors when trying to contact the member database. The alerts could be
aggregated by something on the network, but should also show up on a
physical display or blinkenlight in case the network is down.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#34>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEdWXs7DBifcBx9-n7quMGbMT8xw09Jbks5rCJbWgaJpZM4K8-_z>
.
|
I am just brainstorming here. I am not sure about the blinkenlight but some hardware thing would be good? What about an eink display that lists some status stuff? Because, you are right abut temporary failures. Things that alert all the time are annoying as hell. So, some transient thing that just does a status description is probably good. for email, I think maybe let something else handle it. The go server can write to syslog, and other things go to syslog, and there are tools that take syslog messages and send them off to get collected. we can have something that sits around aggregating messages, and the go server can do things, for example, log non 2xx responses from ps1auth, and the aggregator can decide at some thresehold that it's important to send a message. I think at some point kuroishi or someone set up a nagios thing too, which has some plugins for doing http checks, and we can have it hit some endpoint on the go server (and other things, like an endpoint on the ps1auth site). For email, I've used the one that comes free with rackspace. It has a freemium model. you have to watch out things don't get classified as spam (I seem to remember that happening). I hate email. |
sorry for handwaving. anyway, when the bbb is off the network, a physically connected display/something/counter might be helpful but maybe there is a label to put next to it with a legend for what the stuff means. for when the bbb is back on the network... handwavy again. I've used collectd to send messages to graphite. collectd has a plugin that can tail logs and send off messages. it has been a while so I don't remember how things worked. maybe there is a setting to have it retry until the network is back. that way it has a reasonable chance of collecting data (and if data gets loss, oh well. this isn't a pacemaker right?). but it might give us enough to go on when trying to figure stuff out. |
Some kind of display wouldn't be bad, for sure. I like the ideas, just
spitballing
…On Sun, Nov 27, 2016, 12:06 PM Sheila Miguez ***@***.***> wrote:
sorry for handwaving. anyway, when the bbb is off the network, a
physically connected display/something/counter might be helpful but maybe
there is a label to put next to it with a legend for what the stuff means.
for when the bbb is back on the network... handwavy again. I've used
collectd to send messages to graphite. collectd has a plugin that can tail
logs and send off messages. it has been a while so I don't remember how
things worked. maybe there is a setting to have it retry until the network
is back. that way it has a reasonable chance of collecting data (and if
data gets loss, oh well. this isn't a pacemaker right?). but it might give
us enough to go on when trying to figure stuff out.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#34 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEdWXj6MyvOVtfoxMl_EUp7e0ZSeyB1Mks5rCbiOgaJpZM4K8-_z>
.
|
It would be good for the go server to fire off an alert when it hits errors when trying to contact the member database. The alerts could be aggregated by something on the network, but should also show up on a physical display or blinkenlight in case the network is down.
The text was updated successfully, but these errors were encountered: