Skip to content
This repository has been archived by the owner on Feb 16, 2021. It is now read-only.

Add Service Monitoring #103

Open
claudijd opened this issue Aug 14, 2017 · 4 comments
Open

Add Service Monitoring #103

claudijd opened this issue Aug 14, 2017 · 4 comments

Comments

@claudijd
Copy link
Contributor

Usually, April is the first person to hear about Mozilla SSH Observatory issues because she's working Observatory stuff a lot more than I. However, these issues generally boil down to one of two areas, which I should just add monitoring to let me know, so I'm the first person to know.

1.) Alert me when the site is not responding (this is usually nginx restarting and failing or a failed lets encrypt renew)
2.) Alert me when the queues are non-zero and not changing (this is usually an indication that something is broken or site abuse)

@claudijd
Copy link
Contributor Author

Requested via MOC in bug https://bugzilla.mozilla.org/show_bug.cgi?id=1390296

@floatingatoll
Copy link

floatingatoll commented Aug 14, 2017 via email

@claudijd
Copy link
Contributor Author

@floatingatoll good point, I'll need to add a reporting attribute to the stats to ensure this is visible. I like it a lot because it doesn't require a monitoring endpoint to maintain state between checks. It would just say if "max queue age" gets past X then alert.

@claudijd
Copy link
Contributor Author

QUEUED_MAX_AGE attribute has been deployed to production and can be seen here...

https://sshscan.rubidus.com/api/v1/stats

Acceptable tolerances requested of MOC are between 0-30 seconds. Anything outside that is either an infrastructure issue or an abuse scenario, which fundamentally affects a user experience.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants