feat(Health.Checker.Alerts): Evaluate within grace period #311
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
What is this PR for?
No ticket, this is in response to some recent noisy health check count pages and is intended to avoid noisy failing health check pages in the future.
Previously, we were checking if alert data is stale by comparing the count of alerts in our store to the count of alerts returned from the API. We knew this was a first iteration that would be flaky since it can take a bit of time for our Store to catch up, which causes frequent health check failures (65 in the last 4 hours on prod). So far this hasn't resulted in any pages, but it does create some noise in our dashboard and has the potential to erroneously page someone external to our team, who wouldn't be able to immediately see this as noise.
This PR adds a grace period of 5 minutes to our alert count check to reduce health check failure noisiness.
Testing