Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increased CortexIngesterReachingSeriesLimit critical alert threshold from 80% to 85% #363

Conversation

pracucci
Copy link
Collaborator

What this PR does:
The CortexIngesterReachingSeriesLimit alert currently has two threshold:

  • warning alert: > 70%
  • critical alert: > 80%

The warning alert is currently discussed in #362 while this PR is focusing on the critical alert.

What we've learned from production about ingesters reaching series limit:

  • Series growth is typically slow and the alert fires way before we get even close to the limit (eg. in a Cortex cluster with 50 ingesters and limit of 2.5M / ingester, the ingesters in-memory series grow by 1% every 1.2M series)
  • In case of emergency, it's very quick to increase the max series limit, since it's part of the runtime config (no ingesters restart is required)

Because of this, I would propose to fire the critical alert once we reach the 85% utilization (instead of the current 80% threshold). This will still leave us a 15% room before hitting the limit, which typically gives us enough time to address without any customer-facing disruption.

This change, together with the change we're doing in #362, should help to improve CortexIngesterReachingSeriesLimit alert to fire only when there's actually an action to take.

Which issue(s) this PR fixes:
N/A

Checklist

  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

…from 80% to 85%

Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci pracucci requested a review from a team as a code owner July 27, 2021 17:10
@pracucci pracucci requested a review from beorn7 July 27, 2021 17:10
@pracucci pracucci merged commit da183ef into main Jul 28, 2021
@pracucci pracucci deleted the increase-CortexIngesterReachingSeriesLimit-critical-alert-threshold branch July 28, 2021 07:24
simonswine pushed a commit to grafana/mimir that referenced this pull request Oct 18, 2021
…ortexIngesterReachingSeriesLimit-critical-alert-threshold

Increased CortexIngesterReachingSeriesLimit critical alert threshold from 80% to 85%
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants