Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mixin alert for ring member mismatch #2404

Merged
merged 16 commits into from
Oct 11, 2022

Conversation

jhesketh
Copy link
Contributor

@jhesketh jhesketh commented Jul 13, 2022

This alert fires when the expected number of ring members does not
match the running number for each component (one of compactor,
distributor, ingester, ruler, or store-gateway).

What this PR does

Which issue(s) this PR fixes or relates to

Fixes #1960

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@pracucci pracucci self-requested a review July 14, 2022 12:26
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I left few comments, then LGTM!

Please add a CHANGELOG entry under ### Mixin. You can consider this PR as [ENHANCEMENT].

operations/mimir-mixin/alerts/alerts.libsonnet Outdated Show resolved Hide resolved
docs/sources/operators-guide/mimir-runbooks/_index.md Outdated Show resolved Hide resolved
@pracucci pracucci self-requested a review September 12, 2022 13:23
@jhesketh jhesketh requested review from osg-grafana and a team as code owners September 19, 2022 11:23
@jhesketh
Copy link
Contributor Author

jhesketh commented Oct 4, 2022

@pracucci bump for when you have time please :-)

Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my late. It backtested this alert again and I would keep it only for ingester, which is realistically the only critical component (where things would really go nuts if there's this mismatch).

docs/sources/operators-guide/mimir-runbooks/_index.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Contributor

@osg-grafana osg-grafana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unblocking with minor feedback

CHANGELOG.md Outdated Show resolved Hide resolved
docs/sources/operators-guide/mimir-runbooks/_index.md Outdated Show resolved Hide resolved

How to **investigate**:

- Check the [hash ring web page]({{< relref "../reference-http-api/index.md#ingesters-ring-status" >}}) for the component for which the alert has fired, and look for unexpected instances in the list.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Check the [hash ring web page]({{< relref "../reference-http-api/index.md#ingesters-ring-status" >}}) for the component for which the alert has fired, and look for unexpected instances in the list.
- For the component for which the alert has fired, and to look for unexpected instances in the list, see [Ingesters ring status]({{< relref "../reference-http-api/index.md#ingesters-ring-status" >}}).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this suggestion makes sense. This is an action to take once you know a component that is mis-matched.

docs/sources/operators-guide/mimir-runbooks/_index.md Outdated Show resolved Hide resolved
@pracucci
Copy link
Collaborator

pracucci commented Oct 7, 2022

The CHANGELOG has just been cut to prepare for the next Mimir release. Please rebase main and eventually move the CHANGELOG entry added / updated in this PR to the top of the CHANGELOG document. Thanks!

Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally prefer to no keep commented code, but this PR has been open for too long so let's start using it ;)

@pracucci pracucci enabled auto-merge (squash) October 11, 2022 19:01
@pracucci pracucci merged commit 3c8fabd into grafana:main Oct 11, 2022
@jhesketh jhesketh deleted the active_member_alert branch October 12, 2022 04:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add alert if number of ring ACTIVE members doesn't match expected one
3 participants