Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Stack Monitoring] Account for cloud instance migration in missing_monitoring_data_rule #112369

Open
matschaffer opened this issue Sep 16, 2021 · 2 comments
Labels
Team:Monitoring Stack Monitoring team

Comments

@matschaffer
Copy link
Contributor

matschaffer commented Sep 16, 2021

Description of the problem including expected versus actual behavior:

We have an alert that fires whenever a node goes missing: https://github.com/elastic/kibana/blob/master/x-pack/plugins/monitoring/server/alerts/missing_monitoring_data_rule.ts

But in cloud environments this can be quite common as VMs will fail or need to be removed for maintenance purposes.

This triggers an alert like this which can be confusing.

Screen_Shot_2021-09-16_at_10_00_56

Describe the feature:

We should look at either disabling this or reducing the priority of the alert in cloud environments to avoid confusion.

Ideally we should be alerting on a discrepancy between the actual and intended cluster topology.

For example if I expect to have 3 masters and 6 data nodes, but I only have 2 masters and 5 data nodes beyond an expected window of recovery time, then I'd like an alert.

But for that we'll need a way to have stack monitoring be aware of not only the expected cluster topology, but also the expected recovery time for each class of node. Masters should be order of minutes, probably no more than an hour. Warm data nodes could take order of hours in the event a new VM is needed.

@matschaffer matschaffer added the Team:Kibana Management Dev Tools, Index Management, Upgrade Assistant, ILM, Ingest Node Pipelines, and more label Sep 16, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-stack-management (Team:Stack Management)

@matschaffer matschaffer added Team:Monitoring Stack Monitoring team and removed Team:Kibana Management Dev Tools, Index Management, Upgrade Assistant, ILM, Ingest Node Pipelines, and more labels Sep 16, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/stack-monitoring (Team:Monitoring)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Monitoring Stack Monitoring team
Projects
None yet
Development

No branches or pull requests

2 participants