[Stack Monitoring] Account for cloud instance migration in missing_monitoring_data_rule #112369

matschaffer · 2021-09-16T01:05:49Z

Description of the problem including expected versus actual behavior:

We have an alert that fires whenever a node goes missing: https://github.com/elastic/kibana/blob/master/x-pack/plugins/monitoring/server/alerts/missing_monitoring_data_rule.ts

But in cloud environments this can be quite common as VMs will fail or need to be removed for maintenance purposes.

This triggers an alert like this which can be confusing.

Describe the feature:

We should look at either disabling this or reducing the priority of the alert in cloud environments to avoid confusion.

Ideally we should be alerting on a discrepancy between the actual and intended cluster topology.

For example if I expect to have 3 masters and 6 data nodes, but I only have 2 masters and 5 data nodes beyond an expected window of recovery time, then I'd like an alert.

But for that we'll need a way to have stack monitoring be aware of not only the expected cluster topology, but also the expected recovery time for each class of node. Masters should be order of minutes, probably no more than an hour. Warm data nodes could take order of hours in the event a new VM is needed.

elasticmachine · 2021-09-16T01:05:51Z

Pinging @elastic/kibana-stack-management (Team:Stack Management)

elasticmachine · 2021-09-16T01:07:07Z

Pinging @elastic/stack-monitoring (Team:Monitoring)

matschaffer added the Team:Kibana Management Dev Tools, Index Management, Upgrade Assistant, ILM, Ingest Node Pipelines, and more label Sep 16, 2021

matschaffer added Team:Monitoring Stack Monitoring team and removed Team:Kibana Management Dev Tools, Index Management, Upgrade Assistant, ILM, Ingest Node Pipelines, and more labels Sep 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Stack Monitoring] Account for cloud instance migration in missing_monitoring_data_rule #112369

[Stack Monitoring] Account for cloud instance migration in missing_monitoring_data_rule #112369

matschaffer commented Sep 16, 2021 •

edited

elasticmachine commented Sep 16, 2021

elasticmachine commented Sep 16, 2021

[Stack Monitoring] Account for cloud instance migration in missing_monitoring_data_rule #112369

[Stack Monitoring] Account for cloud instance migration in missing_monitoring_data_rule #112369

Comments

matschaffer commented Sep 16, 2021 • edited

elasticmachine commented Sep 16, 2021

elasticmachine commented Sep 16, 2021

matschaffer commented Sep 16, 2021 •

edited