"Large shard size" Stack Monitoring rule is missing "Look at the average over X minutes" option #111889

ravikesarwani · 2021-09-10T17:42:16Z

Add "Look at the average over X minutes" option to "Large shard size" rule with default value of 15 minutes.
Change the default value for the "Large shard size" rule from 55gb to 75gb

The documentation of stack monitoring alert for Large shard size mentions "The condition is met if an index’s average shard size is 55gb or higher in the last 5 minutes" but the parameter to specify the time period is somehow missing on the rule definition.

We don't want a single spike over 55gb for primary shard size to cause an alert.
Force merges can cause the shard to grow much more than 50 GB (in some cases may double) for a short while and potentially trigger an alert that would be considered false positive.
We want the alert to fire only when size in last X minutes (default 15 minutes) averages over 75gb.
This provides additional control point for the users and avoids unneeded noise at time.

This would be similar to "Disk usage" rule.

elasticmachine · 2021-09-10T17:42:18Z

Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui)

elasticmachine · 2021-09-10T17:42:18Z

Pinging @elastic/stack-monitoring (Team:Monitoring)

ravikesarwani · 2021-09-15T17:50:20Z

Force merges can cause the shard to grow much more than 50 GB for a short while and potentially trigger an alert that would be considered false positive. The change in this issue (where we check for average over last X minute) will help with this temporary condition.

elasticmachine · 2023-11-13T23:55:50Z

Pinging @elastic/infra-monitoring-ui (Team:Monitoring)

ravikesarwani added Team:Monitoring Stack Monitoring team Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services Feature:Stack Monitoring labels Sep 10, 2021

paulb-elastic added this to Features Backlog in [INACTIVE] Metrics / Red Team Backlog Sep 13, 2021

ravikesarwani added the SM alerting improvements label Sep 15, 2021

ravikesarwani mentioned this issue Dec 12, 2021

[Stack Monitoring] Alerts firing for default values #105659

Closed

2 tasks

jasonrhodes removed this from Features Backlog in [INACTIVE] Metrics / Red Team Backlog Mar 3, 2022

jasonrhodes added bug Fixes for quality problems that affect the customer experience and removed Team:Monitoring Stack Monitoring team labels Mar 3, 2022

jasonrhodes mentioned this issue Mar 8, 2022

Stack Monitoring Tech Debt Plan #127224

Closed

39 tasks

pmeresanu85 added this to the ER Archive milestone Aug 31, 2022

pmeresanu85 changed the title ~~"Large shard size" rule is missing "Look at the average over X minutes" option~~ [Stack Monitoring - Tech Debt] "Large shard size" rule is missing "Look at the average over X minutes" option Aug 31, 2022

smith changed the title ~~[Stack Monitoring - Tech Debt] "Large shard size" rule is missing "Look at the average over X minutes" option~~ "Large shard size" Stack Monitoring rule is missing "Look at the average over X minutes" option Feb 24, 2023

sophiec20 removed this from the ER Archive milestone Aug 4, 2023

smith added Team:Monitoring Stack Monitoring team and removed Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services labels Nov 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Large shard size" Stack Monitoring rule is missing "Look at the average over X minutes" option #111889

"Large shard size" Stack Monitoring rule is missing "Look at the average over X minutes" option #111889

ravikesarwani commented Sep 10, 2021 •

edited

elasticmachine commented Sep 10, 2021

elasticmachine commented Sep 10, 2021

ravikesarwani commented Sep 15, 2021

elasticmachine commented Nov 13, 2023

"Large shard size" Stack Monitoring rule is missing "Look at the average over X minutes" option #111889

"Large shard size" Stack Monitoring rule is missing "Look at the average over X minutes" option #111889

Comments

ravikesarwani commented Sep 10, 2021 • edited

elasticmachine commented Sep 10, 2021

elasticmachine commented Sep 10, 2021

ravikesarwani commented Sep 15, 2021

elasticmachine commented Nov 13, 2023

ravikesarwani commented Sep 10, 2021 •

edited