[Stack Monitoring] Alerting Phase -1 #42960

cachedout · 2019-08-08T17:08:28Z

elasticmachine · 2019-08-08T17:08:30Z

Pinging @elastic/stack-monitoring

chrisronline · 2019-09-16T13:46:38Z

Update here.

I found a couple of blockers while taking a first stab at this and raised them here: #45571

chrisronline · 2019-10-16T17:49:16Z

The effort is going well here. I don't have a PR ready yet, but I hope to have it this week. (Update: Draft PR available)

Some updated notes on this effort:

We need to figure out how we handle the state of alerts firing - with Watcher, we write to the .monitoring-alerts-* index, but I think we can avoid an additional index by leveraging the persisted state for actions. We are blocked on this because we need a way to access this state, see Ability to fetch alert state / alert instance state #48442
We need to figure out the right way to disabling cluster alerts (watches). I've outlined some thoughts on this issue
I'm thinking we'll want to progressively add these into master (instead of one big merge) and if so, we should think about if we want to disable these until they are all in, or do we want to enable at least one from the start and have it co-exist with the other watches?
With watcher, we require users to specify an email address to receive alerts in their kibana.yml - we can continue this trend, or we can allow them to specify it in the UI when they enable Kibana alerts, and then we store it in a saved object or something.

igoristic · 2019-10-16T18:46:21Z

Nice work @chrisronline 💪 Can't wait to see it!

We need to figure out how we handle the state of alerts firing - with Watcher, we write to the .monitoring-alerts-* index

Once "Kibana Alerting" is live are we completely deprecating/removing the current/old Alerting?

I think we might still want a new index, just in case some setups still have the old .monitoring-alerts-* with legacy documents (or for some reason we need to support both ES and Kibana alerting). We can abbreviate it with something like -kb like we do -mb for Metricbeat.

I'm thinking we'll want to progressively add these into master (instead of one big merge)

💯

With watcher, we require users to specify an email address to receive alerts in their kibana.yml

I prefer in the Kibana UI, just because it's more UI friendly, and they can modify the info without restarting, but I don't mind continuing the yml trend.

chrisronline · 2019-10-16T20:39:57Z

Thanks for the thoughts @igoristic!

Once "Kibana Alerting" is live are we completely deprecating/removing the current/old Alerting?

I guess it depends on if we want a slow rollout of these migrations. If so, we will be living in a world where both are running and exist at the same time (not for the same alert check, but we'll have some watcher based cluster alerts and some kibana alerts)

I think we might still want a new index, just in case some setups still have the old .monitoring-alerts-* with legacy documents (or for some reason we need to support both ES and Kibana alerting). We can abbreviate it with something like -kb like we do -mb for Metricbeat.

You don't think we can accomplish the same UI from just using the state provided by the alerting framework? I think that's really all we need since we'll store data in there that tells us when the alert fired and if it's been resolved yet.

I prefer in the Kibana UI, just because it's more UI friendly, and they can modify the info without restarting, but I don't mind continuing the yml trend.

Yea I agree the UI route is better, but if we do a slow rollout, it might be confusing for folks who already have the kibana.yml config set - I think we need to make a call on the slow rollout and that will help inform us of how to handle these other issues.

igoristic · 2019-10-16T21:05:43Z

You don't think we can accomplish the same UI from just using the state provided by the alerting framework? I think that's really all we need since we'll store data in there that tells us when the alert fired and if it's been resolved yet.

I guess I don't really know how the current implantation well enough to validate my concern. My worry is that if an ES Alert is triggered it'll be added to the index which will then be picked up by both ES Alerts and KB Alerts which might duplicate some actions like sending two emails etc...

I just think a new index can help avoid any of this issues we might not yet foresee (maybe for the same reason Metricbeat has its own -mb indices?)

This is all based on speculation though

chrisronline · 2019-10-17T01:19:21Z

I guess I don't really know how the current implantation well enough to validate my concern. My worry is that if an ES Alert is triggered it'll be added to the index which will then be picked up by both ES Alerts and KB Alerts which might duplicate some actions like sending two emails etc...

Ah, I see the confusion here.

Part of this work involves disabling (or blacklisting per @cachedout's idea) the cluster alert when we enable the Kibana alert. We'd never have a situation (intentionally) where both the cluster alert for xpack license expiration, and the Kibana alert for xpack license expiration are running at the same time.

cachedout · 2019-10-17T12:04:32Z

I'm thinking we'll want to progressively add these into master (instead of one big merge) and if so, we should think about if we want to disable these until they are all in, or do we want to enable at least one from the start and have it co-exist with the other watches?

I think that gradually merging these and leaving them disabled until we are ready to switch the new alerting on in the application is the right thing to do. It gives us time to develop and test the alerts while minimizing the disruption for the user.

ypid-geberit · 2021-02-15T11:21:39Z

I was forwarded to this issue from elastic/elasticsearch#34814 (comment). The "Phase -1 which is outlined in the proposal document." is not linked so I don’t have knowledge of that so excuse me if this is beyond the scope of "Phase 1".

As a Elastic Stack admin, I feel the "Stack Monitoring" falls short compared to other Monitoring systems. For example, there is no concept of Hard and Soft States. And I am not convinced that it would be a good idea to replicate this using Elastic watcher (I tried for my own use and failed). See elastic/elasticsearch#34814 (comment) for more details.

igoristic · 2021-02-16T18:07:43Z

Thank you @ypid-geberit for your feedback

As a Elastic Stack admin, I feel the "Stack Monitoring" falls short compared to other Monitoring systems. For example, there is no concept of Hard and Soft States

I think this is a good request feature, but perhaps out of scope within the context of this ticket.

@ravikesarwani Maybe this is something we can add a ticket for in SM feature requests roadmap

ravikesarwani · 2021-02-16T18:25:56Z

Many of the out of the box stack monitoring alerts provide users the full flexibility to control the notifications (including what method to get notified with based on license level) and when they are generated. For example "CPU Usage" has the default to alert when CPU is over 85% looking at average over last 5 minutes. Both 85% and 5 minutes duration can easily be adjusted by the users.

Also with #91145 we will allow users to create multiple alerts and be able to handle feature similar to soft and hard states. For example "Say user wants to alert when CPU is 75% for last 5 minutes and send an email. When its 85% for last 10 minutes they want to send a pagerduty alert."

ypid-geberit · 2021-02-17T11:35:35Z

Sounds like what @ravikesarwani wrote addresses it. I am looking forward to it :)

cachedout added Meta enhancement New value added to drive a business result Team:Monitoring Stack Monitoring team labels Aug 8, 2019

cachedout assigned cachedout and chrisronline Aug 8, 2019

chrisronline mentioned this issue Oct 16, 2019

Stack Monitoring + Kibana Alerting #45571

Open

3 tasks

This was referenced Oct 18, 2019

[Monitoring] Kibana Alerting #48464

Closed

[Monitoring] Kibana Alerting #49219

Closed

This was referenced Jan 8, 2020

[Monitoring] Migrate license expiration alert to Kibana alerting #54260

Closed

[Monitoring] Migrate license expiration alert to Kibana alerting #54306

Merged

cachedout removed their assignment Feb 17, 2020

chrisronline mentioned this issue Mar 30, 2020

[Monitoring] Cluster state watch to Kibana alerting #61685

Merged

1 task

chrisronline mentioned this issue Jun 10, 2020

[Monitoring] Out of the box alerting #68805

Merged

10 tasks

chrisronline mentioned this issue Jul 8, 2020

[Monitoring] Cluster Status alert triggers on transient yellow status elastic/elasticsearch#34814

Closed

igoristic closed this as completed Feb 16, 2021

igoristic reopened this Feb 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Stack Monitoring] Alerting Phase -1 #42960

[Stack Monitoring] Alerting Phase -1 #42960

cachedout commented Aug 8, 2019 •

edited by igoristic

elasticmachine commented Aug 8, 2019

chrisronline commented Sep 16, 2019

chrisronline commented Oct 16, 2019 •

edited

igoristic commented Oct 16, 2019

chrisronline commented Oct 16, 2019

igoristic commented Oct 16, 2019

chrisronline commented Oct 17, 2019

cachedout commented Oct 17, 2019

ypid-geberit commented Feb 15, 2021

igoristic commented Feb 16, 2021 •

edited

ravikesarwani commented Feb 16, 2021

ypid-geberit commented Feb 17, 2021

[Stack Monitoring] Alerting Phase -1 #42960

[Stack Monitoring] Alerting Phase -1 #42960

Comments

cachedout commented Aug 8, 2019 • edited by igoristic

elasticmachine commented Aug 8, 2019

chrisronline commented Sep 16, 2019

chrisronline commented Oct 16, 2019 • edited

igoristic commented Oct 16, 2019

chrisronline commented Oct 16, 2019

igoristic commented Oct 16, 2019

chrisronline commented Oct 17, 2019

cachedout commented Oct 17, 2019

ypid-geberit commented Feb 15, 2021

igoristic commented Feb 16, 2021 • edited

ravikesarwani commented Feb 16, 2021

ypid-geberit commented Feb 17, 2021

cachedout commented Aug 8, 2019 •

edited by igoristic

chrisronline commented Oct 16, 2019 •

edited

igoristic commented Feb 16, 2021 •

edited