Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ResponseOps][BE] Alert creation delay based on user definition #174657

Merged
merged 10 commits into from
Jan 23, 2024

Conversation

doakalexi
Copy link
Contributor

@doakalexi doakalexi commented Jan 11, 2024

Related to #173009

Summary

This is the first of two PRs and only focuses on the backend implementation. This PR adds a new notificationDelay field to the Rule object. With the delay the rule will run X times and has to match the threshold X times before triggering actions. It won't affect the alert recovery, but it can be expanded on easily if we want to include recovered alerts in the future.

Checklist

To verify

  • Use Dev Tools to create a rule with the notificationDelay
POST kbn:/api/alerting/rule
{
  "params": {
    "searchType": "esQuery",
    "timeWindowSize": 5,
    "timeWindowUnit": "m",
    "threshold": [
      -1
    ],
    "thresholdComparator": ">",
    "size": 100,
    "esQuery": """{
    "query":{
      "match_all" : {}
    }
  }""",
    "aggType": "count",
    "groupBy": "all",
    "termSize": 5,
    "excludeHitsFromPreviousRun": false,
    "sourceFields": [],
    "index": [
      ".kibana-event-log*"
    ],
    "timeField": "@timestamp"
  },
  "consumer": "stackAlerts",
  "schedule": {
    "interval": "1m"
  },
  "tags": [],
  "name": "test",
  "rule_type_id": ".es-query",
  "actions": [
    {
      "group": "query matched",
      "id": "${ACTION_ID}",
      "params": {
        "level": "info",
        "message": """Elasticsearch query rule '{{rule.name}}' is active:

- Value: {{context.value}}
- Conditions Met: {{context.conditions}} over {{rule.params.timeWindowSize}}{{rule.params.timeWindowUnit}}
- Timestamp: {{context.date}}
- Link: {{context.link}}"""
      },
      "frequency": {
        "notify_when": "onActionGroupChange",
        "throttle": null,
        "summary": false
      }
    }
  ],
  "notification_delay": {
    "active": 3
  }
}
  • Verify that the rule will not trigger actions until it has matched the delay threshold. It might be helpful to look at rule details page and add the Triggered actions column to easily see the action was triggered after X consecutive active alerts
Screen Shot 2024-01-16 at 1 18 52 PM - Verify that the delay does not affect recovered alerts

@doakalexi
Copy link
Contributor Author

/ci

@doakalexi
Copy link
Contributor Author

/ci

@doakalexi
Copy link
Contributor Author

/ci

@doakalexi
Copy link
Contributor Author

/ci

@doakalexi
Copy link
Contributor Author

/ci

@doakalexi
Copy link
Contributor Author

/ci

@doakalexi doakalexi added Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.13.0 release_note:skip Skip the PR/issue when compiling release notes labels Jan 16, 2024
@doakalexi doakalexi changed the title Adding the notification delay [ResponseOps][BE] Alert creation delay based on user definition Jan 16, 2024
@doakalexi doakalexi marked this pull request as ready for review January 16, 2024 19:40
@doakalexi doakalexi requested a review from a team as a code owner January 16, 2024 19:40
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@ymao1
Copy link
Contributor

ymao1 commented Jan 22, 2024

Should the update API support the notification delay parameter as well?

@doakalexi
Copy link
Contributor Author

Should the update API support the notification delay parameter as well?

I added that code in the ui PR, I can add it here instead if you think that is better

@ymao1
Copy link
Contributor

ymao1 commented Jan 22, 2024

When I migrate an existing rule to this branch, and let it run with an active alert, I see the activeCount get set to 0 the first run, and then it stays zero in subsequent runs (in each run, the alert is active). Should the activeCount be getting incremented? The rule has no notification delay setting but that doesn't seem like it should impact the activeCount parameter?

@ymao1
Copy link
Contributor

ymao1 commented Jan 22, 2024

I added that code in the ui PR, I can add it here instead if you think that is better

I think that's fine...I was just trying to update an existing rule with a notification delay parameter :)

@doakalexi
Copy link
Contributor Author

I added that code in the ui PR, I can add it here instead if you think that is better

I think that's fine...I was just trying to update an existing rule with a notification delay parameter :)

ohh okay, I can add that code here if needed!

@doakalexi
Copy link
Contributor Author

When I migrate an existing rule to this branch, and let it run with an active alert, I see the activeCount get set to 0 the first run, and then it stays zero in subsequent runs (in each run, the alert is active). Should the activeCount be getting incremented? The rule has no notification delay setting but that doesn't seem like it should impact the activeCount parameter?

If there is not a notificationDelay field it will be reset in the execution handler: https://github.com/elastic/kibana/pull/174657/files#diff-8656f6dc8acf27d0e712d9c277de92a693b6e2ee3bcdae9f08f8395141f7b38cR641
I can change this code so it's incrementing, I had decided that if the user did not want to delay the alert then we didn't need to track it.

@ymao1
Copy link
Contributor

ymao1 commented Jan 22, 2024

If there is not a notificationDelay field it will be reset in the execution handler:
Gotcha 👍

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
alerting 799 803 +4
Unknown metric groups

API count

id before after diff
alerting 830 834 +4

References to deprecated APIs

id before after diff
alerting 114 116 +2

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Contributor

@ersin-erdal ersin-erdal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Tested locally, works as expected.
Just not sure if the notification should be scheduled on the third or fourth active alert when the notificationDelay is 3.

alert.getActiveCount() < this.rule.notificationDelay.active
or
alert.getActiveCount() <= this.rule.notificationDelay.active

other than this, good job :)

@doakalexi
Copy link
Contributor Author

Tested locally, works as expected. Just not sure if the notification should be scheduled on the third or fourth active alert when the notificationDelay is 3.

alert.getActiveCount() < this.rule.notificationDelay.active or alert.getActiveCount() <= this.rule.notificationDelay.active

I am not 100% sure which is correct, but I implemented this similar to how we are doing the pending recovered count for flapping.
if (alert.getPendingRecoveredCount() < flappingSettings.statusChangeThreshold) {

@doakalexi doakalexi merged commit 80640cf into elastic:main Jan 23, 2024
33 checks passed
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Jan 23, 2024
lcawl pushed a commit to lcawl/kibana that referenced this pull request Jan 26, 2024
…tic#174657)

Related to elastic#173009

## Summary

This is the first of two PRs and only focuses on the backend
implementation. This PR adds a new `notificationDelay` field to the
`Rule` object. With the delay the rule will run X times and has to match
the threshold X times before triggering actions. It won't affect the
alert recovery, but it can be expanded on easily if we want to include
recovered alerts in the future.


### Checklist

- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios


### To verify

- Use [Dev Tools](http://localhost:5601/app/dev_tools#/console) to
create a rule with the `notificationDelay`

```
POST kbn:/api/alerting/rule
{
  "params": {
    "searchType": "esQuery",
    "timeWindowSize": 5,
    "timeWindowUnit": "m",
    "threshold": [
      -1
    ],
    "thresholdComparator": ">",
    "size": 100,
    "esQuery": """{
    "query":{
      "match_all" : {}
    }
  }""",
    "aggType": "count",
    "groupBy": "all",
    "termSize": 5,
    "excludeHitsFromPreviousRun": false,
    "sourceFields": [],
    "index": [
      ".kibana-event-log*"
    ],
    "timeField": "@timestamp"
  },
  "consumer": "stackAlerts",
  "schedule": {
    "interval": "1m"
  },
  "tags": [],
  "name": "test",
  "rule_type_id": ".es-query",
  "actions": [
    {
      "group": "query matched",
      "id": "${ACTION_ID}",
      "params": {
        "level": "info",
        "message": """Elasticsearch query rule '{{rule.name}}' is active:

- Value: {{context.value}}
- Conditions Met: {{context.conditions}} over {{rule.params.timeWindowSize}}{{rule.params.timeWindowUnit}}
- Timestamp: {{context.date}}
- Link: {{context.link}}"""
      },
      "frequency": {
        "notify_when": "onActionGroupChange",
        "throttle": null,
        "summary": false
      }
    }
  ],
  "notification_delay": {
    "active": 3
  }
}
```

- Verify that the rule will not trigger actions until it has matched the
delay threshold. It might be helpful to look at rule details page and
add the Triggered actions column to easily see the action was triggered
after X consecutive active alerts
<img width="1420" alt="Screen Shot 2024-01-16 at 1 18 52 PM"
src="https://github.com/elastic/kibana/assets/109488926/85d8ceef-042c-4a52-950e-24492dc0e79f">
- Verify that the delay does not affect recovered alerts
CoenWarmer pushed a commit to CoenWarmer/kibana that referenced this pull request Feb 15, 2024
…tic#174657)

Related to elastic#173009

## Summary

This is the first of two PRs and only focuses on the backend
implementation. This PR adds a new `notificationDelay` field to the
`Rule` object. With the delay the rule will run X times and has to match
the threshold X times before triggering actions. It won't affect the
alert recovery, but it can be expanded on easily if we want to include
recovered alerts in the future.


### Checklist

- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios


### To verify

- Use [Dev Tools](http://localhost:5601/app/dev_tools#/console) to
create a rule with the `notificationDelay`

```
POST kbn:/api/alerting/rule
{
  "params": {
    "searchType": "esQuery",
    "timeWindowSize": 5,
    "timeWindowUnit": "m",
    "threshold": [
      -1
    ],
    "thresholdComparator": ">",
    "size": 100,
    "esQuery": """{
    "query":{
      "match_all" : {}
    }
  }""",
    "aggType": "count",
    "groupBy": "all",
    "termSize": 5,
    "excludeHitsFromPreviousRun": false,
    "sourceFields": [],
    "index": [
      ".kibana-event-log*"
    ],
    "timeField": "@timestamp"
  },
  "consumer": "stackAlerts",
  "schedule": {
    "interval": "1m"
  },
  "tags": [],
  "name": "test",
  "rule_type_id": ".es-query",
  "actions": [
    {
      "group": "query matched",
      "id": "${ACTION_ID}",
      "params": {
        "level": "info",
        "message": """Elasticsearch query rule '{{rule.name}}' is active:

- Value: {{context.value}}
- Conditions Met: {{context.conditions}} over {{rule.params.timeWindowSize}}{{rule.params.timeWindowUnit}}
- Timestamp: {{context.date}}
- Link: {{context.link}}"""
      },
      "frequency": {
        "notify_when": "onActionGroupChange",
        "throttle": null,
        "summary": false
      }
    }
  ],
  "notification_delay": {
    "active": 3
  }
}
```

- Verify that the rule will not trigger actions until it has matched the
delay threshold. It might be helpful to look at rule details page and
add the Triggered actions column to easily see the action was triggered
after X consecutive active alerts
<img width="1420" alt="Screen Shot 2024-01-16 at 1 18 52 PM"
src="https://github.com/elastic/kibana/assets/109488926/85d8ceef-042c-4a52-950e-24492dc0e79f">
- Verify that the delay does not affect recovered alerts
CoenWarmer pushed a commit to CoenWarmer/kibana that referenced this pull request Feb 15, 2024
…tic#174657)

Related to elastic#173009

## Summary

This is the first of two PRs and only focuses on the backend
implementation. This PR adds a new `notificationDelay` field to the
`Rule` object. With the delay the rule will run X times and has to match
the threshold X times before triggering actions. It won't affect the
alert recovery, but it can be expanded on easily if we want to include
recovered alerts in the future.


### Checklist

- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios


### To verify

- Use [Dev Tools](http://localhost:5601/app/dev_tools#/console) to
create a rule with the `notificationDelay`

```
POST kbn:/api/alerting/rule
{
  "params": {
    "searchType": "esQuery",
    "timeWindowSize": 5,
    "timeWindowUnit": "m",
    "threshold": [
      -1
    ],
    "thresholdComparator": ">",
    "size": 100,
    "esQuery": """{
    "query":{
      "match_all" : {}
    }
  }""",
    "aggType": "count",
    "groupBy": "all",
    "termSize": 5,
    "excludeHitsFromPreviousRun": false,
    "sourceFields": [],
    "index": [
      ".kibana-event-log*"
    ],
    "timeField": "@timestamp"
  },
  "consumer": "stackAlerts",
  "schedule": {
    "interval": "1m"
  },
  "tags": [],
  "name": "test",
  "rule_type_id": ".es-query",
  "actions": [
    {
      "group": "query matched",
      "id": "${ACTION_ID}",
      "params": {
        "level": "info",
        "message": """Elasticsearch query rule '{{rule.name}}' is active:

- Value: {{context.value}}
- Conditions Met: {{context.conditions}} over {{rule.params.timeWindowSize}}{{rule.params.timeWindowUnit}}
- Timestamp: {{context.date}}
- Link: {{context.link}}"""
      },
      "frequency": {
        "notify_when": "onActionGroupChange",
        "throttle": null,
        "summary": false
      }
    }
  ],
  "notification_delay": {
    "active": 3
  }
}
```

- Verify that the rule will not trigger actions until it has matched the
delay threshold. It might be helpful to look at rule details page and
add the Triggered actions column to easily see the action was triggered
after X consecutive active alerts
<img width="1420" alt="Screen Shot 2024-01-16 at 1 18 52 PM"
src="https://github.com/elastic/kibana/assets/109488926/85d8ceef-042c-4a52-950e-24492dc0e79f">
- Verify that the delay does not affect recovered alerts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.13.0
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

6 participants