Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Alerting] Search alert #88528

Merged
merged 51 commits into from
Jan 29, 2021
Merged

[Alerting] Search alert #88528

merged 51 commits into from
Jan 29, 2021

Conversation

ymao1
Copy link
Contributor

@ymao1 ymao1 commented Jan 15, 2021

Resolves #61313

Summary

New stack alert for executing ES DSL (query only, no aggregation support) and evaluating the number of matches against a threshold condition.

UI

Alert params expression contains a JSON editor allowing users to enter query DSL and test their query. Screenshots attached.

Search alert in alert types list Screen Shot 2021-01-22 at 11 05 17 AM
Search alert params Screen Shot 2021-01-25 at 12 11 45 PM
Test query success Screen Shot 2021-01-25 at 12 11 54 PM
Test query failure Screen Shot 2021-01-25 at 12 12 06 PM

Alert executor

Each run of the alert executor builds an ES query with the user-defined query clause and the user defined time window relative to the date/time of the alert execution. In order to avoid counting a single document multiple times during multiple executions of the alert, we keep track of the timestamp of the last document that matched a threshold condition. This is passed through the alert state and used during the next execution as an additional filter for the query.

Action context

Matching documents are passed through to actions using context.hits. This can be used with mustache templates to create a message for each matching document containing information about that document.

Example action message
You have {{context.value}} matches!
{{#context.hits}}
  Document with {{_id}} and hostname {{_source.host.name}} has {{_source.system.memory.actual.free}} bytes of memory free
{{/context.hits}}
Resulting Slack Notification Screen Shot 2021-01-25 at 12 09 54 PM

Docs

Documentation Screen Shot 2021-01-25 at 1 53 24 PM Screen Shot 2021-01-25 at 1 53 37 PM Screen Shot 2021-01-25 at 1 53 49 PM

Checklist

Delete any items that are not applicable to this PR.

… index popoover into component for reuse between index threshold and es query alert types
…r every document if threshold is not defined
@mdefazio
Copy link
Contributor

I think this implementation works. I like simply having text that shows the result of the test. It's right next to the test button and toasts tend to cover the flyout buttons anyway.

Copy link
Contributor

@gmmorris gmmorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!
I've left a bunch of smaller notes, but other than a crash and a couple of missing pieces (like root cause of the failure to parse a query), I'd say this looks great for the first piece of the puzzle. Well done :)

Copy link
Contributor

@YulNaumenko YulNaumenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! One more builtin alert type!! 👍

@ymao1
Copy link
Contributor Author

ymao1 commented Jan 28, 2021

@elasticmachine merge upstream

Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't do a complete review yet (sorry!), but did poke through the alert type itself, and have some questions. Will add another (more complete) review, but don't wait for me to merge ...

// During each alert execution, we run the configured query, get a hit count
// (hits.total) and retrieve up to DEFAULT_MAX_HITS_PER_EXECUTION hits. We
// evaluate the threshold condition using the value of hits.total. If the threshold
// condition is met, the hits are counted toward the query match and we update
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should call this alert "query threshold" instead of "query". At some point, we'll probably need to create a completely generic query alert, no threshold, no windowing, just whatever the customer wants to query for, and presumably provide the complete ES response in a context variable.

// evaluate the threshold condition using the value of hits.total. If the threshold
// condition is met, the hits are counted toward the query match and we update
// the alert state with the timestamp of the latest hit. In the next execution
// of the alert, the latestTimestamp will be used to gate the query in order to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem with this is it will miss documents which got ingested late. Eg, alert runs at 12:00. A document gets added at 12:01, but it's actual date (being queried over) is 11:59. The alert would presumably never see it.

Since we already have the "window" parameters like other alerts, I wonder if we just shouldn't do this at all. Eg, we could do the same for index threshold, but don't. Should we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting point. I based this logic off of how the Detection Rule Threshold Alert works. I can ask them how they handle this type of scenario.

@ymao1
Copy link
Contributor Author

ymao1 commented Jan 28, 2021

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
stackAlerts 41 61 +20

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
stackAlerts 89.5KB 712.4KB ⚠️ +622.9KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
alerts 64.7KB 65.4KB +698.0B
stackAlerts 16.6KB 21.5KB +4.9KB
total +5.6KB
Unknown metric groups

async chunk count

id before after diff
stackAlerts 3 4 +1

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@ymao1 ymao1 merged commit 0491351 into elastic:master Jan 29, 2021
ymao1 added a commit to ymao1/kibana that referenced this pull request Jan 29, 2021
* Adding es query alert type to server with commented out executor

* Adding skeleton es query alert to client with JSON editor. Pulled out index popoover into component for reuse between index threshold and es query alert types

* Implementing alert executor that performs query and matches condition against doc count

* Added tests for server side alert type

* Updated alert executor to de-duplicate matches and create instance for every document if threshold is not defined

* Moving more index popover code out of index threshold and es query expression components

* Ability to remove threshold condition from es query alert

* Validation tests

* Adding ability to test out query. Need to add error handling and it looks ugly

* Fixing bug with creating alert with threshold and i18n

* wip

* Fixing tests

* Simplifying executor logic to only handle threshold and store hits in action context

* Adding functional test for es query alert

* Types

* Adding functional test for query testing

* Fixing unit test

* Adding link to ES docs. Cleaning up logger statements

* Adding docs

* Updating docs based on feedback

* PR fixes

* Using ES client typings

* Fixing unit test

* Fixing copy based on comments

* Fixing copy based on comments

* Fixing bug in index select popover

* Fixing unit tests

* Making track_total_hits configurable

* Fixing functional test

* PR fixes

* Added unit test

* Removing unused import

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
@111andre111
Copy link

111andre111 commented Jan 29, 2021

@ymao1 Thank you this is extremely mega cool. ❤️

ymao1 added a commit that referenced this pull request Jan 29, 2021
* [Alerting] Search alert (#88528)

* Adding es query alert type to server with commented out executor

* Adding skeleton es query alert to client with JSON editor. Pulled out index popoover into component for reuse between index threshold and es query alert types

* Implementing alert executor that performs query and matches condition against doc count

* Added tests for server side alert type

* Updated alert executor to de-duplicate matches and create instance for every document if threshold is not defined

* Moving more index popover code out of index threshold and es query expression components

* Ability to remove threshold condition from es query alert

* Validation tests

* Adding ability to test out query. Need to add error handling and it looks ugly

* Fixing bug with creating alert with threshold and i18n

* wip

* Fixing tests

* Simplifying executor logic to only handle threshold and store hits in action context

* Adding functional test for es query alert

* Types

* Adding functional test for query testing

* Fixing unit test

* Adding link to ES docs. Cleaning up logger statements

* Adding docs

* Updating docs based on feedback

* PR fixes

* Using ES client typings

* Fixing unit test

* Fixing copy based on comments

* Fixing copy based on comments

* Fixing bug in index select popover

* Fixing unit tests

* Making track_total_hits configurable

* Fixing functional test

* PR fixes

* Added unit test

* Removing unused import

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

* Fixing typings

* Fixing query to use ISO string instead of millis

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
@mikecote mikecote added release_note:feature Makes this part of the condensed release notes needs_docs and removed release_note:skip Skip the PR/issue when compiling release notes labels Feb 2, 2021
@alexfrancoeur
Copy link

I'll echo that last comment, extremely mega mega cool. I can't say how excited I am about this alert type and the functionality it unlocks!

I also have to applaud you @ymao1 for the detail in the PR description. That takes time. As someone who isn't able to follow as closely, I was able to clearly understand what this PR provided. So, thanks for taking the time to share the details 👏👏👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Alerting needs_docs release_note:feature Makes this part of the condensed release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v7.12.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New "search alert" type based on ES dsl query and hit count