Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution][Detection Engine] Signals detected between 0 and 100 go into a degraded state #77342

Closed
FrankHassanabad opened this issue Sep 14, 2020 · 2 comments · Fixed by #77658
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Feature:Detection Rules Anything related to Security Solution's Detection Rules Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:SIEM v7.10.0

Comments

@FrankHassanabad
Copy link
Contributor

Kibana version:
7.9.2

Describe the bug:

When detection engine detects signals between 0 and 100 (max signal count) it will go into a tight loop and re-examine almost the same time frame for a long time. In a way it goes through a very odd "count down" of the found signals one by one. It still operates but goes in a very slow manner "counting down" the signals and trying to index them one at a time until it exhausts them.

This is probably a search_after that is pointing to the existing first found signal and then slowly marching down or a loop somewhere that should have a "break" with a piece of boolean logic that was missed.

Steps to reproduce:
First ensure you have the debug mode of signals enabled like so in your kibana.yml:

logging.verbose: true
logging.events:
  {
    log: ['securitySolution', 'alerting', info', 'warning', 'error', 'fatal'],
    request: ['info', 'warning', 'error', 'fatal'],
    error: '*',
    ops: __no-ops__,
  }

Then create any rule which will find some signals but not zero nor 100. Make sure you do a somewhat larger look back time to hit existing already created signals to cause the odd count down.

In my example I am using the query of: source.port < 2000, a runtime of 1m and an additional look-back of 3m
Screen Shot 2020-09-14 at 7 11 21 AM

Allow the rule to run at least "twice" and then then in your logs will you see this type of behavior where it is doing the count down and the rule execution is slowing down:

server    log   [07:17:23.448] [debug][plugins][plugins][securitySolution][securitySolution] individual bulk process time took: 67.05 milliseconds
server    log   [07:17:23.449] [debug][plugins][plugins][securitySolution][securitySolution] took property says bulk took: 5 milliseconds
server    log   [07:17:23.449] [debug][plugins][plugins][securitySolution][securitySolution] ignored 5 duplicate signals
server    log   [07:17:23.449] [debug][plugins][plugins][securitySolution][securitySolution] bulk created 0 signals
server    log   [07:17:23.665] [debug][plugins][plugins][securitySolution][securitySolution] about to bulk create 4 events
server    log   [07:17:23.730] [debug][plugins][plugins][securitySolution][securitySolution] individual bulk process time took: 64.85 milliseconds
server    log   [07:17:23.730] [debug][plugins][plugins][securitySolution][securitySolution] took property says bulk took: 5 milliseconds
server    log   [07:17:23.730] [debug][plugins][plugins][securitySolution][securitySolution] ignored 4 duplicate signals
server    log   [07:17:23.731] [debug][plugins][plugins][securitySolution][securitySolution] bulk created 0 signals
server    log   [07:17:23.731] [debug][plugins][plugins][securitySolution][securitySolution] created 0 signals name: "siem kibana" id: "ffcf4e04-7c85-4c35-8428-531bd9fe9377" rule id: "66a0f11d-bcc5-44ba-a74b-2bcd5b3a4388" signals index: ".siem-signals-hassanabad3-default"
server    log   [07:17:23.972] [debug][plugins][plugins][securitySolution][securitySolution] about to bulk create 3 events
server    log   [07:17:24.023] [debug][plugins][plugins][securitySolution][securitySolution] individual bulk process time took: 51.07 milliseconds
server    log   [07:17:24.024] [debug][plugins][plugins][securitySolution][securitySolution] took property says bulk took: 4 milliseconds
server    log   [07:17:24.024] [debug][plugins][plugins][securitySolution][securitySolution] ignored 3 duplicate signals
server    log   [07:17:24.024] [debug][plugins][plugins][securitySolution][securitySolution] bulk created 0 signals
server    log   [07:17:24.024] [debug][plugins][plugins][securitySolution][securitySolution] created 0 signals name: "siem kibana" id: "ffcf4e04-7c85-4c35-8428-531bd9fe9377" rule id: "66a0f11d-bcc5-44ba-a74b-2bcd5b3a4388" signals index: ".siem-signals-hassanabad3-default"
server    log   [07:17:24.314] [debug][plugins][plugins][securitySolution][securitySolution] about to bulk create 2 events
server    log   [07:17:24.365] [debug][plugins][plugins][securitySolution][securitySolution] individual bulk process time took: 49.72 milliseconds
server    log   [07:17:24.365] [debug][plugins][plugins][securitySolution][securitySolution] took property says bulk took: 2 milliseconds
server    log   [07:17:24.365] [debug][plugins][plugins][securitySolution][securitySolution] ignored 2 duplicate signals
server    log   [07:17:24.365] [debug][plugins][plugins][securitySolution][securitySolution] bulk created 0 signals
server    log   [07:17:24.365] [debug][plugins][plugins][securitySolution][securitySolution] created 0 signals name: "siem kibana" id: "ffcf4e04-7c85-4c35-8428-531bd9fe9377" rule id: "66a0f11d-bcc5-44ba-a74b-2bcd5b3a4388" signals index: ".siem-signals-hassanabad3-default"
server    log   [07:17:24.688] [debug][plugins][plugins][securitySolution][securitySolution] about to bulk create 1 events
server    log   [07:17:24.713] [debug][plugins][plugins][securitySolution][securitySolution] individual bulk process time took: 24.77 milliseconds
server    log   [07:17:24.713] [debug][plugins][plugins][securitySolution][securitySolution] took property says bulk took: 3 milliseconds
server    log   [07:17:24.713] [debug][plugins][plugins][securitySolution][securitySolution] ignored 1 duplicate signals
server    log   [07:17:24.714] [debug][plugins][plugins][securitySolution][securitySolution] bulk created 0 signals
server    log   [07:17:24.714] [debug][plugins][plugins][securitySolution][securitySolution] created 0 signals name: "siem kibana" id: "ffcf4e04-7c85-4c35-8428-531bd9fe9377" rule id: "66a0f11d-bcc5-44ba-a74b-2bcd5b3a4388" signals index: ".siem-signals-hassanabad3-default"

Expected behavior:
It should be performant and quick to skip over the signals it already found.

@FrankHassanabad FrankHassanabad added bug Fixes for quality problems that affect the customer experience Team:SIEM Feature:Detection Rules Anything related to Security Solution's Detection Rules labels Sep 14, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/siem (Team:SIEM)

FrankHassanabad added a commit that referenced this issue Sep 17, 2020
…ls detected are between 0 and 100 (#77658)

## Summary

Fixes: #77342

No easy way to unit test/end to end test this as it was operating correctly before and passed all of our tests, it was just running in a slow state if you had between 0 and 100 signals. The best bet is that you hand run the tests from 77342 or use a large look back time to ensure de-duplicate does not run as outlined in 77342.


Also this PR removes a TODO block, a complexity linter issue we had, a few await that were there by accident, and pushes down arrays to make things to be cleaner.
FrankHassanabad added a commit to FrankHassanabad/kibana that referenced this issue Sep 17, 2020
…ls detected are between 0 and 100 (elastic#77658)

## Summary

Fixes: elastic#77342

No easy way to unit test/end to end test this as it was operating correctly before and passed all of our tests, it was just running in a slow state if you had between 0 and 100 signals. The best bet is that you hand run the tests from 77342 or use a large look back time to ensure de-duplicate does not run as outlined in 77342.


Also this PR removes a TODO block, a complexity linter issue we had, a few await that were there by accident, and pushes down arrays to make things to be cleaner.
FrankHassanabad added a commit that referenced this issue Sep 17, 2020
…ls detected are between 0 and 100 (#77658) (#77797)

## Summary

Fixes: #77342

No easy way to unit test/end to end test this as it was operating correctly before and passed all of our tests, it was just running in a slow state if you had between 0 and 100 signals. The best bet is that you hand run the tests from 77342 or use a large look back time to ensure de-duplicate does not run as outlined in 77342.


Also this PR removes a TODO block, a complexity linter issue we had, a few await that were there by accident, and pushes down arrays to make things to be cleaner.
@MindyRS MindyRS added the Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. label Oct 27, 2020
@ghost
Copy link

ghost commented Nov 25, 2020

Bug Conversion:

This ticket requires Dev validation

Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Detection Rules Anything related to Security Solution's Detection Rules Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:SIEM v7.10.0
Projects
None yet
5 participants