Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution][Detection Engine] adds alert suppression to ES|QL rule type #180927

Open
wants to merge 63 commits into
base: main
Choose a base branch
from

Conversation

vitaliidm
Copy link
Contributor

@vitaliidm vitaliidm commented Apr 16, 2024

Summary

  • addresses https://github.com/elastic/security-team/issues/9203
  • adds alert suppression for new terms rule type
  • similarly to custom investigation fields list of available suppression fields:
    • shows only ES|QL fields returned in query for aggregating queries
    • shows ES|QL fields returned in query + index fields for non-aggregating queries. Since resulted alerts for this type of query, are enriched with source documents.

Demo

  1. run esql rule w/o suppression
  2. run esql rule w/ suppression per rule execution. Since ES|QL query is aggregating, no alerts suppressed on already agrregated field host.ip
  3. run suppression on interval 20m
  4. run suppression for custom ES|QL field which is the same as host.ip, hence same results
  5. run suppression on interval 100m
_esql.suppression.mov

Limitations

Since suppressed alerts deduplication relies on alert timestamps, sorting of results other than @timestamp asc in ES|QL query may impact on number of suppressed alerts, when number of possible alerts more than max_signals.
This affects only non-aggregating queries, since suppression boundaries for these alerts set as rule execution time

Checklist

  • Functional changes are hidden behind a feature flag

    Feature flag alertSuppressionForEsqlRuleEnabled

  • Functional changes are covered with a test plan and automated tests.

  • Stability of new and changed tests is verified using the Flaky Test Runner.

  • Comprehensive manual testing is done by two engineers: the PR author and one of the PR reviewers. Changes are tested in both ESS and Serverless.

  • Mapping changes are accompanied by a technical design document. It can be a GitHub issue or an RFC explaining the changes. The design document is shared with and approved by the appropriate teams and individual stakeholders.

    Existing AlertSuppression schema field is used for ES|QL rule, the one that already used for Query, New terms and IM rules.

        alert_suppression:
          $ref: './common_attributes.schema.yaml#/components/schemas/AlertSuppression'

    where

        AlertSuppression:
          type: object
          properties:
            group_by:
              $ref: '#/components/schemas/AlertSuppressionGroupBy'
            duration:
              $ref: '#/components/schemas/AlertSuppressionDuration'
            missing_fields_strategy:
              $ref: '#/components/schemas/AlertSuppressionMissingFieldsStrategy'
          required:
            - group_by
  • Functional changes are communicated to the Docs team. A ticket or PR is opened in https://github.com/elastic/security-docs. The following information is included: any feature flags used, affected environments (Serverless, ESS, or both).

@vitaliidm vitaliidm added Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:Detection Engine Security Solution Detection Engine Area v8.15.0 labels Apr 16, 2024
@vitaliidm vitaliidm self-assigned this Apr 16, 2024
vitaliidm and others added 24 commits April 16, 2024 15:09
# Conflicts:
#	x-pack/plugins/security_solution/public/detection_engine/rule_creation_ui/components/step_define_rule/use_experimental_feature_fields_transform.ts
#	x-pack/plugins/security_solution/public/detection_engine/rule_management/logic/use_alert_suppression.test.tsx
#	x-pack/plugins/security_solution/public/detection_engine/rule_management/logic/use_alert_suppression.tsx
#	x-pack/test/security_solution_cypress/config.ts
@vitaliidm vitaliidm requested review from a team as code owners May 9, 2024 10:28
@vitaliidm vitaliidm requested a review from nkhristinin May 9, 2024 10:28
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@vitaliidm vitaliidm requested a review from nikitaindik May 9, 2024 10:28
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detection-engine (Team:Detection Engine)

# Conflicts:
#	x-pack/plugins/security_solution/public/detection_engine/rule_creation/logic/esql_validator.ts
#	x-pack/plugins/security_solution/public/detection_engine/rule_creation_ui/hooks/use_all_esql_rule_fields.test.ts
#	x-pack/plugins/security_solution/public/detection_engine/rule_creation_ui/hooks/use_all_esql_rule_fields.ts
/**
* generates id for ES|QL alert
*/
export const generateAlertId = ({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add comment here and describe why we need to use hash, and not just provide unique id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nkhristinin
Copy link
Contributor

Hey, thanks for you work! I did test it locally and have some questions.

I do have this index:

{
  "my-index": {
    "mappings": {
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "host": {
          "properties": {
            "name": {
              "type": "keyword"
            }
          }
        },
        "user": {
          "properties": {
            "name": {
              "type": "long"
            }
          }
        }
      }
    }
  }
}

this what I have without suppression:

Screenshot 2024-05-14 at 12 04 52

suppression with host.name works as expected.

Question 1
I think we already discussed it, but want to just double check. It's ok, than in user.name we do have some "random" values?

Screenshot 2024-05-14 at 12 04 39

Question 2

When I make change query a bit, suppression works - but there some not unique values still.
Screenshot 2024-05-14 at 12 04 23

Question 3

There some duplication when you choose a field

Screen.Recording.2024-05-14.at.12.00.05.mov

@vitaliidm
Copy link
Contributor Author

@nkhristinin

Question 1
I think we already discussed it, but want to just double check. It's ok, than in user.name we do have some "random" values?

Yes, this is expected as only one alert is created. The rest are supppressed. So, the value you see is the value from event, which is used to create alert.
Partly it is covered in https://www.elastic.co/guide/en/security/8.13/alert-suppression.html,

Normally, when a rule meets its criteria repeatedly, it creates multiple alerts, one for each time the rule’s criteria are met. When alert suppression is configured, duplicate qualifying events are grouped, and only one alert is created for each group.

So, these "random" values are just values from the only created alert.

Question 2
When I make change query a bit, suppression works - but there some not unique values still.

You suppression configuration is to suppress per rule execution only. That means, at least one alert per group would be created during the rule execution. By default you would have 12(5m interval) rule execution in preview, so if your events appear in different executions, an alert for each one would be created.
You can switch suppression to Suppress on time interval, increase it to cover multiple rule executions(let's say 30m) and you should not see any alerts with the same values on that interval (30m)

@vitaliidm
Copy link
Contributor Author

@nkhristinin

Question 3
There some duplication when you choose a field

Duplicate fields removed in 131929e

Copy link
Member

@MadameSheema MadameSheema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sec-eng-prod changes LGTM!


return getIndexListFromIndexString(indexString);
return getIndexListFromIndexString(indexString);
} catch (e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we log this error somewhere? Say a user thought their rule is correct, would they ever be alerted that something could be going wrong at this step? It'd likely error if there was some kind of syntax error, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if there is a syntax error, AST helper should still return valid AST tree.
But there is a particular case when it crashes - typing [ after index.

The error is shown already in console.

Screenshot 2024-05-16 at 12 39 55

Also reported it way back for Kibana team: #182012

@vitaliidm vitaliidm requested a review from yctercero May 16, 2024 11:47
@kibana-ci
Copy link
Collaborator

kibana-ci commented May 16, 2024

⏳ Build in-progress, with failures

Failed CI Steps

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @vitaliidm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:feature Makes this part of the condensed release notes Team:Detection Engine Security Solution Detection Engine Area Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants