Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution][Detections] Rule execution logging overhaul #121644

Merged

Conversation

banderror
Copy link
Contributor

@banderror banderror commented Dec 20, 2021

Epic: #118324
Tickets: #119603, #119597, #91265, #118511

Summary

The legacy rule execution logging implementation is replaced by a new one that introduces a new model for execution-related data, a new saved object and a new, cleaner interface and implementation.

  • The legacy data model is deleted (IRuleStatusResponseAttributes, IRuleStatusSOAttributes)
  • The legacy siem-detection-engine-rule-status saved object type is deleted and marked as deleted in src/core
  • A new data model is introduced (x-pack/plugins/security_solution/common/detection_engine/schemas/common/rule_monitoring.ts). This data model doesn't contain a mixture of successful and failed statuses, which should simplify client-side code (e.g. the code of Rule Management and Monitoring tables, as well as Rule Details page).
  • A new siem-detection-engine-rule-execution-info saved object is introduced (x-pack/plugins/security_solution/server/lib/detection_engine/rule_execution_log/rule_execution_info/saved_object.ts).
    • This SO has 1:1 association with the rule SO, so every rule can have 0 or 1 execution info associated with it. This SO is used in order to 1) update the last execution status and metrics and 2) fetch execution data for N rules more efficiently comparing to the legacy SO.
    • The logic of creating or updating this SOs is based on the "upsert" approach (planned in [Security Solution] Improve RuleExecutionLog performance #118511). It does not fetch the SO by rule id before updating it anymore.
  • Rule execution logging logic is rewritten (see x-pack/plugins/security_solution/server/lib/detection_engine/rule_execution_log). The previous rule execution log client is split into two objects: IRuleExecutionLogClient for using it from route handlers, and IRuleExecutionLogger for writing logs from rule executors.
    • IRuleExecutionLogger instance is scoped to the currently executing rule and space id. There's no need to pass rule id, name, type etc to .logStatusChange() every time.
  • Rule executors and related functions are updated.
  • API routes are updated, including the rule preview route which uses a special "spy" implementation of IRuleExecutionLogger. A rule returned from an API endpoint now has optional execution_summary field of type RuleExecutionSummary.
  • UI is updated to use the new data model of RuleExecutionSummary:
    • Rule Management and Monitoring tables
    • Rule Details page
  • A new API route is introduced for fetching rule execution events: /internal/detection_engine/rules/{ruleId}/execution/events. It is used for rendering the Failure History tab (last 5 failures) and is intended to be used in the coming UI of Rule Execution Log on the Details page.
  • Rule Details page and Failure History tab are updated to use the new data models and API routes.
  • I used react-query for fetching execution events
    • See x-pack/plugins/security_solution/public/detections/containers/detection_engine/rules/use_rule_execution_events.tsx
    • The lib is updated to the latest version
  • Tests and fixed and updated according to all the changes
  • Components related to rule execution statuses are all moved to x-pack/plugins/security_solution/public/detections/components/rules/rule_execution_status.
  • I left a lot of // TODO: https://github.com/elastic/kibana/pull/121644 comments in the code which I'm planning to address and remove in a follow-up PR. Lots of clean up work is needed, but I'd like to unblock the work on Rule Execution Log UI.

In the next episodes

  • Address and remove // TODO: https://github.com/elastic/kibana/pull/121644 comments in the code
  • Make sure that SO id generation for siem-detection-engine-rule-execution-info is safe and future-proof. Sync with the Core team. If there are risks, we will need to choose between risks and performance (reading the SO before updating it). It would be easy to submit a fix if needed.
  • Add APM integration. Use withSecuritySpan in methods of rule_execution_log citizens.
  • Add comments to the code and README.
  • Add test coverage.
  • Etc...

Checklist

Delete any items that are not applicable to this PR.

  • Unit or functional tests were updated or added to match the most common scenarios
  • Any UI touched in this PR is usable by keyboard only (learn more about keyboard accessibility)
  • Any UI touched in this PR does not create any new axe failures (run axe in browser: FF, Chrome)
  • If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the docker list
  • This renders correctly on smaller devices using a responsive layout. (You can test this in your browser)
  • This was checked for cross-browser compatibility

For maintainers

@banderror banderror added performance refactoring technical debt Improvement of the software architecture and operational architecture release_note:skip Skip the PR/issue when compiling release notes Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.1.0 Feature:Rule Monitoring Security Solution Detection Rule Monitoring Team:Detection Rule Management Security Detection Rule Management Team labels Dec 20, 2021
@banderror banderror self-assigned this Dec 20, 2021
@banderror banderror force-pushed the new-data-model-for-rule-statuses-and-metrics branch 3 times, most recently from 771a9ee to df40deb Compare December 22, 2021 14:33
Copy link
Contributor

@xcrzx xcrzx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this PR a cursory review, looks good so far 👍
Left a couple of comments.

@banderror banderror force-pushed the new-data-model-for-rule-statuses-and-metrics branch 6 times, most recently from 3ef5434 to 95f66dc Compare December 29, 2021 21:50
@banderror banderror removed the skip-ci label Dec 30, 2021
@banderror banderror force-pushed the new-data-model-for-rule-statuses-and-metrics branch from 994be34 to 927f810 Compare December 30, 2021 03:26
@banderror banderror force-pushed the new-data-model-for-rule-statuses-and-metrics branch 7 times, most recently from f62a19e to 0d884be Compare January 14, 2022 14:56
Copy link
Member

@spong spong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked out, desk tested locally, and code reviewed concepts and arch changes. Verified upgrade that the old status SO's aren't migrated over (though one nit about potential leftover mappings we need to talk to core about), and tested rule failures and UI around the the new rule-execution-info SO.

You've packed quite a bit in here @banderror! Thank you for all the cleanup and added plumbing for the new Rule Execution Events table! Approving now as we've discussed follow-up changes offline in a team architecture review and this is a hefty PR to be left open for too long. Appreciate the TODO:'s and contextual comments as well @banderror -- LGTM! 👍 🚀

FrankHassanabad pushed a commit that referenced this pull request Jan 20, 2022
## Summary

New ECS FieldMap was generated in #123012, however since it only contained changes to `Rule Registry` code the `Security Solution` Cypress tests were not run, and thus did not catch this field change.

See #122661 (comment) for details. Confirmed w/ @madirey that expected value is indeed `5` now that `host.geo.continent_code` has been [added](https://github.com/elastic/kibana/pull/123012/files#diff-a1647ccb73ef26c8c8b6aefd87084504b146af72fcb088ccacad93fcaad15b69R1524-R1528).


Some failing PR's from `main`:
#123357
#121644
#123352

### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Jan 20, 2022
…123429)

## Summary

New ECS FieldMap was generated in elastic#123012, however since it only contained changes to `Rule Registry` code the `Security Solution` Cypress tests were not run, and thus did not catch this field change.

See elastic#122661 (comment) for details. Confirmed w/ @madirey that expected value is indeed `5` now that `host.geo.continent_code` has been [added](https://github.com/elastic/kibana/pull/123012/files#diff-a1647ccb73ef26c8c8b6aefd87084504b146af72fcb088ccacad93fcaad15b69R1524-R1528).

Some failing PR's from `main`:
elastic#123357
elastic#121644
elastic#123352

### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios

(cherry picked from commit d6917fc)
@spong
Copy link
Member

spong commented Jan 20, 2022

@elasticmachine merge upstream

kibanamachine added a commit that referenced this pull request Jan 20, 2022
…#123433)

## Summary

New ECS FieldMap was generated in #123012, however since it only contained changes to `Rule Registry` code the `Security Solution` Cypress tests were not run, and thus did not catch this field change.

See #122661 (comment) for details. Confirmed w/ @madirey that expected value is indeed `5` now that `host.geo.continent_code` has been [added](https://github.com/elastic/kibana/pull/123012/files#diff-a1647ccb73ef26c8c8b6aefd87084504b146af72fcb088ccacad93fcaad15b69R1524-R1528).

Some failing PR's from `main`:
#123357
#121644
#123352

### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios

(cherry picked from commit d6917fc)

Co-authored-by: Garrett Spong <spong@users.noreply.github.com>
Copy link
Contributor

@xcrzx xcrzx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, we can clean up this implementation in follow-up PRs. Let's merge this guy to unblock user-facing features development. Thanks for your refactoring efforts, Georgii!

@banderror banderror force-pushed the new-data-model-for-rule-statuses-and-metrics branch from 7a3d986 to fb564af Compare January 20, 2022 12:17
@banderror banderror force-pushed the new-data-model-for-rule-statuses-and-metrics branch from fb564af to f2fe48b Compare January 20, 2022 17:56
@banderror banderror force-pushed the new-data-model-for-rule-statuses-and-metrics branch from f2fe48b to f4795b5 Compare January 20, 2022 19:18
@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
securitySolution 2852 2855 +3

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
osquery 939.1KB 941.1KB +2.0KB
securitySolution 4.6MB 4.6MB +1.8KB
total +3.8KB

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
securitySolution 19 18 -1

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
securitySolution 245.9KB 245.6KB -345.0B

Saved Objects .kibana field count

Every field in each saved object type adds overhead to Elasticsearch. Kibana needs to keep the total field count below Elasticsearch's default limit of 1000 fields. Only specify field mappings for the fields you wish to search on or query. See https://www.elastic.co/guide/en/kibana/master/development-plugin-saved-objects.html#_mappings

id before after diff
siem-detection-engine-rule-execution-info - 10 +10
siem-detection-engine-rule-status 11 - -11
total -1
Unknown metric groups

ESLint disabled line counts

id before after diff
securitySolution 455 448 -7

Total ESLint disabled count

id before after diff
securitySolution 523 516 -7

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @banderror

@banderror
Copy link
Contributor Author

banderror commented Jan 20, 2022

Thanks everyone for your feedback! I'm merging this PR and will start addressing the remaining comments here and items from the In the next episodes sections on Monday. It will go in a follow-up PR which we'll need to merge before 8.1 FF. This is needed to unblock #119598 and #119599 so that @spong could build the new Rule Execution Log UI on top of this PR much easier.

@banderror banderror added the backport:skip This commit does not require backporting label Jan 20, 2022
@banderror banderror merged commit ba0833f into elastic:main Jan 20, 2022
@banderror banderror deleted the new-data-model-for-rule-statuses-and-metrics branch January 20, 2022 21:17
@banderror banderror linked an issue Jan 20, 2022 that may be closed by this pull request
6 tasks
ogupte pushed a commit to ogupte/kibana that referenced this pull request Jan 28, 2022
…123429)

## Summary

New ECS FieldMap was generated in elastic#123012, however since it only contained changes to `Rule Registry` code the `Security Solution` Cypress tests were not run, and thus did not catch this field change.

See elastic#122661 (comment) for details. Confirmed w/ @madirey that expected value is indeed `5` now that `host.geo.continent_code` has been [added](https://github.com/elastic/kibana/pull/123012/files#diff-a1647ccb73ef26c8c8b6aefd87084504b146af72fcb088ccacad93fcaad15b69R1524-R1528).


Some failing PR's from `main`:
elastic#123357
elastic#121644
elastic#123352

### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
spong pushed a commit that referenced this pull request Feb 2, 2022
…w-up (#124194)

**Related to:** #121644
**Addresses:** #86202

## Summary

Done in this PR:

- Removed the deprecated `warning` rule execution status ([comment](#121644 (comment))).
- Added a new `running` status ([ticket](#86202)).
- Simplified the internal implementation of the `rule_execution_log` folder. Hopefully naming of folders, files and interfaces is clearer now as well. ([comment](#121644 (comment)), [comment](#121644 (comment)))
- Added APM measurements with `withSecuritySpan`.
- Added rule id to the react-query key used for loading last rule failures ([comment](#124198 (comment)))
- Addressed most of the `// TODO: #121644 comments

In the next PR that could be merged after the FF I'd address the rest of the stuff:

- Add comments to all the interfaces and methods in the `rule_execution_log` folder. Write a readme for it.
- Address the remaining of the `// TODO: #121644 comments. All of them are related to tests.
- Fix for the gap column ([comment](#121644 (comment)))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Rule Monitoring Security Solution Detection Rule Monitoring performance refactoring release_note:skip Skip the PR/issue when compiling release notes Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. technical debt Improvement of the software architecture and operational architecture v8.1.0
Projects
None yet