[Security Solution][Detections] Reading last 5 failures from Event Log v1 - raw implementation #115574

banderror · 2021-10-19T14:31:40Z

Summary

TL;DR: New internal endpoint for reading data from Event Log (raw version), legacy status SO under the hood.

With this PR we now read the Failure History (last 5 failures) on the Rule Details page from Event Log. We continue getting the Current Status from the legacy siem-detection-engine-rule-status saved objects. Rule Management page also gets data from the legacy saved objects.

Deprecate existing methods for reading data in IRuleExecutionLogClient: .find() and .findBulk()
Introduce new methods for reading data in IRuleExecutionLogClient:
- for reading last N execution events for 1 rule from event log
- for reading current status and metrics for 1 rule from legacy status SOs
- for reading current statuses and metrics for N rules from legacy status SOs
New methods should return data in the legacy status SO format.
Update all the existing endpoints that depend on IRuleExecutionLogClient to use the new methods.
Implement a new internal endpoint for fetching current status of the rule execution and execution events from Event Log for a given rule.
The API of the new endpoint should be the same as rules/_find_statuses to minimise changes in the app.
Use the new endpoint on the Rule Details page.

Near-term plan for technical implementation of the Rule Execution Log (#101013)

Stage 1. Reading last 5 failures from Event Log v1 - raw implementation - ✔️ done in this PR

TL;DR: New internal endpoint for reading data from Event Log (raw version), legacy status SO under the hood.

Deprecate existing methods for reading data in IRuleExecutionLogClient: .find() and .findBulk()
Introduce new methods for reading data in IRuleExecutionLogClient:
- for reading last N execution events for 1 rule from event log
- for reading current status and metrics for 1 rule from legacy status SOs
- for reading current statuses and metrics for N rules from legacy status SOs
New methods should return data in the legacy status SO format.
Update all the existing endpoints that depend on IRuleExecutionLogClient to use the new methods.
Implement a new internal endpoint for fetching current status of the rule execution and execution events from Event Log for a given rule.
The API of the new endpoint should be the same as rules/_find_statuses to minimise changes in the app.
Use the new endpoint on the Rule Details page.

Stage 2: Reading last 5 failures from Event Log v2 - clean implementation

TL;DR: Clean HTTP API, legacy Rule Status SO under the hood.

🚨🚨🚨 Possible breaking changes in Detections API 🚨🚨🚨

Design a new data model for the Current Rule Execution Info (the TO-BE new SO type and later the TO-BE data in the rule object itself).
Design a new data model for the Rule Execution Event (read model to be used on the Rule Details page)
Think over changes in IRuleExecutionLogClient to support the new data model.
Think over changes in all the endpoints that return any data related to rule monitoring (statuses, metrics, etc). Make sure to check our docs to identify what's documented there regarding rule monitoring.
Update IRuleExecutionLogClient to return data in the new format.
Update all the endpoints (including the raw new one) to return data in the new format.
Update Rule Details page to consume data in the new format.
Update Rule Management page to consume data in the new format.

Stage 3: Reading last 5 failures from Event Log v3 - new SO

TL;DR: Clean HTTP API, new Rule Execution Info SO under the hood.

Implement a new SO type for storing the current rule execution info. Relation type: 1 rule - 1 current execution info.
Swap the legacy SO with the new SO in the implementation of IRuleExecutionLogClient.

Stage 4: Cleanup and misc

Revisit the problem of deterministic ordering (comment)
Remove rule execution log's glue code: adapters, feature switch.
Remove the legacy rule status SO.
Mark the legacy rule status SO as deleted in Kibana Core.
Encapsulate the current space id in the instance of IRuleExecutionLogClient. Remove it from parameters of its methods.
Introduce a Rule Execution Logger scoped to a rule instance. For use in rule executors.
Add test coverage.

Checklist

Delete any items that are not applicable to this PR.

Unit or functional tests were updated or added to match the most common scenarios

kibanamachine · 2021-10-20T06:17:23Z

💔 Build Failed

continuous-integration/kibana-ci/pull-request
Commit: 66be0c7
Storybooks Preview
Pipeline Steps (look for red circles / failed steps)
Interpreting CI Failures
Documentation Changes

Failed CI Steps

Test Failures

Kibana Pipeline / general / X-Pack API Integration Tests.x-pack/test/api_integration/apis/ml/jobs/categorization_field_examples·ts.apis Machine Learning jobs Categorization example endpoint - partially valid, more than 75% are null

Link to Jenkins

Standard Out

Failed Tests Reporter:
  - Test has not failed recently on tracked branches

[00:00:00]     │
[00:00:00]       └-: apis
[00:00:00]         └-> "before all" hook in "apis"
[00:11:28]         └-: Machine Learning
[00:11:28]           └-> "before all" hook in "Machine Learning"
[00:11:28]           └-> "before all" hook in "Machine Learning"
[00:11:28]             │ debg creating role ft_ml_source
[00:11:28]             │ info [o.e.x.s.a.r.TransportPutRoleAction] [node-01] added role [ft_ml_source]
[00:11:28]             │ debg creating role ft_ml_source_readonly
[00:11:28]             │ info [o.e.x.s.a.r.TransportPutRoleAction] [node-01] added role [ft_ml_source_readonly]
[00:11:28]             │ debg creating role ft_ml_dest
[00:11:28]             │ info [o.e.x.s.a.r.TransportPutRoleAction] [node-01] added role [ft_ml_dest]
[00:11:28]             │ debg creating role ft_ml_dest_readonly
[00:11:28]             │ info [o.e.x.s.a.r.TransportPutRoleAction] [node-01] added role [ft_ml_dest_readonly]
[00:11:28]             │ debg creating role ft_ml_ui_extras
[00:11:28]             │ info [o.e.x.s.a.r.TransportPutRoleAction] [node-01] added role [ft_ml_ui_extras]
[00:11:28]             │ debg creating role ft_default_space_ml_all
[00:11:28]             │ info [o.e.x.s.a.r.TransportPutRoleAction] [node-01] added role [ft_default_space_ml_all]
[00:11:28]             │ debg creating role ft_default_space1_ml_all
[00:11:28]             │ info [o.e.x.s.a.r.TransportPutRoleAction] [node-01] added role [ft_default_space1_ml_all]
[00:11:28]             │ debg creating role ft_all_spaces_ml_all
[00:11:28]             │ info [o.e.x.s.a.r.TransportPutRoleAction] [node-01] added role [ft_all_spaces_ml_all]
[00:11:28]             │ debg creating role ft_default_space_ml_read
[00:11:28]             │ info [o.e.x.s.a.r.TransportPutRoleAction] [node-01] added role [ft_default_space_ml_read]
[00:11:28]             │ debg creating role ft_default_space1_ml_read
[00:11:28]             │ info [o.e.x.s.a.r.TransportPutRoleAction] [node-01] added role [ft_default_space1_ml_read]
[00:11:28]             │ debg creating role ft_all_spaces_ml_read
[00:11:28]             │ info [o.e.x.s.a.r.TransportPutRoleAction] [node-01] added role [ft_all_spaces_ml_read]
[00:11:28]             │ debg creating role ft_default_space_ml_none
[00:11:28]             │ info [o.e.x.s.a.r.TransportPutRoleAction] [node-01] added role [ft_default_space_ml_none]
[00:11:28]             │ debg creating user ft_ml_poweruser
[00:11:29]             │ info [o.e.x.s.a.u.TransportPutUserAction] [node-01] added user [ft_ml_poweruser]
[00:11:29]             │ debg created user ft_ml_poweruser
[00:11:29]             │ debg creating user ft_ml_poweruser_spaces
[00:11:29]             │ info [o.e.x.s.a.u.TransportPutUserAction] [node-01] added user [ft_ml_poweruser_spaces]
[00:11:29]             │ debg created user ft_ml_poweruser_spaces
[00:11:29]             │ debg creating user ft_ml_poweruser_space1
[00:11:29]             │ info [o.e.x.s.a.u.TransportPutUserAction] [node-01] added user [ft_ml_poweruser_space1]
[00:11:29]             │ debg created user ft_ml_poweruser_space1
[00:11:29]             │ debg creating user ft_ml_poweruser_all_spaces
[00:11:29]             │ info [o.e.x.s.a.u.TransportPutUserAction] [node-01] added user [ft_ml_poweruser_all_spaces]
[00:11:29]             │ debg created user ft_ml_poweruser_all_spaces
[00:11:29]             │ debg creating user ft_ml_viewer
[00:11:29]             │ info [o.e.x.s.a.u.TransportPutUserAction] [node-01] added user [ft_ml_viewer]
[00:11:29]             │ debg created user ft_ml_viewer
[00:11:29]             │ debg creating user ft_ml_viewer_spaces
[00:11:29]             │ info [o.e.x.s.a.u.TransportPutUserAction] [node-01] added user [ft_ml_viewer_spaces]
[00:11:29]             │ debg created user ft_ml_viewer_spaces
[00:11:29]             │ debg creating user ft_ml_viewer_space1
[00:11:29]             │ info [o.e.x.s.a.u.TransportPutUserAction] [node-01] added user [ft_ml_viewer_space1]
[00:11:29]             │ debg created user ft_ml_viewer_space1
[00:11:29]             │ debg creating user ft_ml_viewer_all_spaces
[00:11:29]             │ info [o.e.x.s.a.u.TransportPutUserAction] [node-01] added user [ft_ml_viewer_all_spaces]
[00:11:29]             │ debg created user ft_ml_viewer_all_spaces
[00:11:29]             │ debg creating user ft_ml_unauthorized
[00:11:29]             │ info [o.e.x.s.a.u.TransportPutUserAction] [node-01] added user [ft_ml_unauthorized]
[00:11:29]             │ debg created user ft_ml_unauthorized
[00:11:29]             │ debg creating user ft_ml_unauthorized_spaces
[00:11:30]             │ info [o.e.x.s.a.u.TransportPutUserAction] [node-01] added user [ft_ml_unauthorized_spaces]
[00:11:30]             │ debg created user ft_ml_unauthorized_spaces
[00:16:01]           └-: jobs
[00:16:01]             └-> "before all" hook in "jobs"
[00:16:01]             └-: Categorization example endpoint - 
[00:16:01]               └-> "before all" hook for "valid with good number of tokens"
[00:16:01]               └-> "before all" hook for "valid with good number of tokens"
[00:16:01]                 │ info [x-pack/test/functional/es_archives/ml/categorization] Loading "mappings.json"
[00:16:01]                 │ info [x-pack/test/functional/es_archives/ml/categorization] Loading "data.json.gz"
[00:16:01]                 │ info [o.e.c.m.MetadataCreateIndexService] [node-01] [ft_categorization] creating index, cause [api], templates [], shards [1]/[0]
[00:16:02]                 │ info [x-pack/test/functional/es_archives/ml/categorization] Created index "ft_categorization"
[00:16:02]                 │ debg [x-pack/test/functional/es_archives/ml/categorization] "ft_categorization" settings {"index":{"number_of_replicas":"0","number_of_shards":"1"}}
[00:16:03]                 │ info [x-pack/test/functional/es_archives/ml/categorization] Indexed 1501 docs into "ft_categorization"
[00:16:03]                 │ debg applying update to kibana config: {"dateFormat:tz":"UTC"}
[00:16:03]               └-> valid with good number of tokens
[00:16:03]                 └-> "before each" hook: global before each for "valid with good number of tokens"
[00:16:03]                 └- ✓ pass  (215ms)
[00:16:03]               └-> invalid, too many tokens.
[00:16:03]                 └-> "before each" hook: global before each for "invalid, too many tokens."
[00:16:03]                 │ info [r.suppressed] [node-01] path: /_analyze, params: {}
[00:16:03]                 │      org.elasticsearch.transport.RemoteTransportException: [node-01][127.0.0.1:63101][indices:admin/analyze[s]]
[00:16:03]                 │      Caused by: java.lang.IllegalStateException: The number of tokens produced by calling _analyze has exceeded the allowed maximum of [10000]. This limit can be set by changing the [index.analyze.max_token_count] index level setting.
[00:16:03]                 │      	at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction$TokenCounter.increment(TransportAnalyzeAction.java:397) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.simpleAnalyze(TransportAnalyzeAction.java:229) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.analyze(TransportAnalyzeAction.java:204) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.analyze(TransportAnalyzeAction.java:122) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.shardOperation(TransportAnalyzeAction.java:110) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.shardOperation(TransportAnalyzeAction.java:62) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.support.single.shard.TransportSingleShardAction.lambda$asyncShardOperation$0(TransportSingleShardAction.java:99) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:47) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
[00:16:03]                 │      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
[00:16:03]                 │      	at java.lang.Thread.run(Thread.java:833) [?:?]
[00:16:03]                 │ info [r.suppressed] [node-01] path: /_analyze, params: {}
[00:16:03]                 │      org.elasticsearch.transport.RemoteTransportException: [node-01][127.0.0.1:63101][indices:admin/analyze[s]]
[00:16:03]                 │      Caused by: java.lang.IllegalStateException: The number of tokens produced by calling _analyze has exceeded the allowed maximum of [10000]. This limit can be set by changing the [index.analyze.max_token_count] index level setting.
[00:16:03]                 │      	at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction$TokenCounter.increment(TransportAnalyzeAction.java:397) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.simpleAnalyze(TransportAnalyzeAction.java:229) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.analyze(TransportAnalyzeAction.java:204) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.analyze(TransportAnalyzeAction.java:122) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.shardOperation(TransportAnalyzeAction.java:110) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.shardOperation(TransportAnalyzeAction.java:62) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.support.single.shard.TransportSingleShardAction.lambda$asyncShardOperation$0(TransportSingleShardAction.java:99) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:47) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
[00:16:03]                 │      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
[00:16:03]                 │      	at java.lang.Thread.run(Thread.java:833) [?:?]
[00:16:03]                 └- ✓ pass  (197ms)
[00:16:03]               └-> partially valid, more than 75% are null
[00:16:03]                 └-> "before each" hook: global before each for "partially valid, more than 75% are null"
[00:16:03]                 └- ✖ fail: apis Machine Learning jobs Categorization example endpoint -  partially valid, more than 75% are null
[00:16:03]                 │       Error: expected 249 to sort of equal 250
[00:16:03]                 │       + expected - actual
[00:16:03]                 │ 
[00:16:03]                 │       -249
[00:16:03]                 │       +250
[00:16:03]                 │       
[00:16:03]                 │       at Assertion.assert (/dev/shm/workspace/parallel/10/kibana/node_modules/@kbn/expect/expect.js:100:11)
[00:16:03]                 │       at Assertion.eql (/dev/shm/workspace/parallel/10/kibana/node_modules/@kbn/expect/expect.js:244:8)
[00:16:03]                 │       at Context.<anonymous> (test/api_integration/apis/ml/jobs/categorization_field_examples.ts:303:36)
[00:16:03]                 │       at runMicrotasks (<anonymous>)
[00:16:03]                 │       at processTicksAndRejections (node:internal/process/task_queues:96:5)
[00:16:03]                 │       at Object.apply (/dev/shm/workspace/parallel/10/kibana/node_modules/@kbn/test/target_node/functional_test_runner/lib/mocha/wrap_function.js:87:16)
[00:16:03]                 │ 
[00:16:03]                 │

Stack Trace

Error: expected 249 to sort of equal 250
    at Assertion.assert (/dev/shm/workspace/parallel/10/kibana/node_modules/@kbn/expect/expect.js:100:11)
    at Assertion.eql (/dev/shm/workspace/parallel/10/kibana/node_modules/@kbn/expect/expect.js:244:8)
    at Context.<anonymous> (test/api_integration/apis/ml/jobs/categorization_field_examples.ts:303:36)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at Object.apply (/dev/shm/workspace/parallel/10/kibana/node_modules/@kbn/test/target_node/functional_test_runner/lib/mocha/wrap_function.js:87:16) {
  actual: '249',
  expected: '250',
  showDiff: true
}

Kibana Pipeline / general / Creates and activates a new custom rule with override option.Detection rules, override Creates and activates a new custom rule with override option

Link to Jenkins

Stack Trace

Failed Tests Reporter:
  - Test has failed 18 times on tracked branches: https://github.com/elastic/kibana/issues/84020

AssertionError: Timed out retrying after 60000ms: Expected to find content: '80' within the element: <div.euiDataGridRowCell.euiDataGridRowCell--numeric> but never did.
    at Context.eval (http://localhost:6181/__cypress/tests?p=cypress/integration/detection_rules/override.spec.ts:21940:48)

Kibana Pipeline / general / displays the data provider action menu when Enter is pressed.timeline data providers displays the data provider action menu when Enter is pressed

Link to Jenkins

Stack Trace

Failed Tests Reporter:
  - Test has not failed recently on tracked branches

CypressError: Timed out retrying after 60050ms: `cy.click()` failed because this element is `disabled`:

`<button class="euiButton euiButton--primary euiButton--fill euiButton-isDisabled edit-data-provider-save" disabled="" type="button" data-test-subj="save">...</button>`

Fix this problem, or use `{force: true}` to disable error checking.

https://on.cypress.io/element-cannot-be-interacted-with
    at $Cy.ensureNotDisabled (http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:160481:85)
    at runAllChecks (http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:147820:14)
    at retryActionability (http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:147894:16)
    at tryCatcher (http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:13212:23)
    at Function.Promise.attempt.Promise.try (http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:10486:29)
    at tryFn (http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:165329:61)
    at whenStable (http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:165368:14)
    at http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:164855:18
    at tryCatcher (http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:13212:23)
    at Promise._settlePromiseFromHandler (http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:11147:31)
    at Promise._settlePromise (http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:11204:18)
    at Promise._settlePromise0 (http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:11249:10)
    at Promise._settlePromises (http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:11329:18)
    at Promise._fulfill (http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:11273:18)
    at http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:12887:46
From Your Spec Code:
    at Object.addDataProvider (http://localhost:6181/__cypress/tests?p=cypress/integration/timelines/data_providers.spec.ts:16703:54)
    at Context.eval (http://localhost:6181/__cypress/tests?p=cypress/integration/timelines/data_providers.spec.ts:15559:20)

Metrics [docs]

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id	before	after	diff
`securitySolution`	104.2KB	104.3KB	+89.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @banderror

banderror · 2021-10-20T11:47:36Z

There's a failed Cypress test for Timelines and it's skipped in #115738. I will rebase when the skip is merged.

elasticmachine · 2021-10-20T19:03:46Z

Pinging @elastic/security-detections-response (Team:Detections and Resp)

elasticmachine · 2021-10-20T19:03:47Z

Pinging @elastic/security-solution (Team: SecuritySolution)

spong · 2021-10-21T22:03:53Z

Reviewed stages 2-4 in description and the path forward proposed sounds good to me. 👍 I do wonder with @xcrzx's work this cycle to allow generic fields on solution Rule SO's has any play here, and if we might be able to skip the single status SO approach in favor of just storing any necessary fields on our Security Rule SO directly. I'm probably being too ambitious here (😅), especially with the flexibility a custom SO will give with regards to quickly implementing features in the next few minor releases. That said, it'll still be a join and so we'd still be limited on features like sorting, and of course potential perf considerations (no worse than today at least).

I think that was just a long-winded "LGTM 👍" and me convincing myself the near-term flexibility of another Status SO > managing the risk/timing around getting reaching consensus and implementing generic fields on rules... 🙂

xcrzx

Thanks for implementing these changes, Georgii 🚀 I love to see how ExecLog usages become cleaner with the new, improved interface—looking forward to seeing further refinements in that direction!

I checked out the PR locally and tested it. All affected API routes work as expected with both event-log-based implementation, and SO-based 👍 Added some comments to discuss before we can merge this in. Not everything requires immediate action. We could address some of them in follow-up PRs.

x-pack/plugins/security_solution/common/constants.ts

...ecurity_solution/server/lib/detection_engine/routes/rules/internal_find_rule_status_route.ts

x-pack/plugins/security_solution/server/routes/index.ts

...erver/lib/detection_engine/rule_execution_log/saved_objects_adapter/saved_objects_adapter.ts

...olution/server/lib/detection_engine/rule_execution_log/event_log_adapter/event_log_client.ts

banderror · 2021-10-27T15:28:34Z

I do wonder with @xcrzx's work this cycle to allow generic fields on solution Rule SO's has any play here, and if we might be able to skip the single status SO approach in favor of just storing any necessary fields on our Security Rule SO directly. I'm probably being too ambitious here (😅), especially with the flexibility a custom SO will give with regards to quickly implementing features in the next few minor releases.

@spong I think eventually we definitely need to store this data in the rule object itself (the simplest way to enable sorting etc), but it's difficult to estimate when this could happen because IMHO this is a much more complicated stuff + cross-team dependencies + an RFC. Meanwhile, we could work on improving the data model and the HTTP API, so that when the mechanism of storing this stuff in the rule is ready, we'll already have a working "blueprint" for this data - the new SO, which we ideally will just transfer 1:1 to the rule. So yeah, I hope that the new custom SO will give us some flexibility + ability to clean up the code for the time being + an opportunity to introduce breaking changes in the API in 8.0.

That said, it'll still be a join and so we'd still be limited on features like sorting, and of course potential perf considerations (no worse than today at least).

Unfortunately, yeah, so we need this data in the rule itself.

xcrzx

LGTM👍

kibanamachine · 2021-11-01T13:16:30Z

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id	before	after	diff
`securitySolution`	4.5MB	4.5MB	+1.0B

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id	before	after	diff
`securitySolution`	246.6KB	246.6KB	+87.0B

History

💔 Build #2916 failed 8f32a94
💔 Build #2807 failed c853bbc
💔 Build #2781 failed 5109670
💔 Build #2485 failed 59084ed
💔 Build #2121 failed 4fefde9

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @banderror

…g v1 - raw implementation (elastic#115574) **Ticket:** elastic#106469, elastic#101013 ## Summary TL;DR: New internal endpoint for reading data from Event Log (raw version), legacy status SO under the hood. With this PR we now read the Failure History (last 5 failures) on the Rule Details page from Event Log. We continue getting the Current Status from the legacy `siem-detection-engine-rule-status` saved objects. Rule Management page also gets data from the legacy saved objects. - [x] Deprecate existing methods for reading data in `IRuleExecutionLogClient`: `.find()` and `.findBulk()` - [x] Introduce new methods for reading data in IRuleExecutionLogClient: - for reading last N execution events for 1 rule from event log - for reading current status and metrics for 1 rule from legacy status SOs - for reading current statuses and metrics for N rules from legacy status SOs - [x] New methods should return data in the legacy status SO format. - [x] Update all the existing endpoints that depend on `IRuleExecutionLogClient` to use the new methods. - [x] Implement a new internal endpoint for fetching current status of the rule execution and execution events from Event Log for a given rule. - [x] The API of the new endpoint should be the same as `rules/_find_statuses` to minimise changes in the app. - [x] Use the new endpoint on the Rule Details page. ## Near-term plan for technical implementation of the Rule Execution Log (elastic#101013) **Stage 1. Reading last 5 failures from Event Log v1 - raw implementation** - ✔️ done in this PR TL;DR: New internal endpoint for reading data from Event Log (raw version), legacy status SO under the hood. - Deprecate existing methods for reading data in `IRuleExecutionLogClient`: `.find()` and `.findBulk()` - Introduce new methods for reading data in IRuleExecutionLogClient: - for reading last N execution events for 1 rule from event log - for reading current status and metrics for 1 rule from legacy status SOs - for reading current statuses and metrics for N rules from legacy status SOs - New methods should return data in the legacy status SO format. - Update all the existing endpoints that depend on `IRuleExecutionLogClient` to use the new methods. - Implement a new internal endpoint for fetching current status of the rule execution and execution events from Event Log for a given rule. - The API of the new endpoint should be the same as `rules/_find_statuses` to minimise changes in the app. - Use the new endpoint on the Rule Details page. **Stage 2: Reading last 5 failures from Event Log v2 - clean implementation** TL;DR: Clean HTTP API, legacy Rule Status SO under the hood. 🚨🚨🚨 Possible breaking changes in Detections API 🚨🚨🚨 - Design a new data model for the Current Rule Execution Info (the TO-BE new SO type and later the TO-BE data in the rule object itself). - Design a new data model for the Rule Execution Event (read model to be used on the Rule Details page) - Think over changes in `IRuleExecutionLogClient` to support the new data model. - Think over changes in all the endpoints that return any data related to rule monitoring (statuses, metrics, etc). Make sure to check our docs to identify what's documented there regarding rule monitoring. - Update `IRuleExecutionLogClient` to return data in the new format. - Update all the endpoints (including the raw new one) to return data in the new format. - Update Rule Details page to consume data in the new format. - Update Rule Management page to consume data in the new format. **Stage 3: Reading last 5 failures from Event Log v3 - new SO** TL;DR: Clean HTTP API, new Rule Execution Info SO under the hood. - Implement a new SO type for storing the current rule execution info. Relation type: 1 rule - 1 current execution info. - Swap the legacy SO with the new SO in the implementation of `IRuleExecutionLogClient`. **Stage 4: Cleanup and misc** - Revisit the problem of deterministic ordering ([comment](elastic#115574 (comment))) - Remove rule execution log's glue code: adapters, feature switch. - Remove the legacy rule status SO. - Mark the legacy rule status SO as deleted in Kibana Core. - Encapsulate the current space id in the instance of IRuleExecutionLogClient. Remove it from parameters of its methods. - Introduce a Rule Execution Logger scoped to a rule instance. For use in rule executors. - Add test coverage. ### Checklist Delete any items that are not applicable to this PR. - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios

kibanamachine · 2021-11-01T13:43:05Z

💚 Backport successful

Status	Branch	Result
✅	8.0

This backport PR will be merged automatically after passing CI.

…g v1 - raw implementation (#115574) (#116947) **Ticket:** #106469, #101013 ## Summary TL;DR: New internal endpoint for reading data from Event Log (raw version), legacy status SO under the hood. With this PR we now read the Failure History (last 5 failures) on the Rule Details page from Event Log. We continue getting the Current Status from the legacy `siem-detection-engine-rule-status` saved objects. Rule Management page also gets data from the legacy saved objects. - [x] Deprecate existing methods for reading data in `IRuleExecutionLogClient`: `.find()` and `.findBulk()` - [x] Introduce new methods for reading data in IRuleExecutionLogClient: - for reading last N execution events for 1 rule from event log - for reading current status and metrics for 1 rule from legacy status SOs - for reading current statuses and metrics for N rules from legacy status SOs - [x] New methods should return data in the legacy status SO format. - [x] Update all the existing endpoints that depend on `IRuleExecutionLogClient` to use the new methods. - [x] Implement a new internal endpoint for fetching current status of the rule execution and execution events from Event Log for a given rule. - [x] The API of the new endpoint should be the same as `rules/_find_statuses` to minimise changes in the app. - [x] Use the new endpoint on the Rule Details page. ## Near-term plan for technical implementation of the Rule Execution Log (#101013) **Stage 1. Reading last 5 failures from Event Log v1 - raw implementation** - ✔️ done in this PR TL;DR: New internal endpoint for reading data from Event Log (raw version), legacy status SO under the hood. - Deprecate existing methods for reading data in `IRuleExecutionLogClient`: `.find()` and `.findBulk()` - Introduce new methods for reading data in IRuleExecutionLogClient: - for reading last N execution events for 1 rule from event log - for reading current status and metrics for 1 rule from legacy status SOs - for reading current statuses and metrics for N rules from legacy status SOs - New methods should return data in the legacy status SO format. - Update all the existing endpoints that depend on `IRuleExecutionLogClient` to use the new methods. - Implement a new internal endpoint for fetching current status of the rule execution and execution events from Event Log for a given rule. - The API of the new endpoint should be the same as `rules/_find_statuses` to minimise changes in the app. - Use the new endpoint on the Rule Details page. **Stage 2: Reading last 5 failures from Event Log v2 - clean implementation** TL;DR: Clean HTTP API, legacy Rule Status SO under the hood. 🚨🚨🚨 Possible breaking changes in Detections API 🚨🚨🚨 - Design a new data model for the Current Rule Execution Info (the TO-BE new SO type and later the TO-BE data in the rule object itself). - Design a new data model for the Rule Execution Event (read model to be used on the Rule Details page) - Think over changes in `IRuleExecutionLogClient` to support the new data model. - Think over changes in all the endpoints that return any data related to rule monitoring (statuses, metrics, etc). Make sure to check our docs to identify what's documented there regarding rule monitoring. - Update `IRuleExecutionLogClient` to return data in the new format. - Update all the endpoints (including the raw new one) to return data in the new format. - Update Rule Details page to consume data in the new format. - Update Rule Management page to consume data in the new format. **Stage 3: Reading last 5 failures from Event Log v3 - new SO** TL;DR: Clean HTTP API, new Rule Execution Info SO under the hood. - Implement a new SO type for storing the current rule execution info. Relation type: 1 rule - 1 current execution info. - Swap the legacy SO with the new SO in the implementation of `IRuleExecutionLogClient`. **Stage 4: Cleanup and misc** - Revisit the problem of deterministic ordering ([comment](#115574 (comment))) - Remove rule execution log's glue code: adapters, feature switch. - Remove the legacy rule status SO. - Mark the legacy rule status SO as deleted in Kibana Core. - Encapsulate the current space id in the instance of IRuleExecutionLogClient. Remove it from parameters of its methods. - Introduce a Rule Execution Logger scoped to a rule instance. For use in rule executors. - Add test coverage. ### Checklist Delete any items that are not applicable to this PR. - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios Co-authored-by: Georgii Gorbachev <georgii.gorbachev@elastic.co>

banderror self-assigned this Oct 19, 2021

banderror force-pushed the reading-last-5-failures-from-event-log branch 6 times, most recently from dff0590 to 66be0c7 Compare October 20, 2021 04:16

banderror removed the v7.16.0 label Oct 20, 2021

banderror force-pushed the reading-last-5-failures-from-event-log branch from 66be0c7 to 7aaa316 Compare October 20, 2021 17:47

banderror changed the title ~~[Security Solution][Detections] Reading last 5 failures from Event Log~~ [Security Solution][Detections] Reading last 5 failures from Event Log v1 - raw implementation Oct 20, 2021

banderror requested a review from a team October 20, 2021 19:02

banderror marked this pull request as ready for review October 20, 2021 19:03

banderror requested a review from a team as a code owner October 20, 2021 19:03

banderror force-pushed the reading-last-5-failures-from-event-log branch from 7aaa316 to b9a4e4c Compare October 21, 2021 18:35

banderror force-pushed the reading-last-5-failures-from-event-log branch from b9a4e4c to a79477f Compare October 25, 2021 13:56

xcrzx reviewed Oct 25, 2021

View reviewed changes

banderror force-pushed the reading-last-5-failures-from-event-log branch 2 times, most recently from a854593 to 5baac92 Compare October 27, 2021 15:04

banderror force-pushed the reading-last-5-failures-from-event-log branch 3 times, most recently from c1f182a to 59084ed Compare October 28, 2021 13:08

banderror requested a review from xcrzx October 28, 2021 13:09

xcrzx approved these changes Oct 28, 2021

View reviewed changes

banderror added v8.1.0 auto-backport Deprecated: Automatically backport this PR after it's merged labels Oct 28, 2021

banderror force-pushed the reading-last-5-failures-from-event-log branch 4 times, most recently from 67d57e1 to 8f32a94 Compare October 29, 2021 18:30

banderror added 8 commits November 1, 2021 12:21

Implement new methods in IRuleExecutionLogClient

fb92607

Update existing endpoints

e6e84bf

Pass IEventLogClient as a dependency to RuleExecutionLogClient

f4b16f4

Change the return type of getCurrentStatus

582efaa

Add a new internal _find_statuses endpoint

1f4d234

Use the new internal endpoint on the Rule Details page

a668b2c

Fix types and tests

a4027e2

Address code review comments

dc16bb8

banderror force-pushed the reading-last-5-failures-from-event-log branch from 8f32a94 to dc16bb8 Compare November 1, 2021 11:22

banderror merged commit 2431a08 into elastic:main Nov 1, 2021

banderror deleted the reading-last-5-failures-from-event-log branch November 1, 2021 13:40

kibanamachine mentioned this pull request Nov 1, 2021

[8.0] [Security Solution][Detections] Reading last 5 failures from Event Log v1 - raw implementation (#115574) #116947

Merged

banderror mentioned this pull request Nov 11, 2021

[Security Solution] Rule Execution Log - technical debt #118324

Open

19 tasks

banderror mentioned this pull request Dec 30, 2021

[Security Solution][Detections] Fix GET /api/detection_engine/rules?id={ruleId} endpoint #122024

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security Solution][Detections] Reading last 5 failures from Event Log v1 - raw implementation #115574

[Security Solution][Detections] Reading last 5 failures from Event Log v1 - raw implementation #115574

banderror commented Oct 19, 2021 •

edited

Loading

kibanamachine commented Oct 20, 2021 •

edited

Loading

Standard Out

Stack Trace

Stack Trace

Stack Trace

banderror commented Oct 20, 2021

elasticmachine commented Oct 20, 2021

elasticmachine commented Oct 20, 2021

spong commented Oct 21, 2021

xcrzx left a comment

banderror commented Oct 27, 2021

xcrzx left a comment

kibanamachine commented Nov 1, 2021

kibanamachine commented Nov 1, 2021

[Security Solution][Detections] Reading last 5 failures from Event Log v1 - raw implementation #115574

[Security Solution][Detections] Reading last 5 failures from Event Log v1 - raw implementation #115574

Conversation

banderror commented Oct 19, 2021 • edited Loading

Summary

Near-term plan for technical implementation of the Rule Execution Log (#101013)

Checklist

kibanamachine commented Oct 20, 2021 • edited Loading

💔 Build Failed

Failed CI Steps

Test Failures

Standard Out

Stack Trace

Stack Trace

Stack Trace

Metrics [docs]

Page load bundle

History

banderror commented Oct 20, 2021

elasticmachine commented Oct 20, 2021

elasticmachine commented Oct 20, 2021

spong commented Oct 21, 2021

xcrzx left a comment

Choose a reason for hiding this comment

banderror commented Oct 27, 2021

xcrzx left a comment

Choose a reason for hiding this comment

kibanamachine commented Nov 1, 2021

💚 Build Succeeded

Metrics [docs]

Async chunks

Page load bundle

History

kibanamachine commented Nov 1, 2021

💚 Backport successful

banderror commented Oct 19, 2021 •

edited

Loading

kibanamachine commented Oct 20, 2021 •

edited

Loading