Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution][Detections] Reading last 5 failures from Event Log v1 - raw implementation #115574

Merged

Conversation

banderror
Copy link
Contributor

@banderror banderror commented Oct 19, 2021

Ticket: #106469, #101013

Summary

TL;DR: New internal endpoint for reading data from Event Log (raw version), legacy status SO under the hood.

With this PR we now read the Failure History (last 5 failures) on the Rule Details page from Event Log. We continue getting the Current Status from the legacy siem-detection-engine-rule-status saved objects. Rule Management page also gets data from the legacy saved objects.

  • Deprecate existing methods for reading data in IRuleExecutionLogClient: .find() and .findBulk()
  • Introduce new methods for reading data in IRuleExecutionLogClient:
    • for reading last N execution events for 1 rule from event log
    • for reading current status and metrics for 1 rule from legacy status SOs
    • for reading current statuses and metrics for N rules from legacy status SOs
  • New methods should return data in the legacy status SO format.
  • Update all the existing endpoints that depend on IRuleExecutionLogClient to use the new methods.
  • Implement a new internal endpoint for fetching current status of the rule execution and execution events from Event Log for a given rule.
  • The API of the new endpoint should be the same as rules/_find_statuses to minimise changes in the app.
  • Use the new endpoint on the Rule Details page.

Near-term plan for technical implementation of the Rule Execution Log (#101013)

Stage 1. Reading last 5 failures from Event Log v1 - raw implementation - ✔️ done in this PR

TL;DR: New internal endpoint for reading data from Event Log (raw version), legacy status SO under the hood.

  • Deprecate existing methods for reading data in IRuleExecutionLogClient: .find() and .findBulk()
  • Introduce new methods for reading data in IRuleExecutionLogClient:
    • for reading last N execution events for 1 rule from event log
    • for reading current status and metrics for 1 rule from legacy status SOs
    • for reading current statuses and metrics for N rules from legacy status SOs
  • New methods should return data in the legacy status SO format.
  • Update all the existing endpoints that depend on IRuleExecutionLogClient to use the new methods.
  • Implement a new internal endpoint for fetching current status of the rule execution and execution events from Event Log for a given rule.
  • The API of the new endpoint should be the same as rules/_find_statuses to minimise changes in the app.
  • Use the new endpoint on the Rule Details page.

Stage 2: Reading last 5 failures from Event Log v2 - clean implementation

TL;DR: Clean HTTP API, legacy Rule Status SO under the hood.

🚨🚨🚨 Possible breaking changes in Detections API 🚨🚨🚨

  • Design a new data model for the Current Rule Execution Info (the TO-BE new SO type and later the TO-BE data in the rule object itself).
  • Design a new data model for the Rule Execution Event (read model to be used on the Rule Details page)
  • Think over changes in IRuleExecutionLogClient to support the new data model.
  • Think over changes in all the endpoints that return any data related to rule monitoring (statuses, metrics, etc). Make sure to check our docs to identify what's documented there regarding rule monitoring.
  • Update IRuleExecutionLogClient to return data in the new format.
  • Update all the endpoints (including the raw new one) to return data in the new format.
  • Update Rule Details page to consume data in the new format.
  • Update Rule Management page to consume data in the new format.

Stage 3: Reading last 5 failures from Event Log v3 - new SO

TL;DR: Clean HTTP API, new Rule Execution Info SO under the hood.

  • Implement a new SO type for storing the current rule execution info. Relation type: 1 rule - 1 current execution info.
  • Swap the legacy SO with the new SO in the implementation of IRuleExecutionLogClient.

Stage 4: Cleanup and misc

  • Revisit the problem of deterministic ordering (comment)
  • Remove rule execution log's glue code: adapters, feature switch.
  • Remove the legacy rule status SO.
  • Mark the legacy rule status SO as deleted in Kibana Core.
  • Encapsulate the current space id in the instance of IRuleExecutionLogClient. Remove it from parameters of its methods.
  • Introduce a Rule Execution Logger scoped to a rule instance. For use in rule executors.
  • Add test coverage.

Checklist

Delete any items that are not applicable to this PR.

@banderror banderror self-assigned this Oct 19, 2021
@banderror banderror added Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Feature:Rule Monitoring Security Solution Detection Rule Monitoring release_note:skip Skip the PR/issue when compiling release notes v7.16.0 v8.0.0 labels Oct 19, 2021
@banderror banderror force-pushed the reading-last-5-failures-from-event-log branch 6 times, most recently from dff0590 to 66be0c7 Compare October 20, 2021 04:16
@kibanamachine
Copy link
Contributor

kibanamachine commented Oct 20, 2021

💔 Build Failed

Failed CI Steps


Test Failures

Kibana Pipeline / general / X-Pack API Integration Tests.x-pack/test/api_integration/apis/ml/jobs/categorization_field_examples·ts.apis Machine Learning jobs Categorization example endpoint - partially valid, more than 75% are null

Link to Jenkins

Standard Out

Failed Tests Reporter:
  - Test has not failed recently on tracked branches

[00:00:00]     │
[00:00:00]       └-: apis
[00:00:00]         └-> "before all" hook in "apis"
[00:11:28]         └-: Machine Learning
[00:11:28]           └-> "before all" hook in "Machine Learning"
[00:11:28]           └-> "before all" hook in "Machine Learning"
[00:11:28]             │ debg creating role ft_ml_source
[00:11:28]             │ info [o.e.x.s.a.r.TransportPutRoleAction] [node-01] added role [ft_ml_source]
[00:11:28]             │ debg creating role ft_ml_source_readonly
[00:11:28]             │ info [o.e.x.s.a.r.TransportPutRoleAction] [node-01] added role [ft_ml_source_readonly]
[00:11:28]             │ debg creating role ft_ml_dest
[00:11:28]             │ info [o.e.x.s.a.r.TransportPutRoleAction] [node-01] added role [ft_ml_dest]
[00:11:28]             │ debg creating role ft_ml_dest_readonly
[00:11:28]             │ info [o.e.x.s.a.r.TransportPutRoleAction] [node-01] added role [ft_ml_dest_readonly]
[00:11:28]             │ debg creating role ft_ml_ui_extras
[00:11:28]             │ info [o.e.x.s.a.r.TransportPutRoleAction] [node-01] added role [ft_ml_ui_extras]
[00:11:28]             │ debg creating role ft_default_space_ml_all
[00:11:28]             │ info [o.e.x.s.a.r.TransportPutRoleAction] [node-01] added role [ft_default_space_ml_all]
[00:11:28]             │ debg creating role ft_default_space1_ml_all
[00:11:28]             │ info [o.e.x.s.a.r.TransportPutRoleAction] [node-01] added role [ft_default_space1_ml_all]
[00:11:28]             │ debg creating role ft_all_spaces_ml_all
[00:11:28]             │ info [o.e.x.s.a.r.TransportPutRoleAction] [node-01] added role [ft_all_spaces_ml_all]
[00:11:28]             │ debg creating role ft_default_space_ml_read
[00:11:28]             │ info [o.e.x.s.a.r.TransportPutRoleAction] [node-01] added role [ft_default_space_ml_read]
[00:11:28]             │ debg creating role ft_default_space1_ml_read
[00:11:28]             │ info [o.e.x.s.a.r.TransportPutRoleAction] [node-01] added role [ft_default_space1_ml_read]
[00:11:28]             │ debg creating role ft_all_spaces_ml_read
[00:11:28]             │ info [o.e.x.s.a.r.TransportPutRoleAction] [node-01] added role [ft_all_spaces_ml_read]
[00:11:28]             │ debg creating role ft_default_space_ml_none
[00:11:28]             │ info [o.e.x.s.a.r.TransportPutRoleAction] [node-01] added role [ft_default_space_ml_none]
[00:11:28]             │ debg creating user ft_ml_poweruser
[00:11:29]             │ info [o.e.x.s.a.u.TransportPutUserAction] [node-01] added user [ft_ml_poweruser]
[00:11:29]             │ debg created user ft_ml_poweruser
[00:11:29]             │ debg creating user ft_ml_poweruser_spaces
[00:11:29]             │ info [o.e.x.s.a.u.TransportPutUserAction] [node-01] added user [ft_ml_poweruser_spaces]
[00:11:29]             │ debg created user ft_ml_poweruser_spaces
[00:11:29]             │ debg creating user ft_ml_poweruser_space1
[00:11:29]             │ info [o.e.x.s.a.u.TransportPutUserAction] [node-01] added user [ft_ml_poweruser_space1]
[00:11:29]             │ debg created user ft_ml_poweruser_space1
[00:11:29]             │ debg creating user ft_ml_poweruser_all_spaces
[00:11:29]             │ info [o.e.x.s.a.u.TransportPutUserAction] [node-01] added user [ft_ml_poweruser_all_spaces]
[00:11:29]             │ debg created user ft_ml_poweruser_all_spaces
[00:11:29]             │ debg creating user ft_ml_viewer
[00:11:29]             │ info [o.e.x.s.a.u.TransportPutUserAction] [node-01] added user [ft_ml_viewer]
[00:11:29]             │ debg created user ft_ml_viewer
[00:11:29]             │ debg creating user ft_ml_viewer_spaces
[00:11:29]             │ info [o.e.x.s.a.u.TransportPutUserAction] [node-01] added user [ft_ml_viewer_spaces]
[00:11:29]             │ debg created user ft_ml_viewer_spaces
[00:11:29]             │ debg creating user ft_ml_viewer_space1
[00:11:29]             │ info [o.e.x.s.a.u.TransportPutUserAction] [node-01] added user [ft_ml_viewer_space1]
[00:11:29]             │ debg created user ft_ml_viewer_space1
[00:11:29]             │ debg creating user ft_ml_viewer_all_spaces
[00:11:29]             │ info [o.e.x.s.a.u.TransportPutUserAction] [node-01] added user [ft_ml_viewer_all_spaces]
[00:11:29]             │ debg created user ft_ml_viewer_all_spaces
[00:11:29]             │ debg creating user ft_ml_unauthorized
[00:11:29]             │ info [o.e.x.s.a.u.TransportPutUserAction] [node-01] added user [ft_ml_unauthorized]
[00:11:29]             │ debg created user ft_ml_unauthorized
[00:11:29]             │ debg creating user ft_ml_unauthorized_spaces
[00:11:30]             │ info [o.e.x.s.a.u.TransportPutUserAction] [node-01] added user [ft_ml_unauthorized_spaces]
[00:11:30]             │ debg created user ft_ml_unauthorized_spaces
[00:16:01]           └-: jobs
[00:16:01]             └-> "before all" hook in "jobs"
[00:16:01]             └-: Categorization example endpoint - 
[00:16:01]               └-> "before all" hook for "valid with good number of tokens"
[00:16:01]               └-> "before all" hook for "valid with good number of tokens"
[00:16:01]                 │ info [x-pack/test/functional/es_archives/ml/categorization] Loading "mappings.json"
[00:16:01]                 │ info [x-pack/test/functional/es_archives/ml/categorization] Loading "data.json.gz"
[00:16:01]                 │ info [o.e.c.m.MetadataCreateIndexService] [node-01] [ft_categorization] creating index, cause [api], templates [], shards [1]/[0]
[00:16:02]                 │ info [x-pack/test/functional/es_archives/ml/categorization] Created index "ft_categorization"
[00:16:02]                 │ debg [x-pack/test/functional/es_archives/ml/categorization] "ft_categorization" settings {"index":{"number_of_replicas":"0","number_of_shards":"1"}}
[00:16:03]                 │ info [x-pack/test/functional/es_archives/ml/categorization] Indexed 1501 docs into "ft_categorization"
[00:16:03]                 │ debg applying update to kibana config: {"dateFormat:tz":"UTC"}
[00:16:03]               └-> valid with good number of tokens
[00:16:03]                 └-> "before each" hook: global before each for "valid with good number of tokens"
[00:16:03]                 └- ✓ pass  (215ms)
[00:16:03]               └-> invalid, too many tokens.
[00:16:03]                 └-> "before each" hook: global before each for "invalid, too many tokens."
[00:16:03]                 │ info [r.suppressed] [node-01] path: /_analyze, params: {}
[00:16:03]                 │      org.elasticsearch.transport.RemoteTransportException: [node-01][127.0.0.1:63101][indices:admin/analyze[s]]
[00:16:03]                 │      Caused by: java.lang.IllegalStateException: The number of tokens produced by calling _analyze has exceeded the allowed maximum of [10000]. This limit can be set by changing the [index.analyze.max_token_count] index level setting.
[00:16:03]                 │      	at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction$TokenCounter.increment(TransportAnalyzeAction.java:397) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.simpleAnalyze(TransportAnalyzeAction.java:229) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.analyze(TransportAnalyzeAction.java:204) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.analyze(TransportAnalyzeAction.java:122) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.shardOperation(TransportAnalyzeAction.java:110) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.shardOperation(TransportAnalyzeAction.java:62) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.support.single.shard.TransportSingleShardAction.lambda$asyncShardOperation$0(TransportSingleShardAction.java:99) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:47) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
[00:16:03]                 │      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
[00:16:03]                 │      	at java.lang.Thread.run(Thread.java:833) [?:?]
[00:16:03]                 │ info [r.suppressed] [node-01] path: /_analyze, params: {}
[00:16:03]                 │      org.elasticsearch.transport.RemoteTransportException: [node-01][127.0.0.1:63101][indices:admin/analyze[s]]
[00:16:03]                 │      Caused by: java.lang.IllegalStateException: The number of tokens produced by calling _analyze has exceeded the allowed maximum of [10000]. This limit can be set by changing the [index.analyze.max_token_count] index level setting.
[00:16:03]                 │      	at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction$TokenCounter.increment(TransportAnalyzeAction.java:397) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.simpleAnalyze(TransportAnalyzeAction.java:229) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.analyze(TransportAnalyzeAction.java:204) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.analyze(TransportAnalyzeAction.java:122) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.shardOperation(TransportAnalyzeAction.java:110) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.shardOperation(TransportAnalyzeAction.java:62) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.support.single.shard.TransportSingleShardAction.lambda$asyncShardOperation$0(TransportSingleShardAction.java:99) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:47) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
[00:16:03]                 │      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
[00:16:03]                 │      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
[00:16:03]                 │      	at java.lang.Thread.run(Thread.java:833) [?:?]
[00:16:03]                 └- ✓ pass  (197ms)
[00:16:03]               └-> partially valid, more than 75% are null
[00:16:03]                 └-> "before each" hook: global before each for "partially valid, more than 75% are null"
[00:16:03]                 └- ✖ fail: apis Machine Learning jobs Categorization example endpoint -  partially valid, more than 75% are null
[00:16:03]                 │       Error: expected 249 to sort of equal 250
[00:16:03]                 │       + expected - actual
[00:16:03]                 │ 
[00:16:03]                 │       -249
[00:16:03]                 │       +250
[00:16:03]                 │       
[00:16:03]                 │       at Assertion.assert (/dev/shm/workspace/parallel/10/kibana/node_modules/@kbn/expect/expect.js:100:11)
[00:16:03]                 │       at Assertion.eql (/dev/shm/workspace/parallel/10/kibana/node_modules/@kbn/expect/expect.js:244:8)
[00:16:03]                 │       at Context.<anonymous> (test/api_integration/apis/ml/jobs/categorization_field_examples.ts:303:36)
[00:16:03]                 │       at runMicrotasks (<anonymous>)
[00:16:03]                 │       at processTicksAndRejections (node:internal/process/task_queues:96:5)
[00:16:03]                 │       at Object.apply (/dev/shm/workspace/parallel/10/kibana/node_modules/@kbn/test/target_node/functional_test_runner/lib/mocha/wrap_function.js:87:16)
[00:16:03]                 │ 
[00:16:03]                 │ 

Stack Trace

Error: expected 249 to sort of equal 250
    at Assertion.assert (/dev/shm/workspace/parallel/10/kibana/node_modules/@kbn/expect/expect.js:100:11)
    at Assertion.eql (/dev/shm/workspace/parallel/10/kibana/node_modules/@kbn/expect/expect.js:244:8)
    at Context.<anonymous> (test/api_integration/apis/ml/jobs/categorization_field_examples.ts:303:36)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at Object.apply (/dev/shm/workspace/parallel/10/kibana/node_modules/@kbn/test/target_node/functional_test_runner/lib/mocha/wrap_function.js:87:16) {
  actual: '249',
  expected: '250',
  showDiff: true
}

Kibana Pipeline / general / Creates and activates a new custom rule with override option.Detection rules, override Creates and activates a new custom rule with override option

Link to Jenkins

Stack Trace

Failed Tests Reporter:
  - Test has failed 18 times on tracked branches: https://github.com/elastic/kibana/issues/84020

AssertionError: Timed out retrying after 60000ms: Expected to find content: '80' within the element: <div.euiDataGridRowCell.euiDataGridRowCell--numeric> but never did.
    at Context.eval (http://localhost:6181/__cypress/tests?p=cypress/integration/detection_rules/override.spec.ts:21940:48)

Kibana Pipeline / general / displays the data provider action menu when Enter is pressed.timeline data providers displays the data provider action menu when Enter is pressed

Link to Jenkins

Stack Trace

Failed Tests Reporter:
  - Test has not failed recently on tracked branches

CypressError: Timed out retrying after 60050ms: `cy.click()` failed because this element is `disabled`:

`<button class="euiButton euiButton--primary euiButton--fill euiButton-isDisabled edit-data-provider-save" disabled="" type="button" data-test-subj="save">...</button>`

Fix this problem, or use `{force: true}` to disable error checking.

https://on.cypress.io/element-cannot-be-interacted-with
    at $Cy.ensureNotDisabled (http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:160481:85)
    at runAllChecks (http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:147820:14)
    at retryActionability (http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:147894:16)
    at tryCatcher (http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:13212:23)
    at Function.Promise.attempt.Promise.try (http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:10486:29)
    at tryFn (http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:165329:61)
    at whenStable (http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:165368:14)
    at http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:164855:18
    at tryCatcher (http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:13212:23)
    at Promise._settlePromiseFromHandler (http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:11147:31)
    at Promise._settlePromise (http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:11204:18)
    at Promise._settlePromise0 (http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:11249:10)
    at Promise._settlePromises (http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:11329:18)
    at Promise._fulfill (http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:11273:18)
    at http://elastic:changeme@localhost:6181/__cypress/runner/cypress_runner.js:12887:46
From Your Spec Code:
    at Object.addDataProvider (http://localhost:6181/__cypress/tests?p=cypress/integration/timelines/data_providers.spec.ts:16703:54)
    at Context.eval (http://localhost:6181/__cypress/tests?p=cypress/integration/timelines/data_providers.spec.ts:15559:20)

Metrics [docs]

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
securitySolution 104.2KB 104.3KB +89.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @banderror

@banderror
Copy link
Contributor Author

There's a failed Cypress test for Timelines and it's skipped in #115738. I will rebase when the skip is merged.

@banderror banderror removed the v7.16.0 label Oct 20, 2021
@banderror banderror force-pushed the reading-last-5-failures-from-event-log branch from 66be0c7 to 7aaa316 Compare October 20, 2021 17:47
@banderror banderror changed the title [Security Solution][Detections] Reading last 5 failures from Event Log [Security Solution][Detections] Reading last 5 failures from Event Log v1 - raw implementation Oct 20, 2021
@banderror banderror requested a review from a team October 20, 2021 19:02
@banderror banderror marked this pull request as ready for review October 20, 2021 19:03
@banderror banderror requested a review from a team as a code owner October 20, 2021 19:03
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@banderror banderror force-pushed the reading-last-5-failures-from-event-log branch from 7aaa316 to b9a4e4c Compare October 21, 2021 18:35
@spong
Copy link
Member

spong commented Oct 21, 2021

Reviewed stages 2-4 in description and the path forward proposed sounds good to me. 👍 I do wonder with @xcrzx's work this cycle to allow generic fields on solution Rule SO's has any play here, and if we might be able to skip the single status SO approach in favor of just storing any necessary fields on our Security Rule SO directly. I'm probably being too ambitious here (😅), especially with the flexibility a custom SO will give with regards to quickly implementing features in the next few minor releases. That said, it'll still be a join and so we'd still be limited on features like sorting, and of course potential perf considerations (no worse than today at least).

I think that was just a long-winded "LGTM 👍" and me convincing myself the near-term flexibility of another Status SO > managing the risk/timing around getting reaching consensus and implementing generic fields on rules... 🙂

@banderror banderror force-pushed the reading-last-5-failures-from-event-log branch from b9a4e4c to a79477f Compare October 25, 2021 13:56
Copy link
Contributor

@xcrzx xcrzx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing these changes, Georgii 🚀 I love to see how ExecLog usages become cleaner with the new, improved interface—looking forward to seeing further refinements in that direction!

I checked out the PR locally and tested it. All affected API routes work as expected with both event-log-based implementation, and SO-based 👍 Added some comments to discuss before we can merge this in. Not everything requires immediate action. We could address some of them in follow-up PRs.

@banderror banderror force-pushed the reading-last-5-failures-from-event-log branch 2 times, most recently from a854593 to 5baac92 Compare October 27, 2021 15:04
@banderror
Copy link
Contributor Author

I do wonder with @xcrzx's work this cycle to allow generic fields on solution Rule SO's has any play here, and if we might be able to skip the single status SO approach in favor of just storing any necessary fields on our Security Rule SO directly. I'm probably being too ambitious here (😅), especially with the flexibility a custom SO will give with regards to quickly implementing features in the next few minor releases.

@spong I think eventually we definitely need to store this data in the rule object itself (the simplest way to enable sorting etc), but it's difficult to estimate when this could happen because IMHO this is a much more complicated stuff + cross-team dependencies + an RFC. Meanwhile, we could work on improving the data model and the HTTP API, so that when the mechanism of storing this stuff in the rule is ready, we'll already have a working "blueprint" for this data - the new SO, which we ideally will just transfer 1:1 to the rule. So yeah, I hope that the new custom SO will give us some flexibility + ability to clean up the code for the time being + an opportunity to introduce breaking changes in the API in 8.0.

That said, it'll still be a join and so we'd still be limited on features like sorting, and of course potential perf considerations (no worse than today at least).

Unfortunately, yeah, so we need this data in the rule itself.

@banderror banderror force-pushed the reading-last-5-failures-from-event-log branch 3 times, most recently from c1f182a to 59084ed Compare October 28, 2021 13:08
@banderror banderror requested a review from xcrzx October 28, 2021 13:09
Copy link
Contributor

@xcrzx xcrzx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM👍

@banderror banderror added v8.1.0 auto-backport Deprecated: Automatically backport this PR after it's merged labels Oct 28, 2021
@banderror banderror force-pushed the reading-last-5-failures-from-event-log branch 4 times, most recently from 67d57e1 to 8f32a94 Compare October 29, 2021 18:30
@banderror banderror force-pushed the reading-last-5-failures-from-event-log branch from 8f32a94 to dc16bb8 Compare November 1, 2021 11:22
@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
securitySolution 4.5MB 4.5MB +1.0B

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
securitySolution 246.6KB 246.6KB +87.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @banderror

@banderror banderror merged commit 2431a08 into elastic:main Nov 1, 2021
@banderror banderror deleted the reading-last-5-failures-from-event-log branch November 1, 2021 13:40
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Nov 1, 2021
…g v1 - raw implementation (elastic#115574)

**Ticket:** elastic#106469, elastic#101013

## Summary

TL;DR: New internal endpoint for reading data from Event Log (raw version), legacy status SO under the hood.

With this PR we now read the Failure History (last 5 failures) on the Rule Details page from Event Log. We continue getting the Current Status from the legacy `siem-detection-engine-rule-status` saved objects. Rule Management page also gets data from the legacy saved objects.

- [x] Deprecate existing methods for reading data in `IRuleExecutionLogClient`: `.find()` and `.findBulk()`
- [x] Introduce new methods for reading data in IRuleExecutionLogClient:
  - for reading last N execution events for 1 rule from event log
  - for reading current status and metrics for 1 rule from legacy status SOs
  - for reading current statuses and metrics for N rules from legacy status SOs
- [x] New methods should return data in the legacy status SO format.
- [x] Update all the existing endpoints that depend on `IRuleExecutionLogClient` to use the new methods.
- [x] Implement a new internal endpoint for fetching current status of the rule execution and execution events from Event Log for a given rule.
- [x] The API of the new endpoint should be the same as `rules/_find_statuses` to minimise changes in the app.
- [x] Use the new endpoint on the Rule Details page.

## Near-term plan for technical implementation of the Rule Execution Log (elastic#101013)

**Stage 1. Reading last 5 failures from Event Log v1 - raw implementation** - ✔️ done in this PR

TL;DR: New internal endpoint for reading data from Event Log (raw version), legacy status SO under the hood.

- Deprecate existing methods for reading data in `IRuleExecutionLogClient`: `.find()` and `.findBulk()`
- Introduce new methods for reading data in IRuleExecutionLogClient:
  - for reading last N execution events for 1 rule from event log
  - for reading current status and metrics for 1 rule from legacy status SOs
  - for reading current statuses and metrics for N rules from legacy status SOs
- New methods should return data in the legacy status SO format.
- Update all the existing endpoints that depend on `IRuleExecutionLogClient` to use the new methods.
- Implement a new internal endpoint for fetching current status of the rule execution and execution events from Event Log for a given rule.
- The API of the new endpoint should be the same as `rules/_find_statuses` to minimise changes in the app.
- Use the new endpoint on the Rule Details page.

**Stage 2: Reading last 5 failures from Event Log v2 - clean implementation**

TL;DR: Clean HTTP API, legacy Rule Status SO under the hood.

🚨🚨🚨 Possible breaking changes in Detections API 🚨🚨🚨

- Design a new data model for the Current Rule Execution Info (the TO-BE new SO type and later the TO-BE data in the rule object itself).
- Design a new data model for the Rule Execution Event (read model to be used on the Rule Details page)
- Think over changes in `IRuleExecutionLogClient` to support the new data model.
- Think over changes in all the endpoints that return any data related to rule monitoring (statuses, metrics, etc). Make sure to check our docs to identify what's documented there regarding rule monitoring.
- Update `IRuleExecutionLogClient` to return data in the new format. 
- Update all the endpoints (including the raw new one) to return data in the new format.
- Update Rule Details page to consume data in the new format.
- Update Rule Management page to consume data in the new format.

**Stage 3: Reading last 5 failures from Event Log v3 - new SO**

TL;DR: Clean HTTP API, new Rule Execution Info SO under the hood.

- Implement a new SO type for storing the current rule execution info. Relation type: 1 rule - 1 current execution info.
- Swap the legacy SO with the new SO in the implementation of `IRuleExecutionLogClient`.

**Stage 4: Cleanup and misc**

- Revisit the problem of deterministic ordering ([comment](elastic#115574 (comment)))
- Remove rule execution log's glue code: adapters, feature switch.
- Remove the legacy rule status SO.
- Mark the legacy rule status SO as deleted in Kibana Core.
- Encapsulate the current space id in the instance of IRuleExecutionLogClient. Remove it from parameters of its methods.
- Introduce a Rule Execution Logger scoped to a rule instance. For use in rule executors.
- Add test coverage.

### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
8.0

This backport PR will be merged automatically after passing CI.

kibanamachine added a commit that referenced this pull request Nov 1, 2021
…g v1 - raw implementation (#115574) (#116947)

**Ticket:** #106469, #101013

## Summary

TL;DR: New internal endpoint for reading data from Event Log (raw version), legacy status SO under the hood.

With this PR we now read the Failure History (last 5 failures) on the Rule Details page from Event Log. We continue getting the Current Status from the legacy `siem-detection-engine-rule-status` saved objects. Rule Management page also gets data from the legacy saved objects.

- [x] Deprecate existing methods for reading data in `IRuleExecutionLogClient`: `.find()` and `.findBulk()`
- [x] Introduce new methods for reading data in IRuleExecutionLogClient:
  - for reading last N execution events for 1 rule from event log
  - for reading current status and metrics for 1 rule from legacy status SOs
  - for reading current statuses and metrics for N rules from legacy status SOs
- [x] New methods should return data in the legacy status SO format.
- [x] Update all the existing endpoints that depend on `IRuleExecutionLogClient` to use the new methods.
- [x] Implement a new internal endpoint for fetching current status of the rule execution and execution events from Event Log for a given rule.
- [x] The API of the new endpoint should be the same as `rules/_find_statuses` to minimise changes in the app.
- [x] Use the new endpoint on the Rule Details page.

## Near-term plan for technical implementation of the Rule Execution Log (#101013)

**Stage 1. Reading last 5 failures from Event Log v1 - raw implementation** - ✔️ done in this PR

TL;DR: New internal endpoint for reading data from Event Log (raw version), legacy status SO under the hood.

- Deprecate existing methods for reading data in `IRuleExecutionLogClient`: `.find()` and `.findBulk()`
- Introduce new methods for reading data in IRuleExecutionLogClient:
  - for reading last N execution events for 1 rule from event log
  - for reading current status and metrics for 1 rule from legacy status SOs
  - for reading current statuses and metrics for N rules from legacy status SOs
- New methods should return data in the legacy status SO format.
- Update all the existing endpoints that depend on `IRuleExecutionLogClient` to use the new methods.
- Implement a new internal endpoint for fetching current status of the rule execution and execution events from Event Log for a given rule.
- The API of the new endpoint should be the same as `rules/_find_statuses` to minimise changes in the app.
- Use the new endpoint on the Rule Details page.

**Stage 2: Reading last 5 failures from Event Log v2 - clean implementation**

TL;DR: Clean HTTP API, legacy Rule Status SO under the hood.

🚨🚨🚨 Possible breaking changes in Detections API 🚨🚨🚨

- Design a new data model for the Current Rule Execution Info (the TO-BE new SO type and later the TO-BE data in the rule object itself).
- Design a new data model for the Rule Execution Event (read model to be used on the Rule Details page)
- Think over changes in `IRuleExecutionLogClient` to support the new data model.
- Think over changes in all the endpoints that return any data related to rule monitoring (statuses, metrics, etc). Make sure to check our docs to identify what's documented there regarding rule monitoring.
- Update `IRuleExecutionLogClient` to return data in the new format. 
- Update all the endpoints (including the raw new one) to return data in the new format.
- Update Rule Details page to consume data in the new format.
- Update Rule Management page to consume data in the new format.

**Stage 3: Reading last 5 failures from Event Log v3 - new SO**

TL;DR: Clean HTTP API, new Rule Execution Info SO under the hood.

- Implement a new SO type for storing the current rule execution info. Relation type: 1 rule - 1 current execution info.
- Swap the legacy SO with the new SO in the implementation of `IRuleExecutionLogClient`.

**Stage 4: Cleanup and misc**

- Revisit the problem of deterministic ordering ([comment](#115574 (comment)))
- Remove rule execution log's glue code: adapters, feature switch.
- Remove the legacy rule status SO.
- Mark the legacy rule status SO as deleted in Kibana Core.
- Encapsulate the current space id in the instance of IRuleExecutionLogClient. Remove it from parameters of its methods.
- Introduce a Rule Execution Logger scoped to a rule instance. For use in rule executors.
- Add test coverage.

### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios

Co-authored-by: Georgii Gorbachev <georgii.gorbachev@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated: Automatically backport this PR after it's merged Feature:Rule Monitoring Security Solution Detection Rule Monitoring release_note:skip Skip the PR/issue when compiling release notes Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.0.0 v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants