Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RAM] Update rule status #140882

Merged
merged 42 commits into from Nov 14, 2022
Merged

[RAM] Update rule status #140882

merged 42 commits into from Nov 14, 2022

Conversation

XavierM
Copy link
Contributor

@XavierM XavierM commented Sep 16, 2022

Summary

Resolves the parent issue: #136039

Also resolves the subtasks:

This is the backend portion of the consolidated rule status feature. It mainly contains changes to the rules_client.ts and task_runner.ts to support the new consolidated rule statuses.

This PR added a new property: lastRun to the rules saved object to hold the new rule outcome statuses (succeeded, warning, and failed) as the new simplified rule status over the existing executionStatus property. However, we are keeping the old executionStatus so we can slowly migrate the rest of the application to use the new lastRun outcomes.

In addition, we have enriched the monitoring property to be the source of truth for metrics related to the last run (as well as new fields that other plugins will find useful). We also added a monitoring service that allows other plugins to easily add data to the monitoring field.

To test this PR, please use #144466 since it has both the frontend and backend changes.

Checklist

@XavierM XavierM requested a review from a team as a code owner September 16, 2022 17:16
@XavierM XavierM marked this pull request as draft September 16, 2022 17:17
@XavierM XavierM added release_note:enhancement Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.5.0 labels Sep 16, 2022
@XavierM XavierM changed the title [RAM] Add Stats on top of rules log [RAM] Update rule status Sep 16, 2022
@XavierM XavierM marked this pull request as ready for review November 2, 2022 17:08
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@@ -255,5 +251,80 @@ export const alertMappings: SavedObjectsTypeMappingDefinition = {
},
},
},
running: {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's delete running, we are not using it and too scare to make another call to SO. Let's wait and see

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was going to be my question actually...if I create a rule, what is the status prior to the first run? Or if I disable it before its run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we were thinking that we can tell an approximation time when the rule is going to run. if you disable it before its run, we won't show anything, do you have a better idea?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After chatting, we opted to simply show -- if the rule has never run. My initial concern was if I create a rule and see nothing in this column, I would wonder if the rule is starting or not. So for this scenario, we'll also show the -- but do a similar treatment as we do for Stat loading indicator. We felt this was the simplest solution at this time and still indicated to the user that something is happening with the rule after I create it.

Copy link
Contributor

@ymao1 ymao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did initial code review only of the task runner. Still need to look at the rules client and migration and pull it down to run it. Looking good tho!

monitoring: getDefaultRuleMonitoring(),
running: false,
executionStatus: getRuleExecutionStatusPending(lastRunTimestamp.toISOString()),
monitoring: getDefaultMonitoring(lastRunTimestamp.toISOString()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a rule is created, before running, it looks like monitoring is set to this:

"monitoring": {
  "run": {
    "history": [],
    "calculated_metrics": {
      "success_ratio": 0
    },
    "last_run": {
      "timestamp": "2022-11-03T13:06:31.486Z",
      "metrics": {
      "duration": 0,
        "total_search_duration_ms": null,
        "total_indexing_duration_ms": null,
        "total_alerts_detected": null,
        "total_alerts_created": null,
        "gap_duration_s": null
      }
    }
  }
}

Should those initial duration be null since we don't know it? Also is setting last_run.timestamp to the current date confusing since it hasn't run yet?

I'm thinking of a case where a rule is created in an overloaded Kibana cluster, so it might take longer than expected for the task to get claimed and run. In the UI, this last_run.timestamp would show up and give the impression that the rule has already run? Maybe I'm overthinking :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point, having 0 for the duration is definitely misleading

As for the timestamp, we're essentially mirroring existing executionStatus -> lastExecutionDate field with the lastRun -> timestamp field. So I guess setting the timestamp during create was existing behavior but we could fix it for both of the existing behavior if it doesn't make sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That make sense!

@@ -783,7 +799,7 @@ export class TaskRunner<

return { interval: retryInterval };
}),
monitoring,
monitoring: this.ruleMonitoring.getMonitoring(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After 1 execution, my rule saved object looks like:

{
	... other fields
	"executionStatus": {
		"status": "active",
		"lastExecutionDate": "2022-11-03T13:09:15.575Z",
		"error": null,
		"warning": null,
		"lastDuration": 7270
	},
	"monitoring": {
		"run": {
			"history": [{
				"duration": 7270,
				"success": true,
				"timestamp": 1667480955574
			}],
			"calculated_metrics": {
				"success_ratio": 1,
				"p99": 7270,
				"p50": 7270,
				"p95": 7270
			},
			"last_run": {
				"timestamp": "2022-11-03T13:09:15.574Z",
				"metrics": {
					"duration": 0,
					"total_search_duration_ms": null,
					"total_indexing_duration_ms": null,
					"total_alerts_detected": null,
					"total_alerts_created": null,
					"gap_duration_s": null
				}
			}
		}
	},
	"lastRun": {
		"alertsCount": {
			"new": 5,
			"ignored": 0,
			"recovered": 0,
			"active": 5
		},
		"outcomeMsg": null,
		"warning": null,
		"outcome": "succeeded"
	},
}

The monitoring.run.last_run.duration field doesn't look like it's been updated to the last duration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure this was covered in the RFC and I'm behind, but why a separate warning and outcomeMsg field? The outcome can be succeeded | failed | warning so I would expect any warnings to show up as outcome: 'warning', outcomeMsg: <warning message>?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes good catch WRT the duration, will fix that.

From the RFC, the warning field is described to act sort of like a decorator to the outcome. My understanding is the outcome could be successful but there might have been a special flag raised in the rule run that should be noted. So in this case the warning field gets mapped to reasons. @XavierM could correct me if I'm wrong here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

@JiaweiWu
Copy link
Contributor

@elasticmachine merge upstream

@JiaweiWu JiaweiWu removed the request for review from a team November 14, 2022 06:37
Copy link
Contributor

@ymao1 ymao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@JiaweiWu
Copy link
Contributor

@elasticmachine merge upstream

@JiaweiWu
Copy link
Contributor

@elasticmachine merge upstream

@JiaweiWu
Copy link
Contributor

@elasticmachine merge upstream

@XavierM XavierM enabled auto-merge (squash) November 14, 2022 20:13
@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
alerting 379 406 +27

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
alerting 26 27 +1

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
alerting 38.7KB 39.2KB +443.0B

Saved Objects .kibana field count

Every field in each saved object type adds overhead to Elasticsearch. Kibana needs to keep the total field count below Elasticsearch's default limit of 1000 fields. Only specify field mappings for the fields you wish to search on or query. See https://www.elastic.co/guide/en/kibana/master/saved-objects-service.html#_mappings

id before after diff
alert 74 95 +21
Unknown metric groups

API count

id before after diff
alerting 388 415 +27

ESLint disabled in files

id before after diff
osquery 1 2 +1

ESLint disabled line counts

id before after diff
alerting 68 70 +2
enterpriseSearch 19 21 +2
fleet 59 65 +6
osquery 108 113 +5
securitySolution 441 447 +6
total +21

References to deprecated APIs

id before after diff
alerting 84 96 +12

Total ESLint disabled count

id before after diff
alerting 69 71 +2
enterpriseSearch 20 22 +2
fleet 67 73 +6
osquery 109 115 +6
securitySolution 518 524 +6
total +22

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@XavierM XavierM merged commit e9feb06 into elastic:main Nov 14, 2022
@kibanamachine kibanamachine added v8.6.0 backport:skip This commit does not require backporting labels Nov 14, 2022
jloleysens added a commit to jloleysens/kibana that referenced this pull request Nov 15, 2022
* main: (65 commits)
  Migrate server-side `Root` and `Server` to packages (elastic#144990)
  [Discover] Handle no data views state for `esQuery` alert (elastic#145052)
  [ML] Allow updates for number of allocations and priority for trained model deployments (elastic#144704)
  [api-docs] 2022-11-15 Daily api_docs build (elastic#145203)
  [Security solution] remove guided onboarding feature flag (elastic#144247)
  [DOCS] Automate final case APIs (elastic#145007)
  [Enterprise Search] Name and description flyout for connectors (elastic#143827)
  [Guided onboarding] Update header button logic (elastic#144634)
  [Lens] Multi metric partition charts (elastic#143966)
  [Dashboard] [Controls] Add unmapped runtime field support to options list (elastic#144947)
  [Security Solution] Add Task Metric Collection to New Tasks (elastic#145181)
  [TriggersActionsUi] disable jest config in CI (elastic#145186)
  [TableListView] Enhance tag filtering (elastic#142108)
  [Cloud Posture] Compliance by CIS section table (elastic#145114)
  [8.6][Session View] Fix hidden alert flyout  in session view (elastic#145141)
  [customIntegrations] async load all components (elastic#145166)
  Fix time for logs smoke tests in integration test (elastic#145130)
  [RAM] Update rule status (elastic#140882)
  Update babel (main) (elastic#145060)
  [Actionable Observability] Add context.alertDetailsUrl variable to action connector template for APM rule types (elastic#144791)
  ...
JiaweiWu added a commit that referenced this pull request Nov 15, 2022
## Summary
Parent issue for updating rule status:
#136039
Frontend issue: #145191

Backend PR: #140882

Updates the rules list and rules details page to support the new
consolidated statuses. With E2E and unit testing.

Rules list:
- Table cell values
- Last response filter
- Table cell filtering
- Status aggregations

Rule details:
- Rule status summary
- KPI headers renaming
- Event log cells renaming


![dashdash](https://user-images.githubusercontent.com/74562234/201778676-775f58e9-6707-4972-a1ca-2dcf71befc5b.png)


![rule_details_consolidate](https://user-images.githubusercontent.com/74562234/201778792-f03c368a-3b0d-43cf-805e-f8151b4b96ae.png)

### Checklist
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

Co-authored-by: Xavier Mouligneau <xavier.mouligneau@elastic.co>
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
benakansara pushed a commit to benakansara/kibana that referenced this pull request Nov 17, 2022
## Summary
Parent issue for updating rule status:
elastic#136039
Frontend issue: elastic#145191

Backend PR: elastic#140882

Updates the rules list and rules details page to support the new
consolidated statuses. With E2E and unit testing.

Rules list:
- Table cell values
- Last response filter
- Table cell filtering
- Status aggregations

Rule details:
- Rule status summary
- KPI headers renaming
- Event log cells renaming


![dashdash](https://user-images.githubusercontent.com/74562234/201778676-775f58e9-6707-4972-a1ca-2dcf71befc5b.png)


![rule_details_consolidate](https://user-images.githubusercontent.com/74562234/201778792-f03c368a-3b0d-43cf-805e-f8151b4b96ae.png)

### Checklist
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

Co-authored-by: Xavier Mouligneau <xavier.mouligneau@elastic.co>
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
maximpn pushed a commit that referenced this pull request Jan 27, 2023
…ead of saved object (#147035)

**Addresses:** #130966
**Based on:** #135127

## Summary

This PR deprecates the Sidecar SO of type `siem-detection-engine-rule-execution-info` in favour of storing Rule Execution Logging data within the rule itself, making use of the work previously done in the Alerting Framework:
- #140882
- #147278

Work done:
- **Pass execution statuses and metrics from rule executors to the Framework:** through the use of `RuleMonitoringService` and `RuleResultService` from within the rule execution log client for executor. `x-pack/plugins/security_solution/server/lib/detection_engine/rule_monitoring/logic/rule_execution_log/client_for_executors/client.ts`
- **Fetch execution statuses and metrics from rules themselves instead of the sidecar `siem-detection-engine-rule-execution-info` saved objects**: through the use of the new function `createRuleExecutionSummary` in `x-pack/plugins/security_solution/server/lib/detection_engine/rule_monitoring/logic/rule_execution_log/create_rule_execution_summary.ts`, which extracts last execution information from the rule itself.
- **Remove the siem-detection-engine-rule-execution-info saved objects type from the codebase. Mark it as deleted in Kibana Core:** added `siem-detection-engine-rule-execution-info` to `packages/core/saved-objects/core-saved-objects-migration-server-internal/src/core/unused_types.ts`; and got rid of the related Saved Object client.
- **Make sure to keep backward compatibility in the Detection API endpoints and rule execution events we write into the Event Log**: API compatibility is maintained. No breaking changes.


### Checklist

- [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials
- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
kqualters-elastic pushed a commit to kqualters-elastic/kibana that referenced this pull request Feb 6, 2023
…ead of saved object (elastic#147035)

**Addresses:** elastic#130966
**Based on:** elastic#135127

## Summary

This PR deprecates the Sidecar SO of type `siem-detection-engine-rule-execution-info` in favour of storing Rule Execution Logging data within the rule itself, making use of the work previously done in the Alerting Framework:
- elastic#140882
- elastic#147278

Work done:
- **Pass execution statuses and metrics from rule executors to the Framework:** through the use of `RuleMonitoringService` and `RuleResultService` from within the rule execution log client for executor. `x-pack/plugins/security_solution/server/lib/detection_engine/rule_monitoring/logic/rule_execution_log/client_for_executors/client.ts`
- **Fetch execution statuses and metrics from rules themselves instead of the sidecar `siem-detection-engine-rule-execution-info` saved objects**: through the use of the new function `createRuleExecutionSummary` in `x-pack/plugins/security_solution/server/lib/detection_engine/rule_monitoring/logic/rule_execution_log/create_rule_execution_summary.ts`, which extracts last execution information from the rule itself.
- **Remove the siem-detection-engine-rule-execution-info saved objects type from the codebase. Mark it as deleted in Kibana Core:** added `siem-detection-engine-rule-execution-info` to `packages/core/saved-objects/core-saved-objects-migration-server-internal/src/core/unused_types.ts`; and got rid of the related Saved Object client.
- **Make sure to keep backward compatibility in the Detection API endpoints and rule execution events we write into the Event Log**: API compatibility is maintained. No breaking changes.


### Checklist

- [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials
- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
darnautov pushed a commit to darnautov/kibana that referenced this pull request Feb 7, 2023
…ead of saved object (elastic#147035)

**Addresses:** elastic#130966
**Based on:** elastic#135127

## Summary

This PR deprecates the Sidecar SO of type `siem-detection-engine-rule-execution-info` in favour of storing Rule Execution Logging data within the rule itself, making use of the work previously done in the Alerting Framework:
- elastic#140882
- elastic#147278

Work done:
- **Pass execution statuses and metrics from rule executors to the Framework:** through the use of `RuleMonitoringService` and `RuleResultService` from within the rule execution log client for executor. `x-pack/plugins/security_solution/server/lib/detection_engine/rule_monitoring/logic/rule_execution_log/client_for_executors/client.ts`
- **Fetch execution statuses and metrics from rules themselves instead of the sidecar `siem-detection-engine-rule-execution-info` saved objects**: through the use of the new function `createRuleExecutionSummary` in `x-pack/plugins/security_solution/server/lib/detection_engine/rule_monitoring/logic/rule_execution_log/create_rule_execution_summary.ts`, which extracts last execution information from the rule itself.
- **Remove the siem-detection-engine-rule-execution-info saved objects type from the codebase. Mark it as deleted in Kibana Core:** added `siem-detection-engine-rule-execution-info` to `packages/core/saved-objects/core-saved-objects-migration-server-internal/src/core/unused_types.ts`; and got rid of the related Saved Object client.
- **Make sure to keep backward compatibility in the Detection API endpoints and rule execution events we write into the Event Log**: API compatibility is maintained. No breaking changes.


### Checklist

- [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials
- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.6 candidate backport:skip This commit does not require backporting release_note:enhancement Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants