Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ResponseOps]: Refactor alerting task runner - combine loadRuleAttributesAndRun() and validateAndExecuteRule() #132079

Merged
merged 6 commits into from
Jun 14, 2022

Conversation

pmuellr
Copy link
Member

@pmuellr pmuellr commented May 11, 2022

resolves #131544

Summary

Extract the loadRuleAttributesAndRun() and validateAndExecuteRule() methods from the alerting task manager, into a separate module.

Also adds a new RuleExecutionStatusErrorReasons - validate. Previously, validation errors ended up with an unknown reason, now validation is a first class error reason.

meta issue: #124206

Checklist

Delete any items that are not applicable to this PR.

@pmuellr pmuellr force-pushed the alerting/refactor-tr-131544 branch 4 times, most recently from 67b868f to ee073ca Compare May 19, 2022 19:05
@pmuellr pmuellr force-pushed the alerting/refactor-tr-131544 branch 2 times, most recently from 25a3ab2 to 4de3842 Compare May 26, 2022 19:49
…tributesAndRun() and validateAndExecuteRule()

resolves elastic#131544

Extract the `loadRuleAttributesAndRun()` and `validateAndExecuteRule()`
methods from the alerting task manager, into a separate module.

meta issue: elastic#124206
@pmuellr pmuellr force-pushed the alerting/refactor-tr-131544 branch from 4de3842 to a152bfd Compare June 2, 2022 14:22
try {
validatedParams = validateRuleTypeParams<Params>(rule.params, paramValidator);
} catch (err) {
throw new ErrorWithReason(RuleExecutionStatusErrorReasons.Validate, err);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in task_runner this was abstracted from didn't catch a validation exception, and previously ended up with a reason of Unknown set back in task runner. Now, it's an explicit reason.

@@ -183,7 +183,7 @@ export default function executionStatusAlertTests({ getService }: FtrProviderCon
await ensureAlertUpdatedAtHasNotChanged(alertId, alertUpdatedAt);
});

it('should eventually have error reason "unknown" when appropriate', async () => {
it('should eventually have error reason "validate" when appropriate', async () => {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned ^^^, previously validation errors returned a reason of unknown, but now return the new reason. I didn't see an obvious way to test the unknown status anymore, which I guess is kinda good!

@@ -136,6 +143,7 @@ export const rulesErrorReasonTranslationsMapping = {
license: ALERT_ERROR_LICENSE_REASON,
timeout: ALERT_ERROR_TIMEOUT_REASON,
disabled: ALERT_ERROR_DISABLED_REASON,
validate: ALERT_ERROR_VALIDATE_REASON,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way the reason type is defined, this code generated an error until I added this new reason. Quite nice - I knew I'd have to do something in the UX, wasn't sure where, and didn't have to go hunting for it, just run the type checker!

@pmuellr pmuellr changed the title WIP [ResponseOps]: Refactor alerting task runner - combine loadRuleAttributesAndRun() and validateAndExecuteRule() [ResponseOps]: Refactor alerting task runner - combine loadRuleAttributesAndRun() and validateAndExecuteRule() Jun 2, 2022
@pmuellr pmuellr marked this pull request as ready for review June 2, 2022 16:22
@pmuellr pmuellr requested a review from a team as a code owner June 2, 2022 16:22
@pmuellr pmuellr added the Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) label Jun 2, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@pmuellr pmuellr added Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework v8.4.0 release_note:skip Skip the PR/issue when compiling release notes backport:skip This commit does not require backporting labels Jun 2, 2022
@@ -599,12 +549,12 @@ export class TaskRunner<
params: { alertId: ruleId, spaceId },
} = this.taskInstance;
try {
const decryptedAttributes = await this.getDecryptedAttributes(ruleId, spaceId);
const decryptedAttributes = await getDecryptedAttributes(this.context, ruleId, spaceId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to get a new fake request and rules client again here? Could we pass the rules client we get back from loadRule and use it here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, ya. Good catch - we should have caught that when the method was originally added, but who would have caught it in this maze‽‽‽. We only need fakeRequest to create the rulesClient, so was able to cut out a bit more ...

I've had to now use PublicMethodsOf<RulesClient> numerous times now, so going to create an alias of that as well named RulesClientApi.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code in commit 24da47b

jest.restoreAllMocks();
});

describe('loadRule()', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we could remove some of the tests inside task_runner.test.ts that test for errors when loading the rule since we have these tests here? I think as long as we have test coverage for what happens in the event of these various failures and then one test inside the task runner test for what happens when loadRule throws an error, we should be covered right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not real comfortable with that, though it would be a nice goal. The error tests I saw were also checking some other side effects of the error processing, so we'd be missing those tests.

Were there some specific tests you were thinking of? Maybe I missed some obvious ones we could remove ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just quickly looking at the tests, I was thinking:

  • validates params before running the rule type
  • uses API key when provided
  • doesn't use API key when not provided
  • recovers gracefully when the Alert Task Runner throws an exception when fetching the encrypted attributes
  • recovers gracefully when the Alert Task Runner throws an exception when license is higher than supported
  • recovers gracefully when the Alert Task Runner throws an exception when getting internal Services
  • recovers gracefully when the Alert Task Runner throws an exception when fetching attributes
  • successfully bails on execution if the rule is disabled

These all seem to test things inside the rule loading? But if you're not comfortable with it, we can save it for a later time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at validates params before running the rule type, it's testing the result of taskRunner.run(), so we'd lose the test of that. I'm not sure how critical that is, and we'd have to hunt down if we are testing all these in another way. Some of the other tests are checking the usageCounter and eventLog - again, not sure.

My current thought is that once we've boiled this module down a bit more, we'll probably see some streamlining we can do, including with the tests. So feels like not the best time to be doing this.

I can open an issue to track, we could add it to the task runner meta issue ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created #133831 to track and added to the meta issue #124206

Copy link
Contributor

@ymao1 ymao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@pmuellr
Copy link
Member Author

pmuellr commented Jun 7, 2022

@elasticmachine merge upstream

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
triggersActionsUi 810.8KB 811.0KB +174.0B

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
alerting 19 20 +1

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
alerting 38.4KB 38.4KB +22.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@@ -567,17 +521,17 @@ export class TaskRunner<
};
}

private async validateAndExecuteRule(
private async prepareAndExecuteRule(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cant we just move getExecutionHandler into executeRule and get rid of this prepareAndExecuteRule function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems possible; let me give it a go ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seemed to work fine - this was done in commit 5d00b9c

@pmuellr
Copy link
Member Author

pmuellr commented Jun 13, 2022

@elasticmachine merge upstream

@pmuellr pmuellr requested a review from ersin-erdal June 13, 2022 19:37
@pmuellr
Copy link
Member Author

pmuellr commented Jun 14, 2022

@elasticmachine merge upstream

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
triggersActionsUi 831.7KB 831.9KB +174.0B

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
alerting 19 20 +1

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
alerting 38.4KB 38.4KB +22.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Response Ops] Refactor alerting task runner - combine loadRuleAttributesAndRun() and validateAndExecuteRule()
6 participants