Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement wanted error handling for alerting and actions #41917

Merged
merged 13 commits into from
Jul 30, 2019

Conversation

mikecote
Copy link
Contributor

@mikecote mikecote commented Jul 24, 2019

This PR implements what is wanted for error handling in alerting and actions as outlined in #39349.

The following changes have been made:

  • Action types have a new maxAttempts attribute that is optional
  • Executions returning status of error will be able to recommend retry logic via retry attribute
  • Executors are no longer passed scheduledRunAt and previousScheduledRunAt. Those are replaced with startedAt and previousStartedAt.

@mikecote mikecote self-assigned this Jul 24, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-stack-services

@elasticmachine

This comment has been minimized.

@mikecote mikecote force-pushed the alerting/error-handling branch 3 times, most recently from 88012ae to d145f25 Compare July 26, 2019 16:47
@elasticmachine

This comment has been minimized.

@elasticmachine

This comment has been minimized.

@mikecote mikecote marked this pull request as ready for review July 29, 2019 13:28
@mikecote mikecote requested a review from a team July 29, 2019 13:28
@elasticmachine

This comment has been minimized.

return error.retry == null ? true : error.retry;
}
// Retry other kinds of errors
return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to have this return false by default. Idea being that an action will know when something is retry-able - setting retry to true or a Date - and any other case, should not be retryable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess by looking here https://github.com/elastic/kibana/blob/master/x-pack/legacy/plugins/actions/server/lib/execute.ts#L29-L74 any code that could throw here is already being handled and wrapped into a proper ExecutionError. Some may need to return retry: false ex validation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and made it false for other kinds of errors. I can't think of a reason to retry totally unhandled errors.

Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a few comments - just one in the review, a few "single" comments.

My only real concern is defaulting getRetry() to true in x-pack/legacy/plugins/actions/server/action_type_registry.ts; perhaps I'm not thinking about that right. And we could always tweak it later ...

@@ -85,23 +94,22 @@ test('successfully executes the task', async () => {
const runnerResult = await runner.run();
expect(runnerResult).toMatchInlineSnapshot(`
Object {
"runAt": 2019-06-03T18:55:30.982Z,
"runAt": 1970-01-01T00:00:10.000Z,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting the date is set to such an old epoch-y date. My only thought is that at some point in the future, if we ended up needing to calculate a previous date from this, somehow, the epoch millis might go negative. Which node should handle, but I wonder if some other part of the system might choke on negative numbers.

$ node -p 'new Date(-1)'
1969-12-31T23:59:59.999Z

OTOH, I guess would actually be a good test, so ... leave it in!?!?!?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this seems to be what happens when you use fake timers, everything becomes new Date(0) by default 🤔

@elasticmachine

This comment has been minimized.

Copy link
Contributor

@bmcconaghy bmcconaghy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM, just wondering about defaulting to retrying.

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@mikecote mikecote merged commit 9a321ae into elastic:master Jul 30, 2019
mikecote added a commit to mikecote/kibana that referenced this pull request Jul 30, 2019
)

* Implement wanted error handling

* Cleanup

* Add retry logic to actions

* Leverage startedAt from task manager

* Fix broken jest tests

* Add missing unit test

* Add unit tests for getRetry

* Add test for rate limit

* Remove fake timers

* Don't retry errors by default for actions unless the proper result format is returned with status: error

* Don't retry unless attribute specified

* Fix tests
mikecote added a commit to mikecote/kibana that referenced this pull request Jul 30, 2019
)

* Implement wanted error handling

* Cleanup

* Add retry logic to actions

* Leverage startedAt from task manager

* Fix broken jest tests

* Add missing unit test

* Add unit tests for getRetry

* Add test for rate limit

* Remove fake timers

* Don't retry errors by default for actions unless the proper result format is returned with status: error

* Don't retry unless attribute specified

* Fix tests
mikecote added a commit that referenced this pull request Jul 30, 2019
… (#42255)

*  Implement wanted error handling for alerting and actions (#41917)

* Implement wanted error handling

* Cleanup

* Add retry logic to actions

* Leverage startedAt from task manager

* Fix broken jest tests

* Add missing unit test

* Add unit tests for getRetry

* Add test for rate limit

* Remove fake timers

* Don't retry errors by default for actions unless the proper result format is returned with status: error

* Don't retry unless attribute specified

* Fix tests

* Increase retry timeout to prevent flaky tests (#42291)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Alerting release_note:skip Skip the PR/issue when compiling release notes review v7.4.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants