Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancer and HealthUtil now differentiate between fatal and transient errors #1427

Merged
merged 5 commits into from
Mar 18, 2020

Conversation

zen-dog
Copy link
Contributor

@zen-dog zen-dog commented Mar 17, 2020

Summary:
previously any error returned by the Enhancer was treated as fatal. It made sense because it would act deterministically on a prepared set of resources. However, since #1319 enhancer also uses the discovery interface to determine whether or not the given resource is namespaced. These API requests may fail and must be retried. With this fix, DefaultEnhancer clearly marks certain errors as fatal.

Fixes #1413

Signed-off-by: Aleksey Dukhovniy alex.dukhovniy@googlemail.com

Summary:
previously any error returned by the `DefaultEnhancer` was treated as fatal. It made sense because it would act deterministically on a prepared set of resources. However, since #1319 enhancer also uses discovery interface to determine whether or not given resource is namespaced. These API requests may fail and must be retried, hence the introduction of a transient error.

Fixes #1413

Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
Removed `engine.ErrTransientExecution` as all non-fatal errors are now treated as transient

Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
@zen-dog zen-dog changed the title Fixing for enhancer's transient errors Enhancer now differentiates between fatal and transient errors Mar 17, 2020
Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
…at are not recoverable

Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
@@ -83,6 +70,10 @@ func IsHealthy(obj runtime.Object) error {
log.Printf("HealthUtil: Job \"%v\" is marked healthy", obj.Name)
return nil
}
if terminal, msg := isJobTerminallyFailed(obj); terminal {
return fmt.Errorf("%wHealthUtil: Job \"%v\" has failed terminally: %s", engine.ErrFatalExecution, obj.Name, msg)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we use engine.ErrFatalExecution marker to denote health errors that are likely not recoverable. The caller may or may not act upon this information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing I don't like about it is that ErrFatalExecution is defined as errors.New("fatal error: ") and that fatal error: string breaks error message readability.

@@ -58,8 +60,8 @@ func (at ApplyTask) Run(ctx Context) (bool, error) {
// 4. - Check health for all resources -
err = isHealthy(applied)
if err != nil {
if fatal := isTerminallyFailed(applied); fatal != nil {
return false, fatalExecutionError(fatal, failedTerminalState, ctx.Meta)
if errors.Is(err, engine.ErrFatalExecution) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We return a type engine.ExecutionError struct if we encountered a fatal health error.

@@ -239,16 +241,6 @@ func isHealthy(ro []runtime.Object) error {
return nil
}

func isTerminallyFailed(ro []runtime.Object) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably this is a better separation of concerns: determining whether a resource health error is terminal is better left to the HealthUtil

@zen-dog zen-dog changed the title Enhancer now differentiates between fatal and transient errors Enhancer and HealthUtil now differentiate between fatal and transient errors Mar 17, 2020
Copy link
Member

@porridge porridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern is that (AFAICT) the unit of retrying is the whole Apply operation, which runs on potentially dozens of templates/objects. This might amplify transient issues and not really help, unless we have caching for IsNamespacedObject 🤔

pkg/engine/health/health.go Outdated Show resolved Hide resolved
@@ -193,5 +196,5 @@ func (k *testEnhancer) Apply(templates map[string]string, metadata renderer.Meta
type errorEnhancer struct{}

func (k *errorEnhancer) Apply(templates map[string]string, metadata renderer.Metadata) ([]runtime.Object, error) {
return nil, errors.New("always error")
return nil, fmt.Errorf("%wsomething bad happens", engine.ErrFatalExecution)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return nil, fmt.Errorf("%wsomething bad happens", engine.ErrFatalExecution)
return nil, fmt.Errorf("%wsomething bad happens every time", engine.ErrFatalExecution)

Signed-off-by: Aleksey Dukhovniy <alex.dukhovniy@googlemail.com>
@zen-dog
Copy link
Contributor Author

zen-dog commented Mar 18, 2020

This might amplify transient issues and not really help, unless we have caching for IsNamespacedObject

Correct. Thought fatally failing a plan, because an API request timeout is less than ideal too. Caching for the rescue!

@zen-dog zen-dog merged commit b5f1f29 into master Mar 18, 2020
@zen-dog zen-dog deleted the ad/transient-enhancer-err branch March 18, 2020 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

failed to determine if [Service] is namespaced: "server could not find the requested resource"
3 participants