Calling abort rather than finalize on permanent failure #449

hamersaw · 2022-06-03T01:21:58Z

TL;DR

Currently FlytePropeller calls "finalize" rather than "abort" on permanent failures. Consequently, during instances where a resource is actively running we fail to abort it. For example when the number of timeouts exceeds the maximum number of retries we automatically transition from a retryable failure to a permanent failure.

Type

Bug Fix
Feature
Plugin

Are all requirements met?

Complete description

I do not believe there are any unintended side-effects by calling abort. If a node has not yet been executed (ex. dynamic, subworkflow, etc) the abort should still be successful. If this is not the case, we will need to revisit this logic to ensure we fail correctly.

Tracking Issue

fixes flyteorg/flyte#2298

Follow-up issue

NA

Signed-off-by: Daniel Rammer <daniel@union.ai>

codecov · 2022-06-03T01:27:08Z

Codecov Report

Merging #449 (65abf93) into master (ac88f95) will decrease coverage by 0.00%.
The diff coverage is 42.85%.

❗ Current head 65abf93 differs from pull request most recent head 1069fe2. Consider uploading reports for the commit 1069fe2 to get more accurate results

EngHabu · 2022-06-03T02:11:23Z

pkg/controller/nodes/executor.go

@@ -659,7 +659,7 @@ func (c *nodeExecutor) handleNode(ctx context.Context, dag executors.DAGStructur

 	if currentPhase == v1alpha1.NodePhaseFailing {
 		logger.Debugf(ctx, "node failing")
-		if err := c.finalize(ctx, h, nCtx); err != nil {


Should we, instead, change finalize implementation in webapi to call delete?

Having both finalize and abort has always bothered me.. I think we only need one (perhaps finalize)... If you look at the implementations in all plugins, the code is pretty much the same... And when it isn't, it's usually a bug waiting to happen

I can certainly do that. I had two thoughts:
(1) The plugin abstraction, especially since users may write there own, makes it difficult to know if this issue exists in other plugins as well as the webapi async. On a permanent failure we should probably be attempting to abort anyways right? If we make the change in the webapi plugin are we potentially missing out on this fix in other plugins?
(2) I agree the two calls is confusing - especially when they are very similar. Since the existing node-level abort implementation calls both Abort and Finaliz if we introduce a Delete call in the webapi plugin are there any concerns about handling ResourceNotFound errors in webapi plugins? Just want to make sure we are not going to introduce downstream issues in custom plugins. This should already be handled right?

EngHabu

👍🏽

Signed-off-by: Daniel Rammer <daniel@union.ai>

calling abort rather than finalize on permanent failure

1069fe2

Signed-off-by: Daniel Rammer <daniel@union.ai>

hamersaw requested review from kumare3 and EngHabu as code owners June 3, 2022 01:21

EngHabu reviewed Jun 3, 2022

View reviewed changes

EngHabu approved these changes Jun 6, 2022

View reviewed changes

hamersaw merged commit ac293ed into master Jun 6, 2022

hamersaw deleted the bug/abort-on-permanent-failure branch June 6, 2022 22:30

eapolinario pushed a commit to eapolinario/flytepropeller that referenced this pull request Aug 9, 2023

calling abort rather than finalize on permanent failure (flyteorg#449)

d0a8c57

Signed-off-by: Daniel Rammer <daniel@union.ai>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calling abort rather than finalize on permanent failure #449

Calling abort rather than finalize on permanent failure #449

hamersaw commented Jun 3, 2022 •

edited

codecov bot commented Jun 3, 2022

EngHabu Jun 3, 2022

hamersaw Jun 3, 2022

EngHabu left a comment

Calling abort rather than finalize on permanent failure #449

Calling abort rather than finalize on permanent failure #449

Conversation

hamersaw commented Jun 3, 2022 • edited

TL;DR

Type

Are all requirements met?

Complete description

Tracking Issue

Follow-up issue

codecov bot commented Jun 3, 2022

Codecov Report

EngHabu Jun 3, 2022

Choose a reason for hiding this comment

hamersaw Jun 3, 2022

Choose a reason for hiding this comment

EngHabu left a comment

Choose a reason for hiding this comment

hamersaw commented Jun 3, 2022 •

edited