Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Added IsFailurePermanent flag on DynamicTaskStatus #567

Merged
merged 8 commits into from
Jul 6, 2023

Conversation

hamersaw
Copy link
Contributor

@hamersaw hamersaw commented May 25, 2023

TL;DR

Currently dynamic tasks treat all failures and retryable which results in unintended behavior when subtasks are attempted multiple times. This PR adds a IsFailurePermanent field on the DynamicTaskStatus struct to indicate that the failure is permanent, and therefore should not be recovered.

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

The choice to use a new flag is entirely out of backwards compatibility. Other options include:
(1) Using the ExecutionError in flyteplugins io rather than flyteidl ExecutionError: This would make retrieval of in progress dynamic task status fail when it expects a different error type on the field.
(2) Naming the flag IsRecoverable vs isPermanent: This is difficult because intuition says that we should use IsRecoverable which adheres more to the Flyte standard. The biggest issue is that dynamics treat all failures as retryable by default now, so this would change the default to be non-recoverable (ie. false for IsRecoverable) which changes the default behavior. It is VERY difficult to ensure there will be no regressions.

Tracking Issue

fixes flyteorg/flyte#3606

Follow-up issue

NA

Signed-off-by: Daniel Rammer <daniel@union.ai>
@codecov
Copy link

codecov bot commented May 25, 2023

Codecov Report

Merging #567 (8e18bfe) into master (206013a) will increase coverage by 0.39%.
The diff coverage is 33.33%.

❗ Current head 8e18bfe differs from pull request most recent head f450131. Consider uploading reports for the commit f450131 to get more accurate results

Signed-off-by: Daniel Rammer <daniel@union.ai>
Signed-off-by: Daniel Rammer <daniel@union.ai>
@hamersaw hamersaw marked this pull request as ready for review May 25, 2023 17:51
Signed-off-by: Daniel Rammer <daniel@union.ai>
Signed-off-by: Daniel Rammer <daniel@union.ai>
Signed-off-by: Daniel Rammer <daniel@union.ai>
Signed-off-by: Daniel Rammer <daniel@union.ai>
Signed-off-by: Daniel Rammer <daniel@union.ai>
@hamersaw hamersaw merged commit 0cfb0a0 into master Jul 6, 2023
14 checks passed
@hamersaw hamersaw deleted the bug/dynamic-task-retries branch July 6, 2023 14:25
eapolinario pushed a commit to eapolinario/flytepropeller that referenced this pull request Aug 9, 2023
* added IsFailurePermanent flag on DynamicTaskStatus

Signed-off-by: Daniel Rammer <daniel@union.ai>

* fixed linter

Signed-off-by: Daniel Rammer <daniel@union.ai>

* cleaned up retryable vs permanent reporting

Signed-off-by: Daniel Rammer <daniel@union.ai>

* make generate

Signed-off-by: Daniel Rammer <daniel@union.ai>

---------

Signed-off-by: Daniel Rammer <daniel@union.ai>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants