feat(connector): handle TaskExecution_RETRYABLE_FAILED phase#7409
Merged
Conversation
Map the RETRYABLE_FAILED phase returned by connectors to PhaseInfoRetryableFailure so propeller honors the task's retry policy instead of treating the failure as permanent, and treat it as terminal in IsTerminal() to avoid an extra network call to the connector once it has returned. Signed-off-by: Kevin Su <pingsutw@apache.org>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the WebAPI connector task plugin to correctly recognize and translate the TaskExecution_RETRYABLE_FAILED phase from connector responses, ensuring Propeller treats it as a retryable task failure (honoring task retry policy) instead of surfacing an “unknown execution phase” system error.
Changes:
- Treat
TaskExecution_RETRYABLE_FAILEDas terminal inResourceWrapper.IsTerminal()to avoid extra connectorGetpolling once the phase is reached. - Map
TaskExecution_RETRYABLE_FAILEDtocore.PhaseInfoRetryableFailure(...)inStatus()so the failure is retryable.
Comments suppressed due to low confidence (2)
flyteplugins/go/tasks/plugins/webapi/connector/plugin.go:366
- The new
TaskExecution_RETRYABLE_FAILED→PhaseInfoRetryableFailuremapping isn’t covered by the existingplugin_test.gostatus-phase tests (which already coverFAILED/ABORTED). Please add a test that assertsStatus()returnspluginsCore.PhaseRetryableFailurefor this phase so the retry behavior remains protected.
case flyteIdl.TaskExecution_FAILED:
return core.PhaseInfoFailure(errorCode, fmt.Sprintf("failed to run the job: %s", resource.Message), taskInfo), nil
case flyteIdl.TaskExecution_RETRYABLE_FAILED:
return core.PhaseInfoRetryableFailure(errorCode, fmt.Sprintf("failed to run the job: %s", resource.Message), taskInfo), nil
flyteplugins/go/tasks/plugins/webapi/connector/plugin.go:366
- Both
TaskExecution_FAILEDandTaskExecution_RETRYABLE_FAILEDcurrently produce the same reason string prefix ("failed to run the job: …"), which makes it hard to tell from logs/events whether the failure is retryable vs permanent. Consider making the retryable-failure reason explicitly distinguishable (e.g., include "retryable" in the message).
case flyteIdl.TaskExecution_FAILED:
return core.PhaseInfoFailure(errorCode, fmt.Sprintf("failed to run the job: %s", resource.Message), taskInfo), nil
case flyteIdl.TaskExecution_RETRYABLE_FAILED:
return core.PhaseInfoRetryableFailure(errorCode, fmt.Sprintf("failed to run the job: %s", resource.Message), taskInfo), nil
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Add a Status subtest asserting RETRYABLE_FAILED maps to PhaseRetryableFailure with the expected error message, and a table-driven TestResourceWrapper_IsTerminal that pins the terminal phase set (SUCCEEDED, FAILED, RETRYABLE_FAILED, ABORTED). Signed-off-by: Kevin Su <pingsutw@apache.org>
wild-endeavor
approved these changes
May 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Tracking issue
Why are the changes needed?
The webapi connector plugin currently only handles
TaskExecution_FAILEDandTaskExecution_ABORTEDas failure phases. TheTaskExecution_RETRYABLE_FAILEDphase (value8inflyteidl2/core/execution.proto) is silently ignored, falling through to the default branch which returns an "unknown execution phase" system error. As a result, a connector that legitimately reports a retryable failure surfaces as a system error to propeller and the task's retry policy is never honored.What changes were proposed in this pull request?
In
flyteplugins/go/tasks/plugins/webapi/connector/plugin.go:ResourceWrapper.IsTerminal()now treatsTaskExecution_RETRYABLE_FAILEDas terminal alongsideSUCCEEDED/FAILED/ABORTED, so the framework skips an extraGetround-trip to the connector once it has returned this phase.StatusmapsTaskExecution_RETRYABLE_FAILEDtocore.PhaseInfoRetryableFailure(...), so propeller retries per the task's retry policy instead of marking the node as permanently failed.How was this patch tested?
Built locally with
go build ./flyteplugins/go/tasks/plugins/webapi/connector/.... No existing tests covered the previous default-branch behavior; happy to add a unit test for bothIsTerminal()and the phase mapping if reviewers prefer.Labels
Setup process
N/A
Screenshots
N/A
Check all the applicable boxes
Related PRs
Stack
Docs link