-
Notifications
You must be signed in to change notification settings - Fork 656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store failed execution in flyteadmin #4390
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #4390 +/- ##
==========================================
+ Coverage 59.24% 59.74% +0.49%
==========================================
Files 618 636 +18
Lines 52334 53870 +1536
==========================================
+ Hits 31007 32184 +1177
- Misses 18859 19154 +295
- Partials 2468 2532 +64
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
a64362b
to
0c2ade1
Compare
flyteadmin/pkg/errors/errors.go
Outdated
|
||
// ExecutionRuntimeError is a special error that can be returned by plugins denoting that | ||
// execution failed during runtime and should still be saved to database | ||
type ExecutionRuntimeError struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this specific to Create calls, and if so, should it be named accordingly, i.e., CreateExecutionRuntimeError
? Otherwise I imagine most execution errors will be runtime errors (conceptually).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it has to be specific to Create calls, word "Runtime" means it happened while actually executing, no matter that execution was triggered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of creating a new error type here and checking for errors.As
, does it make sense to add a code to the FlyteAdmin error code for "BuildFailure" or something and just wrapping the build error here. It seems a little weird to me that were just checking if the error contains an flyteidl.core.ExecutionError
and using that to indicate the we failed to correctly build the workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I'm not following, what do you actually suggest to change? can you elaborate in more details. In Union Cloud, we have our own executor plugin which flyteadmin invokes during execution creation. Its shown in this PR. I have wrapped flyteidl.core.ExecutionError
because this is what needed to store failed execution in ExecutionClosure in database. And I think its responsibility of plugin to set explicitly code, message and kind from this flyteidl.core.ExecutionError
rather than flyteadmin deducing it from obtained error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC the goal here is that every time the build process fails we write a record to the DB indicating that the build process fails. In this scenario, we are only performing this failure maintenance when the error contains an flyteidlcore.ExecutionError
. This covers the case in your PR because you are explicitly throwing the correct error type. Alternatively, we could wrap this error similar to how we're doing elsewhere with a code that indicates a build error. Then every time that the build process fails we persist it in the DB.
There are plenty of other ways to handle this as well, but just wondering if we care about all failure scenarios, or just the one where you're explicitly throwing (which may be more accurately captured using error codes rather than types). If this is not the intended behavior, then disregard my comments. Also, I don't feel strongly about this, no need to over analyze.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored this to store all failed executions in database. Please take a look and let me know what do you think.
DISCLAIMER: I am not experienced with flyteadmin codebase, so I may not understand all the consequences of storing ALL failed executions. For example, are there any clients expecting to receive an immediate error when execution is invalid? This change now has bigger scope than my initial intent of just storing task validation errors, so don't expect me to understand all the implications of PR at least at this moment 😄
523c606
to
1c8f6ee
Compare
00c025f
to
14f534a
Compare
Signed-off-by: Iaroslav Ciupin <iaroslav@union.ai>
14f534a
to
1888220
Compare
Signed-off-by: Iaroslav Ciupin <iaroslav@union.ai>
4999c2a
to
fdc273c
Compare
Tracking issue
Docs link
Describe your changes
When creating an execution fails, store execution model in database with corresponding error.
Check all the applicable boxes
Setup Process
Screenshots
Note to reviewers
Related PRs