Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Health status of Jobs is incomplete #1367

Closed
ANeumann82 opened this issue Feb 27, 2020 · 1 comment · Fixed by #1370
Closed

Health status of Jobs is incomplete #1367

ANeumann82 opened this issue Feb 27, 2020 · 1 comment · Fixed by #1370
Labels

Comments

@ANeumann82
Copy link
Member

What happened:
I'm currently working on the Cassandra Backup. This includes a plan that is triggered to execute the backups.
This plan deploys one or more jobs that do the backup. The are not supposed to run multiple times, therefore have a backoffLimit of 0. So when the backup process fails, the job has a status of Failed and will never complete.

KUDO currently waits for the Job to be healthy and have a status of Success. The plan never completes or fails and stays in IN_PROGRESS

What you expected to happen:
KUDO should acknowledge the fact that the job has failed and will not recover and set the step status to FATAL_ERROR with a descriptive error message.

How to reproduce it (as minimally and precisely as possible):

  • Create a plan that deploys a Job that has backoffLimit of 0 and fails.
@ANeumann82
Copy link
Member Author

Hmmm. I'm not sure if we should use FATAL_ERROR in this case. I think there's difference in an KUDO FATAL_ERROR and an unrecoverable error outside of KUDO. Maybe a new error code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant