-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Throw an error from jobReady() if the job exceeds its BackoffLimit #9950
Conversation
Closes helm#9285 Signed-off-by: Rosenberg, Jeff <jeff.rosenberg@icfnext.com>
In local testing, this speeds up how quickly we fail, but not yet to the extent I'd like. This still waits for all deployments to be ready, and then once the failed job is the last thing we're waiting for, it fails immediately rather than spinning on the failed job until reaching the timeout. That's still a significant improvement, but I'd love to figure out if there's a way to fail the install immediately on job failure. |
Yes because the ResourceList ordering: https://github.com/helm/helm/blob/main/pkg/kube/wait.go#L54 Test cases passed locally as expected. I think this PR is sane and able to fix the original issue. |
Thanks for the review @zonggen; FYI, I've resolved the merge conflict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, but I'm not an approver.
End user test Job:
Baseline TestReleased Helm version 3.9.1
Summary Job failed but Helm waited until timeout before exiting. Not the desired response but expected. PR Test
Summary Helm exited correctly within timeout period |
another blocker for this PR ? I could help if needed :) |
this PR works for me. 👍 |
@mattfarina , need your help on this PR? |
@jouve , need your help on this PR |
/assign mattfarina |
Closes #9285
Signed-off-by: Rosenberg, Jeff jeff.rosenberg@icfnext.com
What this PR does / why we need it:
As noted in #9285, if
--wait-for-jobs
is specified, and a job exceeds its backoff limit, Helm will always wait for the full timeout before failing, even though there is no chance of its succeeding. This PR modifies thejobReady()
function to return an error. Based on the comments on k8s.io/apimachinery/pkg/util/wait#PollImmediateUntil, it appears that returning an error should stop that function from polling.Special notes for your reviewer:
I've updated the tests in
ready_test.go
, but there is no test coverage inclient_test.go
to validate that this works as expected. Ideally this would have a test inclient_test.go
, but these comments indicate that there haven't been tests for waits since the feature was introduced.If applicable: