-
Notifications
You must be signed in to change notification settings - Fork 39k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix job backoffLimit #63990
Fix job backoffLimit #63990
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: janetkuo, kow3ns The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
New changes are detected. LGTM label has been removed. |
Test seems to be flaky. Sometimes there's only 1 failed pod when the job allows 1 retries (should have 2 failed pods.) I wasn't able to reproduce it locally so added a temp commit for logging. |
/retest Unrelated failures |
From the test of a failing Job with backoffLimit = 1 (retry once), I saw two different kinds of failures:
I think Job backoffLimit is still broken with the fix. |
Any status on this PR? We've got jobs with activeDeadlineSeconds set to 86400 and backoffLimit set to 3. If they fail immediately in 1.10.X, they can spawn enough failed jobs to take down the cluster. A partial fix may be better than no fix. |
/milestone v1.11 |
/retest |
Raising priority so that this stays in 1.11. /priority critical-urgent |
[MILESTONENOTIFIER] Milestone Pull Request: Up-to-date for process Pull Request Labels
|
/test pull-kubernetes-e2e-gce |
/test pull-kubernetes-e2e-kops-aws |
1 similar comment
/test pull-kubernetes-e2e-kops-aws |
/hold holding in favor of #63650 |
/close |
What this PR does / why we need it:
Job backoffLimit is broken in 1.10, which is a regression. This PR does:
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #62382
Special notes for your reviewer:
Release note: