New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not bump API requests backoff in the Job controller due to pod failures #118759
Do not bump API requests backoff in the Job controller due to pod failures #118759
Conversation
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @alculquicondor |
9fcbb2c
to
784a309
Compare
/test pull-kubernetes-e2e-kind |
I'm not sure the release note is very user friendly. Maybe something simpler like: Reduce delay when processing jobs after a transient API error |
failed int | ||
} | ||
|
||
func TestJobBackoffReset(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you summarize why this test is changing so drastically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test checks that the rate-limiter backoff in the syncJob queue is reset after a successful execution. For this reason it used to add an item with AddRateLimited
based on the error due to pod failures, then it checked that the queue is emptied after a succeeded pod (esentially that the Forget is called).
However, now pod failures don't enqueue in the rate limiter. Still, the queue is empties after a successful syncJob, so it seems to make sense to preserve the test that the rate-limiter is getting emptied (Forget getting called).
Additionally, the test used to do it in two variants (with parallelism=1 and parallelism=2), I don't think it matters for the current scenario.
t.Errorf("%s: unexpected job failure", name) | ||
} | ||
// the queue is emptied on success | ||
fakePodControl.Err = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be a different test case instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think no, this is part of the scenario. First an element was put into the queue (due to error), and here we are going to empty the queue by success (so the Forget is called).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
LGTM label has been added. Git tree hash: baae73c0691b893c019f7667df86c2fd39d1da92
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, mimowo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind cleanup
/kind bug
What this PR does / why we need it:
Pod failures shouldn't increase the backoff for API requests as they are counted independently since #114768, and use different constants since: #118615.
Which issue(s) this PR fixes:
Part of #118527
Special notes for your reviewer:
Still, pod failures bump the expotential pod failure backoff delay used for pod recreations.
Also, clean up the stale expectations for
expectedForGetKey
as the syncJob no longer returnsforget
since #114768.Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: