-
Notifications
You must be signed in to change notification settings - Fork 39.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Job back-offs get reset on controller-manager restart #114650
Comments
@sathyanarays: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/wg batch |
There is a more immediate problem for me: I think the solution is somewhat simple: keep track of the elapsed time within the sync, instead of the workqueue and restrict pod creation specifically. The time should be measured based on the Pod status, to be resilient to restarts. But even if we can't do that immediately, just tracking in memory shouldn't be too bad to start. |
cc @mimowo |
@alculquicondor , @mimowo , please provide early feedback on these changes if they are in the right direction! |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/close I can't remember were we said this, but I think we agreed that handling this is overkill. |
@alculquicondor: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happened?
Exponential back-off kicks in when a pod fails. The back-off logic is offloaded to the in-memory
workqueue
data structure. On controller-manager restarts, the information in theworkqueue
is lost. So, the back-off information is lost and the back-off gets calculated as if the job is a new one.What did you expect to happen?
The pods to be created with correct back-offs even when the controller-manager restarts
How can we reproduce it (as minimally and precisely as possible)?
Anything else we need to know?
No response
Kubernetes version
In master as of commit d2504c9
Also present in latest release.
Cloud provider
NA
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: