-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AWS] Failed ELB creation retried indefinitely after LB service deletion #17790
Comments
We experienced this with Kubernetes 1.1.1 this morning, and the infinite retry coupled with the aggressiveness of the retry caused other parts of our system to have difficulties. Limiting the number of retries and exponentially backing off on subsequent retries would be fantastic. I'm willing to submit a PR to fix this if provided some guidance on whether my suggestions would be acceptable. |
+1 Experienced this on k8s 1.1.1 - a previous occurrence caused AWS to disable ELB creation on a particular account. Exponential cool down timer seems to be what AWS recommends but at the very least this should have a retry limit and a default delay between retries. |
+1 on this. @kelcecil Is the PR already submitted for this? |
@kelcecil sure a backoff pr sounds like a good fix, if you're still interested. Sorry for the delay @kubernetes/goog-cluster fyi |
@gopinatht @bprashanth The PR isn't submitted yet, but I can possibly get to it in the next week. If someone else wants to take it, then please feel free. |
Next week sound good, I'll wait for it unless @gopinatht want's to jump in |
@bprashanth @kelcecil I have absolutely no experience with this code base. So if I do it, it will be a lot more than a week if I ever need to do it. I offer to review the PR if that helps. |
I talked to @justinsb about this on the #sig-aws Slack channel a little while ago. I'm going to look at generalizing this backoff for everything instead of just AWS. Should make things easier. I'm going to start hacking on it this week. |
This should now be fixed in 1.2: we have a backoff. I believe we still retry indefinitely, but we should not hit rate limits. Please reopen if it continues in 1.2 or later. |
(k8s 1.1.2)
I created a service of type LoadBalancer on AWS, which failed because of #12381.
After deleting the service,
kube-controller-manager
keeps trying to create the ELB indefinitely:This eventually leads to (see ##12121):
The only way I found to stop the hemorrhage was to restart
kube-controller-manager
.The text was updated successfully, but these errors were encountered: