New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kops will keep retying to update cluster when AWS hits launchconfigurations limit #1058
Comments
I am not sure if we actually have the notion of a non-recoverable error (or trust AWS that when it says something is permanent that it genuinely is). But we can have a look, or at least provide a hint if we encounter the error :-) |
It would be very handy if we could use IAM profiling, and a dry run to verify if AWS will even allow us to create a cluster. We could check things like limits on resources, permissions, etc |
I think root cause here is #329 |
@justinsb I would sorta agree. We are not cleaning up, so yes that is a problem. But the call to great a new launch config, when your account cannot create another, loops. I have seen kops hang with quota issues. We can run into this if you have a bunch of clusters and hit your quota for launch configs. |
Wondering how much of this is related to #1051 Can we test and close if so? |
Sort of. We still will loop to eternity when we hit certain limits. We are not timing out properly somewhere. |
In kops 1.5.0 we have much clearer logging for errors during retries: #1658 I think we should consider the idea of retryable errors, but it isn't clear when errors are retryable. If another cluster is being deleted, resources may become available. |
@justinsb since we put #1658 in place I have seen that we seem to be hammering the API pretty hard on deletes. Not sure if this is expected? For example deleting a private topo cluster with HA masters. I am on master
The cluster delete suceeded, but to a user this may seem odd. |
The interesting thing is that I am getting the errors only on the first delete. If I delete another cluster right after I do not get the errors. Maybe an oddness with the API. |
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Was this issue fixed? |
@amadev I'm running under v1.17.0 and I sitll see A LOT of |
How to reproduce:
Create launchconfigurations until you reach the limit for that aws account and try to upgrade a kubernetes cluster with kops, it will keep is this loop:
...
...
I don't think this is a retryable error though
Regards
The text was updated successfully, but these errors were encountered: