Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kops will keep retying to update cluster when AWS hits launchconfigurations limit #1058

Closed
felipejfc opened this issue Dec 4, 2016 · 14 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Milestone

Comments

@felipejfc
Copy link
Contributor

How to reproduce:

Create launchconfigurations until you reach the limit for that aws account and try to upgrade a kubernetes cluster with kops, it will keep is this loop:

...

I1204 16:03:16.701326   45183 aws_cloud.go:570] Resolved image "ami-4bb3e05c"
I1204 16:03:16.702746   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:16.941540   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:17.092591   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:17.755397   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:17.863162   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:18.537845   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:18.690481   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:19.313920   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:19.432232   45183 logging_retryer.go:50] Retryable error 400 from autoscaling/CreateLaunchConfiguration - will retry after delay of 705ms
I1204 16:03:20.034683   45183 logging_retryer.go:50] Retryable error 400 from autoscaling/CreateLaunchConfiguration - will retry after delay of 604ms
I1204 16:03:20.139128   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:20.640814   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:20.972542   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:21.427645   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:21.758880   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:22.212730   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:22.480970   45183 logging_retryer.go:50] Retryable error 400 from autoscaling/CreateLaunchConfiguration - will retry after delay of 655ms
I1204 16:03:23.046056   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:23.139135   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:23.839591   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:23.920233   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:24.627519   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:24.695294   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:25.395123   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:25.401989   45183 logging_retryer.go:50] Retryable error 400 from autoscaling/CreateLaunchConfiguration - will retry after delay of 732ms
I1204 16:03:26.115919   45183 logging_retryer.go:50] Retryable error 400 from autoscaling/CreateLaunchConfiguration - will retry after delay of 889ms
I1204 16:03:26.138282   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:26.922574   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:27.010358   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:27.721469   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:27.802537   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:28.516712   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:28.612808   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:29.271350   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:29.441639   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:30.075490   45183 request_logger.go:45] AWS request: autoscaling/CreateLaunchConfiguration
I1204 16:03:30.212078   45183 logging_retryer.go:50] Retryable error 400 from autoscaling/CreateLaunchConfiguration - will retry after delay of 709ms

...

I don't think this is a retryable error though

Regards

@justinsb justinsb added this to the 1.5.0 milestone Dec 28, 2016
@justinsb
Copy link
Member

I am not sure if we actually have the notion of a non-recoverable error (or trust AWS that when it says something is permanent that it genuinely is). But we can have a look, or at least provide a hint if we encounter the error :-)

@krisnova
Copy link
Contributor

It would be very handy if we could use IAM profiling, and a dry run to verify if AWS will even allow us to create a cluster.

We could check things like limits on resources, permissions, etc

@justinsb
Copy link
Member

I think root cause here is #329

@chrislovecnm
Copy link
Contributor

chrislovecnm commented Dec 30, 2016

@justinsb I would sorta agree. We are not cleaning up, so yes that is a problem. But the call to great a new launch config, when your account cannot create another, loops. I have seen kops hang with quota issues. We can run into this if you have a bunch of clusters and hit your quota for launch configs.

@krisnova
Copy link
Contributor

krisnova commented Jan 5, 2017

Wondering how much of this is related to #1051

Can we test and close if so?

@chrislovecnm
Copy link
Contributor

Sort of. We still will loop to eternity when we hit certain limits. We are not timing out properly somewhere.

@justinsb
Copy link
Member

In kops 1.5.0 we have much clearer logging for errors during retries: #1658

I think we should consider the idea of retryable errors, but it isn't clear when errors are retryable. If another cluster is being deleted, resources may become available.

@justinsb justinsb modified the milestones: 1.5.1, 1.5.0 Jan 30, 2017
@chrislovecnm
Copy link
Contributor

chrislovecnm commented Feb 16, 2017

@justinsb since we put #1658 in place I have seen that we seem to be hammering the API pretty hard on deletes. Not sure if this is expected?

For example deleting a private topo cluster with HA masters. I am on master

I0216 12:00:38.022397   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: 97663fe8-8cff-421f-b009-5cda8131b7e2) from ec2/DeleteSubnet - will retry after delay of 5.072s
I0216 12:00:38.258773   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: 7531b2af-9fce-48a0-9fd4-6d008074029f) from ec2/DeleteSubnet - will retry after delay of 6.432s
I0216 12:00:38.509773   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: 0ded66e4-96da-4e4d-bf7d-53f4b2cadb6f) from ec2/DeleteVolume - will retry after delay of 4.664s
I0216 12:00:38.582814   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: 489434b5-5ae8-4ac3-8a84-c2158abb3ca3) from ec2/DeleteVolume - will retry after delay of 4.848s
I0216 12:00:38.595522   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: e6a789a9-4ee8-4d63-936c-6bb01ce7d133) from ec2/DetachInternetGateway - will retry after delay of 6.96s
I0216 12:00:38.663642   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: d98d4f82-04e6-4c74-891e-f83e7165f0de) from ec2/DeleteSubnet - will retry after delay of 6.52s
I0216 12:00:38.744053   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: 3cbcb968-456c-4e3a-ba3c-c9c3d3b084ff) from ec2/DeleteVolume - will retry after delay of 7.392s
I0216 12:00:38.758525   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: 2487a0a7-2e31-4841-ae70-78db8c43b21b) from ec2/DeleteSecurityGroup - will retry after delay of 6.112s
I0216 12:00:38.777200   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: ae7ebf42-fbfe-41ec-b7b5-d00e02c01b81) from ec2/DeleteSecurityGroup - will retry after delay of 4.064s
I0216 12:00:39.027556   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: c440fc41-dff3-4af4-9258-8d95e248a20a) from ec2/ReleaseAddress - will retry after delay of 7.944s
I0216 12:00:39.061817   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: 04d49bb8-f691-4c00-a8c9-11464f5355b8) from ec2/ReleaseAddress - will retry after delay of 5.744s
I0216 12:00:39.183454   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: 3915504c-0b89-44ef-aef8-f2dd5268dbf0) from ec2/DeleteVolume - will retry after delay of 6.216s
I0216 12:00:39.208595   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: b329f48c-2140-4d2b-bbb8-50ac5431b09d) from ec2/DeleteSecurityGroup - will retry after delay of 7.264s
I0216 12:00:39.224426   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: 16324629-c166-475b-8de7-d1ca5583e24d) from ec2/DeleteVolume - will retry after delay of 5.36s
I0216 12:00:39.245337   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: eabec3cf-32bb-43af-9264-d8a25e019939) from ec2/DeleteVolume - will retry after delay of 6.904s
I0216 12:00:39.431604   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: 462353db-052f-4a90-8566-c241c4d1b903) from ec2/DeleteVolume - will retry after delay of 5.368s
I0216 12:00:39.523822   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: 3dc5130e-4661-4769-bda8-0e9ccfd2491e) from ec2/DeleteSubnet - will retry after delay of 4.04s
I0216 12:00:39.683042   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: 1d11d8f6-3313-4740-8764-0ecc4cfe9057) from ec2/DeleteVolume - will retry after delay of 7.128s
I0216 12:00:39.823968   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: dd1196ad-f9c0-4b67-b4f5-1043591dc5ea) from ec2/DeleteSubnet - will retry after delay of 5.04s
I0216 12:00:39.836455   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: 2e7e868c-f3ca-4115-b51a-33174061c85d) from ec2/ReleaseAddress - will retry after delay of 4.232s
I0216 12:00:40.073099   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: 277036c4-391b-4264-95be-c9a3890f4592) from ec2/DeleteSecurityGroup - will retry after delay of 6.088s
I0216 12:00:40.386382   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: 299e20d4-9b71-484c-819c-f249ea2e514f) from ec2/DeleteVolume - will retry after delay of 6.032s
I0216 12:00:40.388647   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: 2e3679ce-eefa-4faa-ac79-66bb344184ae) from ec2/DeleteSecurityGroup - will retry after delay of 7.288s
I0216 12:00:40.568829   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: 0e17a1ed-0e35-4a45-b517-9f4ff888b5e1) from ec2/DeleteSecurityGroup - will retry after delay of 6.4s
security-group:sg-4b611533	ok
volume:vol-051f2f335d9841db9	still has dependencies, will retry
subnet:subnet-ab1e28f3	ok
volume:vol-069114779a254da90	ok
subnet:subnet-f878b2b1	still has dependencies, will retry
I0216 12:00:44.145238   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: 9c3660dd-2353-4f03-bc61-d59360d9e92f) from ec2/ReleaseAddress - will retry after delay of 14.592s
I0216 12:00:44.664229   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: f72a2627-649b-4dfb-abb7-4af98a7fb8b3) from ec2/DeleteVolume - will retry after delay of 10.336s
I0216 12:00:44.965389   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: b311dd30-54b2-437f-8b26-105a703edab8) from ec2/DeleteSubnet - will retry after delay of 12.864s
I0216 12:00:45.080082   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: 14fb2023-b1a6-4732-bef5-0bfa2a702b1f) from ec2/DeleteVolume - will retry after delay of 9.808s
I0216 12:00:45.097391   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: 53d241af-a4aa-42b1-840a-064beee8126d) from ec2/ReleaseAddress - will retry after delay of 14.864s
I0216 12:00:45.155219   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: 7fc4d788-ccb3-488e-a86d-89b6b9a75837) from ec2/DeleteSecurityGroup - will retry after delay of 9.648s
I0216 12:00:45.256597   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: 0251cff4-1367-42c7-9f3d-d59681d0f5d9) from ec2/DeleteSubnet - will retry after delay of 11.792s
I0216 12:00:45.472431   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: 42eba90a-8767-4983-ad5e-8a72b9fc1244) from ec2/DeleteSubnet - will retry after delay of 15.504s
I0216 12:00:45.696224   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: 1f852c3c-b632-4e40-8797-9d78e0f0f7bb) from ec2/DeleteVolume - will retry after delay of 15.888s
I0216 12:00:45.838773   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: 34808969-1e52-4ea1-ad6f-b105e0896b30) from ec2/DetachInternetGateway - will retry after delay of 14.224s
I0216 12:00:46.418787   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: f4c0409c-96c5-4881-9949-01f1a4b448cc) from ec2/DeleteVolume - will retry after delay of 12.896s
I0216 12:00:46.426612   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: 8f8aa57b-1a44-4459-aad8-c4650f84544d) from ec2/DeleteVolume - will retry after delay of 11.12s
I0216 12:00:46.432869   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: f4b002a2-c6fe-4fdc-b3fb-c2c663eaebed) from ec2/DeleteSecurityGroup - will retry after delay of 8.992s
I0216 12:00:46.697706   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: af0b59b6-fdf9-41d3-ac24-0219b1925a83) from ec2/DeleteVolume - will retry after delay of 11.936s
I0216 12:00:46.749192   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: c9a79407-c117-44ea-9644-ed17524fb7c0) from ec2/DeleteSecurityGroup - will retry after delay of 9.92s
I0216 12:00:47.093747   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: 7dc61333-e1ef-4489-b355-2c0794372a1c) from ec2/DeleteVolume - will retry after delay of 8.608s
I0216 12:00:47.253045   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: ef7a4f80-f594-4e3b-874b-1c75826be7c3) from ec2/DeleteSecurityGroup - will retry after delay of 11.12s
I0216 12:00:47.254923   36530 logging_retryer.go:59] Retryable error (RequestLimitExceeded: Request limit exceeded.
	status code: 503, request id: d346b40c-6eb5-425b-944a-eca4d5989c5a) from ec2/ReleaseAddress - will retry after delay of 11.904s

The cluster delete suceeded, but to a user this may seem odd.

@chrislovecnm
Copy link
Contributor

The interesting thing is that I am getting the errors only on the first delete. If I delete another cluster right after I do not get the errors. Maybe an oddness with the API.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 21, 2017
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 20, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@amadav
Copy link

amadav commented Jun 24, 2019

Was this issue fixed?

@phspagiari
Copy link
Contributor

@amadev I'm running under v1.17.0 and I sitll see A LOT of Got RequestLimitExceeded error on AWS request during my updates. 3 minutes or more, for me isn't fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

8 participants