Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS: API RequestLimit delayer was not triggering #22906

Closed
justinsb opened this issue Mar 12, 2016 · 15 comments
Closed

AWS: API RequestLimit delayer was not triggering #22906

justinsb opened this issue Mar 12, 2016 · 15 comments
Assignees
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@justinsb
Copy link
Member

@benmcrae reported that while trying beta.0 he was hitting the AWS ELB rate limit exceeded error because of a security group tagging problem (Error creating load balancer (will retry): Failed to create load balancer for service default/my-nginx: error creating AWS loadbalancer listeners: Throttling: Rate exceeded). However, he did not see Inserting delay before AWS request (%s) to avoid RequestLimitExceeded: %s in the kube-controller-manager logs.

The error isn't unexpected because the fix in 5b3bb56 did not make beta.0 (it did make beta.1).

But, we would have wanted to see some AWS API throttling. Possible solutions:

  1. We have a single pool for all API requests per region, which isn't right I don't think. We should probably have one for EC2 us-west-2, one for ELB us-west-2 etc.
  2. We should probably log the current statistics whenever we hit a RateLimit error (perhaps at v=2). This way we can tweak our statistic based upon what we see in the real world.
@justinsb justinsb added this to the v1.2 milestone Mar 12, 2016
@justinsb justinsb added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Mar 12, 2016
@justinsb justinsb self-assigned this Mar 12, 2016
@mzupan
Copy link
Contributor

mzupan commented Mar 20, 2016

I'm running Server Version: version.Info{Major:"1", Minor:"2", GitVersion:"v1.2.0", GitCommit:"5cb86ee022267586db386f62781338b0483733b3", GitTreeState:"clean"}

Running into a rate limit issue by setting --cloud-provider=aws on control-manager, kubelet, apiserver

@justinsb
Copy link
Member Author

@mzupan can you post / send me the relevant bits of your kube-controller-manager log (/var/log/kube-controller-manager.log) from the master. We increased the logging in 1.2, so that it should now be much more informative.

@mzupan
Copy link
Contributor

mzupan commented Mar 20, 2016

@justinsb is there a set level you want to see? Right now most of the loggin is set to 0 due to not wanting to kill the system with IO but I can turn them back on

@justinsb
Copy link
Member Author

@mzupan I think that in 1.2, when we hit the RateLimit we log the failing API request as a warning https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/retry_handler.go#L86. So I hope you basically can't turn it off :-)

I'm really trying to figure out which particular call is hitting the rate limit ... if it is an ELB call or something else. And of course there may be other hints in the log.

We can start with v=0 - I believe that still logs warnings!

@mzupan
Copy link
Contributor

mzupan commented Mar 20, 2016

I've been digging around too and just enabled aws on the kubelet till I figured out what process was caucing a limit to be hit.. On the kubelets per node I notice this

I0320 14:06:36.257691   13777 aws.go:745] Could not determine public IP from AWS metadata.

That happens once per second per node. The only services I'm running are normal k8 services like internal redis instances and nodeports. I'm not running any loadbalancers that would try to make an ELB or expose a external IP

@justinsb
Copy link
Member Author

@mzupan do you have any logs that show a rate limit error?

I'm not sure whether the " Could not determine public IP from AWS metadata." is related. As I recall that happens when your instances don't have public IPs, and we don't have any great options about how to determine the error - we just get back a generic error. I take you aren't giving your nodes public IPs? I'll look into why it is happening so often though....

@mzupan
Copy link
Contributor

mzupan commented Mar 20, 2016

so i have aws enabled for controller and kubelet and so far nothing is sprung.. Another issue is the controller is looking up ELB for services that aren't a loadbalancer.

E0320 14:26:49.822276       1 servicecontroller.go:196] Failed to process service delta. Retrying in 5m0s: Error getting LB for service www-test/redis: AccessDenied: User: arn:aws:sts::xxxx:assumed-role/WEB-kubernetes/i-7393e7c2 is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers
    status code: 403, request id: c8a825ae-eea7-11e5-84ed-7df29816bfc6
# kubectl describe svc redis --namespace=www-test
Name:           redis
Namespace:      www-test
Labels:         name=redis
Selector:       name=redis
Type:           ClusterIP
IP:         10.100.255.71
Port:           <unset> 6379/TCP
Endpoints:      192.168.2.228:6379
Session Affinity:   None
Events:
  FirstSeen LastSeen    Count   From            SubobjectPath   Type        Reason              Message
  --------- --------    -----   ----            -------------   --------    ------              -------
  7m        7m      1   {service-controller }           Warning     CreatingLoadBalancerFailed  Error creating load balancer (will retry): Error getting LB for service www-test/redis: AccessDenied: User: arn:aws:sts::181547680117:assumed-role/WEB-kubernetes/i-7393e7c2 is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers
        status code: 403, request id: 16e73c14-eea7-11e5-a5ed-072f5af57d33
  7m        7m  1   {service-controller }       Warning CreatingLoadBalancerFailed  Error creating load balancer (will retry): Error getting LB for service www-test/redis: AccessDenied: User: arn:aws:sts::181547680117:assumed-role/WEB-kubernetes/i-7393e7c2 is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers
        status code: 403, request id: 1a0ff2a3-eea7-11e5-9385-3df1cb6d9d84
  7m        7m  1   {service-controller }       Warning CreatingLoadBalancerFailed  Error creating load balancer (will retry): Error getting LB for service www-test/redis: AccessDenied: User: arn:aws:sts::181547680117:assumed-role/WEB-kubernetes/i-7393e7c2 is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers
        status code: 403, request id: 202765a3-eea7-11e5-8d57-755a5f7005e2
  6m        6m  1   {service-controller }       Warning CreatingLoadBalancerFailed  Error creating load balancer (will retry): Error getting LB for service www-test/redis: AccessDenied: User: arn:aws:sts::181547680117:assumed-role/WEB-kubernetes/i-7393e7c2 is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers
        status code: 403, request id: 2c17e291-eea7-11e5-ab02-29614ddbc8df
  6m        6m  1   {service-controller }       Warning CreatingLoadBalancerFailed  Error creating load balancer (will retry): Error getting LB for service www-test/redis: AccessDenied: User: arn:aws:sts::181547680117:assumed-role/WEB-kubernetes/i-7393e7c2 is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers
        status code: 403, request id: 4405fbe9-eea7-11e5-8b35-db21d044e504
  4m        4m  1   {service-controller }       Warning CreatingLoadBalancerFailed  Error creating load balancer (will retry): Error getting LB for service www-test/redis: AccessDenied: User: arn:aws:sts::181547680117:assumed-role/WEB-kubernetes/i-7393e7c2 is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers
        status code: 403, request id: 73cbe79d-eea7-11e5-827d-f560c975a3b0
  2m        2m  1   {service-controller }       Warning CreatingLoadBalancerFailed  Error creating load balancer (will retry): Error getting LB for service www-test/redis: AccessDenied: User: arn:aws:sts::181547680117:assumed-role/WEB-kubernetes/i-7393e7c2 is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers
        status code: 403, request id: c8a825ae-eea7-11e5-84ed-7df29816bfc6
  2m        2m  1   {service-controller }       Warning CreatingLoadBalancerFailed  Error creating load balancer (will retry): Error getting LB for service www-test/redis: AccessDenied: User: arn:aws:sts::181547680117:assumed-role/WEB-kubernetes/i-7393e7c2 is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers
        status code: 403, request id: d34101e6-eea7-11e5-acfb-c3e4b8739954

@mzupan
Copy link
Contributor

mzupan commented Mar 20, 2016

I have yet to see any throttling errors in the controller

@justinsb
Copy link
Member Author

Opened those two issues for the two other issues you've found @mzupan - let's try to keep this bug about rate limits / throttling :-)

@mzupan
Copy link
Contributor

mzupan commented Mar 21, 2016

little bit of an update.. I ran 1.2.1-beta.0 most of the weekend and I don't think I'm hitting the api limits anymore. Before I would see limit errors when I browsed the console and have yet to see one

@justinsb justinsb modified the milestones: v1.2, v1.3 Apr 1, 2016
@mzupan
Copy link
Contributor

mzupan commented May 21, 2016

Ok found out my issue finally. We had probably 50 or so services in our cluster. We don't have any load balance type services so we never gave the IAM role any access to ELBs.

Looking at the code it makes a check to see if it has a ELB to make sure about any edits is my guess

Once I granted elasticloadbalancing:DescribeLoadBalancers permission to the role the controller made one API check per service that stopped after that

@guoshimin
Copy link
Contributor

Sounds like #25401

@goltermann goltermann modified the milestones: v1.3, next-candidate Jun 13, 2016
@fejta-bot
Copy link

Issues go stale after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 16, 2017
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 15, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

7 participants