AWS: API RequestLimit delayer was not triggering #22906

justinsb · 2016-03-12T17:08:27Z

@benmcrae reported that while trying beta.0 he was hitting the AWS ELB rate limit exceeded error because of a security group tagging problem (Error creating load balancer (will retry): Failed to create load balancer for service default/my-nginx: error creating AWS loadbalancer listeners: Throttling: Rate exceeded). However, he did not see Inserting delay before AWS request (%s) to avoid RequestLimitExceeded: %s in the kube-controller-manager logs.

The error isn't unexpected because the fix in 5b3bb56 did not make beta.0 (it did make beta.1).

But, we would have wanted to see some AWS API throttling. Possible solutions:

We have a single pool for all API requests per region, which isn't right I don't think. We should probably have one for EC2 us-west-2, one for ELB us-west-2 etc.
We should probably log the current statistics whenever we hit a RateLimit error (perhaps at v=2). This way we can tweak our statistic based upon what we see in the real world.

The text was updated successfully, but these errors were encountered:

mzupan · 2016-03-20T09:24:34Z

I'm running Server Version: version.Info{Major:"1", Minor:"2", GitVersion:"v1.2.0", GitCommit:"5cb86ee022267586db386f62781338b0483733b3", GitTreeState:"clean"}

Running into a rate limit issue by setting --cloud-provider=aws on control-manager, kubelet, apiserver

justinsb · 2016-03-20T13:55:32Z

@mzupan can you post / send me the relevant bits of your kube-controller-manager log (/var/log/kube-controller-manager.log) from the master. We increased the logging in 1.2, so that it should now be much more informative.

mzupan · 2016-03-20T13:57:43Z

@justinsb is there a set level you want to see? Right now most of the loggin is set to 0 due to not wanting to kill the system with IO but I can turn them back on

justinsb · 2016-03-20T14:00:22Z

@mzupan I think that in 1.2, when we hit the RateLimit we log the failing API request as a warning https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/retry_handler.go#L86. So I hope you basically can't turn it off :-)

I'm really trying to figure out which particular call is hitting the rate limit ... if it is an ELB call or something else. And of course there may be other hints in the log.

We can start with v=0 - I believe that still logs warnings!

mzupan · 2016-03-20T14:08:25Z

I've been digging around too and just enabled aws on the kubelet till I figured out what process was caucing a limit to be hit.. On the kubelets per node I notice this

I0320 14:06:36.257691   13777 aws.go:745] Could not determine public IP from AWS metadata.

That happens once per second per node. The only services I'm running are normal k8 services like internal redis instances and nodeports. I'm not running any loadbalancers that would try to make an ELB or expose a external IP

justinsb · 2016-03-20T14:27:26Z

@mzupan do you have any logs that show a rate limit error?

I'm not sure whether the " Could not determine public IP from AWS metadata." is related. As I recall that happens when your instances don't have public IPs, and we don't have any great options about how to determine the error - we just get back a generic error. I take you aren't giving your nodes public IPs? I'll look into why it is happening so often though....

mzupan · 2016-03-20T14:29:39Z

so i have aws enabled for controller and kubelet and so far nothing is sprung.. Another issue is the controller is looking up ELB for services that aren't a loadbalancer.

E0320 14:26:49.822276       1 servicecontroller.go:196] Failed to process service delta. Retrying in 5m0s: Error getting LB for service www-test/redis: AccessDenied: User: arn:aws:sts::xxxx:assumed-role/WEB-kubernetes/i-7393e7c2 is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers
    status code: 403, request id: c8a825ae-eea7-11e5-84ed-7df29816bfc6

# kubectl describe svc redis --namespace=www-test
Name:           redis
Namespace:      www-test
Labels:         name=redis
Selector:       name=redis
Type:           ClusterIP
IP:         10.100.255.71
Port:           <unset> 6379/TCP
Endpoints:      192.168.2.228:6379
Session Affinity:   None
Events:
  FirstSeen LastSeen    Count   From            SubobjectPath   Type        Reason              Message
  --------- --------    -----   ----            -------------   --------    ------              -------
  7m        7m      1   {service-controller }           Warning     CreatingLoadBalancerFailed  Error creating load balancer (will retry): Error getting LB for service www-test/redis: AccessDenied: User: arn:aws:sts::181547680117:assumed-role/WEB-kubernetes/i-7393e7c2 is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers
        status code: 403, request id: 16e73c14-eea7-11e5-a5ed-072f5af57d33
  7m        7m  1   {service-controller }       Warning CreatingLoadBalancerFailed  Error creating load balancer (will retry): Error getting LB for service www-test/redis: AccessDenied: User: arn:aws:sts::181547680117:assumed-role/WEB-kubernetes/i-7393e7c2 is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers
        status code: 403, request id: 1a0ff2a3-eea7-11e5-9385-3df1cb6d9d84
  7m        7m  1   {service-controller }       Warning CreatingLoadBalancerFailed  Error creating load balancer (will retry): Error getting LB for service www-test/redis: AccessDenied: User: arn:aws:sts::181547680117:assumed-role/WEB-kubernetes/i-7393e7c2 is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers
        status code: 403, request id: 202765a3-eea7-11e5-8d57-755a5f7005e2
  6m        6m  1   {service-controller }       Warning CreatingLoadBalancerFailed  Error creating load balancer (will retry): Error getting LB for service www-test/redis: AccessDenied: User: arn:aws:sts::181547680117:assumed-role/WEB-kubernetes/i-7393e7c2 is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers
        status code: 403, request id: 2c17e291-eea7-11e5-ab02-29614ddbc8df
  6m        6m  1   {service-controller }       Warning CreatingLoadBalancerFailed  Error creating load balancer (will retry): Error getting LB for service www-test/redis: AccessDenied: User: arn:aws:sts::181547680117:assumed-role/WEB-kubernetes/i-7393e7c2 is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers
        status code: 403, request id: 4405fbe9-eea7-11e5-8b35-db21d044e504
  4m        4m  1   {service-controller }       Warning CreatingLoadBalancerFailed  Error creating load balancer (will retry): Error getting LB for service www-test/redis: AccessDenied: User: arn:aws:sts::181547680117:assumed-role/WEB-kubernetes/i-7393e7c2 is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers
        status code: 403, request id: 73cbe79d-eea7-11e5-827d-f560c975a3b0
  2m        2m  1   {service-controller }       Warning CreatingLoadBalancerFailed  Error creating load balancer (will retry): Error getting LB for service www-test/redis: AccessDenied: User: arn:aws:sts::181547680117:assumed-role/WEB-kubernetes/i-7393e7c2 is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers
        status code: 403, request id: c8a825ae-eea7-11e5-84ed-7df29816bfc6
  2m        2m  1   {service-controller }       Warning CreatingLoadBalancerFailed  Error creating load balancer (will retry): Error getting LB for service www-test/redis: AccessDenied: User: arn:aws:sts::181547680117:assumed-role/WEB-kubernetes/i-7393e7c2 is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers
        status code: 403, request id: d34101e6-eea7-11e5-acfb-c3e4b8739954

mzupan · 2016-03-20T14:29:57Z

I have yet to see any throttling errors in the controller

justinsb · 2016-03-20T15:17:09Z

Opened those two issues for the two other issues you've found @mzupan - let's try to keep this bug about rate limits / throttling :-)

mzupan · 2016-03-21T11:28:49Z

little bit of an update.. I ran 1.2.1-beta.0 most of the weekend and I don't think I'm hitting the api limits anymore. Before I would see limit errors when I browsed the console and have yet to see one

mzupan · 2016-05-21T05:09:17Z

Ok found out my issue finally. We had probably 50 or so services in our cluster. We don't have any load balance type services so we never gave the IAM role any access to ELBs.

Looking at the code it makes a check to see if it has a ELB to make sure about any edits is my guess

Once I granted elasticloadbalancing:DescribeLoadBalancers permission to the role the controller made one API check per service that stopped after that

guoshimin · 2016-05-21T06:28:29Z

Sounds like #25401

fejta-bot · 2017-12-16T03:08:40Z

Issues go stale after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

fejta-bot · 2018-01-15T03:56:25Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-02-14T04:04:17Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

justinsb added the area/platform/aws label Mar 12, 2016

justinsb added this to the v1.2 milestone Mar 12, 2016

justinsb added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Mar 12, 2016

justinsb self-assigned this Mar 12, 2016

This was referenced Mar 20, 2016

AWS: Public IP warning every second #23256

Closed

AWS: We require the DescribeLoadBalancers permission #23257

Closed

a-robinson added the team/sig-aws label Mar 31, 2016

justinsb modified the milestones: v1.2, v1.3 Apr 1, 2016

goltermann modified the milestones: v1.3, next-candidate Jun 13, 2016

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 16, 2017

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 15, 2018

k8s-ci-robot closed this as completed Feb 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS: API RequestLimit delayer was not triggering #22906

AWS: API RequestLimit delayer was not triggering #22906

justinsb commented Mar 12, 2016

mzupan commented Mar 20, 2016

justinsb commented Mar 20, 2016

mzupan commented Mar 20, 2016

justinsb commented Mar 20, 2016

mzupan commented Mar 20, 2016

justinsb commented Mar 20, 2016

mzupan commented Mar 20, 2016

mzupan commented Mar 20, 2016

justinsb commented Mar 20, 2016

mzupan commented Mar 21, 2016

mzupan commented May 21, 2016

guoshimin commented May 21, 2016

fejta-bot commented Dec 16, 2017

fejta-bot commented Jan 15, 2018

fejta-bot commented Feb 14, 2018

AWS: API RequestLimit delayer was not triggering #22906

AWS: API RequestLimit delayer was not triggering #22906

Comments

justinsb commented Mar 12, 2016

mzupan commented Mar 20, 2016

justinsb commented Mar 20, 2016

mzupan commented Mar 20, 2016

justinsb commented Mar 20, 2016

mzupan commented Mar 20, 2016

justinsb commented Mar 20, 2016

mzupan commented Mar 20, 2016

mzupan commented Mar 20, 2016

justinsb commented Mar 20, 2016

mzupan commented Mar 21, 2016

mzupan commented May 21, 2016

guoshimin commented May 21, 2016

fejta-bot commented Dec 16, 2017

fejta-bot commented Jan 15, 2018

fejta-bot commented Feb 14, 2018