[AWS] Failed ELB creation retried indefinitely after LB service deletion #17790

antoineco · 2015-11-25T18:41:21Z

(k8s 1.1.2)

I created a service of type LoadBalancer on AWS, which failed because of #12381.

After deleting the service, kube-controller-manager keeps trying to create the ELB indefinitely:

kube-controller-manager[1228]: I1125 18:26:53.922152    1228 servicecontroller.go:222] Got new Sync delta for service: &{TypeMeta:{Kind: APIVersion:} ObjectMeta:{Name:api-lb GenerateName: Namespace:user-api SelfLink:/api/v1/namespaces/user-api/services/api-lb UID:a0827a39-939f-11e5-8b77-0a4c87eff515 ResourceVersion:24376398 Generation:0 CreationTimestamp:2015-11-25 18:09:10 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[app:core-user-rails component:api] Annotations:map[]} Spec:{Type:LoadBalancer Ports:[{Name:http Protocol:TCP Port:9080 TargetPort:{Kind:1 IntVal:0 StrVal:http} NodePort:30131}] Selector:map[app:core-user-rails component:api] ClusterIP:10.10.203.255 ExternalIPs:[] LoadBalancerIP: SessionAffinity:None} Status:{LoadBalancer:{Ingress:[]}}}
kube-controller-manager[1228]: I1125 18:26:53.922308    1228 servicecontroller.go:317] Ensuring LB for service user-api/api-lb
kube-controller-manager[1228]: I1125 18:26:53.922346    1228 aws.go:1582] EnsureTCPLoadBalancer(aa0827a39939f11e58b770a4c87eff51, eu-west-1, <nil>, [0xc208a27bd0], [ip-10-0-12-111.eu-west-1.compute.internal ip-10-0-12-147.eu-west-1.compute.internal ip-10-0-12-110.eu-west-1.compute.internal])
kube-controller-manager[1228]: I1125 18:26:54.470111    1228 aws_loadbalancer.go:50] Creating load balancer with name: aa0827a39939f11e58b770a4c87eff51
kube-controller-manager[1228]: E1125 18:26:54.818167    1228 servicecontroller.go:187] Failed to process service delta. Retrying: Failed to create load balancer for service user-api/api-lb: InvalidConfigurationRequest: ELB cannot be attached to multiple subnets in the same AZ.
kube-controller-manager[1228]: status code: 409, request id: 1a9adde7-93a2-11e5-b3ea-5bdb992c6083
...

This eventually leads to (see ##12121):

kube-controller-manager[1228]: E1125 18:35:11.765525    1228 servicecontroller.go:187] Failed to process service delta. Retrying: Failed to create load balancer for service user-api/api-lb: Throttling: Rate exceeded

The only way I found to stop the hemorrhage was to restart kube-controller-manager.

The text was updated successfully, but these errors were encountered:

davidopp · 2015-11-25T23:51:44Z

@bprashanth

kelcecil · 2015-12-04T19:20:46Z

We experienced this with Kubernetes 1.1.1 this morning, and the infinite retry coupled with the aggressiveness of the retry caused other parts of our system to have difficulties. Limiting the number of retries and exponentially backing off on subsequent retries would be fantastic.

I'm willing to submit a PR to fix this if provided some guidance on whether my suggestions would be acceptable.

harsha-y · 2015-12-10T21:14:59Z

+1

Experienced this on k8s 1.1.1 - a previous occurrence caused AWS to disable ELB creation on a particular account.

Exponential cool down timer seems to be what AWS recommends but at the very least this should have a retry limit and a default delay between retries.

gopinatht · 2016-02-03T03:44:23Z

+1 on this. @kelcecil Is the PR already submitted for this?

bprashanth · 2016-02-03T03:47:36Z

@kelcecil sure a backoff pr sounds like a good fix, if you're still interested. Sorry for the delay @kubernetes/goog-cluster fyi

kelcecil · 2016-02-03T15:20:47Z

@gopinatht @bprashanth The PR isn't submitted yet, but I can possibly get to it in the next week. If someone else wants to take it, then please feel free.

bprashanth · 2016-02-03T18:16:42Z

Next week sound good, I'll wait for it unless @gopinatht want's to jump in

gopinatht · 2016-02-03T23:44:24Z

@bprashanth @kelcecil I have absolutely no experience with this code base. So if I do it, it will be a lot more than a week if I ever need to do it. I offer to review the PR if that helps.

kelcecil · 2016-03-01T21:53:08Z

I talked to @justinsb about this on the #sig-aws Slack channel a little while ago. I'm going to look at generalizing this backoff for everything instead of just AWS. Should make things easier. I'm going to start hacking on it this week.

justinsb · 2016-03-02T01:08:49Z

@kelcecil I've got a PR pending for backoff in the servicecontroller #21982 . It isn't great (it uses a goroutine to defer), but it should be doable for 1.2, while a better implementation would probably be too invasive.

justinsb · 2016-06-04T18:58:44Z

This should now be fixed in 1.2: we have a backoff. I believe we still retry indefinitely, but we should not hit rate limits. Please reopen if it continues in 1.2 or later.

pwittrock added the team/control-plane label Nov 25, 2015

davidopp added team/cluster and removed team/control-plane labels Nov 25, 2015

bgrant0607 added the team/sig-aws label Mar 9, 2016

justinsb added the area/platform/aws label May 27, 2016

justinsb closed this as completed Jun 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AWS] Failed ELB creation retried indefinitely after LB service deletion #17790

[AWS] Failed ELB creation retried indefinitely after LB service deletion #17790

antoineco commented Nov 25, 2015

davidopp commented Nov 25, 2015

kelcecil commented Dec 4, 2015

harsha-y commented Dec 10, 2015

gopinatht commented Feb 3, 2016

bprashanth commented Feb 3, 2016

kelcecil commented Feb 3, 2016

bprashanth commented Feb 3, 2016

gopinatht commented Feb 3, 2016

kelcecil commented Mar 1, 2016

justinsb commented Mar 2, 2016

justinsb commented Jun 4, 2016

[AWS] Failed ELB creation retried indefinitely after LB service deletion #17790

[AWS] Failed ELB creation retried indefinitely after LB service deletion #17790

Comments

antoineco commented Nov 25, 2015

davidopp commented Nov 25, 2015

kelcecil commented Dec 4, 2015

harsha-y commented Dec 10, 2015

gopinatht commented Feb 3, 2016

bprashanth commented Feb 3, 2016

kelcecil commented Feb 3, 2016

bprashanth commented Feb 3, 2016

gopinatht commented Feb 3, 2016

kelcecil commented Mar 1, 2016

justinsb commented Mar 2, 2016

justinsb commented Jun 4, 2016