Leaking GCE load balancer target pools detected in services e2e test #8377

ghost · 2015-05-16T15:42:32Z

This started failing consistently on our continuous integration system (kubernetes-e2e-gce) at Build #352 (May 16, 2015 1:08:38 AM). No obvious culprit PR's in the vicinity. Perhaps an inderlying GCE issue?
I noticed a bunch of GCE load balancers seemingly left lying around in the relevant GCE project. Perhaps the system or test is leaking LB's, and reaching it's GCE quota. I'll look into that...

cc: @a-robinson

/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/service.go:348

Expected error:
<*errors.errorString | 0xc208e160c0>: {
s: "service external-lb-test in namespace e2e-tests-service-0-5460a388-2c1d-4be9-a258-a19ee3ad8ea0 doesn't have a public IP after 240.00 seconds",
}
service external-lb-test in namespace e2e-tests-service-0-5460a388-2c1d-4be9-a258-a19ee3ad8ea0 doesn't have a public IP after 240.00 seconds
not to have occurred

ghost · 2015-05-16T15:45:58Z

There were 29 GCE forwarding rules, but 50 GCE target pools, so target pools seem to be leaking. 50 is also the quota limit for target pools on our project, as far as I know, so that's probably the problem. I've deleted all of the above - lets see whether that sorts out the problem.

ghost · 2015-05-16T15:47:38Z

To be clear, the 29 forwarding rules are probably from some running e2e tests - there are many, and I didn't check. But that number should be the same as the number of target pools. The discrepency indicates a probable leak.

ghost · 2015-05-16T16:18:50Z

Yes, that fixed it. e2e's are all green again. Dropping priority, but keeping open to fix the source of the target pool leak.

ghost · 2015-05-16T17:36:11Z

Yup, we're leaking target pools again. All of the leaked pools are named k8s-jenkins-gke-e2e-* , so the problem appears to be specific to GKE.

ghost · 2015-05-16T17:36:39Z

Correction - the instances behind the leaked pools are named k8s-jenkins-gke-e2e-*

ghost · 2015-05-16T17:37:29Z

cc: @roberthbailey @brendandburns FYI

a-robinson · 2015-05-18T06:59:56Z

This is a dupe of #7753, which #7852 should at least help with. Closing this in favor of that.

It's very odd that it's affecting GKE so disproportionately, though. After cleaning up the target pools whose forwarding rules were gone, all the non-soak-test target pools belonged to GKE, which indicates that GKE isn't just leaking target pools, it's also leaking forwarding rules.

The first thing that strikes me about that is that it's very possibly happening on the occasions when the services tests are the last ones run, and the cluster gets torn down before the service controller has had time to clean up. This is happening in GKE but not in GCE because GKE tears down all resources in parallel, while the GCE script synchronously waits for all nodes to be deleted before deleting the master.

ghost added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. area/test-infra labels May 16, 2015

ghost added this to the v1.0 milestone May 16, 2015

ghost added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels May 16, 2015

ghost modified the milestones: v1.0-candidate, v1.0 May 16, 2015

ghost added the team/cluster label May 16, 2015

ghost changed the title ~~e2e regression: Service does not get a public IP~~ Leaking GCE load balancer target pools detected in services e2e test May 16, 2015

a-robinson closed this as completed May 18, 2015

a-robinson mentioned this issue May 18, 2015

Target pool leak in GCE project running our jenkins e2e tests #7753

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leaking GCE load balancer target pools detected in services e2e test #8377

Leaking GCE load balancer target pools detected in services e2e test #8377

ghost commented May 16, 2015

ghost commented May 16, 2015

ghost commented May 16, 2015

ghost commented May 16, 2015

ghost commented May 16, 2015

ghost commented May 16, 2015

ghost commented May 16, 2015

a-robinson commented May 18, 2015

Leaking GCE load balancer target pools detected in services e2e test #8377

Leaking GCE load balancer target pools detected in services e2e test #8377

Comments

ghost commented May 16, 2015

ghost commented May 16, 2015

ghost commented May 16, 2015

ghost commented May 16, 2015

ghost commented May 16, 2015

ghost commented May 16, 2015

ghost commented May 16, 2015

a-robinson commented May 18, 2015