Deployment Integration Test Goroutine Limit Exceeded #53617

crimsonfaith91 · 2017-10-09T22:08:32Z

What happened:
When number of deployment integration tests increases more than a threshold, an error running race: limit on 8192 simultaneously alive goroutines is exceeded, dying happens when running the tests locally using bazel. The error does not happen when the number of tests is small.

What you expected to happen:
The integration tests should not create so many alive goroutines (more than 8192).

How to reproduce it (as minimally and precisely as possible):
(1) Duplicate each deployment tests under test/integration/deployment directory twice with a digit identifier
(2) bazel build //test/integration/deployment/...
(3) bazel test //test/integration/deployment/...

Anything else we need to know?:
The error also happens for replicaset. It may be related to how integration test environment is set up.

/kind bug
/sig apps

The text was updated successfully, but these errors were encountered:

enisoc · 2017-10-09T23:04:04Z

@kubernetes/sig-api-machinery-bugs Is it expected that an integration test would exceed 8192 goroutines (mostly started in apiserver code) if it starts a number of apiservers? That seems excessive to me, but if it's normal we should probably limit the concurrency of integration tests. If it's not normal, it seems like we are leaking goroutines.

Some examples of what those 8192 goroutines are doing:

k8s.io/client-go/tools/cache.(*Reflector).watchHandler(0xc420f625a0, 0xa8a2700, 0xc4210f5800, 0xc423939ba0, 0xc4210f55c0, 0xc420f485a0, 0x0, 0x0)
        vendor/k8s.io/client-go/tools/cache/reflector.go:366 +0x16f2
k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch(0xc420f625a0, 0xc420f485a0, 0x0, 0x0)
        vendor/k8s.io/client-go/tools/cache/reflector.go:332 +0x1560
k8s.io/apiserver/pkg/storage.(*Cacher).startCaching(0xc4209121c0, 0xc420f485a0)
        vendor/k8s.io/apiserver/pkg/storage/cacher.go:276 +0x1a4
k8s.io/apiserver/pkg/storage.NewCacherFromConfig.func1.1()
        vendor/k8s.io/apiserver/pkg/storage/cacher.go:245 +0x80
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc42003e7a8)
        vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x70
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc423939fa8, 0x3b9aca00, 0x0, 0xc42003e701, 0xc420f485a0)
        vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0xce
k8s.io/apimachinery/pkg/util/wait.Until(0xc42003e7a8, 0x3b9aca00, 0xc420f485a0)
        vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x5b
k8s.io/apiserver/pkg/storage.NewCacherFromConfig.func1(0xc4209121c0, 0xc420f485a0)
        vendor/k8s.io/apiserver/pkg/storage/cacher.go:248 +0xe3
created by k8s.io/apiserver/pkg/storage.NewCacherFromConfig
        vendor/k8s.io/apiserver/pkg/storage/cacher.go:249 +0xfa7

k8s.io/apiserver/pkg/storage.(*Cacher).dispatchEvents(0xc4209121c0)
        vendor/k8s.io/apiserver/pkg/storage/cacher.go:595 +0x24a
created by k8s.io/apiserver/pkg/storage.NewCacherFromConfig
        vendor/k8s.io/apiserver/pkg/storage/cacher.go:237 +0xf54

github.com/coreos/etcd/clientv3.(*lessor).deadlineLoop(0xc4206948c0)
        vendor/github.com/coreos/etcd/clientv3/lease.go:434 +0x2fd
created by github.com/coreos/etcd/clientv3.NewLease
        vendor/github.com/coreos/etcd/clientv3/lease.go:156 +0x4da

ncdc · 2017-10-10T14:36:05Z

Do you have a full stack dump of all the goroutines? We could run them through panicparse to get some summarized details.

crimsonfaith91 · 2017-10-10T16:49:12Z

@ncdc Yes, but the file is very big (around 4MB). Most of the goroutines have same output. The partial stack dump above highlights most of the output.

enisoc · 2017-10-11T01:04:18Z

The goroutines sampled above seem to be part of the client sitting between the REST Store and etcd. Perhaps it will help to incorporate calls to DestroyFunc somewhere in the integration framework?

kubernetes/staging/src/k8s.io/apiserver/pkg/registry/generic/registry/store.go

Lines 173 to 174 in 77c8b6e

    
           // Called to cleanup clients used by the underlying Storage; optional. 
        
           DestroyFunc func()

mml · 2017-10-12T19:48:35Z

cc @jpbetz

enisoc · 2017-10-18T07:42:10Z

This may be related:

#50690
#49489

fejta-bot · 2018-01-16T08:24:24Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

MHBauer · 2018-01-19T21:25:02Z

to me this looks like dup of root cause in #49489

MHBauer · 2018-01-19T21:31:40Z

/remove-lifecycle stale

MHBauer · 2018-01-19T21:35:31Z

@ncdc we have similar problem in service-catalog. Goroutines are not being reclaimed.

Not sure how to get an appropriate stack dump to help, but some information in this gist

crimsonfaith91 · 2018-02-01T23:05:57Z

I also encountered the error when working on a DaemonSet's integration test: #59013

fejta-bot · 2018-05-30T08:50:23Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

nikhita · 2018-06-13T15:29:00Z

/remove-lifecycle stale

fejta-bot · 2018-09-11T15:43:47Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

nikhita · 2018-09-13T20:08:33Z

/remove-lifecycle stale

Removing help-wanted because the direction is not clear.

/remove-help

fejta-bot · 2018-12-12T21:30:07Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-01-11T22:14:43Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-02-10T22:31:08Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-02-10T22:31:15Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. sig/apps Categorizes an issue or PR as relevant to SIG Apps. labels Oct 9, 2017

crimsonfaith91 mentioned this issue Oct 9, 2017

revamp replicaset integration tests #52067

Merged

7 tasks

crimsonfaith91 changed the title ~~Controller Integration Test Goroutine Limit Exceeded~~ Deployment Integration Test Goroutine Limit Exceeded Oct 9, 2017

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. sig/apps Categorizes an issue or PR as relevant to SIG Apps. labels Oct 9, 2017

k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Oct 9, 2017

BenTheElder mentioned this issue Nov 23, 2017

disable integration tests ci-kubernetes-bazel-test to match presubmits kubernetes/test-infra#5664

Merged

This was referenced Jan 5, 2018

Integration tests appear to hold tons of file descriptors open kubernetes-retired/service-catalog#1649

Closed

DestroyFunc is not exposed via rest.Storage nor called by the ApiServer kubernetes/apiserver#30

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 16, 2018

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 19, 2018

kow3ns added this to Backlog in Workloads Feb 27, 2018

sttts added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. and removed kind/bug Categorizes issue or PR as related to a bug. labels Mar 1, 2018

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 30, 2018

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 13, 2018

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 11, 2018

k8s-ci-robot removed help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 13, 2018

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 12, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 11, 2019

k8s-ci-robot closed this as completed Feb 10, 2019

Workloads automation moved this from Backlog to Done Feb 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deployment Integration Test Goroutine Limit Exceeded #53617

Deployment Integration Test Goroutine Limit Exceeded #53617

crimsonfaith91 commented Oct 9, 2017 •

edited

enisoc commented Oct 9, 2017

ncdc commented Oct 10, 2017

crimsonfaith91 commented Oct 10, 2017

enisoc commented Oct 11, 2017

mml commented Oct 12, 2017

enisoc commented Oct 18, 2017

fejta-bot commented Jan 16, 2018

MHBauer commented Jan 19, 2018

MHBauer commented Jan 19, 2018

MHBauer commented Jan 19, 2018

crimsonfaith91 commented Feb 1, 2018 •

edited

fejta-bot commented May 30, 2018

nikhita commented Jun 13, 2018

fejta-bot commented Sep 11, 2018

nikhita commented Sep 13, 2018

fejta-bot commented Dec 12, 2018

fejta-bot commented Jan 11, 2019

fejta-bot commented Feb 10, 2019

k8s-ci-robot commented Feb 10, 2019

Deployment Integration Test Goroutine Limit Exceeded #53617

Deployment Integration Test Goroutine Limit Exceeded #53617

Comments

crimsonfaith91 commented Oct 9, 2017 • edited

enisoc commented Oct 9, 2017

ncdc commented Oct 10, 2017

crimsonfaith91 commented Oct 10, 2017

enisoc commented Oct 11, 2017

mml commented Oct 12, 2017

enisoc commented Oct 18, 2017

fejta-bot commented Jan 16, 2018

MHBauer commented Jan 19, 2018

MHBauer commented Jan 19, 2018

MHBauer commented Jan 19, 2018

crimsonfaith91 commented Feb 1, 2018 • edited

fejta-bot commented May 30, 2018

nikhita commented Jun 13, 2018

fejta-bot commented Sep 11, 2018

nikhita commented Sep 13, 2018

fejta-bot commented Dec 12, 2018

fejta-bot commented Jan 11, 2019

fejta-bot commented Feb 10, 2019

k8s-ci-robot commented Feb 10, 2019

crimsonfaith91 commented Oct 9, 2017 •

edited

crimsonfaith91 commented Feb 1, 2018 •

edited