New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kube 1.7 against etcd 3.2.9 can result in hung etcd connections at large scales #57061

Closed
smarterclayton opened this Issue Dec 12, 2017 · 19 comments

Comments

Projects
None yet
7 participants
@smarterclayton
Copy link
Contributor

smarterclayton commented Dec 12, 2017

On a very large 1.7 cluster with etcd 3.2.9, a large number of goroutines in the kube-apiserver are waiting for a new http2 stream for gRPC against etcd. This results in service account token requests hanging with a timeout (504) because they are unable to acquire a gRPC stream within the default 30s timeout, which results in the service account token authenticator failing. This also causes almost 100% 429 requests, because when request rate * duration / max-in-flight > 1 we run out of open connections.

Each storage has its own etcd connection (usually) which does not result in a transport being shared across backend types. Only retrieving secrets was affected - not pods or nodes.

80 node cluster, performing ~60-120 secret requests a second.

Upstream gRPC has a few known hangs fixed, but only when Keepalive or server side InTapHandle are on (which we don't turn on in Kube). I don't see an existing etcd issues.

goroutine profile: total 66730
26058 @ 0x430bea 0x440054 0x43ecbc 0x2286c54 0x22777f5 0x2289c85 0x228a72a 0x228a11c 0x22bdbc2 0x2335ed2 0x232ee2a 0x232ec87 0x2326586 0x231ed18 0x231e5e5 0x231e112 0x2338535 0x3a6c9f4 0x3a1ae74 0x3a1a991 0x3a45837 0x38cc377 0x8740ed 0x8734b7 0x3a51c37 0x3a56e06 0x4184222 0x38d382d 0x38d3412 0x3a51851 0x3a56e06 0x472da22
#	0x2286c53	github.com/openshift/origin/vendor/github.com/coreos/etcd/vendor/google.golang.org/grpc/transport.wait+0x2e3				/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/vendor/google.golang.org/grpc/transport/transport.go:577
#	0x22777f4	github.com/openshift/origin/vendor/github.com/coreos/etcd/vendor/google.golang.org/grpc/transport.(*http2Client).NewStream+0x21a4	/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/vendor/google.golang.org/grpc/transport/http2_client.go:319
#	0x2289c84	github.com/openshift/origin/vendor/github.com/coreos/etcd/vendor/google.golang.org/grpc.sendRequest+0x94				/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/vendor/google.golang.org/grpc/call.go:80
#	0x228a729	github.com/openshift/origin/vendor/github.com/coreos/etcd/vendor/google.golang.org/grpc.invoke+0x5d9					/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/vendor/google.golang.org/grpc/call.go:191
#	0x228a11b	github.com/openshift/origin/vendor/github.com/coreos/etcd/vendor/google.golang.org/grpc.Invoke+0x19b					/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/vendor/google.golang.org/grpc/call.go:118
#	0x22bdbc1	github.com/openshift/origin/vendor/github.com/coreos/etcd/etcdserver/etcdserverpb.(*kVClient).Range+0xd1				/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/etcdserver/etcdserverpb/rpc.pb.go:2239
#	0x2335ed1	github.com/openshift/origin/vendor/github.com/coreos/etcd/clientv3.(*retryWriteKVClient).Range+0x91					<autogenerated>:177
#	0x232ee29	github.com/openshift/origin/vendor/github.com/coreos/etcd/clientv3.(*retryKVClient).Range.func1+0x89					/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/clientv3/retry.go:92
#	0x232ec86	github.com/openshift/origin/vendor/github.com/coreos/etcd/clientv3.(*Client).newAuthRetryWrapper.func1+0x46				/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/clientv3/retry.go:61
#	0x2326585	github.com/openshift/origin/vendor/github.com/coreos/etcd/clientv3.(*retryKVClient).Range+0x155						/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/clientv3/retry.go:94
#	0x231ed17	github.com/openshift/origin/vendor/github.com/coreos/etcd/clientv3.(*kv).do+0x4f7							/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/clientv3/kv.go:140
#	0x231e5e4	github.com/openshift/origin/vendor/github.com/coreos/etcd/clientv3.(*kv).Do+0x84							/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/clientv3/kv.go:119
#	0x231e111	github.com/openshift/origin/vendor/github.com/coreos/etcd/clientv3.(*kv).Get+0xe1							/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/clientv3/kv.go:93
#	0x2338534	github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/storage/etcd3.(*store).Get+0x134						/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/storage/etcd3/store.go:125
#	0x3a6c9f3	github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/registry/generic/registry.(*Store).Get+0x183					/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/registry/generic/registry/store.go:617
#	0x3a1ae73	github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers.GetResource.func1+0x193					/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers/rest.go:175
#	0x3a1a990	github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers.getResourceHandler.func1+0x200				/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers/rest.go:129
#	0x3a45836	github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints.restfulGetResource.func1+0xb6						/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/installer.go:1097
#	0x38cc376	github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/metrics.InstrumentRouteFunc.func1+0x206				/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/metrics/metrics.go:144
#	0x8740ec	github.com/openshift/origin/vendor/github.com/emicklei/go-restful.(*Container).dispatch+0xb8c						/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/github.com/emicklei/go-restful/container.go:277
#	0x8734b6	github.com/openshift/origin/vendor/github.com/emicklei/go-restful.(*Container).Dispatch+0x56						/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/github.com/emicklei/go-restful/container.go:199
#	0x3a51c36	github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server.director.ServeHTTP+0x6e6							/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/handler.go:153
#	0x3a56e05	github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server.(*director).ServeHTTP+0x85						<autogenerated>:72
#	0x4184221	github.com/openshift/origin/vendor/k8s.io/kube-aggregator/pkg/apiserver.(*proxyHandler).ServeHTTP+0x121					/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kube-aggregator/pkg/apiserver/handler_proxy.go:91
#	0x38d382c	github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/mux.(*pathHandler).ServeHTTP+0x3dc					/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/mux/pathrecorder.go:248
#	0x38d3411	github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/mux.(*PathRecorderMux).ServeHTTP+0x71					/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/mux/pathrecorder.go:234
#	0x3a51850	github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server.director.ServeHTTP+0x300							/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/handler.go:161
#	0x3a56e05	github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server.(*director).ServeHTTP+0x85						<autogenerated>:72
#	0x472da21	github.com/openshift/origin/pkg/cmd/server/origin.namespacingFilter.func1+0xd1								/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/origin/handlers.go:52


7542 @ 0x430bea 0x440054 0x43ecbc 0x38ce16d 0x38c6739 0x71cfa4 0x170d10f 0x71cfa4 0x3d77138 0x71cfa4 0x472d932 0x71cfa4 0x45980b7 0x71cfa4 0x472e8b1 0x71cfa4 0x472e8b1 0x71cfa4 0x3a52471 0x71f8b2 0x7212b3 0x746494 0x73da1d 0x6f6199 0x45e211
#	0x38ce16c	github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP+0x2ac	/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:109
#	0x38c6738	github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithRequestInfo.func1+0x218		/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters/requestinfo.go:45
#	0x71cfa3	net/http.HandlerFunc.ServeHTTP+0x43										/usr/lib/golang/src/net/http/server.go:1942
#	0x170d10e	github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/request.WithRequestContext.func1+0xee		/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/request/requestcontext.go:110
#	0x71cfa3	net/http.HandlerFunc.ServeHTTP+0x43										/usr/lib/golang/src/net/http/server.go:1942
#	0x3d77137	github.com/openshift/origin/pkg/cmd/server/handlers.TranslateLegacyScopeImpersonation.func1+0x227		/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/handlers/impersonation_scopes.go:20
#	0x71cfa3	net/http.HandlerFunc.ServeHTTP+0x43										/usr/lib/golang/src/net/http/server.go:1942
#	0x472d931	github.com/openshift/origin/pkg/cmd/server/origin.cacheControlFilter.func1+0xc1					/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/origin/handlers.go:20
#	0x71cfa3	net/http.HandlerFunc.ServeHTTP+0x43										/usr/lib/golang/src/net/http/server.go:1942
#	0x45980b6	github.com/openshift/origin/pkg/assets/apiserver.WithAssetServerRedirect.func1+0x86				/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/pkg/assets/apiserver/asset_apiserver.go:321
#	0x71cfa3	net/http.HandlerFunc.ServeHTTP+0x43										/usr/lib/golang/src/net/http/server.go:1942
#	0x472e8b0	github.com/openshift/origin/pkg/cmd/server/origin.WithPatternPrefixHandler.func1+0xd0				/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/origin/master.go:364
#	0x71cfa3	net/http.HandlerFunc.ServeHTTP+0x43										/usr/lib/golang/src/net/http/server.go:1942
#	0x472e8b0	github.com/openshift/origin/pkg/cmd/server/origin.WithPatternPrefixHandler.func1+0xd0				/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/origin/master.go:364
#	0x71cfa3	net/http.HandlerFunc.ServeHTTP+0x43										/usr/lib/golang/src/net/http/server.go:1942
#	0x3a52470	github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server.(*APIServerHandler).ServeHTTP+0x50		/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/handler.go:198
#	0x71f8b1	net/http.serverHandler.ServeHTTP+0x91										/usr/lib/golang/src/net/http/server.go:2568
#	0x7212b2	net/http.initNPNRequest.ServeHTTP+0x92										/usr/lib/golang/src/net/http/server.go:3088
#	0x746493	net/http.(*initNPNRequest).ServeHTTP+0x73									<autogenerated>:312
#	0x73da1c	net/http.(Handler).ServeHTTP-fm+0x4c										/usr/lib/golang/src/net/http/h2_bundle.go:4331
#	0x6f6198	net/http.(*http2serverConn).runHandler+0x88									/usr/lib/golang/src/net/http/h2_bundle.go:4611

6393 @ 0x430bea 0x440054 0x43ecbc 0x922fe4 0x91f073 0x91eba0 0x9035e9 0x728979 0x100586c 0x6d29d2 0x6d2658 0x6d3cf4 0x1011f0d 0x10127f0 0x14d8bf1 0x3b6a1f5 0x3b72a6e 0x16be369 0x3db59ce 0x3db502a 0x25ef382 0x25f13a1 0x25f13a1 0x25ee695 0x38c28d4 0x71cfa4 0x170d10f 0x71cfa4 0x38cefe9 0x71cfa4 0x38d08bb 0x71cfa4
#	0x922fe3	github.com/openshift/origin/vendor/golang.org/x/net/http2.(*ClientConn).RoundTrip+0xb83								/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/golang.org/x/net/http2/transport.go:835
#	0x91f072	github.com/openshift/origin/vendor/golang.org/x/net/http2.(*Transport).RoundTripOpt+0x172							/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/golang.org/x/net/http2/transport.go:339
#	0x91eb9f	github.com/openshift/origin/vendor/golang.org/x/net/http2.(*Transport).RoundTrip+0x3f								/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/golang.org/x/net/http2/transport.go:301
#	0x9035e8	github.com/openshift/origin/vendor/golang.org/x/net/http2.noDialH2RoundTripper.RoundTrip+0x38							/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/golang.org/x/net/http2/configure_transport.go:75
#	0x728978	net/http.(*Transport).RoundTrip+0xaf8														/usr/lib/golang/src/net/http/transport.go:349
#	0x100586b	github.com/openshift/origin/vendor/k8s.io/client-go/transport.(*userAgentRoundTripper).RoundTrip+0x10b						/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/client-go/transport/round_trippers.go:162
#	0x6d29d1	net/http.send+0x161																/usr/lib/golang/src/net/http/client.go:249
#	0x6d2657	net/http.(*Client).send+0x107															/usr/lib/golang/src/net/http/client.go:173
#	0x6d3cf3	net/http.(*Client).Do+0x253															/usr/lib/golang/src/net/http/client.go:595
#	0x1011f0c	github.com/openshift/origin/vendor/k8s.io/client-go/rest.(*Request).request+0x2cc								/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/client-go/rest/request.go:824
#	0x10127ef	github.com/openshift/origin/vendor/k8s.io/client-go/rest.(*Request).Do+0xbf									/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/client-go/rest/request.go:898
#	0x14d8bf0	github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/clientset_generated/clientset/typed/core/v1.(*secrets).Get+0x170		/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/clientset_generated/clientset/typed/core/v1/secret.go:69
#	0x3b6a1f4	github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/serviceaccount.clientGetter.GetSecret+0xc4					/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/serviceaccount/tokengetter.go:49
#	0x3b72a6d	github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/serviceaccount.(*clientGetter).GetSecret+0x7d				<autogenerated>:2
#	0x16be368	github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/serviceaccount.(*jwtTokenAuthenticator).AuthenticateToken+0xf18			/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/serviceaccount/jwt.go:288
#	0x3db59cd	github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/authentication/token/union.(*unionAuthTokenHandler).AuthenticateToken+0x9d		/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/authentication/token/union/union.go:55
#	0x3db5029	github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/authentication/token/cache.(*cachedTokenAuthenticator).AuthenticateToken+0x99		/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/authentication/token/cache/cached_token_authenticator.go:71
#	0x25ef381	github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/authentication/request/bearertoken.(*Authenticator).AuthenticateRequest+0x191		/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/authentication/request/bearertoken/bearertoken.go:55
#	0x25f13a0	github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/authentication/request/union.(*unionAuthRequestHandler).AuthenticateRequest+0x90	/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/authentication/request/union/union.go:57
#	0x25f13a0	github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/authentication/request/union.(*unionAuthRequestHandler).AuthenticateRequest+0x90	/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/authentication/request/union/union.go:57
#	0x25ee694	github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/authentication/group.(*AuthenticatedGroupAdder).AuthenticateRequest+0x54		/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/authentication/group/authenticated_group_adder.go:40
#	0x38c28d3	github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithAuthentication.func1+0x83						/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters/authentication.go:55
#	0x71cfa3	net/http.HandlerFunc.ServeHTTP+0x43														/usr/lib/golang/src/net/http/server.go:1942
#	0x170d10e	github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/request.WithRequestContext.func1+0xee						/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/request/requestcontext.go:110
#	0x71cfa3	net/http.HandlerFunc.ServeHTTP+0x43														/usr/lib/golang/src/net/http/server.go:1942
#	0x38cefe8	github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.WithCORS.func1+0x188							/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/cors.go:75
#	0x71cfa3	net/http.HandlerFunc.ServeHTTP+0x43														/usr/lib/golang/src/net/http/server.go:1942
#	0x38d08ba	github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.WithPanicRecovery.func1+0x11a						/builddir/build/BUILD/atomic-openshift-git-0.7c71a2d/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/wrap.go:41
#	0x71cfa3	net/http.HandlerFunc.ServeHTTP+0x43														/usr/lib/golang/src/net/http/server.go:1942

This is very similar to #53361, although that was watches, and this has a different symptoms.

@smarterclayton

This comment has been minimized.

Copy link
Contributor

smarterclayton commented Dec 12, 2017

We're still on gRPC 1.0.4 (grpc/grpc@777daa1). I'm going to look through newer commits and see if anything beyond the keepalive and tap changes look similar.

@smarterclayton

This comment has been minimized.

Copy link
Contributor

smarterclayton commented Dec 12, 2017

@smarterclayton

This comment has been minimized.

Copy link
Contributor

smarterclayton commented Dec 12, 2017

This might be grpc/grpc-go#1005

@smarterclayton

This comment has been minimized.

Copy link
Contributor

smarterclayton commented Dec 12, 2017

We are fairly sure this is that issue.

@redbaron

This comment has been minimized.

Copy link
Contributor

redbaron commented Dec 12, 2017

Do you know if it can happen on >1.7 and > 3.2.9?

@smarterclayton

This comment has been minimized.

Copy link
Contributor

smarterclayton commented Dec 13, 2017

On 1.8 or newer it's likely to be a different issue.

@smarterclayton

This comment has been minimized.

Copy link
Contributor

smarterclayton commented Dec 13, 2017

In the scenario where we encountered these hangs we had an unusually high rate of watch establishment, which likely increased contention on the code path in the gRPC issue that was dropping stream quota, which likely lead to the hangs building up at such a rate.

@yliaog

This comment has been minimized.

Copy link
Contributor

yliaog commented Dec 18, 2017

/cc @jpbetz

@joejulian

This comment has been minimized.

Copy link
Contributor

joejulian commented Dec 22, 2017

@smarterclayton is that documented in an issue somewhere?

@smarterclayton

This comment has been minimized.

Copy link
Contributor

smarterclayton commented Jan 3, 2018

Against gRPC? I linked one above.

@zhouhaibing089

This comment has been minimized.

Copy link
Contributor

zhouhaibing089 commented Mar 19, 2018

@smarterclayton: I am curious on the etcd version, is it required to upgrade etcd to v3.2.x to fix this issue, too? The commit you mentioned about grpc only exist on v3.2.x.. Not sure whether this should be fixed from both client&server side.

@smarterclayton

This comment has been minimized.

Copy link
Contributor

smarterclayton commented Mar 19, 2018

The problem here requires client side fixes

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Jun 17, 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@joejulian

This comment has been minimized.

Copy link
Contributor

joejulian commented Jun 17, 2018

/remove-lifecycle stale

@joejulian

This comment has been minimized.

Copy link
Contributor

joejulian commented Jun 17, 2018

/close

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Jun 17, 2018

@joejulian: you can't close an active issue unless you authored it or you are assigned to it, Can only assign issues to org members and/or repo collaborators..

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@joejulian

This comment has been minimized.

Copy link
Contributor

joejulian commented Jun 17, 2018

Note to self: do not interact with github before coffee.

/lifecycle stale

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Jul 17, 2018

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Aug 16, 2018

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment