-
Notifications
You must be signed in to change notification settings - Fork 38.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e2e flake: kubemark-gce-scale #28537
Comments
Seems that way. We create a non-dialing client pool when we configure the transport:
And that seems to cause the error to be returned here:
But only because all the current connections were unavailable:
A connection is unavailable if:
Given that this is a scale test, I'm willing to bet it's the number of active streams. There's a default value for the max.
This value is also negotiated through a The client doesn't seem to have any way to change this, but the server does. We should be changing this on the server side anyway. https://godoc.org/golang.org/x/net/http2#Server @timothysc I'm on call this week and out for a bit after that. Would you mind putting together a fix? If you get it done this week, I can review. |
Also, that's like... at least 249 TLS handshakes we didn't have to do. So yay? |
@wojtek-t It might be worth considering the validity of these scale test results as they may no longer be stressing actual concurrent connections anymore. |
@krousey - thanks for debugging; However, I think I'm missing something. This was a flake - it's not failing constantly. And this is kubemark-2000, which mean we have 2000 fake nodes in kubemark and each of them have at least one open connection to apiserver. So apiserver has at least 2000 open connections all the time. And you are saying, the limit is 250? regarding our scale test - we are not going to stree the connections; we are focusing on apiserver/controllers from the number of requests point of view; we've never get to the point of thinking about connections deeper. |
This limit is HTTP2 streams within a single connection. My guess is the e2e suite just happened to have 251 concurrent requests over a single transport this one time. Probably not related to the scale. |
I see - yeah, that's probably possible. |
251 concurrent requests from the e2e client seems... odd. |
Well - whatever it is, we need to fix it. I've seen at least 5 (I think it was closer to 10) failures because of that in the last few days. |
Is it only this test that is causing this issue? b/c the clients can be overridden via environment variable. |
I've seen it both in real-cluster-related tests and in kubemarks. But so far I've seen it only in large ones so it seems to be scale-related. |
@timothysc The panic is not related to this one (I mean I checked for panics in those runs and there weren't any). However, the godep update that you did may potentially help (it was just merged, so we don't have data for it - it was flaking ~1 per day or two). |
This has just happened in 100-node cluster. |
@kubernetes/sig-api-machinery |
k, I'll look into the test iteself soon. |
closed via #29283 |
2000-node kubemark failed with:
http://kubekins.dls.corp.google.com/view/Scalability/job/kubernetes-kubemark-gce-scale/1263/
Seems to be related to http2 - any ideas?
@timothysc @krousey
The text was updated successfully, but these errors were encountered: