Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

addon-resizer stops working after http/2 connection is closed by the API Server #3294

Closed
44past4 opened this issue Jul 6, 2020 · 8 comments

Comments

@44past4
Copy link
Contributor

44past4 commented Jul 6, 2020

addon-resizer in version 1.8.10 stops working after http/2 connection to API server is closed and starts logging:

ERROR: logging before flag.Parse: E0706 15:52:22.196569       1 nanny_lib.go:128] Get "https://10.0.0.1:443/api/v1/nodes?resourceVersion=0": http2: no cached connection was available

On a production cluster this may take hours or days to happen and therefore is hard to notice.

To quickly reproduce this issue API Server can be restarted manually.

With http/2 debug logs enabled (GODEBUG=http2debug=2) just before the first error message addon-resizer logs:

http2: Transport readFrame error on conn 0xc0002d2a80: (*net.OpError) read tcp 10.11.0.4:37488->10.0.0.1:443: read: connection reset by peer
RoundTrip failure: read tcp 10.11.0.4:37488->10.0.0.1:443: read: connection reset by peer
http2: Transport failed to get client conn for 10.0.0.1:443: http2: no cached connection was available

This problem is very similar to kiali/kiali#1953. This indicates that the problem most likely comes from incompatible versions of go, x/net/http2 and kubernetes go-client being used.

I was able to confirm this by building addon-resizer 1.8.10 source code with go 1.12.3. This build does not show the problem described above.

Therefore I would suggest to revert commit with an upgrade to go 1.14.3 and release a new version of addon-resizer. After this x/net/http2 and/or kubernetes go-client should be upgraded to versions which works with go 1.14.

The same problem might affect other versions of addon-resizer like 1.8.7, 1.8.8 and 1.8.9.

@44past4
Copy link
Contributor Author

44past4 commented Jul 6, 2020

/cc @jkaniuk @mm4tt @wojtek-t

@mm4tt
Copy link

mm4tt commented Jul 7, 2020

Therefore I would suggest to revert commit with an upgrade to go 1.14.3 and release a new version of addon-resizer. After this x/net/http2 and/or kubernetes go-client should be upgraded to versions which works with go 1.14.

SGTM from SIG Scalability perspective. Going back to 1.12 will most likely increase the memory usage, but I don't think it will be a significant increase. I'll keep an eye on the memory usage in our 5k node tests.
Thanks Pawel for debugging and taking care of this!

44past4 added a commit to 44past4/autoscaler that referenced this issue Jul 7, 2020
This is required due to kubernetes#3294

This reverts commit 0410b57.
@mm4tt
Copy link

mm4tt commented Jul 8, 2020

@bskiba can we please release the change from #3298 as 1.8.11? I'm not sure what the process is for that, e.g. I know nothing about GH release that was made for 1.8.10 - https://github.com/kubernetes/autoscaler/releases/tag/addon-resizer-1.8.10

@bskiba
Copy link
Member

bskiba commented Jul 8, 2020

Can someone bump the image tag like here: #3193
Then I'll tag and create the release

mm4tt added a commit to mm4tt/kubernetes that referenced this issue Jul 10, 2020
@davidxia
Copy link

Has 1.8.11 been released yet? I'm using GKE 1.15.12-gke.3 which uses gke.gcr.io/addon-resizer:1.8.8-gke.0 and suffers from this bug.

@bskiba
Copy link
Member

bskiba commented Sep 21, 2020

Yes, you can use k8s.gcr.io/addon-resizer:1.8.11

@bskiba
Copy link
Member

bskiba commented Sep 21, 2020

/close

@k8s-ci-robot
Copy link
Contributor

@bskiba: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sharadg pushed a commit to sharadg/kubernetes that referenced this issue Oct 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants