-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase upup http response header timeout #12694
Conversation
Hi @AlexLast. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
@@ -91,7 +91,7 @@ func downloadURLAlways(url string, destPath string, dirMode os.FileMode) error { | |||
KeepAlive: 30 * time.Second, | |||
}).DialContext, | |||
TLSHandshakeTimeout: 10 * time.Second, | |||
ResponseHeaderTimeout: 10 * time.Second, | |||
ResponseHeaderTimeout: 30 * time.Second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should use the default transport directly
https://cs.opensource.google/go/go/+/refs/tags/go1.17.3:src/net/http/transport.go;bpv=1;bpt=1;l=43?gsn=DefaultTransport&gs=kythe%3A%2F%2Fgolang.org%3Flang%3Dgo%3Fpath%3Dnet%2Fhttp%23var%2520DefaultTransport
httpClient := &http.Client{
Transport: http.DefaultTransport
},
I can't remember where these values are coming, if there are no special reasons and the default are more conservative we should use them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this was implemented to support slow downloads of large files - 19f47d4
Although I'm unsure of the history around this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it, but I can't remember where did I get those values, most probably from some kubernetes code, but they don't seem to apply here so I think that better to use the more conservative values from the default client, that in this case doesn't have any timeout at all for response header.
httpClient := &http.Client{
Transport: &http.Transport{
Proxy: ProxyFromEnvironment,
DialContext: (&net.Dialer{
Timeout: 30 * time.Second,
KeepAlive: 30 * time.Second,
}).DialContext,
ForceAttemptHTTP2: true,
MaxIdleConns: 100,
IdleConnTimeout: 90 * time.Second,
TLSHandshakeTimeout: 10 * time.Second,
ExpectContinueTimeout: 1 * time.Second,
},
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the diff is removing the ResponseHeaderTimeout: 30 * time.Second,
and increasing the IdleConnTimeout: 90 * time.Second,
and set ForceAttemptHTTP2: true,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the goal was to reduce the IdleConnTimeout mostly. Not sure if ForceAttemptHTTP2
affected things in any way in the past, but don't see a real need for in in the case of downloading various binaries.
This change looks pretty good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you know better than me the network requirements here, the change lgtm too, just wanted to add more context in case you think some other change will needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate. Thanks @aojea! 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback and for testing this before submitting the PR @AlexLast.
Will try to get it to 1.21, but not sure when there will be another release.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hakman The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Awesome, thank you 👍 |
…-upstream-release-1.22 Automated cherry pick of #12694: Increase upup http response header timeout
…-upstream-release-1.21 Automated cherry pick of #12694: Increase upup http response header timeout
We deploy multiple clusters in different AWS regions, and since upgrading kops to
v1.21.1
we've noticed issues during thenodeup
phase for nodes in theap-southeast-2
region.After some debugging we've tracked it down to an asset that is taking > 10 seconds just to return response headers, the specific asset in question is:
https://github.com/containerd/containerd/releases/download/v1.4.9/cri-containerd-cni-1.4.9-linux-amd64.tar.gz
.While it's highly likely this is caused by some networking issues between the AWS
ap-southeast-2
region and the above github mirror, we are seeing nodes fail to join the cluster and response headers for that asset constantly taking > 10 seconds.We didn't previously hit any issues with these assets when managing our clusters with older versions of kops, where the timeout for the http client was just set to:
If we're happy with this change, could we please get this cherrypicked as a new
1.21
release? Currently we're hosting our own build ofnodeup
with these changes as a temporary solution.