Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase upup http response header timeout #12694

Merged
merged 1 commit into from
Nov 8, 2021

Conversation

AlexLast
Copy link
Contributor

@AlexLast AlexLast commented Nov 8, 2021

We deploy multiple clusters in different AWS regions, and since upgrading kops to v1.21.1 we've noticed issues during the nodeup phase for nodes in the ap-southeast-2 region.

After some debugging we've tracked it down to an asset that is taking > 10 seconds just to return response headers, the specific asset in question is: https://github.com/containerd/containerd/releases/download/v1.4.9/cri-containerd-cni-1.4.9-linux-amd64.tar.gz.

While it's highly likely this is caused by some networking issues between the AWS ap-southeast-2 region and the above github mirror, we are seeing nodes fail to join the cluster and response headers for that asset constantly taking > 10 seconds.

We didn't previously hit any issues with these assets when managing our clusters with older versions of kops, where the timeout for the http client was just set to:

Timeout: 2 * time.Minute,

If we're happy with this change, could we please get this cherrypicked as a new 1.21 release? Currently we're hosting our own build of nodeup with these changes as a temporary solution.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 8, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @AlexLast. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@hakman
Copy link
Member

hakman commented Nov 8, 2021

/ok-to-test
/cc @aojea

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 8, 2021
@@ -91,7 +91,7 @@ func downloadURLAlways(url string, destPath string, dirMode os.FileMode) error {
KeepAlive: 30 * time.Second,
}).DialContext,
TLSHandshakeTimeout: 10 * time.Second,
ResponseHeaderTimeout: 10 * time.Second,
ResponseHeaderTimeout: 30 * time.Second,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should use the default transport directly

https://cs.opensource.google/go/go/+/refs/tags/go1.17.3:src/net/http/transport.go;bpv=1;bpt=1;l=43?gsn=DefaultTransport&gs=kythe%3A%2F%2Fgolang.org%3Flang%3Dgo%3Fpath%3Dnet%2Fhttp%23var%2520DefaultTransport
httpClient := &http.Client{
Transport: http.DefaultTransport
},

I can't remember where these values are coming, if there are no special reasons and the default are more conservative we should use them

Copy link
Contributor Author

@AlexLast AlexLast Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this was implemented to support slow downloads of large files - 19f47d4

Although I'm unsure of the history around this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it, but I can't remember where did I get those values, most probably from some kubernetes code, but they don't seem to apply here so I think that better to use the more conservative values from the default client, that in this case doesn't have any timeout at all for response header.

	httpClient := &http.Client{
		Transport: &http.Transport{
	Proxy: ProxyFromEnvironment,
	DialContext: (&net.Dialer{
		Timeout:   30 * time.Second,
		KeepAlive: 30 * time.Second,
	}).DialContext,
	ForceAttemptHTTP2:     true,
	MaxIdleConns:          100,
	IdleConnTimeout:       90 * time.Second,
	TLSHandshakeTimeout:   10 * time.Second,
	ExpectContinueTimeout: 1 * time.Second,
		},
	}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the diff is removing the ResponseHeaderTimeout: 30 * time.Second, and increasing the IdleConnTimeout: 90 * time.Second, and set ForceAttemptHTTP2: true,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the goal was to reduce the IdleConnTimeout mostly. Not sure if ForceAttemptHTTP2 affected things in any way in the past, but don't see a real need for in in the case of downloading various binaries.
This change looks pretty good to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you know better than me the network requirements here, the change lgtm too, just wanted to add more context in case you think some other change will needed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate. Thanks @aojea! 😄

Copy link
Member

@hakman hakman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback and for testing this before submitting the PR @AlexLast.
Will try to get it to 1.21, but not sure when there will be another release.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 8, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hakman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 8, 2021
@k8s-ci-robot k8s-ci-robot merged commit 31737c6 into kubernetes:master Nov 8, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Nov 8, 2021
@AlexLast
Copy link
Contributor Author

AlexLast commented Nov 8, 2021

Thanks for the feedback and for testing this before submitting the PR @AlexLast. Will try to get it to 1.21, but not sure when there will be another release.

Awesome, thank you 👍

k8s-ci-robot added a commit that referenced this pull request Nov 9, 2021
…-upstream-release-1.22

Automated cherry pick of #12694: Increase upup http response header timeout
k8s-ci-robot added a commit that referenced this pull request Nov 9, 2021
…-upstream-release-1.21

Automated cherry pick of #12694: Increase upup http response header timeout
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants