Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend timeout to workaround slow arm64 math #66264

Merged
merged 1 commit into from Jul 20, 2018

Conversation

@joejulian
Copy link
Contributor

joejulian commented Jul 17, 2018

What this PR does / why we need it:

The math/big functions are slow on arm64. There is improvement coming
with go1.11 but until such time as that version can be used to build releases,
if a server uses rsa certificates on arm64, the math load for the multitude
of watches over-taxes the ability of the processor and the TLS connections
time out. Retries will also not succeed and serve to exacerbate the problem.

By extending the timeout, the TLS connections will eventually be
successful and the load will drop.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #64649

Special notes for your reviewer:
This was tested on a Raspberry Pi 3

Release note:

Extend TLS timeouts to work around slow arm64 math/big
@joejulian

This comment has been minimized.

Copy link
Contributor Author

joejulian commented Jul 17, 2018

/assign @liggitt

@smarterclayton

This comment has been minimized.

Copy link
Contributor

smarterclayton commented Jul 17, 2018

You say "if a server uses RSA certificates". Does this also happen with elliptic curves?

@joejulian

This comment has been minimized.

Copy link
Contributor Author

joejulian commented Jul 17, 2018

I haven't tested that yet but according to the conversations I found in the golang community, ecdsa should not be a problem.

@anguslees

This comment has been minimized.

Copy link
Member

anguslees commented Jul 17, 2018

(Nice find! I noticed that TLS handshaking was very slow on my rPi2s (arm32), and ended up disabling etcd TLS encryption altogether (!) because etcd at the time didn't offer a way to extend the extremely aggressive 1s connect timeout to peers (typically completed in ~1.1s on my rpi2s, worse when under load). I shall have to retry with ecdsa certs to see how they perform...)

@DJGummikuh

This comment has been minimized.

Copy link

DJGummikuh commented Jul 17, 2018

Hey! I used your changes to build kubernetes for me to test with but how do I actually patch the changes onto my raspberry pi? I have the kube-apiserver binary here but I can't seem to figure out where to place it so that kubernetes uses it - can I create the necessary docker images on my pc or would I have to run that process on my Raspberry Pi?

@dims

This comment has been minimized.

Copy link
Member

dims commented Jul 17, 2018

/ok-to-test

@dims

This comment has been minimized.

Copy link
Member

dims commented Jul 17, 2018

/assign @xiang90

@joejulian

This comment has been minimized.

Copy link
Contributor Author

joejulian commented Jul 17, 2018

/test pull-kubernetes-e2e-gce

@@ -40,7 +40,7 @@ var (
keepaliveTime = 30 * time.Second
keepaliveTimeout = 10 * time.Second
// dialTimeout is the timeout for failing to establish a connection.
dialTimeout = 10 * time.Second
dialTimeout = 20 * time.Second

This comment has been minimized.

@timothysc

timothysc Jul 17, 2018

Member

These are really manifest constants then vars. We should also cross-link the issue or some meta-data to outline how this magic number was chosen.

We have O(100)s of magic numbers in the system with little/no history.

This comment has been minimized.

@joejulian

joejulian Jul 17, 2018

Author Contributor

I considered that as well.

The original number seemed to be arbitrary and the new number was just a double of that so equally as arbitrary. When go1.11 is released and is used to build the binaries, this workaround will be moot and can be reverted.

I did reference the hardware to which this was a problem and that this number was chosen as it alleviated that problem for said hardware. There is an issue linked. What further meta-data would you be interested in seeing?

@joejulian joejulian force-pushed the joejulian:workaround_for_slow_arm64_math branch from cc649f8 to d8bf732 Jul 19, 2018

@k8s-ci-robot k8s-ci-robot added size/S and removed size/XS labels Jul 19, 2018

@joejulian joejulian force-pushed the joejulian:workaround_for_slow_arm64_math branch from d8bf732 to 62b9d37 Jul 19, 2018

extend timeout to workaround slow arm64 math
The math/big functions are slow on arm64. There is improvement coming
with go1.11 but in the mean time if a server uses rsa certificates on
arm64, the math load for the multitude of watches over taxes the ability
of the processor and the TLS connections time out. Retries will also not
succeed and serve to exacerbate the problem.

By extending the timeout, the TLS connections will eventually be
successful and the load will drop.

Fixes #64649
@joejulian

This comment has been minimized.

Copy link
Contributor Author

joejulian commented Jul 19, 2018

/test pull-kubernetes-kubemark-e2e-gce-big

@timothysc
Copy link
Member

timothysc left a comment

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm label Jul 20, 2018

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Jul 20, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: joejulian, timothysc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented Jul 20, 2018

Automatic merge from submit-queue (batch tested with PRs 66341, 66405, 66403, 66264, 66447). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit b914542 into kubernetes:master Jul 20, 2018

18 checks passed

Submit Queue Queued to run github e2e tests a second time.
Details
cla/linuxfoundation joejulian authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-e2e-kubeadm-gce Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
@fedebongio

This comment has been minimized.

Copy link
Contributor

fedebongio commented Jul 23, 2018

/cc @cheftako

@k8s-ci-robot k8s-ci-robot requested a review from cheftako Jul 23, 2018

k8s-github-robot pushed a commit that referenced this pull request Jul 25, 2018

Kubernetes Submit Queue
Merge pull request #66459 from joejulian/automated-cherry-pick-of-#66…
…264-upstream-release-1.10

Automatic merge from submit-queue.

Automated cherry pick of #66264: extend timeout to workaround slow arm64 math

Cherry pick of #66264 on release-1.10.

#66264: extend timeout to workaround slow arm64 math

k8s-github-robot pushed a commit that referenced this pull request Jul 25, 2018

Kubernetes Submit Queue
Merge pull request #66458 from joejulian/automated-cherry-pick-of-#66…
…264-upstream-release-1.11

Automatic merge from submit-queue.

Automated cherry pick of #66264: extend timeout to workaround slow arm64 math

Cherry pick of #66264 on release-1.11.

#66264: extend timeout to workaround slow arm64 math

@joejulian joejulian deleted the joejulian:workaround_for_slow_arm64_math branch Jul 25, 2018

@neolit123

This comment has been minimized.

Copy link
Member

neolit123 commented Aug 2, 2018

/sig api-machinery
adding sig label for release notes tools to fetch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.