Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend timeout to workaround slow arm64 math #66264

Merged

Conversation

joejulian
Copy link
Contributor

What this PR does / why we need it:

The math/big functions are slow on arm64. There is improvement coming
with go1.11 but until such time as that version can be used to build releases,
if a server uses rsa certificates on arm64, the math load for the multitude
of watches over-taxes the ability of the processor and the TLS connections
time out. Retries will also not succeed and serve to exacerbate the problem.

By extending the timeout, the TLS connections will eventually be
successful and the load will drop.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #64649

Special notes for your reviewer:
This was tested on a Raspberry Pi 3

Release note:

Extend TLS timeouts to work around slow arm64 math/big

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 17, 2018
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 17, 2018
@joejulian
Copy link
Contributor Author

/assign @liggitt

@smarterclayton
Copy link
Contributor

You say "if a server uses RSA certificates". Does this also happen with elliptic curves?

@joejulian
Copy link
Contributor Author

I haven't tested that yet but according to the conversations I found in the golang community, ecdsa should not be a problem.

@anguslees
Copy link
Member

anguslees commented Jul 17, 2018

(Nice find! I noticed that TLS handshaking was very slow on my rPi2s (arm32), and ended up disabling etcd TLS encryption altogether (!) because etcd at the time didn't offer a way to extend the extremely aggressive 1s connect timeout to peers (typically completed in ~1.1s on my rpi2s, worse when under load). I shall have to retry with ecdsa certs to see how they perform...)

@DJGummikuh
Copy link

Hey! I used your changes to build kubernetes for me to test with but how do I actually patch the changes onto my raspberry pi? I have the kube-apiserver binary here but I can't seem to figure out where to place it so that kubernetes uses it - can I create the necessary docker images on my pc or would I have to run that process on my Raspberry Pi?

@dims
Copy link
Member

dims commented Jul 17, 2018

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 17, 2018
@dims
Copy link
Member

dims commented Jul 17, 2018

/assign @xiang90

@joejulian
Copy link
Contributor Author

/test pull-kubernetes-e2e-gce

@@ -40,7 +40,7 @@ var (
keepaliveTime = 30 * time.Second
keepaliveTimeout = 10 * time.Second
// dialTimeout is the timeout for failing to establish a connection.
dialTimeout = 10 * time.Second
dialTimeout = 20 * time.Second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are really manifest constants then vars. We should also cross-link the issue or some meta-data to outline how this magic number was chosen.

We have O(100)s of magic numbers in the system with little/no history.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that as well.

The original number seemed to be arbitrary and the new number was just a double of that so equally as arbitrary. When go1.11 is released and is used to build the binaries, this workaround will be moot and can be reverted.

I did reference the hardware to which this was a problem and that this number was chosen as it alleviated that problem for said hardware. There is an issue linked. What further meta-data would you be interested in seeing?

@joejulian joejulian force-pushed the workaround_for_slow_arm64_math branch from cc649f8 to d8bf732 Compare July 19, 2018 17:50
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 19, 2018
@joejulian joejulian force-pushed the workaround_for_slow_arm64_math branch from d8bf732 to 62b9d37 Compare July 19, 2018 17:52
The math/big functions are slow on arm64. There is improvement coming
with go1.11 but in the mean time if a server uses rsa certificates on
arm64, the math load for the multitude of watches over taxes the ability
of the processor and the TLS connections time out. Retries will also not
succeed and serve to exacerbate the problem.

By extending the timeout, the TLS connections will eventually be
successful and the load will drop.

Fixes kubernetes#64649
@joejulian
Copy link
Contributor Author

/test pull-kubernetes-kubemark-e2e-gce-big

Copy link
Member

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 20, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: joejulian, timothysc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 20, 2018
@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 66341, 66405, 66403, 66264, 66447). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit b914542 into kubernetes:master Jul 20, 2018
@fedebongio
Copy link
Contributor

/cc @cheftako

k8s-github-robot pushed a commit that referenced this pull request Jul 25, 2018
…264-upstream-release-1.10

Automatic merge from submit-queue.

Automated cherry pick of #66264: extend timeout to workaround slow arm64 math

Cherry pick of #66264 on release-1.10.

#66264: extend timeout to workaround slow arm64 math
k8s-github-robot pushed a commit that referenced this pull request Jul 25, 2018
…264-upstream-release-1.11

Automatic merge from submit-queue.

Automated cherry pick of #66264: extend timeout to workaround slow arm64 math

Cherry pick of #66264 on release-1.11.

#66264: extend timeout to workaround slow arm64 math
@joejulian joejulian deleted the workaround_for_slow_arm64_math branch July 25, 2018 23:59
@neolit123
Copy link
Member

/sig api-machinery
adding sig label for release notes tools to fetch.

@k8s-ci-robot k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Aug 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kube-apiserver 1.10.[0-5] & 1.11.0 uses up all available cpu on arm64