-
Notifications
You must be signed in to change notification settings - Fork 38.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kube-apiserver 1.10.[0-5] & 1.11.0 uses up all available cpu on arm64 #64649
Comments
I've tried profiling this, but since it doesn't respond to the http interface, I can't get it that way. I've tried hacking in the ability to write profile data to a file, but the os.Exit in the fatal error log causes the defers to get skipped the the profile file to remain empty or have corrupted data if I do get it to write. |
/sig api-machinery |
@joe2far Do you use etcd v2? |
Nope, etcd 3.2.13 per #57480 |
/cc @fedebongio |
Tested 1.10.5 and 1.11.0... still have the same problem. |
Actually, 1.11 is not causing the problem. My test method was flawed. 1.10 is still broken but I can upgrade. /close |
I'm still having the issue with 1.11, i'ld like to know what was your fix |
I don't know. It worked fine for a while then I needed to restart that node for unrelated purposes and it failed again so I'm still having the problem, too. /reopen |
I've built a binary with go 1.10 and it still has the same problem, fwiw. |
https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.10.md#scalability says
I'll try that next. |
running etcd 3.2.18 :
Still have the issue |
Reverted 57480 and commit 31ff8c6 on v1.11.0 and have a working 1.11 image on arm64: https://github.com/joejulian/kubernetes/tree/v1.11.0_undo_pr57480 |
Built with go1.11beta1 which contains several changes to math/big specifically to address performance issues with arm64. Since these functions are used with encryption, they have a direct affect on establishing TLS communications. That build allowed the TLS connections to establish within the timeout (10s) whereas they were not before. I'll try extending the timeouts and building with go1.10 again and see if the problem can be worked around without using the beta compiler. @sebt3 Part of this may be the fact that I was using RSA certificates instead of ECDSA. If you are, too, you may be able to work around this problem by regenerating your certificates to use ECDSA. |
The math/big functions are slow on arm64. There is improvement coming with go1.11 but in the mean time if a server uses rsa certificates on arm64, the math load for the multitude of watches over taxes the ability of the processor and the TLS connections time out. Retries will also not succeed and serve to exacerbate the problem. By extending the timeout, the TLS connections will eventually be successful and the load will drop. Fixes kubernetes#64649
The math/big functions are slow on arm64. There is improvement coming with go1.11 but in the mean time if a server uses rsa certificates on arm64, the math load for the multitude of watches over taxes the ability of the processor and the TLS connections time out. Retries will also not succeed and serve to exacerbate the problem. By extending the timeout, the TLS connections will eventually be successful and the load will drop. Fixes kubernetes#64649
The math/big functions are slow on arm64. There is improvement coming with go1.11 but in the mean time if a server uses rsa certificates on arm64, the math load for the multitude of watches over taxes the ability of the processor and the TLS connections time out. Retries will also not succeed and serve to exacerbate the problem. By extending the timeout, the TLS connections will eventually be successful and the load will drop. Fixes kubernetes#64649
Automatic merge from submit-queue (batch tested with PRs 66341, 66405, 66403, 66264, 66447). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. extend timeout to workaround slow arm64 math **What this PR does / why we need it**: The math/big functions are slow on arm64. There is improvement coming with go1.11 but until such time as that version can be used to build releases, if a server uses rsa certificates on arm64, the math load for the multitude of watches over-taxes the ability of the processor and the TLS connections time out. Retries will also not succeed and serve to exacerbate the problem. By extending the timeout, the TLS connections will eventually be successful and the load will drop. **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes #64649 **Special notes for your reviewer**: This was tested on a Raspberry Pi 3 **Release note**: ```release-note Extend TLS timeouts to work around slow arm64 math/big ```
The math/big functions are slow on arm64. There is improvement coming with go1.11 but in the mean time if a server uses rsa certificates on arm64, the math load for the multitude of watches over taxes the ability of the processor and the TLS connections time out. Retries will also not succeed and serve to exacerbate the problem. By extending the timeout, the TLS connections will eventually be successful and the load will drop. Fixes kubernetes#64649
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
I upgraded kube-apiserver on a Raspberry Pi 3 cluster and after it begins adding the http listeners, the cpu usage goes up on all processors to use every available cycle and all attempts to connect to the api via https (6443) or http (8080) time out.
Since I host etcd on the controller nodes, it also becomes unresponsive.
If I set GOMAXPROCS=1, it does limit that and prevents etcd from timing out, but all attempts to connect to kube-apiserver still time out. Eventually, the initial IP allocation check times out, causing a fatal error.
What you expected to happen:
It should idle at about 4%.
How to reproduce it (as minimally and precisely as possible):
On a raspberry pi 3 running Debian jessie (arm64), I run:
Anything else we need to know?:
Environment:
kubectl version
): 1.10.3 arm64uname -a
):Linux kubecon1 4.9.13-bee42-v8 #1 SMP PREEMPT Fri Mar 3 16:42:37 UTC 2017 aarch64 GNU/Linux
The text was updated successfully, but these errors were encountered: