-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to Rancher 2.5.2 and 2.4.10: cattle-cluster-agent hammers apiserver, very high CPU usage #30048
Comments
I'm not sure if it's the same problem, but since the update from 2.5.1 to 2.5.2 I've also had problems with high cpu. I have a Rancher Cluster based on VMs (Hyper-V) with 3 Nodes and Bare Metal Cluster with 3 Nodes created with rancher. |
I rolled back rancher to 2.4.8 and the problem is gone. |
how did you roll back rancher? |
as i'm using a docker install, basically following this doc: https://rancher.com/docs/rancher/v2.x/en/installation/other-installation-methods/single-node-docker/single-node-rollbacks/ |
@horihel @lucky4ever2 do you have Project Network Isolation enabled on these clusters? and do you see the logs reported in this issue? |
Yes, i have Network Isolation enabled. I don't have any logs because I wanted to downgrade again quickly. |
@lucky4ever2 okay, it's a known issue with 2.5.2, we're tracking it with this issue. The workaround is to turn off project network isolation. But I see that you have already successfully rolled back to 2.4.x so you don't need to turn off network isolation on 2.4.x |
I can confirm that network isolation was enabled on both clusters on my side too. as i already rolled back, i can not immediately confirm if disabling solves the problem. I'll try and setup a new test env. |
closing in favor of: #30045 |
is it possible to re-open this issue? cluster operation seems to be still working, although slower than usual. please advise, otherwise i need to downgrade again. |
tried the upgrade to 2.5.3 again today, this time turning PNI off before upgrading. Still misbehaving. downgraded and upgraded to 2.4.11 instead - this time behaviour of cattle-cluster-agent seems to be normal. so 2.4.11 seems to be fine, 2.5.3 (stable) seems to be still misbehaving. |
I also have high cpu on 2.5.3 if i try to go to system project. The server has less than 10% CPU usage while idle, if i try to access system project CPU goes to 300% and request fails after timeout. |
Please reopen, still issues wih 2.5.3 |
What kind of request is this (question/bug/enhancement/feature request):
possible bug
Steps to reproduce (least amount of steps as possible):
upgrade existing rancher from 2.4.8 to 2.5.2
Result:
![grafik](https://user-images.githubusercontent.com/6735079/98927075-5dca3180-24d8-11eb-96ac-db854a2b179b.png)
cattle-agent cpu utilization skyrockets
upgrade took place at 08:00, since then cattle-agent heats the CPU. It also seems to affect kube-apiserver (also taking >100%)
I have this on 2 clusters, one using vsphere and one custom bare metal provisioned by rancher.
Requests to kube-apiserver also went up at the same time
![grafik](https://user-images.githubusercontent.com/6735079/98927809-4ccdf000-24d9-11eb-9010-579277571834.png)
Judging by more prometheus metrics it seems to write a lot to "apps" resources:
![grafik](https://user-images.githubusercontent.com/6735079/98928619-44c28000-24da-11eb-868e-462cb2826ec7.png)
Other details that may be helpful:
Environment information
rancher/rancher
/rancher/server
image tag or shown bottom left in the UI):v2.5.2
single install
Cluster information
2 Clusters show the problem: one rancher provisioned on Vsphere and one Custom/Bare Metal, also rancher provisioned.
Vsphere-Cluster:
VMs (3 mgmt nodes, 4 CPU, 16GB RAM each, 6 workers, 4 CPUs, 16GB RAM each), based on RancherOS
Custom Cluster:
Metal, 5 mixed nodes, (4-8 CPUs, 8-64GB memory).
kubectl version
):1.18.10
docker version
):The text was updated successfully, but these errors were encountered: