Setting control-plane-resource-requests in config.yaml has no effect #4323

hostalp · 2023-06-07T15:39:19Z

Environmental Info:
rke2 version v1.25.9+rke2r1 (842d05e)
go version go1.19.8 X:boringcrypto

Node(s) CPU architecture, OS, and Version:
Rocky Linux release 9.1 (Blue Onyx)
Linux kernel 5.14.0-162.23.1.el9_1.x86_64 # 1 SMP PREEMPT_DYNAMIC Tue Apr 11 19:09:37 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
2 CPUs

Cluster Configuration:
1 server

Describe the bug:
I'm trying to set control-plane-resource-requests in /etc/rancher/rke2/config.yaml as described at https://docs.rke2.io/advanced#control-plane-component-resource-requestslimits
Basic example I tried:

control-plane-resource-requests:
  - kube-apiserver-cpu=500m

However even after restarting the whole host the output from kubectl describe nodes still shows that the default value (250m in this case) is used.

I suppose that these settings may have effect only when the node is being created. Is that correct? How to change these settings later then? Do I have to find and modify the data directly in the etcd database?

The text was updated successfully, but these errors were encountered:

brandond · 2023-06-07T16:39:43Z

Please check the static pod manifest on the node; the kubelet has a bug that can cause the mirror pods (visible from kubectl get pod ) to be out of sync with the actual static pod.

hostalp · 2023-06-07T17:54:50Z

If you mean files in /var/lib/rancher/rke2/agent/pod-manifests/ then these do get updated. However kubectl (kubectl describe nodes or kubectl get pod -n kube-system <pod-name> -o yaml) still shows default values. But even though everything via kubectl looks incorrect, those pods appear to run with the new values.

It's however quite strange and it affects pod scheduling. I'm now at 112% allocation of CPU requests and can't start anything that specifies CPU requests, even 100m blocks the pod from starting. If the reported values would be correct there wouldn't be such issue.

I'm basically trying to decrease most of these CPU requests values in order to get it running at one small and resource constrianed host which would be used for some lightweight workload.

By the way it looks like the cloud-controller-manager CPU requests value can't be set below 100m - even if it's set to a lower value, it still shows 100m in the pod manifest file.

brandond · 2023-06-07T18:44:50Z

I'm basically trying to decrease most of these CPU requests values in order to get it running at one small and resource constrianed host which would be used for some lightweight workload.

You'd be better off running K3s.

hostalp · 2023-06-07T20:14:54Z

I tested that and the overall difference in the resource consumption isn't actually that big.
This is supposed to serve as a small environment to test things (including the whole environment setup) for later reuse. It will also grow bigger, so it's only the starting point.

Still, the resource settings should work properly I believe.

GwynHannay · 2023-07-15T20:21:47Z

Hi, I have the same issue here where I can see that the values from my config.yaml has come through to the RKE2 server (as seen in the "default" values for the --control-plane-resource-requests flag screenshot) but the pods don't change their requested resources (second screenshot).

Is there any consideration given to this bug, or is this unlikely to be fixed?

brandond · 2023-07-15T23:45:15Z

There is a bug in the upstream Kubernetes code that causes the kubelet not to update the mirror pods on the apiserver, which is what you see when you use kubectl to view the static pods. The actual static pod definitions, include resources used by the pods, have been updated as per #4323 (comment)

GwynHannay · 2023-07-15T23:55:36Z

Ah okay, sorry I misunderstood. So then is there no way to tell the scheduler the correct number of resources reserved? I'm using Rancher for the front end and I was trying to lower the amount of CPU reserved on this node to reflect actual usage.

brandond · 2023-07-16T02:54:05Z

You might try running the rke2-killall.sh script to force the pods to be recreated? I believe that should resync the mirror pods. Or just reboot the node.

hostalp · 2023-07-17T12:00:05Z

Unfortunately none of that helped in our case, it's still incorrect.

brandond · 2023-07-17T21:30:20Z

Hmm. Doing this should delete the mirror pods from the apiserver, and allow the kubelet to re-sync them, give it a try?

kubectl delete pod -n kube-system -l tier=control-plane

If that works it's a good work-around, but I don't like the fact that the mirror pods get out of sync. We statically generate the pod UIDs for reasons, but that does trigger the upstream issue with the mirror pods getting unsynchronized with the actual static pod config.

brandond · 2023-07-17T22:12:11Z

Please let me know if that workaround works. I've put this in next-up as a reminder to myself to see if there's a better way to allow the pods to reconcile while the apiserver is down, but doesn't get the mirror pods out of sync.

GwynHannay · 2023-07-18T04:24:37Z

You might try running the rke2-killall.sh script to force the pods to be recreated? I believe that should resync the mirror pods. Or just reboot the node.

I rebooted the node and ran into some issues with things not coming back up properly - ran the rke2-killall.sh script "to be safe", and then spent the rest of the day trying to fix everything, haha.

Today I just backed up all my data, deleted my VM and started from scratch because the cloud-controller-manager and kube-controller-manager were stuck on container creations in an endless loop (due to race conditions, apparently) and I didn't know how to resolve it otherwise.

I created my config.yaml before installing RKE2 though and have successfully changed the CPU requests for control plane components.

brandond · 2023-07-25T16:42:17Z

Closing this out in favor of tracking this in #3725

brandond · 2023-07-26T00:26:43Z

Actually, lets track them both, since the steps to reproduce are different (upgrades vs changing resources).

est-suse · 2023-08-31T01:14:31Z

Validated on release-1.27 branch with version v1.27.5-rc2+rke2r1

cat /etc/os-release | grep PRETTY
PRETTY_NAME="Ubuntu 22.04.2 LTS"

Cluster Configuration:

1 server

Config.yaml:

cat config.yaml
token: xxx
control-plane-resource-requests:
  - kube-apiserver-cpu=300m
  - kube-apiserver-memory=512M
  - kube-scheduler-cpu=300m
  - kube-scheduler-memory=512M
  - etcd-cpu=200m
disable-apiserver: true
disable-controller-manager: true
disable-scheduler: true
write-kubeconfig-mode: "0644"
cni: canal
node-external-ip: 1.1.2.3

For the initial install - keeping the control plane values to default values.

Testing Steps:

Copy config.yaml example provided above ^^

$ sudo mkdir -p /etc/rancher/rke2 && sudo cp config.yaml /etc/rancher/rke2

Install RKE2
For reproducing issue run: (change server/agent per the node being installed)

curl -sfL https://get.rke2.io | sudo INSTALL_RKE2_VERSION=1.27.5-rc2+rke2r1 INSTALL_RKE2_TYPE='server' INSTALL_RKE2_METHOD=tar sh -
sudo systemctl enable --now rke2-server

Validation Results:
rke2 version used for validation:

rke2 version v1.27.5-rc2+rke2r1 (eea2dfbd1f4936316a6af67f7d232813fd3e7db7)
go version go1.20.7 X:boringcrypto

Edit config.yaml and restart rke2 server/agent services. And verify the values for different pods:

control-plane-resource-requests:
  - kube-apiserver-cpu=400m
  - kube-apiserver-memory=400M
  - kube-scheduler-cpu=400m
  - kube-scheduler-memory=400M
  - etcd-cpu=300m

Verify if value changes are reflected in the corresponding pods:

kubectl describe pod kube-scheduler-ip-x-x-4-x -n kube-system | grep -e "Name:" -e memory -e Requests -e cpu -e Namespace -e kubernetes
Name:                 kube-scheduler-ip-x-x-4-x
Namespace:            kube-system
Priority Class Name:  system-cluster-critical
Annotations:          kubernetes.io/config.hash: 9358e203d9c52a914de8dd0eb129aaf1
                      kubernetes.io/config.mirror: 9358e203d9c52a914de8dd0eb129aaf1
                      kubernetes.io/config.seen: 2023-08-31T01:07:40.148649069Z
                      kubernetes.io/config.source: file
    Image:         index.docker.io/rancher/hardened-kubernetes:v1.27.5-rke2r1-build20230824
    Image ID:      docker.io/rancher/hardened-kubernetes@sha256:fa3a842b35c71e82e290a2f922a409f1e252e56da8190422e9401fdb88a58cad
    Requests:
      cpu:     400m
      memory:  400M
  Normal  Pulled   5m18s  kubelet  Container image "index.docker.io/rancher/hardened-kubernetes:v1.27.5-rke2r1-build20230824" already present on machine


$ kubectl describe pod kube-apiserver-ip-x-x-x -n kube-system | grep -e "Name:" -e memory -e Requests -e cpu -e Namespace -e kubernetes
Name:                 kube-apiserver-ip-x-x-4-x
Namespace:            kube-system
Priority Class Name:  system-cluster-critical
Annotations:          kubernetes.io/config.hash: 44704c8fbead7f623eca95e406959c86
                      kubernetes.io/config.mirror: 44704c8fbead7f623eca95e406959c86
                      kubernetes.io/config.seen: 2023-08-31T01:07:35.144282879Z
                      kubernetes.io/config.source: file
    Image:         index.docker.io/rancher/hardened-kubernetes:v1.27.5-rke2r1-build20230824
    Image ID:      docker.io/rancher/hardened-kubernetes@sha256:fa3a842b35c71e82e290a2f922a409f1e252e56da8190422e9401fdb88a58cad
      --api-audiences=https://kubernetes.default.svc.cluster.local,rke2
      --service-account-issuer=https://kubernetes.default.svc.cluster.local
    Requests:
      cpu:      400m
      memory:   400M
  Normal   Pulled     5m27s  kubelet  Container image "index.docker.io/rancher/hardened-kubernetes:v1.27.5-rke2r1-build20230824" already present on machine


kubectl describe pod etcd-ip-x-x-4-x -n kube-system | grep -e "Name:" -e memory -e Requests -e cpu -e Namespace -e kubernetes
Name:                 etcd-ip-x-x-4-x
Namespace:            kube-system
Priority Class Name:  system-cluster-critical
                      kubernetes.io/config.hash: d70c34b267c78b4032851257da63e839
                      kubernetes.io/config.mirror: d70c34b267c78b4032851257da63e839
                      kubernetes.io/config.seen: 2023-08-31T01:11:46.654639101Z
                      kubernetes.io/config.source: file

    Requests:
      cpu:     300m
      memory:  512Mi

brandond self-assigned this Jul 17, 2023

brandond added this to the v1.27.5+rke2r1 milestone Jul 17, 2023

brandond closed this as completed Jul 25, 2023

brandond mentioned this issue Jul 26, 2023

Fix static pod UID generation and cleanup #4508

Merged

brandond reopened this Jul 26, 2023

GwynHannay mentioned this issue Jul 31, 2023

Control plane pods do not come back up after restarting node #4526

Closed

est-suse self-assigned this Aug 25, 2023

est-suse closed this as completed Aug 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting control-plane-resource-requests in config.yaml has no effect #4323

Setting control-plane-resource-requests in config.yaml has no effect #4323

hostalp commented Jun 7, 2023

brandond commented Jun 7, 2023

hostalp commented Jun 7, 2023

brandond commented Jun 7, 2023

hostalp commented Jun 7, 2023

GwynHannay commented Jul 15, 2023

brandond commented Jul 15, 2023 •

edited

GwynHannay commented Jul 15, 2023

brandond commented Jul 16, 2023

hostalp commented Jul 17, 2023

brandond commented Jul 17, 2023 •

edited

brandond commented Jul 17, 2023

GwynHannay commented Jul 18, 2023

brandond commented Jul 25, 2023

brandond commented Jul 26, 2023

est-suse commented Aug 31, 2023 •

edited

Setting control-plane-resource-requests in config.yaml has no effect #4323

Setting control-plane-resource-requests in config.yaml has no effect #4323

Comments

hostalp commented Jun 7, 2023

brandond commented Jun 7, 2023

hostalp commented Jun 7, 2023

brandond commented Jun 7, 2023

hostalp commented Jun 7, 2023

GwynHannay commented Jul 15, 2023

brandond commented Jul 15, 2023 • edited

GwynHannay commented Jul 15, 2023

brandond commented Jul 16, 2023

hostalp commented Jul 17, 2023

brandond commented Jul 17, 2023 • edited

brandond commented Jul 17, 2023

GwynHannay commented Jul 18, 2023

brandond commented Jul 25, 2023

brandond commented Jul 26, 2023

est-suse commented Aug 31, 2023 • edited

brandond commented Jul 15, 2023 •

edited

brandond commented Jul 17, 2023 •

edited

est-suse commented Aug 31, 2023 •

edited