Add memory requests for nodes and controlplane workloads #49

invidian · 2020-02-25T17:47:04Z

This commit adds memory requests for all nodes and controlplane
workloads. The reason behind it is to better show user available
resources on both worker and controller nodes e.g. when doing 'kubectl
describe node'. This is important while one scales the controlplane
deployments and may prevent node eviction.

The measurment was done on freshly created cluster, with
prometheus-operator and metrics-server deployed,
on controller node and on worker node, so the numbers might be lower
than the numbers on long-running cluster, but they give at least some
initial visibility.

The values were measured using 'systemd-cgtop -m -1 / --depth=1', not
using 'free', as 'systemd-cgtop' also includes page cache usage, the
same way 'kubelet' is measuring the memory usage.

Before the measurment, following command has been executed:
'sync; echo 1 | sudo tee /proc/sys/vm/drop_caches; sleep 10'
To make sure only active memory has been captured.

system.slice uses ~250Mi, init.scope uses ~200Mi, which sums up to
roughly 500Mi, which is needed for system.

Kubelet in /docker slice was using ~100Mi. etcd in /docker slice was
using ~200Mi, so workers has 100Mi reserved for 'kube' and controllers
has 300Mi.

Memory usage for self-hosted components has been measured using the
following command: 'kubectl top pods --sort-by=memory | sort -h -k3 -r'.

Then, the read values were rounded up a bit.

Signed-off-by: Mateusz Gozdek mateusz@kinvolk.io

assets/lokomotive-kubernetes/bootkube/resources/charts/kubernetes/templates/kube-scheduler.yaml

rata

I think adding these fields makes sense, but not sure which was the reasoning to pick them. Can you please share the reasoning (and add it to the commit)?

I can't really review if the numbers make sense without knowing how they were calculated :)

rata · 2020-03-19T18:33:34Z

assets/lokomotive-kubernetes/aws/flatcar-linux/kubernetes/cl/controller.yaml.tmpl

@@ -95,7 +95,8 @@ systemd:
          --pod-manifest-path=/etc/kubernetes/manifests \
          --read-only-port=0 \
          --register-with-taints=$${NODE_TAINTS} \
-          --volume-plugin-dir=/var/lib/kubelet/volumeplugins
+          --volume-plugin-dir=/var/lib/kubelet/volumeplugins \
+          --kube-reserved=memory=500Mi


Where does this value come from?

The values are a snapshot of kubectl top pods --sort-by=memory | sort -h -k3 -r after deploying a cluster from example configuration. They should perhaps be configurable, as depending on the cluster size, they will be changing.

Those values are minimal, just to have something in place.

Should I add such comment in commit message?

Yes, please do.

I'd like to get some values from production clusters maybe too... But we can update later, maybe?

Wow, kubelet+docker was using 500mb?

Note ssh, etc. should go under --system-reserved, say the docs

Ah right. I'll use system-reserved then.

Done. PTAL.

assets/lokomotive-kubernetes/bootkube/resources/charts/calico/templates/daemonset.yaml

invidian · 2020-03-27T14:34:22Z

I think adding these fields makes sense, but not sure which was the reasoning to pick them. Can you please share the reasoning (and add it to the commit)?

I can't really review if the numbers make sense without knowing how they were calculated :)

Right, sorry for not providing that straight away. I've added methods of measuring etc to the commit message now. Please take a look.

This commit adds memory requests for all nodes and controlplane workloads. The reason behind it is to better show user available resources on both worker and controller nodes e.g. when doing 'kubectl describe node'. This is important while one scales the controlplane deployments and may prevent node eviction. The measurment was done on freshly created cluster, with prometheus-operator and metrics-server deployed, on controller node and on worker node, so the numbers might be lower than the numbers on long-running cluster, but they give at least some initial visibility. The values were measured using 'systemd-cgtop -m -1 / --depth=1', not using 'free', as 'systemd-cgtop' also includes page cache usage, the same way 'kubelet' is measuring the memory usage. Before the measurment, following command has been executed: 'sync; echo 1 | sudo tee /proc/sys/vm/drop_caches; sleep 10' To make sure only active memory has been captured. system.slice uses ~250Mi, init.scope uses ~200Mi, which sums up to roughly 500Mi, which is needed for system. Kubelet in /docker slice was using ~100Mi. etcd in /docker slice was using ~200Mi, so workers has 100Mi reserved for 'kube' and controllers has 300Mi. Memory usage for self-hosted components has been measured using the following command: 'kubectl top pods --sort-by=memory | sort -h -k3 -r'. Then, the read values were rounded up a bit. Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>

To avoid duplicating the template logic and to make the configuration more readable, as having quote (") right before the template logic is very confusing. Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>

vbatts · 2020-08-06T20:16:13Z

This seems like something that would be done inside slices of cgroups on the host, to ensure the system can still operate. I guess it seems arbitrary and inflexible to declare numbers like this. Right?

invidian force-pushed the invidian/memory-requests branch 6 times, most recently from c24096b to 7133e9e Compare March 6, 2020 12:46

invidian requested review from iaguis, ipochi, rata and surajssd March 6, 2020 13:20

surajssd reviewed Mar 6, 2020

View reviewed changes

assets/lokomotive-kubernetes/bootkube/resources/charts/kubernetes/templates/kube-scheduler.yaml Show resolved Hide resolved

invidian requested a review from surajssd March 10, 2020 09:50

rata reviewed Mar 19, 2020

View reviewed changes

invidian force-pushed the invidian/memory-requests branch from 7133e9e to 1357be0 Compare March 27, 2020 09:45

invidian force-pushed the invidian/memory-requests branch 2 times, most recently from aefa784 to ac21d82 Compare March 30, 2020 09:52

invidian requested a review from rata March 30, 2020 09:56

invidian force-pushed the invidian/memory-requests branch from ac21d82 to 5bb18fa Compare March 30, 2020 10:44

invidian added 2 commits March 30, 2020 13:37

packet worker: move getting BGP_PEER_ADDRESS inline

f907568

To avoid duplicating the template logic and to make the configuration more readable, as having quote (") right before the template logic is very confusing. Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>

invidian force-pushed the invidian/memory-requests branch from 5bb18fa to f907568 Compare March 30, 2020 11:38

invidian mentioned this pull request Apr 22, 2020

Resource requests (and possibly limits) for components #311

Open

surajssd mentioned this pull request Jan 20, 2021

Allow setting CPU pinning per worker pool #1337

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add memory requests for nodes and controlplane workloads #49

Add memory requests for nodes and controlplane workloads #49

invidian commented Feb 25, 2020 •

edited

rata left a comment

rata Mar 19, 2020

invidian Mar 19, 2020

rata Mar 19, 2020

rata Mar 19, 2020

invidian Mar 19, 2020

invidian Mar 27, 2020

invidian commented Mar 27, 2020

vbatts commented Aug 6, 2020

Add memory requests for nodes and controlplane workloads #49

Are you sure you want to change the base?

Add memory requests for nodes and controlplane workloads #49

Conversation

invidian commented Feb 25, 2020 • edited

rata left a comment

Choose a reason for hiding this comment

rata Mar 19, 2020

Choose a reason for hiding this comment

invidian Mar 19, 2020

Choose a reason for hiding this comment

rata Mar 19, 2020

Choose a reason for hiding this comment

rata Mar 19, 2020

Choose a reason for hiding this comment

invidian Mar 19, 2020

Choose a reason for hiding this comment

invidian Mar 27, 2020

Choose a reason for hiding this comment

invidian commented Mar 27, 2020

vbatts commented Aug 6, 2020

invidian commented Feb 25, 2020 •

edited