Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idle master constantly burns CPU/disk polling itself #75565

Closed
Foritus opened this issue Mar 21, 2019 · 12 comments

Comments

Projects
None yet
9 participants
@Foritus
Copy link

commented Mar 21, 2019

What happened: Setting up a Kubernetes cluster using kubeadm (using this guide: https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/ ), the master constantly polls itself (observable via a -v=4 log from the kube-apiserver pod). This load seems to average 10-11 TPS against the api server and subsequently etcd. This places a lot of uneccessary stress on the CPU and IO subsystems. See api-server.log for a log of the observed behaviour.

What you expected to happen:
For Kubernetes to not burn lots of system resources on pointless busy work. My goal as a user is to run my workloads, not just to run kubernetes :)

How to reproduce it (as minimally and precisely as possible):
Install a small cluster by following this guide: https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/
Overview of steps to set up master:

  1. sudo apt update
  2. sudo apt install apt-transport-https
  3. curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
  4. echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
  5. sudo apt update
  6. sudo apt install kubeadm kubelet kubectl
  7. sudo kubeadm init --token-ttl=0
  8. kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
  9. Join some workers by following steps 1-6 and then running the join command output from step 7 on them.

Let the cluster fully initialise (takes a minute or two for all the CNI stuff to initialise).

Observe the master has high CPU usage despite doing very little.

If you set the api-server log verbosity to 4 (edit /etc/kubernetes/manifests/kube-apiserver.yaml and add - --v=4 to the arguments list), you will see logs that look a lot like these: api-server.log

Environment:

  • Kubernetes version (use kubectl version):
    Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-02-28T13:37:52Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-02-28T13:30:26Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

  • Cloud provider or hardware configuration: On prem (at home). 3 virtual machines (1 master, 2 workers) and 3 raspberry pis (all workers).

  • OS (e.g: cat /etc/os-release): Ubuntu 18.04.2 LTS (Bionic Beaver)

  • Kernel (e.g. uname -a): Linux kube-master-1 4.15.0-46-generic #49-Ubuntu SMP Wed Feb 6 09:33:07 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

  • Install tools: apt/dpkg, kubeadm

@Foritus

This comment has been minimized.

Copy link
Author

commented Mar 21, 2019

/sig scalability

@neolit123

This comment has been minimized.

Copy link
Member

commented Mar 23, 2019

the master constantly polls itself (observable via a -v=4 log from the kube-apiserver pod).

/sig api-machinery

@caesarxuchao

This comment has been minimized.

Copy link
Member

commented Mar 25, 2019

/assign @lavalamp

@lavalamp

This comment has been minimized.

Copy link
Member

commented Mar 25, 2019

Thanks for the example log. How big is the machine you're running this on? How much CPU was apiserver using? RAM?

apiserver / controllers don't use disk for things other than logs, so setting -v=4 may actually be the cause of that.

The traffic in the log file appears to include a number of controllers doing dynamic discovery. It's not clear to me if that's more often than expected or not. @yliaog might know.

@Foritus

This comment has been minimized.

Copy link
Author

commented Mar 25, 2019

The disk load appears to be entirely caused by etcd, presumably a second-order effect from the api server chatting to it a lot? Docker stats output: image

The master runs in a VM with 2 cores and 4GB of RAM. Kuberenetes is using about 700MB of that, the rest is free/disk cache. The host box is a 16 core Xeon with 64GB of ram. The underlying storage is a pair of NVME SSDs running ZFS in RAID 1.

@lavalamp

This comment has been minimized.

Copy link
Member

commented Mar 25, 2019

The only thing there that seems potentially unusual is the amount of I/O that etcd has done, but if this system has been running for a long enough time, then that's to be expected. E.g., node status updates and events cause churn, more or less by design.

The informers I mentioned are all hitting endpoints that don't talk to etcd (just reading discovery apis).

If you want to figure out what exactly is in your etcd, take a look at: https://github.com/jpbetz/auger

Kubernetes hasn't really optimized for the "scale down case". So, yeah, the control plane overhead looks bad as a percentage of the cluster if we're talking about < 5 node clusters. You could maybe scale it down to half that CPU / RAM if you just have 2 nodes, at that point you'd be at or near the minimum system to run the control plane.

@Foritus

This comment has been minimized.

Copy link
Author

commented Mar 26, 2019

Closing as "hungry by design" then :)

@Foritus Foritus closed this Mar 26, 2019

@lavalamp

This comment has been minimized.

Copy link
Member

commented Mar 26, 2019

At least for the moment. Thanks for understanding :)

(I'm definitely open to changes which reduce resource usage in small clusters, btw--it's just not been a priority yet.)

@wuestkamp

This comment has been minimized.

Copy link

commented Apr 5, 2019

There are quite a few of issues about high CPU when using Kubernetes for local development:

docker/for-mac#2601
docker/for-mac#3065

Could these have something todo with this ticket?

@ranman

This comment has been minimized.

Copy link

commented Jun 19, 2019

As a follow up to @wuestkamp 's comments - the high CPU usage and constant work when running no containers makes local development very difficult and destroys battery life on laptops.

@HarryWeppner

This comment has been minimized.

Copy link

commented Jun 19, 2019

This is the major complaint from our users as well and hindering K8s adoption in a significant way.

@planetf1

This comment has been minimized.

Copy link

commented Jul 10, 2019

++ for issue above. I'm fortunate enough to have easy access to a cloud environment, but for getting people onboarded with the idea of using k8s - or using k8s to showcase what I work in, it's a real inhibitor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.