Description
@kubernetes/sig-api-machinery-bugs
What happened:
Race condition in Kubelet causes Kubernetes cluster to become unstable due to high CPU load when two nodes share the same hostname or --hostname-override
value.
What you expected to happen:
Since node hostname or --hostname-override
value is used to compute the etcd key name (/registry/minions/[NODE-HOSTNAME]
), the system should either:
- prevent nodes with a duplicate hostname or
--hostname-override
value to join the cluster, - add a prefix/suffix to the etcd key name (e.g.
/registry/minions/[NODE-HOSTNAME]-[SUFFIX]
)
How to reproduce it (as minimally and precisely as possible):
-
Spawn 3 new servers
-
Setup Kubernetes on all 3 servers, according to the following table:
Server ID Hostname Cluster Role 0 k8s-master Master 1 k8s-node-1 Worker 2 k8s-node-2 Worker -
Start all 3 servers and initialize the cluster
-
Access the server w/ ID 1 and change its hostname to
k8s-master
-
Run the
kubeadm join
with appropriate flags and tokens.You should see on
stdout
the following messagesThis node has joined the cluster: * Certificate signing request was sent to apiserver and a response was received. * The Kubelet was informed of the new secure connection details. Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
-
Access the server w/ ID 0 and run
$ kubectl get nodes
. The output should have a single entry. -
Access the server w/ ID 2 and change its hostname to
k8s-master
-
Repeat steps 5 and 6
-
Keep monitoring load average on server w/ ID 0.
Anything else we need to know?:
Running kubeadm join
on nodes sharing the same hostname exits with a success exit code but kubectl get nodes
on the control-pane fails to list them (only one is reported). This is also valid while changing the hostname of a node that already belongs to the cluster (e.g. sudo hostnamectl set-hostname DUPLICATE_HOSTNAME
).
In such scenario, we can observe a severe CPU load increase on k8s master node, caused by concurrent updates to the etcd key /registry/minions/[NODE-HOSTNAME]
, which, in turn, generate several etcd events to be handled by k8s components.
Environment:
-
Kubernetes version (use
kubectl version
):Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:44:30Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:36:19Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
-
Cloud provider or hardware configuration:
# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 2 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 6 Model name: QEMU Virtual CPU version 2.5+ Stepping: 3 CPU MHz: 2394.454 BogoMIPS: 4788.90 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 4096K L3 cache: 16384K NUMA node0 CPU(s): 0,1 Flags: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm rep_good nopl xtopology eagerfpu pni cx16 x2apic hypervisor lahf_lm
# free -h total used free shared buff/cache available Mem: 990M 552M 68M 14M 369M 255M Swap: 0B 0B 0B
# lshw -class network *-network description: Ethernet controller product: Virtio network device vendor: Red Hat, Inc. physical id: 3 bus info: pci@0000:00:03.0 version: 00 width: 64 bits clock: 33MHz capabilities: msix bus_master cap_list rom configuration: driver=virtio-pci latency=0 resources: irq:10 ioport:c0a0(size=32) memory:febd1000-febd1fff memory:fe000000-fe003fff memory:feb80000-febbffff *-virtio0 description: Ethernet interface physical id: 0 bus info: virtio@0 logical name: eth0 serial: 9e:a3:f5:c1:6e:36 capabilities: ethernet physical configuration: broadcast=yes driver=virtio_net driverversion=1.0.0 ip=192.168.122.81 link=yes multicast=yes
-
OS (e.g:
cat /etc/os-release
):NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/" CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"
-
Kernel (e.g.
uname -a
):Linux k8s-master 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
-
Install tools:
cat <<EOF > /etc/yum.repos.d/kubernetes.repo [kubernetes] name=Kubernetes baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64 enabled=1 gpgcheck=1 repo_gpgcheck=1 gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg EOF yum install kubeadm --nogpgcheck -y && \ systemctl restart kubelet && systemctl enable kubelet
Cheers,
Paulo A. Silva