Skip to content

Race condition in Kubelet service #79779

Closed
@PauloASilva

Description

@PauloASilva

@kubernetes/sig-api-machinery-bugs

What happened:

Race condition in Kubelet causes Kubernetes cluster to become unstable due to high CPU load when two nodes share the same hostname or --hostname-override value.

What you expected to happen:

Since node hostname or --hostname-override value is used to compute the etcd key name (/registry/minions/[NODE-HOSTNAME]), the system should either:

  1. prevent nodes with a duplicate hostname or --hostname-override value to join the cluster,
  2. add a prefix/suffix to the etcd key name (e.g. /registry/minions/[NODE-HOSTNAME]-[SUFFIX])

How to reproduce it (as minimally and precisely as possible):

  1. Spawn 3 new servers

  2. Setup Kubernetes on all 3 servers, according to the following table:

    Server ID Hostname Cluster Role
    0 k8s-master Master
    1 k8s-node-1 Worker
    2 k8s-node-2 Worker
  3. Start all 3 servers and initialize the cluster

  4. Access the server w/ ID 1 and change its hostname to k8s-master

  5. Run the kubeadm join with appropriate flags and tokens.

    You should see on stdout the following messages

    This node has joined the cluster:
    * Certificate signing request was sent to apiserver and a response was received.
    * The Kubelet was informed of the new secure connection details.
    
    Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
    
  6. Access the server w/ ID 0 and run $ kubectl get nodes. The output should have a single entry.

  7. Access the server w/ ID 2 and change its hostname to k8s-master

  8. Repeat steps 5 and 6

  9. Keep monitoring load average on server w/ ID 0.

Anything else we need to know?:

Running kubeadm join on nodes sharing the same hostname exits with a success exit code but kubectl get nodes on the control-pane fails to list them (only one is reported). This is also valid while changing the hostname of a node that already belongs to the cluster (e.g. sudo hostnamectl set-hostname DUPLICATE_HOSTNAME).

In such scenario, we can observe a severe CPU load increase on k8s master node, caused by concurrent updates to the etcd key /registry/minions/[NODE-HOSTNAME], which, in turn, generate several etcd events to be handled by k8s components.

Environment:

  • Kubernetes version (use kubectl version):

    Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:44:30Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:36:19Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
    
  • Cloud provider or hardware configuration:

    # lscpu
    Architecture:          x86_64
    CPU op-mode(s):        32-bit, 64-bit
    Byte Order:            Little Endian
    CPU(s):                2
    On-line CPU(s) list:   0,1
    Thread(s) per core:    1
    Core(s) per socket:    1
    Socket(s):             2
    NUMA node(s):          1
    Vendor ID:             GenuineIntel
    CPU family:            6
    Model:                 6
    Model name:            QEMU Virtual CPU version 2.5+
    Stepping:              3
    CPU MHz:               2394.454
    BogoMIPS:              4788.90
    Hypervisor vendor:     KVM
    Virtualization type:   full
    L1d cache:             32K
    L1i cache:             32K
    L2 cache:              4096K
    L3 cache:              16384K
    NUMA node0 CPU(s):     0,1
    Flags:                 fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm rep_good nopl xtopology eagerfpu pni cx16 x2apic hypervisor lahf_lm
    
    # free -h
                  total        used        free      shared  buff/cache   available
    Mem:           990M        552M         68M         14M        369M        255M
    Swap:            0B          0B          0B
    
    # lshw -class network
    *-network
         description: Ethernet controller
         product: Virtio network device
         vendor: Red Hat, Inc.
         physical id: 3
         bus info: pci@0000:00:03.0
         version: 00
         width: 64 bits
         clock: 33MHz
         capabilities: msix bus_master cap_list rom
         configuration: driver=virtio-pci latency=0
         resources: irq:10 ioport:c0a0(size=32) memory:febd1000-febd1fff memory:fe000000-fe003fff memory:feb80000-febbffff
       *-virtio0
            description: Ethernet interface
            physical id: 0
            bus info: virtio@0
            logical name: eth0
            serial: 9e:a3:f5:c1:6e:36
            capabilities: ethernet physical
            configuration: broadcast=yes driver=virtio_net driverversion=1.0.0 ip=192.168.122.81 link=yes multicast=yes
    
  • OS (e.g: cat /etc/os-release):

    NAME="CentOS Linux"
    VERSION="7 (Core)"
    ID="centos"
    ID_LIKE="rhel fedora"
    VERSION_ID="7"
    PRETTY_NAME="CentOS Linux 7 (Core)"
    ANSI_COLOR="0;31"
    CPE_NAME="cpe:/o:centos:centos:7"
    HOME_URL="https://www.centos.org/"
    BUG_REPORT_URL="https://bugs.centos.org/"
    CENTOS_MANTISBT_PROJECT="CentOS-7"
    CENTOS_MANTISBT_PROJECT_VERSION="7"
    REDHAT_SUPPORT_PRODUCT="centos"
    REDHAT_SUPPORT_PRODUCT_VERSION="7"
    
  • Kernel (e.g. uname -a):

    Linux k8s-master 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
    
  • Install tools:

    cat <<EOF > /etc/yum.repos.d/kubernetes.repo
    [kubernetes]
    name=Kubernetes
    baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
    enabled=1
    gpgcheck=1
    repo_gpgcheck=1
    gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
           https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
    EOF
    
    yum install kubeadm --nogpgcheck -y && \
      systemctl restart kubelet && systemctl enable kubelet
    

Cheers,
Paulo A. Silva

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.sig/nodeCategorizes an issue or PR as relevant to SIG Node.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions