Skip to content

Constant kube-system pod restard due to SandboxChanged under ubuntu Jammy #110177

Closed
@Congelli501

Description

@Congelli501

What happened?

When installing a master node under Ubuntu jammy, the base kube-system pods (etcd, kube-apiserver, kube-proxy...) restarts every few minutes.
The etcd pod restarts because of a SandboxChanged

  Normal   SandboxChanged  57s                    kubelet  Pod sandbox changed, it will be killed and re-created.

The other pods restarts because of the SandboxChanged event and fail because of unreachable etcd / api server (as a consequence)

I haven't found the origin of the SandboxChanged event

What did you expect to happen?

The kube-system pods shouldn't restart constently because of the SandboxChanged under ubuntu Jammy

How can we reproduce it (as minimally and precisely as possible)?

  • Setup an ubuntu Jammy VM with 2 network interfaces
  • Setup the kuberntes repository
  • Install containerd kubeadm & kubelet
  • modprobe br_netfilter, enable routing & iptable for bridge
  • run kubeadm init
apt-get --yes install containerd kubelet kubectl kubeadm
modprobe br_netfilter
sysctl -w net.ipv4.ip_forward=1
sysctl -w net.bridge.bridge-nf-call-ip6tables=1
sysctl -w net.bridge.bridge-nf-call-iptables=1
kubeadm config images pull
kubeadm init --cri-socket unix:///run/containerd/containerd.sock --apiserver-advertise-address 10.88.3.60 --pod-network-cidr=10.88.3.60/16

Result:

``` root@master0:~# crictl -r unix:///run/containerd/containerd.sock ps -a CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID 0426a01c0d933 aebe758cef4cd 17 seconds ago Running etcd 12 3081199357710 a46307ecd7db9 529072250ccc6 39 seconds ago Exited kube-apiserver 11 bd9d95f33a01c 75facd3e2c943 77b49675beae1 47 seconds ago Running kube-proxy 13 87642cfed70e1 508f16a359c66 e3ed7dee73e93 About a minute ago Exited kube-scheduler 15 80d6114fb09b4 fefd165b18546 88784fb4ac2f6 3 minutes ago Running kube-controller-manager 12 ebaca3ae71ac8 679b2f042eab1 aebe758cef4cd 6 minutes ago Exited etcd 11 da070257e5ea7 4e6452b46dce2 77b49675beae1 7 minutes ago Exited kube-proxy 12 417f51b4372b9 6dde3dee7c82d 88784fb4ac2f6 8 minutes ago Exited kube-controller-manager 11 ca85f97a018ed ```
root@master0:~# kubectl describe pod -n kube-system etcd-master0
Name:                 etcd-master0
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 master0/10.88.3.60
Start Time:           Mon, 23 May 2022 18:36:48 +0000
Labels:               component=etcd
                      tier=control-plane
Annotations:          kubeadm.kubernetes.io/etcd.advertise-client-urls: https://10.88.3.60:2379
                      kubernetes.io/config.hash: 3da0b3b3f5b168e56c71dd2c6212a28e
                      kubernetes.io/config.mirror: 3da0b3b3f5b168e56c71dd2c6212a28e
                      kubernetes.io/config.seen: 2022-05-23T18:36:48.138775775Z
                      kubernetes.io/config.source: file
                      seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status:               Running
IP:                   10.88.3.60
IPs:
  IP:           10.88.3.60
Controlled By:  Node/master0
Containers:
  etcd:
    Container ID:  containerd://166072295fa6a53a77d887a713d7e711dadc71976f25278b671e382cf5ae82e8
    Image:         k8s.gcr.io/etcd:3.5.3-0
    Image ID:      k8s.gcr.io/etcd@sha256:13f53ed1d91e2e11aac476ee9a0269fdda6cc4874eba903efd40daf50c55eee5
    Port:          <none>
    Host Port:     <none>
    Command:
      etcd
      --advertise-client-urls=https://10.88.3.60:2379
      --cert-file=/etc/kubernetes/pki/etcd/server.crt
      --client-cert-auth=true
      --data-dir=/var/lib/etcd
      --experimental-initial-corrupt-check=true
      --initial-advertise-peer-urls=https://10.88.3.60:2380
      --initial-cluster=master0=https://10.88.3.60:2380
      --key-file=/etc/kubernetes/pki/etcd/server.key
      --listen-client-urls=https://127.0.0.1:2379,https://10.88.3.60:2379
      --listen-metrics-urls=http://127.0.0.1:2381
      --listen-peer-urls=https://10.88.3.60:2380
      --name=master0
      --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
      --peer-client-cert-auth=true
      --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
      --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
      --snapshot-count=10000
      --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    State:          Running
      Started:      Mon, 23 May 2022 18:42:12 +0000
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 23 May 2022 18:40:05 +0000
      Finished:     Mon, 23 May 2022 18:41:25 +0000
    Ready:          True
    Restart Count:  4
    Requests:
      cpu:        100m
      memory:     100Mi
    Liveness:     http-get http://127.0.0.1:2381/health delay=10s timeout=15s period=10s #success=1 #failure=8
    Startup:      http-get http://127.0.0.1:2381/health delay=10s timeout=15s period=10s #success=1 #failure=24
    Environment:  <none>
    Mounts:
      /etc/kubernetes/pki/etcd from etcd-certs (rw)
      /var/lib/etcd from etcd-data (rw)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  etcd-certs:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes/pki/etcd
    HostPathType:  DirectoryOrCreate
  etcd-data:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/etcd
    HostPathType:  DirectoryOrCreate
QoS Class:         Burstable
Node-Selectors:    <none>
Tolerations:       :NoExecute op=Exists
Events:
  Type     Reason          Age                    From     Message
  ----     ------          ----                   ----     -------
  Warning  Unhealthy       7m30s (x4 over 8m)     kubelet  Liveness probe failed: Get "http://127.0.0.1:2381/health": dial tcp 127.0.0.1:2381: connect: connection refused
  Warning  Unhealthy       6m25s (x3 over 9m25s)  kubelet  Startup probe failed: Get "http://127.0.0.1:2381/health": dial tcp 127.0.0.1:2381: connect: connection refused
  Normal   SandboxChanged  6m18s (x3 over 9m15s)  kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal   Pulled          6m18s (x3 over 9m15s)  kubelet  Container image "k8s.gcr.io/etcd:3.5.3-0" already present on machine
  Normal   Created         6m18s (x3 over 9m15s)  kubelet  Created container etcd
  Normal   Started         6m18s (x3 over 9m15s)  kubelet  Started container etcd
  Normal   Killing         4m58s (x4 over 9m35s)  kubelet  Stopping container etcd
  Warning  BackOff         4m57s (x2 over 4m58s)  kubelet  Back-off restarting failed container
  Normal   Killing         33s                    kubelet  Stopping container etcd

Anything else we need to know?

The same install works without any problem under ubuntu focal (20.04)
The install is 100% automated (the only change is the ubuntu version)

Kubernetes version

$ kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.0", GitCommit:"4ce5a8954017644c5420bae81d72b09b735c21f0", GitTreeState:"clean", BuildDate:"2022-05-03T13:46:05Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.0", GitCommit:"4ce5a8954017644c5420bae81d72b09b735c21f0", GitTreeState:"clean", BuildDate:"2022-05-03T13:38:19Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

None

OS version

# On Linux:
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
$ uname -a
Linux master0 5.15.0-33-generic #34-Ubuntu SMP Wed May 18 13:34:26 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

</details>


### Install tools

<details>
kubeadm
</details>


### Container runtime (CRI) and version (if applicable)

<details>
containerd

root@master0:~# containerd --version
containerd github.com/containerd/containerd 1.5.9-0ubuntu3

</details>


### Related plugins (CNI, CSI, ...) and versions (if applicable)

<details>
None (weave should have been installed but the cluster is not stable enough to run the kubectl apply during the installation process)
</details>

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.kind/supportCategorizes issue or PR as a support question.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.sig/cluster-lifecycleCategorizes an issue or PR as relevant to SIG Cluster Lifecycle.sig/networkCategorizes an issue or PR as relevant to SIG Network.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions