Closed
Description
What happened?
When installing a master node under Ubuntu jammy, the base kube-system pods (etcd, kube-apiserver, kube-proxy...) restarts every few minutes.
The etcd pod restarts because of a SandboxChanged
Normal SandboxChanged 57s kubelet Pod sandbox changed, it will be killed and re-created.
The other pods restarts because of the SandboxChanged
event and fail because of unreachable etcd / api server (as a consequence)
I haven't found the origin of the SandboxChanged
event
What did you expect to happen?
The kube-system pods shouldn't restart constently because of the SandboxChanged
under ubuntu Jammy
How can we reproduce it (as minimally and precisely as possible)?
- Setup an ubuntu Jammy VM with 2 network interfaces
- Setup the kuberntes repository
- Install containerd kubeadm & kubelet
- modprobe br_netfilter, enable routing & iptable for bridge
- run kubeadm init
apt-get --yes install containerd kubelet kubectl kubeadm
modprobe br_netfilter
sysctl -w net.ipv4.ip_forward=1
sysctl -w net.bridge.bridge-nf-call-ip6tables=1
sysctl -w net.bridge.bridge-nf-call-iptables=1
kubeadm config images pull
kubeadm init --cri-socket unix:///run/containerd/containerd.sock --apiserver-advertise-address 10.88.3.60 --pod-network-cidr=10.88.3.60/16
Result:
```
root@master0:~# crictl -r unix:///run/containerd/containerd.sock ps -a
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
0426a01c0d933 aebe758cef4cd 17 seconds ago Running etcd 12 3081199357710
a46307ecd7db9 529072250ccc6 39 seconds ago Exited kube-apiserver 11 bd9d95f33a01c
75facd3e2c943 77b49675beae1 47 seconds ago Running kube-proxy 13 87642cfed70e1
508f16a359c66 e3ed7dee73e93 About a minute ago Exited kube-scheduler 15 80d6114fb09b4
fefd165b18546 88784fb4ac2f6 3 minutes ago Running kube-controller-manager 12 ebaca3ae71ac8
679b2f042eab1 aebe758cef4cd 6 minutes ago Exited etcd 11 da070257e5ea7
4e6452b46dce2 77b49675beae1 7 minutes ago Exited kube-proxy 12 417f51b4372b9
6dde3dee7c82d 88784fb4ac2f6 8 minutes ago Exited kube-controller-manager 11 ca85f97a018ed
```
root@master0:~# kubectl describe pod -n kube-system etcd-master0
Name: etcd-master0
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: master0/10.88.3.60
Start Time: Mon, 23 May 2022 18:36:48 +0000
Labels: component=etcd
tier=control-plane
Annotations: kubeadm.kubernetes.io/etcd.advertise-client-urls: https://10.88.3.60:2379
kubernetes.io/config.hash: 3da0b3b3f5b168e56c71dd2c6212a28e
kubernetes.io/config.mirror: 3da0b3b3f5b168e56c71dd2c6212a28e
kubernetes.io/config.seen: 2022-05-23T18:36:48.138775775Z
kubernetes.io/config.source: file
seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status: Running
IP: 10.88.3.60
IPs:
IP: 10.88.3.60
Controlled By: Node/master0
Containers:
etcd:
Container ID: containerd://166072295fa6a53a77d887a713d7e711dadc71976f25278b671e382cf5ae82e8
Image: k8s.gcr.io/etcd:3.5.3-0
Image ID: k8s.gcr.io/etcd@sha256:13f53ed1d91e2e11aac476ee9a0269fdda6cc4874eba903efd40daf50c55eee5
Port: <none>
Host Port: <none>
Command:
etcd
--advertise-client-urls=https://10.88.3.60:2379
--cert-file=/etc/kubernetes/pki/etcd/server.crt
--client-cert-auth=true
--data-dir=/var/lib/etcd
--experimental-initial-corrupt-check=true
--initial-advertise-peer-urls=https://10.88.3.60:2380
--initial-cluster=master0=https://10.88.3.60:2380
--key-file=/etc/kubernetes/pki/etcd/server.key
--listen-client-urls=https://127.0.0.1:2379,https://10.88.3.60:2379
--listen-metrics-urls=http://127.0.0.1:2381
--listen-peer-urls=https://10.88.3.60:2380
--name=master0
--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
--peer-client-cert-auth=true
--peer-key-file=/etc/kubernetes/pki/etcd/peer.key
--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
--snapshot-count=10000
--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
State: Running
Started: Mon, 23 May 2022 18:42:12 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 23 May 2022 18:40:05 +0000
Finished: Mon, 23 May 2022 18:41:25 +0000
Ready: True
Restart Count: 4
Requests:
cpu: 100m
memory: 100Mi
Liveness: http-get http://127.0.0.1:2381/health delay=10s timeout=15s period=10s #success=1 #failure=8
Startup: http-get http://127.0.0.1:2381/health delay=10s timeout=15s period=10s #success=1 #failure=24
Environment: <none>
Mounts:
/etc/kubernetes/pki/etcd from etcd-certs (rw)
/var/lib/etcd from etcd-data (rw)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
etcd-certs:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes/pki/etcd
HostPathType: DirectoryOrCreate
etcd-data:
Type: HostPath (bare host directory volume)
Path: /var/lib/etcd
HostPathType: DirectoryOrCreate
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: :NoExecute op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 7m30s (x4 over 8m) kubelet Liveness probe failed: Get "http://127.0.0.1:2381/health": dial tcp 127.0.0.1:2381: connect: connection refused
Warning Unhealthy 6m25s (x3 over 9m25s) kubelet Startup probe failed: Get "http://127.0.0.1:2381/health": dial tcp 127.0.0.1:2381: connect: connection refused
Normal SandboxChanged 6m18s (x3 over 9m15s) kubelet Pod sandbox changed, it will be killed and re-created.
Normal Pulled 6m18s (x3 over 9m15s) kubelet Container image "k8s.gcr.io/etcd:3.5.3-0" already present on machine
Normal Created 6m18s (x3 over 9m15s) kubelet Created container etcd
Normal Started 6m18s (x3 over 9m15s) kubelet Started container etcd
Normal Killing 4m58s (x4 over 9m35s) kubelet Stopping container etcd
Warning BackOff 4m57s (x2 over 4m58s) kubelet Back-off restarting failed container
Normal Killing 33s kubelet Stopping container etcd
Anything else we need to know?
The same install works without any problem under ubuntu focal (20.04)
The install is 100% automated (the only change is the ubuntu version)
Kubernetes version
$ kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.0", GitCommit:"4ce5a8954017644c5420bae81d72b09b735c21f0", GitTreeState:"clean", BuildDate:"2022-05-03T13:46:05Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.0", GitCommit:"4ce5a8954017644c5420bae81d72b09b735c21f0", GitTreeState:"clean", BuildDate:"2022-05-03T13:38:19Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider
None
OS version
# On Linux:
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
$ uname -a
Linux master0 5.15.0-33-generic #34-Ubuntu SMP Wed May 18 13:34:26 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
</details>
### Install tools
<details>
kubeadm
</details>
### Container runtime (CRI) and version (if applicable)
<details>
containerd
root@master0:~# containerd --version
containerd github.com/containerd/containerd 1.5.9-0ubuntu3
</details>
### Related plugins (CNI, CSI, ...) and versions (if applicable)
<details>
None (weave should have been installed but the cluster is not stable enough to run the kubectl apply during the installation process)
</details>