Changing `cri.name=containerd` to `cri=nil` still leaves `containerd` running on the node as systemd service #4254

voelzmo · 2021-06-23T15:46:28Z

How to categorize this issue?
/kind bug
/priority 3

What happened:
When changing a worker pool from using containerd as a container runtime to docker, the new nodes still run containerd as systemd service

$ systemctl list-units | grep containerd
  containerd.service   loaded    active     running      containerd container runtime

The kubelet is correctly configured to not use containerd as external container runtime

$ ps -aux | grep kubelet
root        3962  4.7  1.5 1967744 116840 ?      SLsl 15:18   0:14 /opt/bin/kubelet --bootstrap-kubeconfig=/var/lib/kubelet/kubeconfig-bootstrap --config=/var/lib/kubelet/config/kubelet --cni-bin-dir=/opt/cni/bin/ --cni-conf-dir=/etc/cni/net.d/ --image-pull-progress-deadline=1m0s --kubeconfig=/var/lib/kubelet/kubeconfig-real --network-plugin=cni --pod-infra-container-image=<redacted> --v=2 --cloud-provider=external --enable-controller-attach-detach=true

The node is correctly reported to use the docker runtime

$ kubectl get nodes -o wide
NAME                                                      STATUS   ROLES    AGE     VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION        CONTAINER-RUNTIME
shoot--d058546--cri-migrate-worker-docke-z1-f8f48-c6bq2   Ready    <none>   5m24s   v1.20.6   10.250.0.8    <none>        Garden Linux 184.0   5.4.0-5-cloud-amd64   docker://19.3.13

What you expected to happen:
Only docker is started on the new nodes. There should be no signs of containerd

How to reproduce it (as minimally and precisely as possible):

Create a cluster with cri.name=containerd. This gets you a node using containerd as container runtime
Remove the cri and cri.name properties from your worker pool. This gets you a node using docker as container runtime, as this is the current default when cri==nil
Look at the output of kubectl get nodes -o wide to verify you're getting a new node with docker as container runtime while the old node with containerd container runtime is drained and deleted
ssh into the new node and look at the started systemd units

Anything else we need to know?:

This really is a new node

$ uptime
 15:28:09 up 10 min,  0 users,  load average: 0.34, 0.64, 0.52

The userData for the new node seems to incorrectly contain containerd (thanks @prashanth26)

kubectl get secret -n shoot--d058546--cri-migrate shoot--d058546--cri-migrate-worker-docke-z1-7c48e  -o json | jq -r .data.userData | base64 -d
#!/bin/bash
mkdir -p /etc/systemd/system/containerd.service.d
cat <<EOF > /etc/systemd/system/containerd.service.d/11-exec_config.conf
[Service]
ExecStart=
ExecStart=/usr/bin/containerd --config=/etc/containerd/config.toml
EOF
chmod 0644 /etc/systemd/system/containerd.service.d/11-exec_config.conf
[...]
systemctl daemon-reload
systemctl enable containerd && systemctl restart containerd
systemctl enable docker && systemctl restart docker
systemctl enable 'cloud-config-downloader.service' && systemctl restart 'cloud-config-downloader.service'

operatingsystemconfig-original is older than the machineset and has spec.criconfig.name=containerd
However, it does not contain containerd as a unit. status.units has

updatecacerts.service                                                                                                                                                                                                                                                             
docker-monitor.service                                                                                                                                                                                                                                                            
docker-logrotate.service                                                                                                                                                                                                                                                          
docker-logrotate.timer                                                                                                                                                                                                                                                            
systemd-sysctl.service                                                                                                                                                                                                                                                            
kubelet.service                                                                                                                                                                                                                                                                   
kubelet-monitor.service                                                                                                                                                                                                                                                           
gardener-user.service

@vpnachev points out: seems like the operatingsystemconfig is re-used for the new machineset du to the fact how we compute the name:

gardener/pkg/operation/botanist/component/extensions/operatingsystemconfig/operatingsystemconfig.go

Line 330 in 2e0ba2a

oscName := Key(worker.Name, o.values.KubernetesVersion) + purposeToKeySuffix(purpose)

. With Allow changing container runtime on existing workers #4171 allowing changing cri.name on existing worker pools, we also need to consider this property in the name computation.

Environment:

Gardener version: 1.25.0
Kubernetes version (use kubectl version): 1.21.0
Cloud provider or hardware configuration: GCP
Others:

The text was updated successfully, but these errors were encountered:

voelzmo · 2021-06-25T15:51:59Z

I think what we're seeing here is two things coming together

as @vpnachev pointed out, the oscName does not depend on the selected worker.cri and therefore the same operatingsystemconfig is used when the CRI for an existing worker pool is changed. My understanding is that this shouldn't be an issue in itself – it should be possible to reconcile this correctly

osc.spec.criconfig is only changed when worker.cri!=nil:

gardener/pkg/operation/botanist/component/extensions/operatingsystemconfig/operatingsystemconfig.go

Lines 579 to 583 in 278e9b4

    
           if d.worker.CRI != nil { 
        
           	d.osc.Spec.CRIConfig = &extensionsv1alpha1.CRIConfig{ 
        
           		Name: extensionsv1alpha1.CRIName(d.worker.CRI.Name), 
        
           	} 
        
           }

– which means that in combination with the above explained re-usage, the operatingsystem extension ends up writing containerd related things into the cloud-config even if you just switched from docker to containerd: https://github.com/gardener/gardener-extension-os-gardenlinux/blob/a5dd5e2c0c83aaed570fa52384719ba8b2ba85cd/pkg/generator/templates/cloud-init.gardenlinux.template#L15

My conclusion is that although this is fixed by changing how oscName is generated to include worker.cri, this doesn't seem to be the intended way to fix it. I'd rather go ahead and modify the logic setting osc.spec.criconfig to respect the fact that it might have a value currently that doesn't match with the desired state (i.e. delete this when worker.cri==nil).

WDYT?

voelzmo · 2021-06-28T08:28:35Z

/assign

voelzmo · 2021-06-30T09:05:05Z

As the operatingsystemconfig is shared between the old worker pool and the new worker pool, this leads to a reconfiguration of the old worker pool during the node rollout. I think this is also what @vpnachev observed in the first place, when looking at this. Here's the scenario for a single node in a worker pool configured to use containerd. Afterwards the worker pool is changed to cri==nil, meaning it defaults to use docker:

 kubectl get nodes -o wide
NAME                                                   STATUS   ROLES    AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION        CONTAINER-RUNTIME
shoot--dev--shoot-gcp-cri-cpu-worker1-z1-6445c-24c5s   Ready    <none>   44m   v1.20.6   10.242.0.2    <none>        Garden Linux 184.0   5.4.0-5-cloud-amd64   containerd://1.4.1

After re-configuring the pool, a new nodes gets created and the old one gets reconfigured to also use docker(!)

 kubectl get nodes -o wide
NAME                                                   STATUS                     ROLES    AGE     VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION        CONTAINER-RUNTIME
shoot--dev--shoot-gcp-cri-cpu-worker1-z1-6445c-24c5s   Ready,SchedulingDisabled   <none>   47m     v1.20.6   10.242.0.2    <none>        Garden Linux 184.0   5.4.0-5-cloud-amd64   docker://19.3.13
shoot--dev--shoot-gcp-cri-cpu-worker1-z1-74fbd-7vhht   Ready                      <none>   2m56s   v1.20.6   10.242.0.3    <none>        Garden Linux 184.0   5.4.0-5-cloud-amd64   docker://19.3.13

So in contrast to my initial assumptions above, the only solution seems to be to include the cri in the operatingsystemconfig name in order to unwanted side-effects on nodes in the old worker pool.

Note that I wasn't able to reproduce this the other way: when changing from docker to containerd, the old node still had this right before its deletion

shoot--dev--shoot-gcp-cri-cpu-worker1-z1-74fbd-7vhht   NotReady,SchedulingDisabled   <none>   25m     v1.20.6   10.242.0.3    <none>        Garden Linux 184.0   5.4.0-5-cloud-amd64   docker://19.3.13

I'm not sure why this is different?

voelzmo · 2021-07-19T15:56:14Z

Our change didn't quite work as expected: #4390

voelzmo · 2021-12-06T09:46:19Z

We agreed that we're not going to fix this one, but rather wait until everyone is on k8s >= 1.22 where this issue goes away automatically (as there's no way to configure nil anymore).

rfranzke · 2022-01-14T14:23:06Z

OK, thanks @voelzmo, then let's
/close
this issue for now given that there is nothing left to be done.

voelzmo added the kind/bug Bug label Jun 23, 2021

gardener-robot added the priority/3 Priority (lower number equals higher priority) label Jun 23, 2021

voelzmo mentioned this issue Jun 23, 2021

Deprecate usage of docker as container runtime in gardener #4110

Closed

31 tasks

gardener-robot assigned voelzmo Jun 28, 2021

voelzmo mentioned this issue Jun 29, 2021

Re-set osc.spec.criconfig on re-deploy #4274

Closed

voelzmo mentioned this issue Jun 30, 2021

Ensure new osc name for different cri configuration #4289

Merged

rfranzke closed this as completed in #4289 Jul 9, 2021

voelzmo reopened this Jul 19, 2021

voelzmo changed the title ~~Changing cri.name=containerd to cri=nil doesn't work as expected~~ Changing cri.name=containerd to cri=nil still leaves containerd running on the node as systemd service Jul 26, 2021

gardener-robot closed this as completed Jan 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changing `cri.name=containerd` to `cri=nil` still leaves `containerd` running on the node as systemd service #4254

Changing `cri.name=containerd` to `cri=nil` still leaves `containerd` running on the node as systemd service #4254

voelzmo commented Jun 23, 2021

voelzmo commented Jun 25, 2021 •

edited

Loading

voelzmo commented Jun 28, 2021

voelzmo commented Jun 30, 2021 •

edited

Loading

voelzmo commented Jul 19, 2021

voelzmo commented Dec 6, 2021

rfranzke commented Jan 14, 2022

Changing cri.name=containerd to cri=nil still leaves containerd running on the node as systemd service #4254

Changing cri.name=containerd to cri=nil still leaves containerd running on the node as systemd service #4254

Comments

voelzmo commented Jun 23, 2021

voelzmo commented Jun 25, 2021 • edited Loading

voelzmo commented Jun 28, 2021

voelzmo commented Jun 30, 2021 • edited Loading

voelzmo commented Jul 19, 2021

voelzmo commented Dec 6, 2021

rfranzke commented Jan 14, 2022

Changing `cri.name=containerd` to `cri=nil` still leaves `containerd` running on the node as systemd service #4254

Changing `cri.name=containerd` to `cri=nil` still leaves `containerd` running on the node as systemd service #4254

voelzmo commented Jun 25, 2021 •

edited

Loading

voelzmo commented Jun 30, 2021 •

edited

Loading