Deprecate usage of docker as container runtime in gardener #4110

voelzmo · 2021-05-26T12:07:56Z

How to categorize this issue?
/kind epic
/priority 3

What would you like to be added:

We need to switch to a CRI container runtime and remove the option the configure docker as a runtime from a yet-to-be-determined gardener version on (latest in April 2022).
We need to provide a migration path for existing shoots switching from docker to a new CRI container runtime

Why is this needed:
docker is the default container runtime in gardener. k8s deprecated the in-tree dockershim in December 2020 and is about to remove it with release 1.24 (~due in April 2022).

Decisions made

Assuming that all current dependencies on the docker daemon can be migrated to different approaches, we decided against using the out-of-tree dockershim maintained by Mirantis and docker. This allows to get rid of unnecessary dependencies on the workers and simplify the interaction between kubelet and container runtime.
We decided to use containerd for a couple of reasons:
- It is already installed in all supported operating systems, because it is part of docker
- It is already enabled as an option during shoot creation
- containerd seems to be much more mainstream than the alternatives according to surveys and other k8s distros (k3s, k0s, kind)
- As docker uses containerd underneath, we expect this to be the most compatible choice for workloads
Don't switch the default container runtime for new shoots or the container runtime for existing shoots without explicit user action (either k8s version update or selecting containerd explicitly)
Eventually, all MachineImageVersions will need to have docker explicitly specified in their supported container runtimes, otherwise things will stop working correctly. Our own users don't need to care about this, as we do it for them. ~~We need to make this clear for our open-source users that this is an operator action at some point, so their users can continue to use their gardener installations correctly.~~
- After discussing about this with @BeckerMax and @timuthy we agreed this should be a one-time migration that Gardener does for them: I've added a story below with the two tasks of migrating and validating the Cloud Profile

Next steps

Make usage of containerd production-ready
- Identify gaps and issues when configuring a shoot to use containerd instead of docker
Switch default container runtime to containerd instead of docker with k8s release 1.22 (planned for August 2021). Introduce the option to configure docker as container runtime explicitly, such that users experiencing issues don't have to downgrade their k8s cluster in order to revert the container runtime change.
Remove option to create shoots with docker container runtime with k8s release 1.23
Communicate change to users

Further work

Open Questions

When moving from an in-tree container runtime shim to an out-of-tree one, can we run into incompatibilities of CRI and containerd, as the two have different lifecycles? I.e. is there a compatibility restriction which k8s version runs with which containerd version? Watch out for this matrix before updating containerd
Are we using systemd as cgroup driver as recommended?
- As @vpnachev points out we are using cgroupfs.
- Should we decouple the switch from cgroupfs to systemd cgroup driver from changing the container runtime?
  - What are the benefits?
  - What are the drawbacks?

Tasks

The text was updated successfully, but these errors were encountered:

voelzmo · 2021-05-26T12:08:28Z

/assign
/assign @BeckerMax

rfranzke · 2021-05-26T12:52:06Z

I think we should only enforce using containerd for shoots >= Kubernetes v1.22. All shoots < v1.22 should be allowed to use either docker or containerd like today, hence,

announce when it will be removed and all existing clusters will be switched to containerd.

should not be necessary, or? Do you see a strong need to enforce containerd for all shoots, even those < v1.22?

vpnachev · 2021-05-26T14:03:35Z

Are we using systemd as cgroup driver as recommended?

We are using cgroupfs, which is still the default in k8s even though the recommended one is systemd. This change requires node roll out, as well as proper configuration of the CRI as kubelet require to run in the same cgroup.

announce when it will be removed and all existing clusters will be switched to containerd.

should not be necessary, or? Do you see a strong need to enforce containerd for all shoots, even those < v1.22?

Such change is not backward compatible and I am against changing anything in existing clusters. Defaulting to containerd for shoots runnnig on k8s >=1.22 sounds like better approach. Or when we are talking also about:

Switch default container runtime to containerd instead of docker

we should do it via new API version, e.g. v1beta2, but I doubt we will go this way.

vlerenc · 2021-05-27T17:01:55Z

Too many people have made themselves dependent on an implementation detail (dockerd underneath), so we cannot change it without users either selecting it as container runtime explicitly or with a new Kubernetes version, which is also explicit.

I would suggest the latter, i.e. piggy-back on a new Kubernetes version and the deprecation of the dockershim to ride on that wave and avoid (to some degree) the frustration when these users realise, we are taking away dockerd for good.

voelzmo · 2021-05-28T15:18:28Z

Thanks for the comments y'all! Most of the discussion seems to deal with if/how/when to switch the container runtime for new/existing clusters. Please help me understand your opinions in a bit more details, such that we can make the right decisions here

@vlerenc

so we cannot change it without users [making an explicit choice]

My understanding is that you're referring to changing the container runtime for already existing clusters here, is that correct? Would you see a similar requirement of an explicit choice for new shoots, even if they are created with a k8s version which technically still allows for using the in-tree dockershim? Our proposal separated switching the default container runtime for new shoots from dealing with already existing clusters.

@vpnachev
Thanks for clarifying the current state of the cgroup driver! My understanding of the documentation about cgroup drivers is that the best opportunity to change the cgroup driver would be when we would change the container runtime, is this correct? What's your opinion on this topic? Should we couple changing to containerd with a change of the cgroup driver? Should we do things differently?

we should do it via new API version, e.g. v1beta2, but I doubt we will go this way.

Just so I get this right: You're saying when we would switch the default container runtime, we would need to update the Seed's API version, because you're regarding this as an API change?

@rfranzke, @vpnachev, @vlerenc
From all of your comments I'm reading that you are concerned with switching the container runtime of existing clusters without users updating to a k8s version which actually requires this. So I guess this is a point we'll update the issue with. Thanks!

rfranzke · 2021-05-28T15:24:22Z

we should do it via new API version, e.g. v1beta2, but I doubt we will go this way.

Just so I get this right: You're saying when we would switch the default container runtime, we would need to update the Seed's API version, because you're regarding this as an API change?

IMO, we don't need a new API version. We change the defaulting in the API for Shoots using a .spec.kubernetes.version of at least v1.22 (if that's the version which doesn't support dockershim anymore) in a way that .spec.provider.workers[].cri.name=containerd (while containerd is also the sole allowed value).

vpnachev · 2021-05-28T19:32:46Z

API convention

@voelzmo if we try to follow strictly the well establish API convention in the k8s community https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md, we must also bump the apiVersion number. Why - because changing the default value from the current implicit docker-shim to the explicit containerd is considered as not backward-compatible change. Let's assume we change the default in Gardener v1.30.0, then creating shoot cluster with Gardener v1.30 and v1.29 with exactly the same shoot manifest will result in completely different clusters - one is using the docker-shim and the other - containerd. However, if we use different apiVersions, then the Gardener version does not matter as the shoot created with manifest with apiVersion=core.gardener.cloud/v1beta1 docker-shim will be used, and for exactly the same shoot but with apiVersion=core.gardener.cloud/v1beta2 it will be containerd.

For our purpose creating new apiVersion is overkill as

we have to maintain both versions for some time
any client need time to migrate to the new version
will duplicate most of the code of the current version with very little change

So, to provide more control to the users, we can use the fact the docker-shim is deprecated in particular k8s version and start denying shoot create/update requests for shoot clusters that are on this particular k8s version or higher, if they have not explicitly configured containerd to be the CRI for their cluster. Simply doing this is also not backward compatible change, because we are making optional field required, therefore the solution is to also default the CRI to containerd only for the shoots on this particular or higher k8s version.

Just one example for similar change in the shoot API that was bound to k8s version - the default value of shoot.spec.kubernetes.kubeAPIServer.enableBasicAuthentication is true for shoots with k8s< 1.16, and false for k8s>=1.16.

gardener/pkg/apis/core/v1beta1/defaults.go

Lines 137 to 142 in a12d355

    
           if obj.Spec.Kubernetes.KubeAPIServer.EnableBasicAuthentication == nil { 
        
           	if k8sVersionLessThan116 { 
        
           		obj.Spec.Kubernetes.KubeAPIServer.EnableBasicAuthentication = pointer.BoolPtr(true) 
        
           	} else { 
        
           		obj.Spec.Kubernetes.KubeAPIServer.EnableBasicAuthentication = pointer.BoolPtr(false) 
        
           	}

For shoots on k8s 1.19 or higher it is even not possible to enable the basic authentication.

gardener/pkg/apis/core/validation/shoot.go

Lines 582 to 584 in a12d355

    
           if geqKubernetes119 && helper.ShootWantsBasicAuthentication(kubeAPIServer) { 
        
           	allErrs = append(allErrs, field.Forbidden(fldPath.Child("kubeAPIServer", "enableBasicAuthentication"), "basic authentication has been removed in Kubernetes v1.19+")) 
        
           }

As Gardener already supports k8s 1.21, it can switch the default earliest for shoots on k8s 1.22.
If k8s really remove docker-shim in 1.22, ref we should forbid it also for this version.

When Docker runtime support is removed in a future release (currently planned for the 1.22 release in late 2021) of Kubernetes it will no longer be supported and you will need to switch to one of the other compliant container runtimes, like containerd or CRI-O.

Existing shoot clusters

Initially Gardener supported only docker-shim and it was not possible to use containerd, hence I guess most of the shoot cluster outside in the world are using docker, not containerd. Some shoot owners have seen the running docker daemon and as they have the full access to their nodes, they added hard-dependency of their applications to the docker daemon. Thus we cannot blindly migrate their already existing clusters to containerd as we will brake their application. The way forward here is to inform them about the deprecation and removal of docker and let them explicitly migrate to containerd when they are ready. Of course we cannot wait them forever, therefore they can keep using docker as long as their shoot cluster is on older k8s version than the particular one.

From all of your comments I'm reading that you are concerned with switching the container runtime of existing clusters without users updating to a k8s version which actually requires this. So I guess this is a point we'll update the issue with. Thanks!

Correct.

Cgroups

I think this is completely different topic and should not be handled in this issue.
We are aware about this recommendation from k8s community, see gardener/gardener-extension-os-ubuntu#1, and the initial source of this recommendation kubernetes/kubeadm#1394 (comment). A lot of k8s distributions started or already switched the default to systemd, however the default in k8s is still cgroupfs. Anyway, changing the cgroup driver for existing node is not safe operation

Changing the cgroup driver of a Node that has joined a cluster is a sensitive operation. If the kubelet has created Pods using the semantics of one cgroup driver, changing the container runtime to another cgroup driver can cause errors when trying to re-create the Pod sandbox for such existing Pods. Restarting the kubelet may not solve such errors.

This means that we will want to replace the nodes of the cluster when this change is triggered.
Currently, Gardener is setting the cgroup driver only on the kubelet

gardener/pkg/operation/botanist/component/extensions/operatingsystemconfig/original/components/kubelet/config.go

Line 53 in 87fb662

CgroupDriver: "cgroupfs",

and rely the CRI on the node is using also the cgroupfs. So, firstly Gardener have to take full control over the cgroup of the CRI and just after that it can start switching to systemd. To avoid unwanted disruption, this switch should be applied only to newly created worker pools.

vlerenc · 2021-05-29T06:49:50Z

Switch default container runtime to containerd instead of docker

@voelzmo We may not do that, unless we bind it to the Kubernetes version (we should amend this line in the ticket description's "next steps" with the Kubernetes version condition for clarity - the line after that about the removal as well).

Reason being: We have many stakeholders with automation in place, i.e. we cannot change the default behaviour for new clusters. The clusters the people will then get would differ, though the spec they sent would be the same. That is call for trouble. Hence the answer to your question is...

My understanding is that you're referring to changing the container runtime for already existing clusters here, is that correct?

Existing clusters may not be changed, but we also cannot change how new clusters are spun up. That's what I wanted to express with "it must be explicit". So:

If you want to offer both or rather all three container runtime options in parallel (dockerd (that under the hood uses containerd+runc as well, but will be omitted here), CRI+containerd+runc, CRI+containerd+gvisor), it must be explicit via containerRuntimes in all Kubernetes versions we currently offer, including the latest 1.21
Changing the default behaviour is best bundled with a new Kubernetes version as those must be validated prior by stakeholders and forces them to sent different specs (with the new Kubernetes version, which is a mandatory field). We cannot do that anymore for 1.21 as it is already offered, so the next possible version when we can change the default container runtime is 1.22. That's our chance. The next one (and last one) is 1.23, but then it would be rather sudden.

Changing the Gardener API version was not proposed in earnest/the thought was only entertained in theory, I believe - @vpnachev listed it merely as option to emphasise the difficulty in changing the default "mid-way", should we have to, but we don't have to/there is no reason to. We can easily bundle it with the next Kubernetes version that we support, so we should. We don't want to create new Gardener API versions as this is a massive undertaking, not common practice in Kubernetes (it is avoided there as well, with the exception of alpha->beta->GA, and annoying to our stakeholders, because they need to switch/adapt one more field and this specifically will make them nervous/trigger questions, what else has changed and generally create uncertainty. We have the luxury (as all in Kubernetes have), to piggy-back such changes on the Kubernetes minor version and so we should always make use of it. Every 4 month, Kubernetes (and therefore Gardener) can rejuvenate itself to some "fair" degree. Not the API specs, of course, but such things as potentially impactful implementation details that change, defaults (when fields are omitted like in our case), etc. are more easily changed then.

To me, the only remaining question was whether we want to offer dockerd and CRI+containerd+runc in parallel or not and if yes, for how long. I think you already made the decision that you want to offer it, so that we learn of issues early on. That seems reasonable. You also decided already to not use a fork/the Mirantis dockershim. That also sounds reasonable. Then I would think, the key questions are settled:

You would offer CRI+containerd+runc as explicit option via containerRuntimes whenever you are ready and for the Kubernetes versions you can easily support, that could be anything from all the versions, from 1.21, or from 1.22
The default will switch from dockerd to CRI+containerd+runc only with a Kubernetes minor version, which could be 1.22, if we have had enough head-time and saw no issues or 1.23 if we need more time
You would drop support for dockerd with 1.22 if everything runs really smoothly or from 1.23 at the latest when we have to, because then the dockershim will be gone. That will give users 8m time to adapt / learn how to live without dockerd, if they are late to the party (yes, I know, some will miss also this date and some will only notice that they cannot upgrade to 1.23 when 1.25 gets out and they will be caught on a no longer supported 1.22 until they free themselves of dockerd, but we cannot help them then anymore - without such cleansing steps every 4 months Kubernetes/Gardener would become unmanageable - in order to evolve healthily, you must be allowed to drop baggage now and then and Kubernetes has defined how that is done to the benefit of all in the long run)

voelzmo · 2021-06-01T08:13:11Z

Thanks for all the detailed comments, I really appreciate you all taking the time to provide more context on your thoughts!
To keep you in the loop, here's what we changed in the original issue, hoping that your input is well considered. Let me know if that didn't quite capture it!

Added concrete k8s versions to the steps of changing the default container runtime for new shoots (1.22) and removing the option to configure docker as container runtime (1.23). Added some reasoning why we want to separate these two.
Clarified the decision to not change the default container runtime without explicit user action
Added more information regarding the cgroup driver. Note that we didn't defer this to the 'future work' section immediately, as we want to better understand the consequences of switching container runtime and cgroup driver at the same time to be able to compare it with the alternative of pushing this change to the future

voelzmo · 2021-11-30T07:58:26Z

Deprecation of docker as container runtime is completed. We created a few tasks are about being able to produce OS images without docker runtime support and other things which might be helpful, but those will be tackled outside this epic.

voelzmo added the kind/enhancement Enhancement, improvement, extension label May 26, 2021

gardener-robot added kind/epic Large multi-story topic priority/3 Priority (lower number equals higher priority) labels May 26, 2021

gardener-robot assigned voelzmo and BeckerMax May 26, 2021

voelzmo mentioned this issue May 26, 2021

Refactor Node Bootstrap for worker nodes with CRI #2719

Closed

gardener deleted a comment from gardener-robot May 26, 2021

voelzmo mentioned this issue Jun 7, 2021

Allow changing container runtime on existing workers #4171

Merged

rfranzke added the roadmap/cloud Roadmap for the (managed) cloud delivery, i.e. gardener.cloud.sap label Jun 11, 2021

rfranzke added this to the 2021-Q3 milestone Jun 11, 2021

This was referenced Jun 16, 2021

Allow selecting docker as cri.name #4218

Merged

Set containerd as default container runtime for k8s>=1.22 #4222

Merged

Enh/stable worker hash #4237

Merged

This was referenced Jun 22, 2021

Implement support for containerd as CRI gardener/gardener-extension-os-coreos#26

Closed

Implement support for containerd as CRI gardener/gardener-extension-os-suse-chost#42

Closed

voelzmo mentioned this issue Jul 5, 2021

Document supported options for cri.name #4314

Merged

rfranzke mentioned this issue Jul 15, 2021

☂️-Issue for "Support for Kubernetes v1.22” #4363

Closed

vpnachev mentioned this issue Jul 23, 2021

Configure separate disk for /var/lib/kubelet for worker nodes #4424

Closed

voelzmo mentioned this issue Jul 29, 2021

Add stakeholder communication regarding dockershim removal #4452

Merged

ashwani2k mentioned this issue Aug 4, 2021

Addressing drawbacks of existing VPA recommender gardener/autoscaler#93

Closed

10 tasks

voelzmo mentioned this issue Aug 18, 2021

Prevent Shoots with k8s>=1.23 to have cri.name: docker #4529

Merged

This was referenced Sep 20, 2021

Remove docker-specific kubelet flags when container runtime is set to containerd #4693

Merged

As on operator, I want crictl to be installed and preconfigured, such that I can use it for debugging the container runtime #4729

Closed

voelzmo closed this as completed Nov 30, 2021

timebertt mentioned this issue Jul 21, 2022

Cleanup CloudProfile docker migration #6375

Closed

timebertt mentioned this issue Aug 15, 2022

Cleanup CloudProfile docker migration and Add Validation MachineImageVersion CRI #6480

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deprecate usage of docker as container runtime in gardener #4110

Deprecate usage of docker as container runtime in gardener #4110

voelzmo commented May 26, 2021 •

edited

voelzmo commented May 26, 2021

rfranzke commented May 26, 2021

vpnachev commented May 26, 2021

vlerenc commented May 27, 2021

voelzmo commented May 28, 2021 •

edited

rfranzke commented May 28, 2021

vpnachev commented May 28, 2021 •

edited

vlerenc commented May 29, 2021

voelzmo commented Jun 1, 2021

voelzmo commented Nov 30, 2021

Deprecate usage of docker as container runtime in gardener #4110

Deprecate usage of docker as container runtime in gardener #4110

Comments

voelzmo commented May 26, 2021 • edited

voelzmo commented May 26, 2021

rfranzke commented May 26, 2021

vpnachev commented May 26, 2021

vlerenc commented May 27, 2021

voelzmo commented May 28, 2021 • edited

rfranzke commented May 28, 2021

vpnachev commented May 28, 2021 • edited

API convention

Existing shoot clusters

Cgroups

vlerenc commented May 29, 2021

voelzmo commented Jun 1, 2021

voelzmo commented Nov 30, 2021

voelzmo commented May 26, 2021 •

edited

voelzmo commented May 28, 2021 •

edited

vpnachev commented May 28, 2021 •

edited