Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate usage of docker as container runtime in gardener #4110

Closed
28 of 31 tasks
voelzmo opened this issue May 26, 2021 · 10 comments
Closed
28 of 31 tasks

Deprecate usage of docker as container runtime in gardener #4110

voelzmo opened this issue May 26, 2021 · 10 comments
Assignees
Labels
kind/enhancement Enhancement, improvement, extension kind/epic Large multi-story topic priority/3 Priority (lower number equals higher priority) roadmap/cloud Roadmap for the (managed) cloud delivery, i.e. gardener.cloud.sap
Milestone

Comments

@voelzmo
Copy link
Member

voelzmo commented May 26, 2021

How to categorize this issue?
/kind epic
/priority 3

What would you like to be added:

  • We need to switch to a CRI container runtime and remove the option the configure docker as a runtime from a yet-to-be-determined gardener version on (latest in April 2022).
  • We need to provide a migration path for existing shoots switching from docker to a new CRI container runtime

Why is this needed:
docker is the default container runtime in gardener. k8s deprecated the in-tree dockershim in December 2020 and is about to remove it with release 1.24 (~due in April 2022).

Decisions made

  • Assuming that all current dependencies on the docker daemon can be migrated to different approaches, we decided against using the out-of-tree dockershim maintained by Mirantis and docker. This allows to get rid of unnecessary dependencies on the workers and simplify the interaction between kubelet and container runtime.
  • We decided to use containerd for a couple of reasons:
    • It is already installed in all supported operating systems, because it is part of docker
    • It is already enabled as an option during shoot creation
    • containerd seems to be much more mainstream than the alternatives according to surveys and other k8s distros (k3s, k0s, kind)
    • As docker uses containerd underneath, we expect this to be the most compatible choice for workloads
  • Don't switch the default container runtime for new shoots or the container runtime for existing shoots without explicit user action (either k8s version update or selecting containerd explicitly)
  • Eventually, all MachineImageVersions will need to have docker explicitly specified in their supported container runtimes, otherwise things will stop working correctly. Our own users don't need to care about this, as we do it for them. We need to make this clear for our open-source users that this is an operator action at some point, so their users can continue to use their gardener installations correctly.
    • After discussing about this with @BeckerMax and @timuthy we agreed this should be a one-time migration that Gardener does for them: I've added a story below with the two tasks of migrating and validating the Cloud Profile

Next steps

  • Make usage of containerd production-ready
    • Identify gaps and issues when configuring a shoot to use containerd instead of docker
  • Switch default container runtime to containerd instead of docker with k8s release 1.22 (planned for August 2021). Introduce the option to configure docker as container runtime explicitly, such that users experiencing issues don't have to downgrade their k8s cluster in order to revert the container runtime change.
  • Remove option to create shoots with docker container runtime with k8s release 1.23
  • Communicate change to users

Further work

Open Questions

  • When moving from an in-tree container runtime shim to an out-of-tree one, can we run into incompatibilities of CRI and containerd, as the two have different lifecycles? I.e. is there a compatibility restriction which k8s version runs with which containerd version? Watch out for this matrix before updating containerd
  • Are we using systemd as cgroup driver as recommended?
    • As @vpnachev points out we are using cgroupfs.
    • Should we decouple the switch from cgroupfs to systemd cgroup driver from changing the container runtime?
      • What are the benefits?
      • What are the drawbacks?

Tasks

@voelzmo voelzmo added the kind/enhancement Enhancement, improvement, extension label May 26, 2021
@gardener-robot gardener-robot added kind/epic Large multi-story topic priority/3 Priority (lower number equals higher priority) labels May 26, 2021
@voelzmo
Copy link
Member Author

voelzmo commented May 26, 2021

/assign
/assign @BeckerMax

@rfranzke
Copy link
Member

I think we should only enforce using containerd for shoots >= Kubernetes v1.22. All shoots < v1.22 should be allowed to use either docker or containerd like today, hence,

announce when it will be removed and all existing clusters will be switched to containerd.

should not be necessary, or? Do you see a strong need to enforce containerd for all shoots, even those < v1.22?

@gardener gardener deleted a comment from gardener-robot May 26, 2021
@gardener gardener deleted a comment from gardener-robot May 26, 2021
@gardener gardener deleted a comment from gardener-robot May 26, 2021
@gardener gardener deleted a comment from gardener-robot May 26, 2021
@vpnachev
Copy link
Member

Are we using systemd as cgroup driver as recommended?

We are using cgroupfs, which is still the default in k8s even though the recommended one is systemd. This change requires node roll out, as well as proper configuration of the CRI as kubelet require to run in the same cgroup.

announce when it will be removed and all existing clusters will be switched to containerd.

should not be necessary, or? Do you see a strong need to enforce containerd for all shoots, even those < v1.22?

Such change is not backward compatible and I am against changing anything in existing clusters. Defaulting to containerd for shoots runnnig on k8s >=1.22 sounds like better approach. Or when we are talking also about:

Switch default container runtime to containerd instead of docker

we should do it via new API version, e.g. v1beta2, but I doubt we will go this way.

@vlerenc
Copy link
Member

vlerenc commented May 27, 2021

Too many people have made themselves dependent on an implementation detail (dockerd underneath), so we cannot change it without users either selecting it as container runtime explicitly or with a new Kubernetes version, which is also explicit.

I would suggest the latter, i.e. piggy-back on a new Kubernetes version and the deprecation of the dockershim to ride on that wave and avoid (to some degree) the frustration when these users realise, we are taking away dockerd for good.

@voelzmo
Copy link
Member Author

voelzmo commented May 28, 2021

Thanks for the comments y'all! Most of the discussion seems to deal with if/how/when to switch the container runtime for new/existing clusters. Please help me understand your opinions in a bit more details, such that we can make the right decisions here

@vlerenc

so we cannot change it without users [making an explicit choice]

My understanding is that you're referring to changing the container runtime for already existing clusters here, is that correct? Would you see a similar requirement of an explicit choice for new shoots, even if they are created with a k8s version which technically still allows for using the in-tree dockershim? Our proposal separated switching the default container runtime for new shoots from dealing with already existing clusters.

@vpnachev
Thanks for clarifying the current state of the cgroup driver! My understanding of the documentation about cgroup drivers is that the best opportunity to change the cgroup driver would be when we would change the container runtime, is this correct? What's your opinion on this topic? Should we couple changing to containerd with a change of the cgroup driver? Should we do things differently?

we should do it via new API version, e.g. v1beta2, but I doubt we will go this way.

Just so I get this right: You're saying when we would switch the default container runtime, we would need to update the Seed's API version, because you're regarding this as an API change?

@rfranzke, @vpnachev, @vlerenc
From all of your comments I'm reading that you are concerned with switching the container runtime of existing clusters without users updating to a k8s version which actually requires this. So I guess this is a point we'll update the issue with. Thanks!

@rfranzke
Copy link
Member

we should do it via new API version, e.g. v1beta2, but I doubt we will go this way.

Just so I get this right: You're saying when we would switch the default container runtime, we would need to update the Seed's API version, because you're regarding this as an API change?

IMO, we don't need a new API version. We change the defaulting in the API for Shoots using a .spec.kubernetes.version of at least v1.22 (if that's the version which doesn't support dockershim anymore) in a way that .spec.provider.workers[].cri.name=containerd (while containerd is also the sole allowed value).

@vpnachev
Copy link
Member

vpnachev commented May 28, 2021

API convention

@voelzmo if we try to follow strictly the well establish API convention in the k8s community https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md, we must also bump the apiVersion number. Why - because changing the default value from the current implicit docker-shim to the explicit containerd is considered as not backward-compatible change. Let's assume we change the default in Gardener v1.30.0, then creating shoot cluster with Gardener v1.30 and v1.29 with exactly the same shoot manifest will result in completely different clusters - one is using the docker-shim and the other - containerd. However, if we use different apiVersions, then the Gardener version does not matter as the shoot created with manifest with apiVersion=core.gardener.cloud/v1beta1 docker-shim will be used, and for exactly the same shoot but with apiVersion=core.gardener.cloud/v1beta2 it will be containerd.

For our purpose creating new apiVersion is overkill as

  • we have to maintain both versions for some time
  • any client need time to migrate to the new version
  • will duplicate most of the code of the current version with very little change

So, to provide more control to the users, we can use the fact the docker-shim is deprecated in particular k8s version and start denying shoot create/update requests for shoot clusters that are on this particular k8s version or higher, if they have not explicitly configured containerd to be the CRI for their cluster. Simply doing this is also not backward compatible change, because we are making optional field required, therefore the solution is to also default the CRI to containerd only for the shoots on this particular or higher k8s version.

Just one example for similar change in the shoot API that was bound to k8s version - the default value of shoot.spec.kubernetes.kubeAPIServer.enableBasicAuthentication is true for shoots with k8s< 1.16, and false for k8s>=1.16.

if obj.Spec.Kubernetes.KubeAPIServer.EnableBasicAuthentication == nil {
if k8sVersionLessThan116 {
obj.Spec.Kubernetes.KubeAPIServer.EnableBasicAuthentication = pointer.BoolPtr(true)
} else {
obj.Spec.Kubernetes.KubeAPIServer.EnableBasicAuthentication = pointer.BoolPtr(false)
}

For shoots on k8s 1.19 or higher it is even not possible to enable the basic authentication.

if geqKubernetes119 && helper.ShootWantsBasicAuthentication(kubeAPIServer) {
allErrs = append(allErrs, field.Forbidden(fldPath.Child("kubeAPIServer", "enableBasicAuthentication"), "basic authentication has been removed in Kubernetes v1.19+"))
}

As Gardener already supports k8s 1.21, it can switch the default earliest for shoots on k8s 1.22.
If k8s really remove docker-shim in 1.22, ref we should forbid it also for this version.

When Docker runtime support is removed in a future release (currently planned for the 1.22 release in late 2021) of Kubernetes it will no longer be supported and you will need to switch to one of the other compliant container runtimes, like containerd or CRI-O.

Existing shoot clusters

Initially Gardener supported only docker-shim and it was not possible to use containerd, hence I guess most of the shoot cluster outside in the world are using docker, not containerd. Some shoot owners have seen the running docker daemon and as they have the full access to their nodes, they added hard-dependency of their applications to the docker daemon. Thus we cannot blindly migrate their already existing clusters to containerd as we will brake their application. The way forward here is to inform them about the deprecation and removal of docker and let them explicitly migrate to containerd when they are ready. Of course we cannot wait them forever, therefore they can keep using docker as long as their shoot cluster is on older k8s version than the particular one.

From all of your comments I'm reading that you are concerned with switching the container runtime of existing clusters without users updating to a k8s version which actually requires this. So I guess this is a point we'll update the issue with. Thanks!

Correct.

Cgroups

I think this is completely different topic and should not be handled in this issue.
We are aware about this recommendation from k8s community, see gardener/gardener-extension-os-ubuntu#1, and the initial source of this recommendation kubernetes/kubeadm#1394 (comment). A lot of k8s distributions started or already switched the default to systemd, however the default in k8s is still cgroupfs. Anyway, changing the cgroup driver for existing node is not safe operation

Changing the cgroup driver of a Node that has joined a cluster is a sensitive operation. If the kubelet has created Pods using the semantics of one cgroup driver, changing the container runtime to another cgroup driver can cause errors when trying to re-create the Pod sandbox for such existing Pods. Restarting the kubelet may not solve such errors.

This means that we will want to replace the nodes of the cluster when this change is triggered.
Currently, Gardener is setting the cgroup driver only on the kubelet

and rely the CRI on the node is using also the cgroupfs. So, firstly Gardener have to take full control over the cgroup of the CRI and just after that it can start switching to systemd. To avoid unwanted disruption, this switch should be applied only to newly created worker pools.

@vlerenc
Copy link
Member

vlerenc commented May 29, 2021

Switch default container runtime to containerd instead of docker

@voelzmo We may not do that, unless we bind it to the Kubernetes version (we should amend this line in the ticket description's "next steps" with the Kubernetes version condition for clarity - the line after that about the removal as well).

Reason being: We have many stakeholders with automation in place, i.e. we cannot change the default behaviour for new clusters. The clusters the people will then get would differ, though the spec they sent would be the same. That is call for trouble. Hence the answer to your question is...

My understanding is that you're referring to changing the container runtime for already existing clusters here, is that correct?

Existing clusters may not be changed, but we also cannot change how new clusters are spun up. That's what I wanted to express with "it must be explicit". So:

  • If you want to offer both or rather all three container runtime options in parallel (dockerd (that under the hood uses containerd+runc as well, but will be omitted here), CRI+containerd+runc, CRI+containerd+gvisor), it must be explicit via containerRuntimes in all Kubernetes versions we currently offer, including the latest 1.21
  • Changing the default behaviour is best bundled with a new Kubernetes version as those must be validated prior by stakeholders and forces them to sent different specs (with the new Kubernetes version, which is a mandatory field). We cannot do that anymore for 1.21 as it is already offered, so the next possible version when we can change the default container runtime is 1.22. That's our chance. The next one (and last one) is 1.23, but then it would be rather sudden.

Changing the Gardener API version was not proposed in earnest/the thought was only entertained in theory, I believe - @vpnachev listed it merely as option to emphasise the difficulty in changing the default "mid-way", should we have to, but we don't have to/there is no reason to. We can easily bundle it with the next Kubernetes version that we support, so we should. We don't want to create new Gardener API versions as this is a massive undertaking, not common practice in Kubernetes (it is avoided there as well, with the exception of alpha->beta->GA, and annoying to our stakeholders, because they need to switch/adapt one more field and this specifically will make them nervous/trigger questions, what else has changed and generally create uncertainty. We have the luxury (as all in Kubernetes have), to piggy-back such changes on the Kubernetes minor version and so we should always make use of it. Every 4 month, Kubernetes (and therefore Gardener) can rejuvenate itself to some "fair" degree. Not the API specs, of course, but such things as potentially impactful implementation details that change, defaults (when fields are omitted like in our case), etc. are more easily changed then.

To me, the only remaining question was whether we want to offer dockerd and CRI+containerd+runc in parallel or not and if yes, for how long. I think you already made the decision that you want to offer it, so that we learn of issues early on. That seems reasonable. You also decided already to not use a fork/the Mirantis dockershim. That also sounds reasonable. Then I would think, the key questions are settled:

  • You would offer CRI+containerd+runc as explicit option via containerRuntimes whenever you are ready and for the Kubernetes versions you can easily support, that could be anything from all the versions, from 1.21, or from 1.22
  • The default will switch from dockerd to CRI+containerd+runc only with a Kubernetes minor version, which could be 1.22, if we have had enough head-time and saw no issues or 1.23 if we need more time
  • You would drop support for dockerd with 1.22 if everything runs really smoothly or from 1.23 at the latest when we have to, because then the dockershim will be gone. That will give users 8m time to adapt / learn how to live without dockerd, if they are late to the party (yes, I know, some will miss also this date and some will only notice that they cannot upgrade to 1.23 when 1.25 gets out and they will be caught on a no longer supported 1.22 until they free themselves of dockerd, but we cannot help them then anymore - without such cleansing steps every 4 months Kubernetes/Gardener would become unmanageable - in order to evolve healthily, you must be allowed to drop baggage now and then and Kubernetes has defined how that is done to the benefit of all in the long run)

@voelzmo
Copy link
Member Author

voelzmo commented Jun 1, 2021

Thanks for all the detailed comments, I really appreciate you all taking the time to provide more context on your thoughts!
To keep you in the loop, here's what we changed in the original issue, hoping that your input is well considered. Let me know if that didn't quite capture it!

  • Added concrete k8s versions to the steps of changing the default container runtime for new shoots (1.22) and removing the option to configure docker as container runtime (1.23). Added some reasoning why we want to separate these two.
  • Clarified the decision to not change the default container runtime without explicit user action
  • Added more information regarding the cgroup driver. Note that we didn't defer this to the 'future work' section immediately, as we want to better understand the consequences of switching container runtime and cgroup driver at the same time to be able to compare it with the alternative of pushing this change to the future

@voelzmo
Copy link
Member Author

voelzmo commented Nov 30, 2021

Deprecation of docker as container runtime is completed. We created a few tasks are about being able to produce OS images without docker runtime support and other things which might be helpful, but those will be tackled outside this epic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement Enhancement, improvement, extension kind/epic Large multi-story topic priority/3 Priority (lower number equals higher priority) roadmap/cloud Roadmap for the (managed) cloud delivery, i.e. gardener.cloud.sap
Projects
None yet
Development

No branches or pull requests

6 participants