Skip to content

Commit

Permalink
WINC-505: Windows containerd runtime enablement
Browse files Browse the repository at this point in the history
  Enhancement proposal for making containerd as default runtime in
  windows node.

Signed-off-by: selansen <esiva@redhat.com>
  • Loading branch information
selansen committed Nov 24, 2021
1 parent 05f2817 commit a1d9d1d
Showing 1 changed file with 175 additions and 0 deletions.
175 changes: 175 additions & 0 deletions enhancements/windows-containers/container-runtime-containerd.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
---
title: container-runtime-containerd
authors:
- "@selansen"
reviewers:
- "@aravindhp"
- "@openshift/openshift-team-windows-containers"
approvers:
- "@aravindhp"
- "@mrunalp"
creation-date: 2021-11-19
last-updated: 2021-11-20
status: implementable
---

# containerd - new container run time

## Release Signoff Checklist

- [x] Enhancement is `implementable`
- [x] Design details are appropriately documented from clear requirements
- [x] Test plan is defined
- [x] Graduation criteria for dev preview, tech preview, GA
- [ ] User-facing documentation is created in [OpenShift-docs](https://github.com/OpenShift/OpenShift-docs/)

## Summary

The intent of this enhancement is to allow customers to bring up windows node with containerd
as default run time from kubernetes 1.24 based OpenShift 4.11 onwards. When customers try to upgrade
current cluster to kubernetes 1.24 based OpenShift 4.11, runtime will be migrated from docker to containerd

## Motivation

In Kubernetes, the CRI interface is used to talk to a container runtime. The design of CRI is to be able
to run a CRI implementation as a separate binary. However currently the CRI of docker (a.k.a. dockershim)
is part of kubelet code, runs as part of kubelet and is tightly coupled with kubelet's lifecycle. From kubernetes
1.24 onwards dockershim will be removed from kubelet code. Currently WMCO uses docker as default run time. Aim is
to make containerd as default runtime and move away from docker before dokershim has been decoupled from kubelet.

### Goals

As part of this enhancement we plan to do the following:
* Make containerd as default run time.
* When upgrade happens from older release newer one, containerd will become default run time.

### Non-Goals

* De-configuring docker run time is not part of this enhancement.
* Refactoring or re-design of WMCO due to dockershim deprecation.
* Switching between docker and containerd runtimes is not supported.

## Proposal

To make containerd as default run time, containerd should be installed before kueblet and
kubelet parameter need to be updated so that containerd will become default run time. In
upgrade case each Windows VM configured by previous versions of the WMCO has its Machine
object deleted, resulting in the drain and deletion of the Windows node, and the termination
then followed by recreation of the VM. The upgraded WMCO instance will then be able to configure
VMs that will be created with containerd as default runtime. All minimum requirement stated
as part of WMCO upgrade will be applicable here as well.

Containerd Migration plan
* Containerd will become default runtime as part of OpenShift 4.11
* For current usage, we plan to introduce feature flag and release it only for community operator.
* If it is downgraded same thing will apply. windows VM Will be de-configured and re-configured with
the WMCO supported runtime. ( Note : downgrades are not supported by the OLM)

### User Stories

Stories can be found within the [Windows Containers: containerd](https://issues.redhat.com/browse/WINC-505)

### Justification

If we dont make containerd as default runtime, we will be left out with Mirantis supported dockershim
and will have to re-deisgn WMCO to incorporate Mirantis dockershim. If we want to use CRI-O for windows,
the amount of time and engineering efforts involved making CRI-O to work for windows is huge. This
doesn't do business justification due to the high cost and time. Containerd is widely adopted and supported
by open source community. Also, Microsoft is major contributor in windows containerd development and making
containerd as default runtime for their kubernetes offerings. Most of the windows supported k8s orchestrators
already moved towards containerd.

### Design Details

we plan to target containerd 1.6.0 to integrate into WMCO. Containerd will be installed
as a first service before kubelet installation as kubelet must require containerd to be
running. We already bundled containerd package with WMCO. We may include crictl bundle as
part of the package for debugging purpose. Once containerd is installed, rest of the service
installation steps remain same. kubelet's dependency on docker runtime will be removed and
replaced by containerd. Separate folder will be created under C:\k\ to store containerd config
and related files.
steps to install containerd as service
* scp containerd/related executables into windows VM
* copy the files in appropriate folder
* create contaierd config file
* run extra command for performance `Add-MpPreference -ExclusionProcess "$Env:containerd.exe"`
* register containerd as service
* start containerd service

## Network changes
Current CNI/IPAM will be used for containerd and no change in HNS-Netwrok and HNS-Ednpoint creation
steps. The config file which will be used to by containerd will point to same CNI/IPAM executables.

## feature flag
Containerd feature flag will be used to enable this feature. This will be enabled by default in
community operator for customers and developers to try it out. This will become default for OpenShift
4.11. At any given point in time we have one runtime support and do not allow switching between runtime.

## logging
Containerd can be started with parameters in which we can enable logging and specify file path to
log warnings/errors. Log files will be stored at c:\var\log\containerd

## upgrade
When a cluster is upgraded OLM will switch to using a new Red Hat operators
index. Because WMCO is named the same in both indexes, OLM will upgrade WMCO
from the previous version, up to the latest version available in the new
cluster.

The procedure for an upgrade is as follows:
1) As part of upgrade, basic validation will be done and de-configuring will take place.
2) During De-configuration, older version of kubelet, kube-proxy and CNI will be uninstalled.
3) The newer version of WMCO will install all the required components along with containerd.
4) Kuebelet will start using containerd for pulling and managing container images.


### Risks and Mitigations

* If the cluster is upgraded, and the new version introduces an issue due to containerd
that should be addressed coz docker runtime support won't be available in kubelet.
going back to older WMCO might also run into issue due to kubelet version mismatch
between API server and kubelet.As we are planning to bring this feature before 4.11
this can be tested well and should make sure we address all the issues by working
with containerd open source community.
* containerd doesn't support image-pull-progress-deadline as of now. There is a PR
https://github.com/containerd/containerd/pull/6150 work in progress. Until this
gets merged, if windows image pull takes more time than the default value, we might
run into to image pull timeout error. proposed solution would be pull the image first
with ctr or crictl commandline tool and then create pods.
* Currently windows_exporter has been used to collect metrics from windows node. we do
have containerd support in https://github.com/prometheus-community/windows_exporter/releases/tag/v0.16.0
All the functionalities supported in docker runtime need to be checked for containerd.

### Test Plan

* We will have new e2e test case to be added for containerd replicating same existing
test case that covers WMCO functionality.
* Containerd is agnostic to platform so testing in any platform should be fine.
* Update WMCO community image on release repo so that CI workflow will use containerd
based WMCO community operator.

### Graduation Criteria

This enhancement will start with WMCO community operator. This will become default feature from
OpenShift 4.11 release onwards.

### Upgrade / Downgrade Strategy

Upgrade is already discussed in design section. Downgrades are [not supported](https://github.com/operator-framework/operator-lifecycle-manager/issues/1177)
by OLM.

### Version Skew Strategy
We plan to maintain parity with the upstream [containerd](https://github.com/containerd/containerd/releases)

## Implementation History

v1: Initial Proposal

## Alternatives

There are few alternatives but they either not cost-effective or depending on competitors less modular
components.
* Implementing CRIO runtime for windows involves huge engineering effort along with less community
support ( most community supporters already moved to containerd).
* There is an effort going on to continue to use dockershim and docker runtime. As kubelet is going to
remove dockershim specific code, we still have to come up with design change to make it work from k8s 1.24
onwards.

0 comments on commit a1d9d1d

Please sign in to comment.