Skip to content
This repository has been archived by the owner on May 30, 2023. It is now read-only.

Move to unified cgroupv2 hierarchy and upgrade docker to 20.10 #931

Merged
merged 16 commits into from
Aug 12, 2021
Merged

Move to unified cgroupv2 hierarchy and upgrade docker to 20.10 #931

merged 16 commits into from
Aug 12, 2021

Conversation

jepio
Copy link
Contributor

@jepio jepio commented Apr 5, 2021

Move to unified cgroupv2 hierarchy and update docker to 20.10

This PR updates Docker to 20.10 and switches systemd to unified cgroupv2 hierarchy. Docker deps (tini,docker-proxy,docker-cli) have been updated following comments/deps from the gentoo docker ebuild (https://github.com/gentoo/gentoo/blob/69d01a4273a556b1205a7a575cb3811ab7e2443d/app-emulation/docker/docker-20.10.5.ebuild).

This PR does not touch the github actions for docker updates, it will be necessary to adjust that for docker-ce -> docker repo rename and add docker-cli updates as well. I also treated the docker ebuild update as a PoC, and made no attempt to reduce divergence to gentoo.

How to use

Apply the following change to build_torcx_store from https://github.com/kinvolk/flatcar-scripts

diff --git a/build_torcx_store b/build_torcx_store
index 381d5c11..ecfd813e 100755
--- a/build_torcx_store
+++ b/build_torcx_store
@@ -237,13 +237,13 @@ function torcx_package() {
 # for each package will point at the last version specified.  This can handle
 # swapping default package versions for different OS releases by reordering.
 DEFAULT_IMAGES=(
-        =app-torcx/docker-19.03
+        =app-torcx/docker-20.10
 )
 
 # This list contains extra images which will be uploaded and included in the
 # generated manifest, but won't be included in the vendor store.
 EXTRA_IMAGES=(
-       =app-torcx/docker-17.03
+       =app-torcx/docker-19.03
 )
 
 mkdir -p "${BUILD_DIR}"

Pull this branch and then:

./build_packages
# rebuild initramfs to pick up changed systemd
emerge-amd64-usr coreos-kernel
./build_image

Testing done

$ docker info
...
 Server Version: 20.10.5
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2

Setup kubernetes following https://suraj.io/post/2021/01/kubeadm-flatcar/, then verify that kubernetes is also making proper use of systemd cgroup driver and cgroupv2:

$ systemctl status containerd  | cat
● containerd.service - containerd container runtime
     Loaded: loaded (/run/systemd/system/containerd.service; disabled; vendor preset: disabled)
     Active: active (running) since Mon 2021-04-05 17:27:13 UTC; 15min ago
       Docs: https://containerd.io
    Process: 739 ExecStartPre=mkdir -p /run/docker/libcontainerd (code=exited, status=0/SUCCESS)
    Process: 742 ExecStartPre=ln -fs /run/containerd/containerd.sock /run/docker/libcontainerd/docker-containerd.sock (code=exited, status=0/SUCCESS)
    Process: 743 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
   Main PID: 744 (containerd)
      Tasks: 309
     Memory: 160.4M
        CPU: 16.020s
     CGroup: /system.slice/containerd.service
             ├─ 744 /run/torcx/bin/containerd --config /run/torcx/unpack/docker/usr/share/containerd/config.toml
             ├─1858 /run/torcx/unpack/docker/bin/containerd-shim-runc-v2 -namespace moby -id bf629475ee9097427dd7f2d53f1d7ed72c4d5900d22eba876a80494f569e3f27 -address /run/containerd/containerd.sock
             ├─1910 /run/torcx/unpack/docker/bin/containerd-shim-runc-v2 -namespace moby -id 4d3d5f32637f90dca4473609040450306d7f8a1d78b15f677e95e467fd5e77fb -address /run/containerd/containerd.sock
             ├─2092 /run/torcx/unpack/docker/bin/containerd-shim-runc-v2 -namespace moby -id 37db84e6a6d705380306b7a4f89b94b60ee8f3135f5765c92c23d67098e2705d -address /run/containerd/containerd.sock
             ├─2138 /run/torcx/unpack/docker/bin/containerd-shim-runc-v2 -namespace moby -id 2d1e1f9f40663453a3fa54fc01a913766c598e63c954425acf2e5fe9eb963bae -address /run/containerd/containerd.sock
             ├─2345 /run/torcx/unpack/docker/bin/containerd-shim-runc-v2 -namespace moby -id 31f59d30117def4ea1b57ddf0aca0fb9d92345acf44c7096acc24b404d286d00 -address /run/containerd/containerd.sock
             ├─2391 /run/torcx/unpack/docker/bin/containerd-shim-runc-v2 -namespace moby -id 81489b4ac8194c657ef52e93f098b7c94918a6c61fd0643442822542a6dfec76 -address /run/containerd/containerd.sock
             ├─2598 /run/torcx/unpack/docker/bin/containerd-shim-runc-v2 -namespace moby -id d3e3e4b693fac62ebc26a56edb3900e6adb6cfe50eb016eb6bd5283101d332fa -address /run/containerd/containerd.sock
             ├─2639 /run/torcx/unpack/docker/bin/containerd-shim-runc-v2 -namespace moby -id 94d4ea3bc0841b81c0218a1380810e363fa3ab833ee12b4ea3dd4693b7e14799 -address /run/containerd/containerd.sock
             ├─2881 /run/torcx/unpack/docker/bin/containerd-shim-runc-v2 -namespace moby -id dff07ea20c18b7c9ecc670253b2617f178da674bfae6aecc0fcc0cfe23ef95ce -address /run/containerd/containerd.sock
             ├─2912 /run/torcx/unpack/docker/bin/containerd-shim-runc-v2 -namespace moby -id 25a9350fb93e39f84bf9b3e84780a8d1b90e79d36c2d8535c9e3859fd6cf3d55 -address /run/containerd/containerd.sock
             ├─3069 /run/torcx/unpack/docker/bin/containerd-shim-runc-v2 -namespace moby -id 09a054176ba1c9a2b402248958d379bbe31bcfc0d6a52783d3efbd0247e89550 -address /run/containerd/containerd.sock
             ├─3126 /run/torcx/unpack/docker/bin/containerd-shim-runc-v2 -namespace moby -id ae468167c4de6038b102e5f45bf2e81536cbb6e6ef0dfc8940dba7fe4ff4758a -address /run/containerd/containerd.sock
...

$ systemd-cgls | grep '[k]ubepods.*slice'
└─kubepods.slice 
  ├─kubepods-burstable.slice 
  │ ├─kubepods-burstable-pod350dc56df6e020578314229ed3c3f49e.slice 
  │ ├─kubepods-burstable-pod8555110d3e24e1ce595b74110fcb757e.slice 
  │ ├─kubepods-burstable-pod561b7082_09ee_48d9_8bbf_a654f17cab19.slice 
  │ ├─kubepods-burstable-pod8600f0e2_28c8_4d1f_ae02_a1eb80288ac5.slice 
  │ ├─kubepods-burstable-podad4827c79ef2a3e7bc337afe5ac3b538.slice 
  │ └─kubepods-burstable-pod9e23b1a40191518b4ea2c75208418b49.slice 
  └─kubepods-besteffort.slice 
    ├─kubepods-besteffort-pod2d9e1469_3f8f_4324_ab84_e0eabdc7664e.slice 
    ├─kubepods-besteffort-podd4ad0648_427a_49c1_aba6_07ab039abbed.slice 
    ├─kubepods-besteffort-pod66848903_0b4d_464b_86c7_fa57f2cd9b16.slice 
    ├─kubepods-besteffort-podf2cf7ff5_e1f4_4fa1_80bc_fd2eeefc70ca.slice 
    └─kubepods-besteffort-pod20ecfc23_4261_4b15_96cd_4d8de4c01884.slice

TODOs

@dongsupark dongsupark requested a review from a team April 6, 2021 15:16
@pothos
Copy link
Contributor

pothos commented Apr 8, 2021

Hi,
overall it looks very good, I didn't go into the details but I realize that the current git situation of the folder a bit confusing with the symlink. Normally the import from Gentoo happens in one commit, resetting the ebuild file (or better the whole folder), and in the following commits the downstream changes are done. This has not really been the case here but the major update is a good point in time to follow this practice for the folder (it can be one big downstream commit). Also, Gentoo upstream does not have a 9999 file and maybe it makes sense to drop it?

@goochjj
Copy link

goochjj commented Apr 30, 2021

If this were available as a torcx package somewhere right now, I'd be running it already.

I packaged up a torcx package of 20.10.5 based on Docker-CE's static binaries, but I'd rather have something vendor-managed. My use case is the ip6tables support in dockerd, plus cgroups v2.

Is this something that's going to go on the roadmap for inclusion?

@mback2k
Copy link

mback2k commented Jul 2, 2021

I am also looking forward to this regarding the capabilities support in Docker 20.10.

@jepio
Copy link
Contributor Author

jepio commented Jul 9, 2021

I've resumed working on this, I have a couple more TODOs:

@jepio jepio changed the title [RFC] Move to unified cgroupv2 hierarchy and upgrade docker to 20.10 [WIP] Move to unified cgroupv2 hierarchy and upgrade docker to 20.10 Jul 14, 2021
@jepio
Copy link
Contributor Author

jepio commented Jul 14, 2021

The docker.oldcompat test fails with:

--- FAIL: docker.oldclient (29.22s)
        cluster.go:117: Error response from daemon: 400 Bad Request: malformed Host header
        cluster.go:130: "/home/core/docker-1.9.1 run echo echo 'IT WORKED'" failed: output , status Process exited with status 1

We will need to retire this test:

@jepio
Copy link
Contributor Author

jepio commented Jul 15, 2021

The google.kubernetes.basic.docker.v1.18.0 test cases fail with:

Jul 15 14:17:42.446942 kubelet[4663]: F0715 14:17:42.446505    4663 server.go:274] failed to run Kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"

Unified cgroup layout is only supported from kubernetes 1.19. So we also have to disable these tests.

The kubeadm tests pass 👍

@jepio
Copy link
Contributor Author

jepio commented Jul 15, 2021

We talked about it in matrix and are going to disable the tests from running on the alpha channel.

@pothos
Copy link
Contributor

pothos commented Jul 15, 2021

What the tini update commit does is a bit unclear as tini doesn't follow the split update + downstream commit workflow. The existing files under files/ are downstream and got created when it was first added to coreos-overlay. If you can it would be nice to address this here as it helps to understand what downstream changes are done compared to the upstream version (https://gitweb.gentoo.org/repo/gentoo.git/tree/sys-process/tini/tini-0.19.0-r1.ebuild).

@pothos
Copy link
Contributor

pothos commented Jul 15, 2021

Looks like the docker-cli-20.10.7.ebuild file needs to get a patch applied that sets the COREOS_GO_VERSION, COREOS_GO_PACKAGE and uses inherit coreos-go and go_build to fix the Go version. Docker releases are tested and done with a particular Go version and we had a problem once already when we took a newer one.

jepio added a commit to flatcar/mantle that referenced this pull request Jul 16, 2021
These tests are 'docker.oldclient' and 'google.kubernetes.basic.docker.*'.

'docker.oldclient' tries to run docker cli 1.9 against daemon in the
image, and fails with:

  --- FAIL: docker.oldclient (29.22s)
          cluster.go:117: Error response from daemon: 400 Bad Request: malformed Host header
          cluster.go:130: "/home/core/docker-1.9.1 run echo echo 'IT WORKED'" failed: output , status Process exited with status 1

This is related to moby/moby#39076, merged into
20.10 which removed some backwards compatibility.

The 'google.kubernetes.basic.docker.*' tests fails with the following
message in journal:

  Jul 15 14:17:42.446942 kubelet[4663]: F0715 14:17:42.446505    4663 server.go:274] failed to run Kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"

Kubernetes release 1.19 is the first one that properly supports the unified cgroup hierarchy.
We also have other tests that test that kubernetes works (kubeadm) so we
can disable the legacy ones.

The old tests should be removed once the docker 20.10 upgrade has
propagated to all channels.

See also flatcar-archive/coreos-overlay#931

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
@jepio
Copy link
Contributor Author

jepio commented Jul 23, 2021

Progress update:

  • I fixed the docker update github action and it should now work
  • I rebased on master to adapt to the containerd bump
  • I reverted to the systemd ebuild - I believe we should enable the cgroup hierarchy setting through grub. This way we don't break existing nodes that may have kubelet running on them.

@pothos
Copy link
Contributor

pothos commented Jul 23, 2021

Good idea to use GRUB to separate new from old deployments, but when would we be able to do the switch in systemd, so that PXE boots can also default to v2?
I also wonder if it will add confusion if people have the same Flatcar release and get different behaviors. I think it would be good to bake the setting into systemd because once GRUB got installed, we can't update the settings anymore – Yes, this requires communicating the breaking change and it would be good to practice this well. Another breaking change will be the nftables backend and there, too, we need to follow the same process of publishing a Beta and raising awareness that users need to deploy changes first before updating.

@mback2k
Copy link

mback2k commented Jul 23, 2021

So how exactly does this affect those using docker and kubelet (e.g. with kube-proxy and kube-router) including iptables directly?

@jepio
Copy link
Contributor Author

jepio commented Jul 27, 2021

So how exactly does this affect those using docker and kubelet (e.g. with kube-proxy and kube-router) including iptables directly?

This PR has no impact on iptables. For the other points I would like to ask some questions:

  • are you using system docker or a custom version (which one?)
  • what are you using to deploy kubelet and what version of kubernetes are you running?

We will will be upgrading system docker to v20.10, and giving folks a transition period with warnings about checking the impact of enabling cgroupv2. You will most likely need to switch your kubelet to systemd cgroup driver - check your kubelet configuration.

@mback2k
Copy link

mback2k commented Jul 29, 2021

@jepio sorry for the late response, but here it is:

This PR has no impact on iptables.

Good to hear that.

For the other points I would like to ask some questions:
* are you using system docker or a custom version (which one?)

System one, the torcx package shipping with flatcar.

* what are you using to deploy kubelet and what version of kubernetes are you running?

I am deploying kubernetes manually using a custom Ansible playbook which then uses a custom torcx package to deploy kubelet and kube-proxy on the flatcar host. All other kubernetes components are running in containers.

You will most likely need to switch your kubelet to systemd cgroup driver - check your kubelet configuration.

Currently looks like this: /run/torcx/bin/kubelet --config=/etc/kubernetes/kubelet.yaml --container-runtime=remote --container-runtime-endpoint=unix:///run/docker/libcontainerd/docker-containerd.sock --kubeconfig=/etc/kubernetes/kubelet.kubeconfig --node-ip a.b.c.d

@jepio jepio changed the title [WIP] Move to unified cgroupv2 hierarchy and upgrade docker to 20.10 Move to unified cgroupv2 hierarchy and upgrade docker to 20.10 Aug 6, 2021
@jepio
Copy link
Contributor Author

jepio commented Aug 6, 2021

I've removed the WIP from the title. CI is running http://jenkins.infra.kinvolk.io:8080/job/os/job/manifest/3230/

jepio added a commit to flatcar/mantle that referenced this pull request Aug 9, 2021
These tests are 'docker.oldclient' and 'google.kubernetes.basic.docker.*'.

'docker.oldclient' tries to run docker cli 1.9 against daemon in the
image, and fails with:

  --- FAIL: docker.oldclient (29.22s)
          cluster.go:117: Error response from daemon: 400 Bad Request: malformed Host header
          cluster.go:130: "/home/core/docker-1.9.1 run echo echo 'IT WORKED'" failed: output , status Process exited with status 1

This is related to moby/moby#39076, merged into
20.10 which removed some backwards compatibility.

The 'google.kubernetes.basic.docker.*' tests fails with the following
message in journal:

  Jul 15 14:17:42.446942 kubelet[4663]: F0715 14:17:42.446505    4663 server.go:274] failed to run Kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"

Kubernetes release 1.19 is the first one that properly supports the unified cgroup hierarchy.
We also have other tests that test that kubernetes works (kubeadm) so we
can disable the legacy ones.

The old tests should be removed once the docker 20.10 upgrade has
propagated to all channels.

See also flatcar-archive/coreos-overlay#931

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
jepio added a commit to flatcar/mantle that referenced this pull request Aug 9, 2021
These tests are 'docker.oldclient' and 'google.kubernetes.basic.docker.*'.

'docker.oldclient' tries to run docker cli 1.9 against daemon in the
image, and fails with:

  --- FAIL: docker.oldclient (29.22s)
          cluster.go:117: Error response from daemon: 400 Bad Request: malformed Host header
          cluster.go:130: "/home/core/docker-1.9.1 run echo echo 'IT WORKED'" failed: output , status Process exited with status 1

This is related to moby/moby#39076, merged into
20.10 which removed some backwards compatibility.

The 'google.kubernetes.basic.docker.*' tests fails with the following
message in journal:

  Jul 15 14:17:42.446942 kubelet[4663]: F0715 14:17:42.446505    4663 server.go:274] failed to run Kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"

Kubernetes release 1.19 is the first one that properly supports the unified cgroup hierarchy.
We also have other tests that test that kubernetes works (kubeadm) so we
can disable the legacy ones.

The old tests should be removed once the docker 20.10 upgrade has
propagated to all channels.

See also flatcar-archive/coreos-overlay#931

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
jepio added a commit to flatcar/mantle that referenced this pull request Aug 9, 2021
These tests are 'docker.oldclient' and 'google.kubernetes.basic.docker.*'.

'docker.oldclient' tries to run docker cli 1.9 against daemon in the
image, and fails with:

  --- FAIL: docker.oldclient (29.22s)
          cluster.go:117: Error response from daemon: 400 Bad Request: malformed Host header
          cluster.go:130: "/home/core/docker-1.9.1 run echo echo 'IT WORKED'" failed: output , status Process exited with status 1

This is related to moby/moby#39076, merged into
20.10 which removed some backwards compatibility.

The 'google.kubernetes.basic.docker.*' tests fails with the following
message in journal:

  Jul 15 14:17:42.446942 kubelet[4663]: F0715 14:17:42.446505    4663 server.go:274] failed to run Kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"

Kubernetes release 1.19 is the first one that properly supports the unified cgroup hierarchy.
We also have other tests that test that kubernetes works (kubeadm) so we
can disable the legacy ones.

The old tests should be removed once the docker 20.10 upgrade has
propagated to all channels.

See also flatcar-archive/coreos-overlay#931

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
;;
1.19)
S3_PATH="1.19.6/2021-01-05"
;;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pothos
I've added fetching code for 1.20,1.19 (and 1.21 in a not pushed fixup), but I'm wondering if it makes sense to keep the <1.19 around. I'm not familiar with EKS deployment - is it possible to pass an ignition snippet? Does it make sense to keep a configuration around that requires a reboot to apply (won't play nicely with autoscaling, right?).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marga-kinvolk can answer better but yes, I think without baking the immediate reboot into the configuration it doesn't make much sense to keep the old ones available

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marga-kinvolk let me know if you have an opinion here - we can still change this before the coming alpha release.

jepio and others added 15 commits August 12, 2021 09:57
For tini-0.19. Upstream commit 2e10a957da8a8a93c1f5d82011e3f6692f7b765c.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
We use a custom build system to remove the cmake dependency and hardcode
relevant configuration.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Docker upstream split the cli component into a separate repo, so there is
a separate ebuild that builds the docker utility. This is a prerequisite
of the update of docker to 20.10.

This is an import from portage commit 69d01a4273a556b1205a7a575cb3811ab7e2443d.

Signed-off-by: Jeremi Piotrowski <jeremi.piotrowski@gmail.com>
We use coreos-go* eclass so that we can override several environment
variables and build with the same go version as docker upstream. These
changes are modeled after what was previously done in app-emulation/docker,
the cli ebuild has only been split out since v20.10.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
This is the version used by docker-19.03. We will be updating the live
ebuild to build docker 20.10 dependencies.

Signed-off-by: Jeremi Piotrowski <jeremi.piotrowski@gmail.com>
This is the version needed by docker 20.10.7. ROADMAP.md doesn't exist so it
has been removed from src_install.

Signed-off-by: Jeremi Piotrowski <jeremi.piotrowski@gmail.com>
For update to docker-20.10.7.

gentoo/portage commit 0ed05ce0a8f0d1c3dfa6151e7ebb25b67c4aae16

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
The patches do the following:

* install flatcar specific wrappers and systemd config
* force some USE flags to default on
* allow injecting CFLAGS/LDFLAGS so that torcx can work
* force building with go1.13 (like upstream does) - this won't be
  necessary next time because docker master already uses go1.16
Compared to previous torcx images the docker-cli package is a separate
package, following upstream Docker repo layout changes.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
…oupv2

We are switching flatcar to cgroupv2 which is support by docker 20.10 and
kubernetes 1.19. This requires setting the systemd cgroup driver in the kubelet
config.

Due to the unified cgroup hierarchy, kubernetes <1.19 will not work so
remove all older versions.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
The upstream docker repository location has changed to docker/docker.
Additionally, the cli component has been split out which which requires
fetching two hashes and updating two ebuilds. We also took the chance to
align the ebuild with gentoo's, which means there are is no more live ebuild
and no symlink.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Now that Docker has been updated to 20.10, we can use cgroupv2 so have
systemd mount the unified cgroup hierarchy by default. Other ways of
achieving the same would have been to pass 'systemd.unified_cgroup_hierarchy=1'
on the kernel cmdline, but this way the change propagates nicely to all
OEM consumers.

Signed-off-by: Jeremi Piotrowski <jeremi.piotrowski@gmail.com>
This pulls in flatcar/update_engine#13

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
This pulls in flatcar/init#44

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
The SystemdCgroup=true setting is incompatible with kubelet
cgroupDriver: cgroupfs. So to prevent kube clusters from failing, we
will be freezing a nodes config.toml during an update. For that purpose,
we install a second configuration file that can then be selected using a
systemd drop-in unit.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
@jepio jepio merged commit ff72a85 into flatcar-archive:main Aug 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants