-
Notifications
You must be signed in to change notification settings - Fork 36
Move to unified cgroupv2 hierarchy and upgrade docker to 20.10 #931
Move to unified cgroupv2 hierarchy and upgrade docker to 20.10 #931
Conversation
Hi, |
If this were available as a torcx package somewhere right now, I'd be running it already. I packaged up a torcx package of 20.10.5 based on Docker-CE's static binaries, but I'd rather have something vendor-managed. My use case is the ip6tables support in dockerd, plus cgroups v2. Is this something that's going to go on the roadmap for inclusion? |
I am also looking forward to this regarding the capabilities support in Docker 20.10. |
I've resumed working on this, I have a couple more TODOs:
|
The
We will need to retire this test:
|
The
Unified cgroup layout is only supported from kubernetes 1.19. So we also have to disable these tests. The |
We talked about it in matrix and are going to disable the tests from running on the alpha channel. |
What the tini update commit does is a bit unclear as tini doesn't follow the split update + downstream commit workflow. The existing files under |
Looks like the |
These tests are 'docker.oldclient' and 'google.kubernetes.basic.docker.*'. 'docker.oldclient' tries to run docker cli 1.9 against daemon in the image, and fails with: --- FAIL: docker.oldclient (29.22s) cluster.go:117: Error response from daemon: 400 Bad Request: malformed Host header cluster.go:130: "/home/core/docker-1.9.1 run echo echo 'IT WORKED'" failed: output , status Process exited with status 1 This is related to moby/moby#39076, merged into 20.10 which removed some backwards compatibility. The 'google.kubernetes.basic.docker.*' tests fails with the following message in journal: Jul 15 14:17:42.446942 kubelet[4663]: F0715 14:17:42.446505 4663 server.go:274] failed to run Kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd" Kubernetes release 1.19 is the first one that properly supports the unified cgroup hierarchy. We also have other tests that test that kubernetes works (kubeadm) so we can disable the legacy ones. The old tests should be removed once the docker 20.10 upgrade has propagated to all channels. See also flatcar-archive/coreos-overlay#931 Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Progress update:
|
Good idea to use GRUB to separate new from old deployments, but when would we be able to do the switch in systemd, so that PXE boots can also default to v2? |
So how exactly does this affect those using docker and kubelet (e.g. with kube-proxy and kube-router) including iptables directly? |
This PR has no impact on iptables. For the other points I would like to ask some questions:
We will will be upgrading system docker to v20.10, and giving folks a transition period with warnings about checking the impact of enabling cgroupv2. You will most likely need to switch your kubelet to systemd cgroup driver - check your kubelet configuration. |
@jepio sorry for the late response, but here it is:
Good to hear that.
System one, the torcx package shipping with flatcar.
I am deploying kubernetes manually using a custom Ansible playbook which then uses a custom torcx package to deploy kubelet and kube-proxy on the flatcar host. All other kubernetes components are running in containers.
Currently looks like this: |
I've removed the WIP from the title. CI is running http://jenkins.infra.kinvolk.io:8080/job/os/job/manifest/3230/ |
These tests are 'docker.oldclient' and 'google.kubernetes.basic.docker.*'. 'docker.oldclient' tries to run docker cli 1.9 against daemon in the image, and fails with: --- FAIL: docker.oldclient (29.22s) cluster.go:117: Error response from daemon: 400 Bad Request: malformed Host header cluster.go:130: "/home/core/docker-1.9.1 run echo echo 'IT WORKED'" failed: output , status Process exited with status 1 This is related to moby/moby#39076, merged into 20.10 which removed some backwards compatibility. The 'google.kubernetes.basic.docker.*' tests fails with the following message in journal: Jul 15 14:17:42.446942 kubelet[4663]: F0715 14:17:42.446505 4663 server.go:274] failed to run Kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd" Kubernetes release 1.19 is the first one that properly supports the unified cgroup hierarchy. We also have other tests that test that kubernetes works (kubeadm) so we can disable the legacy ones. The old tests should be removed once the docker 20.10 upgrade has propagated to all channels. See also flatcar-archive/coreos-overlay#931 Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
These tests are 'docker.oldclient' and 'google.kubernetes.basic.docker.*'. 'docker.oldclient' tries to run docker cli 1.9 against daemon in the image, and fails with: --- FAIL: docker.oldclient (29.22s) cluster.go:117: Error response from daemon: 400 Bad Request: malformed Host header cluster.go:130: "/home/core/docker-1.9.1 run echo echo 'IT WORKED'" failed: output , status Process exited with status 1 This is related to moby/moby#39076, merged into 20.10 which removed some backwards compatibility. The 'google.kubernetes.basic.docker.*' tests fails with the following message in journal: Jul 15 14:17:42.446942 kubelet[4663]: F0715 14:17:42.446505 4663 server.go:274] failed to run Kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd" Kubernetes release 1.19 is the first one that properly supports the unified cgroup hierarchy. We also have other tests that test that kubernetes works (kubeadm) so we can disable the legacy ones. The old tests should be removed once the docker 20.10 upgrade has propagated to all channels. See also flatcar-archive/coreos-overlay#931 Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
These tests are 'docker.oldclient' and 'google.kubernetes.basic.docker.*'. 'docker.oldclient' tries to run docker cli 1.9 against daemon in the image, and fails with: --- FAIL: docker.oldclient (29.22s) cluster.go:117: Error response from daemon: 400 Bad Request: malformed Host header cluster.go:130: "/home/core/docker-1.9.1 run echo echo 'IT WORKED'" failed: output , status Process exited with status 1 This is related to moby/moby#39076, merged into 20.10 which removed some backwards compatibility. The 'google.kubernetes.basic.docker.*' tests fails with the following message in journal: Jul 15 14:17:42.446942 kubelet[4663]: F0715 14:17:42.446505 4663 server.go:274] failed to run Kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd" Kubernetes release 1.19 is the first one that properly supports the unified cgroup hierarchy. We also have other tests that test that kubernetes works (kubeadm) so we can disable the legacy ones. The old tests should be removed once the docker 20.10 upgrade has propagated to all channels. See also flatcar-archive/coreos-overlay#931 Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
;; | ||
1.19) | ||
S3_PATH="1.19.6/2021-01-05" | ||
;; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pothos
I've added fetching code for 1.20,1.19 (and 1.21 in a not pushed fixup), but I'm wondering if it makes sense to keep the <1.19 around. I'm not familiar with EKS deployment - is it possible to pass an ignition snippet? Does it make sense to keep a configuration around that requires a reboot to apply (won't play nicely with autoscaling, right?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marga-kinvolk can answer better but yes, I think without baking the immediate reboot into the configuration it doesn't make much sense to keep the old ones available
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marga-kinvolk let me know if you have an opinion here - we can still change this before the coming alpha release.
For tini-0.19. Upstream commit 2e10a957da8a8a93c1f5d82011e3f6692f7b765c. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
We use a custom build system to remove the cmake dependency and hardcode relevant configuration. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Docker upstream split the cli component into a separate repo, so there is a separate ebuild that builds the docker utility. This is a prerequisite of the update of docker to 20.10. This is an import from portage commit 69d01a4273a556b1205a7a575cb3811ab7e2443d. Signed-off-by: Jeremi Piotrowski <jeremi.piotrowski@gmail.com>
We use coreos-go* eclass so that we can override several environment variables and build with the same go version as docker upstream. These changes are modeled after what was previously done in app-emulation/docker, the cli ebuild has only been split out since v20.10. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
This is the version used by docker-19.03. We will be updating the live ebuild to build docker 20.10 dependencies. Signed-off-by: Jeremi Piotrowski <jeremi.piotrowski@gmail.com>
This is the version needed by docker 20.10.7. ROADMAP.md doesn't exist so it has been removed from src_install. Signed-off-by: Jeremi Piotrowski <jeremi.piotrowski@gmail.com>
For update to docker-20.10.7. gentoo/portage commit 0ed05ce0a8f0d1c3dfa6151e7ebb25b67c4aae16 Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
The patches do the following: * install flatcar specific wrappers and systemd config * force some USE flags to default on * allow injecting CFLAGS/LDFLAGS so that torcx can work * force building with go1.13 (like upstream does) - this won't be necessary next time because docker master already uses go1.16
Compared to previous torcx images the docker-cli package is a separate package, following upstream Docker repo layout changes. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
…oupv2 We are switching flatcar to cgroupv2 which is support by docker 20.10 and kubernetes 1.19. This requires setting the systemd cgroup driver in the kubelet config. Due to the unified cgroup hierarchy, kubernetes <1.19 will not work so remove all older versions. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
The upstream docker repository location has changed to docker/docker. Additionally, the cli component has been split out which which requires fetching two hashes and updating two ebuilds. We also took the chance to align the ebuild with gentoo's, which means there are is no more live ebuild and no symlink. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Now that Docker has been updated to 20.10, we can use cgroupv2 so have systemd mount the unified cgroup hierarchy by default. Other ways of achieving the same would have been to pass 'systemd.unified_cgroup_hierarchy=1' on the kernel cmdline, but this way the change propagates nicely to all OEM consumers. Signed-off-by: Jeremi Piotrowski <jeremi.piotrowski@gmail.com>
This pulls in flatcar/update_engine#13 Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
This pulls in flatcar/init#44 Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
The SystemdCgroup=true setting is incompatible with kubelet cgroupDriver: cgroupfs. So to prevent kube clusters from failing, we will be freezing a nodes config.toml during an update. For that purpose, we install a second configuration file that can then be selected using a systemd drop-in unit. Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Move to unified cgroupv2 hierarchy and update docker to 20.10
This PR updates Docker to 20.10 and switches systemd to unified cgroupv2 hierarchy. Docker deps (tini,docker-proxy,docker-cli) have been updated following comments/deps from the gentoo docker ebuild (https://github.com/gentoo/gentoo/blob/69d01a4273a556b1205a7a575cb3811ab7e2443d/app-emulation/docker/docker-20.10.5.ebuild).
This PR does not touch the github actions for docker updates, it will be necessary to adjust that for docker-ce -> docker repo rename and add docker-cli updates as well. I also treated the docker ebuild update as a PoC, and made no attempt to reduce divergence to gentoo.
How to use
Apply the following change to build_torcx_store from https://github.com/kinvolk/flatcar-scripts
Pull this branch and then:
Testing done
Setup kubernetes following https://suraj.io/post/2021/01/kubeadm-flatcar/, then verify that kubernetes is also making proper use of systemd cgroup driver and cgroupv2:
TODOs
google.kubernetes.docker.*
anddocker.oldclient
)