Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix cgroup2 support #2844

Merged
merged 1 commit into from
Jan 26, 2021
Merged

fix cgroup2 support #2844

merged 1 commit into from
Jan 26, 2021

Conversation

AkihiroSuda
Copy link
Contributor

Proposed Changes

cgroup2 support was introduced in #2584, but got broken in f3de60f

It was failing with "F1210 19:13:37.305388 4955 server.go:181] cannot set feature gate SupportPodPidsLimit to false, feature is locked to true"

Types of Changes

BugFix

Verification

sudo k3s server on Ubuntu 20.10 with cgroup2

Linked Issues

Fix #900

Further Comments

@AkihiroSuda AkihiroSuda requested a review from a team as a code owner January 22, 2021 13:18
@AkihiroSuda AkihiroSuda mentioned this pull request Jan 22, 2021
@AkihiroSuda
Copy link
Contributor Author

Opened #2845 for requesting cgroup v2 CI infra.

Fix issue 900

cgroup2 support was introduced in PR 2584, but got broken in f3de60f

It was failing with "F1210 19:13:37.305388    4955 server.go:181] cannot set feature gate SupportPodPidsLimit to false, feature is locked to true"

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
@brandond brandond self-assigned this Jan 22, 2021
@dweomer
Copy link
Contributor

dweomer commented Jan 22, 2021

This looks fine to me but we really need @Oats87 to review.

Copy link
Member

@Oats87 Oats87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this PR looks good to me, but I do have some concerns around functionality. As far as I can tell this will not break our existing cgroupsv1 support, and so should be safe to merge, although we should do regression testing to verify this.

@@ -182,13 +184,34 @@ func checkCgroups() (kubeletRoot, runtimeRoot string, hasCFS, hasPIDs bool) {
}
defer f.Close()

v2 := cgroups.Mode() == cgroups.Unified
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not very familiar with github.com/containerd/cgroups behavior -- is it possible that we'll receive a cgroups.Unified result from cgroups.Mode() even if we don't have systemd as our init process?

i.e. on a system with unified cgroups but no systemd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes,that is possible, as long as cgroup2 is mounted on /sys/fs/cgroup

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AkihiroSuda do you know if there is a non-systemd based system that only supports cgroups v2 that we could test this on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@brandond brandond Jan 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't actually find an OpenRC based distro that uses unified mode. Fedora33 uses unified mode with SystemD and k3s works fine after applying this PR:

[sysadm@fedora01 ~]$ mount | grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate)

Run as a systemd unit:

[sysadm@fedora01 ~]$ systemd-cgls
...
└─system.slice
  ├─k3s.service
  │ ├─3742 /usr/local/bin/k3s server
  │ └─3755 containerd

Run from a shell:

[sysadm@fedora01 ~]$ systemd-cgls
...
├─user.slice
│ └─user-1000.slice
│   ├─user@1000.service
│   │ └─init.scope
│   │   ├─747 /usr/lib/systemd/systemd --user
│   │   └─749 (sd-pam)
│   ├─session-3.scope
│   │ ├─ 773 sshd: sysadm [priv]
│   │ ├─ 777 sshd: sysadm@pts/0
│   │ ├─ 778 -bash
│   │ ├─ 903 sudo -i
│   │ ├─ 905 -bash
│   │ ├─1842 tee k3s.log
│   │ ├─1864 containerd
│   │ ├─2273 /var/lib/rancher/k3s/data/8e9ca8da90942295d88eb1e4277ae10c3cc9b17b43853382ee1d0dd19e4caa10/bin/containerd-shim-runc-v2 -namespace k8s.io -id 605179ece3>
│   │ ├─2300 /var/lib/rancher/k3s/data/8e9ca8da90942295d88eb1e4277ae10c3cc9b17b43853382ee1d0dd19e4caa10/bin/containerd-shim-runc-v2 -namespace k8s.io -id 8549898bd0>
│   │ ├─2330 /var/lib/rancher/k3s/data/8e9ca8da90942295d88eb1e4277ae10c3cc9b17b43853382ee1d0dd19e4caa10/bin/containerd-shim-runc-v2 -namespace k8s.io -id 566e62505e>
│   │ ├─2924 /var/lib/rancher/k3s/data/8e9ca8da90942295d88eb1e4277ae10c3cc9b17b43853382ee1d0dd19e4caa10/bin/containerd-shim-runc-v2 -namespace k8s.io -id 41bac9411d>
│   │ └─3036 /var/lib/rancher/k3s/data/8e9ca8da90942295d88eb1e4277ae10c3cc9b17b43853382ee1d0dd19e4caa10/bin/containerd-shim-runc-v2 -namespace k8s.io -id fac53cab8b>
└─k3s
  └─1841 /usr/local/bin/k3s server

I tried Devuan Beowulf with OpenRC and it is in hybrid mode with all the controllers running under v2 by default. I tried editing /etc/rc.conf but it appears that none of the settings in there make any difference - I can't put it into legacy OR unified mode, or move any of the required controllers from v2 to v1.

sysadm@devuan01:~$ mount | grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,mode=755)
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
cgroup on /sys/fs/cgroup/elogind type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/elogind/elogind-cgroups-agent,name=elogind)

K3s does NOT work on devuan (with or without this PR) because we don't support hybrid mode with the controllers under v2 - as currently implemented it only works if the the pids and cgroups controllers are in v1 (legacy or hybrid mode), or if we are in 'pure' unified mode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brandond FYI. I built version of k3OS which combines this PR with OpenRC on "unified" mode. You can find it from: https://github.com/olljanat/k3os/releases/tag/rootless-cgroup-v2-test

Afaiu cgroup is using v2 mode and k3s still works:

k3os-11941 [~]$ ls -l /sys/fs/cgroup/
total 0
drwxr-xr-x 2 root root 0 Jan 25 22:04 acpid
drwxr-xr-x 2 root root 0 Jan 25 22:04 ccapply
-r--r--r-- 1 root root 0 Jan 25 22:04 cgroup.controllers
-rw-r--r-- 1 root root 0 Jan 25 22:04 cgroup.max.depth
-rw-r--r-- 1 root root 0 Jan 25 22:04 cgroup.max.descendants
-rw-r--r-- 1 root root 0 Jan 25 22:04 cgroup.procs
-r--r--r-- 1 root root 0 Jan 25 22:04 cgroup.stat
-rw-r--r-- 1 root root 0 Jan 25 22:11 cgroup.subtree_control
-rw-r--r-- 1 root root 0 Jan 25 22:04 cgroup.threads
drwxr-xr-x 2 root root 0 Jan 25 22:04 connman
-rw-r--r-- 1 root root 0 Jan 25 22:04 cpu.pressure
-r--r--r-- 1 root root 0 Jan 25 22:04 cpuset.cpus.effective
-r--r--r-- 1 root root 0 Jan 25 22:04 cpuset.mems.effective
drwxr-xr-x 2 root root 0 Jan 25 22:04 dbus
drwxr-xr-x 2 root root 0 Jan 25 22:04 haveged
-rw-r--r-- 1 root root 0 Jan 25 22:04 io.cost.model
-rw-r--r-- 1 root root 0 Jan 25 22:04 io.cost.qos
-rw-r--r-- 1 root root 0 Jan 25 22:04 io.pressure
drwxr-xr-x 2 root root 0 Jan 25 22:04 iscsid
drwxr-xr-x 2 root root 0 Jan 25 22:04 issue
drwxr-xr-x 2 root root 0 Jan 25 22:04 k3s-service
drwxr-xr-x 4 root root 0 Jan 25 22:04 kubepods
-rw-r--r-- 1 root root 0 Jan 25 22:04 memory.pressure
drwxr-xr-x 2 root root 0 Jan 25 22:04 sshd
drwxr-xr-x 2 root root 0 Jan 25 22:04 syslog
drwxr-xr-x 2 root root 0 Jan 25 22:04 udev
k3os-11941 [~]$ kubectl get nodes
NAME         STATUS   ROLES                  AGE     VERSION
k3os-11941   Ready    control-plane,master   7m41s   v1.20.2+k3s-127b96e8

Copy link
Member

@brandond brandond Jan 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@olljanat if you are confirming that this works with OpenRC in unified mode then I'm satisfied. We can open another issue to track hybrid mode with the cpu,pids controllers in v2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"hybrid" mode is almost meaningless in the container ecosystem, because a controller cannot be enabled for both v1 and v2 simultaneously.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah hybrid mostly just means we should check v2 and fall back to v1. Right now we just assume they are available in v1 in hybrid mode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the case is that without this one k3s fails to start if cgroup v2 is used then I can confirm that it works OpenRC in unified mode. Or is there something else what I should check?

@@ -197,7 +220,7 @@ func checkCgroups() (kubeletRoot, runtimeRoot string, hasCFS, hasPIDs bool) {
if _, err := os.Stat(p); err == nil {
hasCFS = true
}
} else if system == "name=systemd" {
} else if system == "name=systemd" || v2 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When on a non-systemd system but with unified cgroups, is it OK that we aren't setting kubeletRoot if we don't have a .scope?

Copy link
Contributor Author

@AkihiroSuda AkihiroSuda Jan 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with kubeletRoot stuff honestly , but this PR corresponds to the v1 behavior.

olljanat added a commit to olljanat/k3os that referenced this pull request Jan 25, 2021
olljanat added a commit to olljanat/k3os that referenced this pull request Jan 25, 2021
olljanat added a commit to olljanat/k3os that referenced this pull request Jan 25, 2021
@brandond brandond merged commit f3c41b7 into k3s-io:master Jan 26, 2021
@AkihiroSuda
Copy link
Contributor Author

AkihiroSuda commented Jan 26, 2021

Would it be possible to get CI for this? #2845

GitHub Actions with Vagrant can be used for smoke testing, though probably too heavy for the entire sonobuoy testing.
e.g. https://github.com/rootless-containers/usernetes/blob/v20201118.0/.github/workflows/main.yaml#L31-L50

@phedders
Copy link

Would it be possible to expedite a new release to include #2844 ?
Many thanks!

@brandond
Copy link
Member

brandond commented Jan 28, 2021

We will probably just wait for the next upstream patch release: #2857
You can always run a CI build using the INSTALL_K3S_COMMIT variable for the installer, if you want to use this before we have cut an official release.

@sjkeerthi
Copy link

I know it's too late just want to get clarification if these features enabled will be supported on the latest k3s with the latest Kubernetes or will it be considered some other last three or four Kubernetes support?

Like for example will It be available on
1.19.x
1.20.x
1.21.x
1.22.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support cgroups v2
8 participants