Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroupv2: fix fs2 driver default path #2305

Merged
merged 1 commit into from
Apr 16, 2020

Conversation

kolyshkin
Copy link
Contributor

When the cgroupv2 fs driver is used without setting cgroupsPath,
it picks up a path from /proc/self/cgroup. On a host with systemd,
such a path can look like (examples from my machines):

  • /user.slice/user-1000.slice/session-4.scope
  • /user.slice/user-1000.slice/user@1000.service/gnome-launched-xfce4-terminal.desktop-4260.scope
  • /user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service

This cgroup already contains processes in it, which prevents to enable
controllers for a sub-cgroup (writing to cgroup.subtree_control fails
with EBUSY or EOPNOTSUPP).

Obviously, a parent cgroup (which does not contain tasks) should be used.

Fixes: #2298

When the cgroupv2 fs driver is used without setting cgroupsPath,
it picks up a path from /proc/self/cgroup. On a host with systemd,
such a path can look like (examples from my machines):

 - /user.slice/user-1000.slice/session-4.scope
 - /user.slice/user-1000.slice/user@1000.service/gnome-launched-xfce4-terminal.desktop-4260.scope
 - /user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service

This cgroup already contains processes in it, which prevents to enable
controllers for a sub-cgroup (writing to cgroup.subtree_control fails
with EBUSY or EOPNOTSUPP).

Obviously, a parent cgroup (which does not contain tasks) should be used.

Fixes opencontainers/issues/2298

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin
Copy link
Contributor Author

kolyshkin commented Apr 9, 2020

@AkihiroSuda PTAL. You might want a similar change in containers/cgroups's NestedGroupPath(), although I'm not sure if it has any users (yet?).

This patch can be amended by checking first if a cgroup has any users, and if so, moving up, but I don't think it's necessary.

@AkihiroSuda AkihiroSuda self-requested a review April 10, 2020 01:00
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Apr 12, 2020
Those needs to be run on the (Vagrant Fedora 31) host
(since we need real systemd running), and so we have
to have all the tools needed to compile runc and run
the tests.

The good news is Fedora packages a decent and recent release
of bats-core (1.1.0), which we can use (Debian does not),
and we can also use golang (currently 1.13.9) from Fedora.

The bad news are

 1. Currently cgroups tests are only working with
    RUNC_USE_SYSTEMD=yes (addressed by opencontainers#2299, opencontainers#2305)

 2. Some tests that are not yet working:

    - ps.bats with RUNC_USE_SYSTEMD=yes (investigating)
    - events.bats (need cgroupv2 memory.events support)

[v2: add -t to ssh to enforce pty]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Apr 13, 2020
Those needs to be run on the (Vagrant Fedora 31) host
(since we need real systemd running), and so we have
to have all the tools needed to compile runc and run
the tests.

The good news is Fedora packages a decent and recent release
of bats-core (1.1.0), which we can use (Debian does not),
and we can also use golang (currently 1.13.9) from Fedora.

The bad news are

 1. Currently cgroups tests are only working with
    RUNC_USE_SYSTEMD=yes (addressed by opencontainers#2299, opencontainers#2305)

 2. Tests in events.bats do not work (need cgroupv2
    memory.events support)

[v2: add -t to ssh to enforce pty]
[v3: disable events tests for cgroupv2]
[v4: update container-selinux to fix ps tests]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Apr 14, 2020
Those needs to be run on the (Vagrant Fedora 31) host
(since we need real systemd running), and so we have
to have all the tools needed to compile runc and run
the tests.

The good news is Fedora packages a decent and recent release
of bats-core (1.1.0), which we can use (Debian does not),
and we can also use golang (currently 1.13.9) from Fedora.

The bad news are

 1. Currently cgroups tests are only working with
    RUNC_USE_SYSTEMD=yes (addressed by opencontainers#2299, opencontainers#2305)

 2. Tests in events.bats do not work (need cgroupv2
    memory.events support)

 3. Fedora 31 image is 6 months old (and has broken
    container-selinux policy) so we need `dnf update`.

[v2: add -t to ssh to enforce pty]
[v3: disable events tests for cgroupv2]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Apr 14, 2020
Those needs to be run on the (Vagrant Fedora 31) host
(since we need real systemd running), and so we have
to have all the tools needed to compile runc and run
the tests.

The good news is Fedora packages a decent and recent release
of bats-core (1.1.0), which we can use (Debian does not),
and we can also use golang (currently 1.13.9) from Fedora.

The bad news are

 1. Currently cgroups tests are only working with
    RUNC_USE_SYSTEMD=yes (addressed by opencontainers#2299, opencontainers#2305)

 2. Tests in events.bats do not work (need cgroupv2
    memory.events support)

 3. Fedora 31 image is 6 months old (and has broken
    container-selinux policy) so we need `dnf update`.

[v2: add -t to ssh to enforce pty]
[v3: disable events tests for cgroupv2]
[v4: update fedora packages, use a single dnf transation]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Apr 14, 2020
Those needs to be run on the (Vagrant Fedora 31) host
(since we need real systemd running), and so we have
to have all the tools needed to compile runc and run
the tests.

The good news is Fedora packages a decent and recent release
of bats-core (1.1.0), which we can use (Debian does not),
and we can also use golang (currently 1.13.9) from Fedora.

The bad news are

 1. Currently cgroups tests are only working with
    RUNC_USE_SYSTEMD=yes (addressed by opencontainers#2299, opencontainers#2305)

 2. Tests in events.bats do not work (need cgroupv2
    memory.events support)

 3. Fedora 31 image is 6 months old (and has broken
    container-selinux policy) so we need `dnf update`,
    which adds ~5 min to test time.

[v2: add -t to ssh to enforce pty]
[v3: disable events tests for cgroupv2]
[v4: update fedora packages, use a single dnf transation]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@mrunalp
Copy link
Contributor

mrunalp commented Apr 14, 2020

LGTM

Approved with PullApprove

@kolyshkin
Copy link
Contributor Author

kolyshkin commented Apr 14, 2020

So, @brauner says that if a host is running systemd, we must use systemd:
#1708 (comment)
and he also describes the approach of "going one level up" (item 2 item 3) which is used here.

Edited: fix item number

@AkihiroSuda
Copy link
Member

AkihiroSuda commented Apr 15, 2020

LGTM

Approved with PullApprove

@h-vetinari
Copy link

@kolyshkin: and he also describes the approach of "going one level up" (item 2) which is used here.

Ummmm... (not sure if "describes" is the right word, plus "one level up" is his Oprion 3?):

@brauner:

  1. [...]
  2. you escape to the root cgroup
  3. you escape one level up form the cgroup the processes are located in
  4. [...]

[...] Option 2 is a big nono as the root cgroup is owned by systemd and is free to do whatever it wants with your processes. You're also violating the single writer rule. Option 3 is another big nono for all the reasons 2 is. [...]

@brauner
Copy link
Contributor

brauner commented Apr 15, 2020

So, @brauner says that if a host is running systemd, we must use systemd:
#1708 (comment)
and he also describes the approach of "going one level up" (item 2) which is used here.

Hey Kir,

You don't necessarily need to run systemd in the container. I might've not expressed myself well enough. The generic problem is that on a v2-only host you can't run an init process in the container that does cgroup management and which is v1 or hybrid mode aware only. This is e.g. is the case for old systemds as systemd will simply freeze if it can't mount named=systemd which it uses for process tracking if the cgroup2 hierarchy is not available. Though nowadays on most supported distros systemd can be told to use the unified hierarchy only.

@kolyshkin
Copy link
Contributor Author

Though nowadays on most supported distros systemd can be told to use the unified hierarchy only.

Yes, I am targeting the v2-only unified hierarchy in mind, as this is the way it's going to be (and it already is in F31 by default).

@kolyshkin
Copy link
Contributor Author

With checkpoint/restore tests disabled for v2 but no systemd, CI passes.

@AkihiroSuda AkihiroSuda merged commit 9f6a2d4 into opencontainers:master Apr 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cgroupv2 fs driver does not work with default cgroups path
5 participants