Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cgroupv2 - cgroup.subtree_control is empty in the root cgroup and can't be populated #126

Closed
MarkKoz opened this issue Dec 19, 2021 · 0 comments · Fixed by #127
Closed
Assignees
Labels
area: nsjail Related to NsJail and its configuration priority: 1 - high status: planning Discussing details type: bug Something isn't working

Comments

@MarkKoz
Copy link
Member

MarkKoz commented Dec 19, 2021

By default, Docker uses a private cgroup namespace when the host system uses cgroupv2. This results in the root cgroup within the container having an empty cgroup.subtree_control, which means the child cgroups NsJail creates will not have any controllers enabled. Attempting to write to the root cgroup.subtree_control results in a "device or resource busy" error, which seems to be because the cgroup already has processes in it (it's the root cgroup, after all). However, the exact cause for this error has not been confirmed.

docker run has a --cgroupns option which can be set to host to use the host's cgroup namespace instead of a private one. This works around the empty cgroup.subtree_control but at the cost of not having a private namespace. This is in fact the default behaviour when cgroupv1 is used. Another downside is that this option cannot be configured in the Docker Compose file (compose-spec/compose-spec#148). To still use Docker Compose, this setting would need to be set globally via the default-cgroupns-mode Docker daemon option. Otherwise, the container would have to be started with docker run instead.

I looked into the --cgroup-parent option as well. I created a cgroup /sys/fs/cgroup/NSJAIL.slice and then enabled some controllers in its cgroup.subtree_control before starting the container. However, the cgroup Docker creates within NSJAIL.slice still ends up having an empty cgroup.subtree_control. That makes sense, since when I created NSJAIL.slice, cgroup.subtree_control also started out empty (as is documented by the various manpages on cgroupv2). I was hoping that NSJAIL.slice would become the root cgroup in the container, which is not the case.

As mentioned in the cgroup namespaces manpage,

When a process creates a new cgroup namespace using clone(2) or unshare(2) with the CLONE_NEWCGROUP flag, its current cgroups directories become the cgroup root directories of the new namespace.

However, Docker seems to create a new cgroup for each container with some hash in its name right before starting the container. Thus, there seems to be no opportunity to write to cgroup.subtree_control before the cgroup is populated with processes. Thus, the only solution I see currently is to rely on --cgroupns host.

@MarkKoz MarkKoz added type: bug Something isn't working status: planning Discussing details area: nsjail Related to NsJail and its configuration priority: 1 - high labels Dec 19, 2021
@MarkKoz MarkKoz self-assigned this Dec 20, 2021
MarkKoz added a commit that referenced this issue Dec 20, 2021
Ensure the cgroupv2 mount exists, subtree_control is not empty, and
swap is disabled.

Fix #126
Fix #102
MarkKoz added a commit that referenced this issue Dec 21, 2021
Ensure the cgroupv2 mount exists, subtree_control is not empty, and
swap is disabled.

Fix #126
Fix #102
MarkKoz added a commit that referenced this issue Dec 22, 2021
Ensure the cgroupv2 mount exists, subtree_control is not empty, and
swap is disabled.

Fix #126
Fix #102
MarkKoz added a commit that referenced this issue Dec 22, 2021
Ensure the cgroupv2 mount exists, subtree_control is not empty, and
swap is disabled.

Fix #126
Fix #102
MarkKoz added a commit that referenced this issue Dec 25, 2021
Ensure the cgroupv2 mount exists, subtree_control is not empty, and
swap is disabled.

Fix #126
Fix #102
@jb3 jb3 closed this as completed in #127 Dec 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: nsjail Related to NsJail and its configuration priority: 1 - high status: planning Discussing details type: bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant