-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to run systemd in docker with ro /sys/fs/cgroup after systemd 248 host upgrade #42275
Comments
Same here |
Is there already a fix for this? |
For reference, it is possible with namespace isolation. https://docs.docker.com/engine/security/userns-remap/ |
It didn't help. I'm running Ubuntu 21.10 (Impish Indri).
@skast96, it didn't help either. I edited {"userns-remap": "default"} Restarted
So the only workaround is supposedly to switch to the cgroup v1 mode (
GRUB_CMDLINE_LINUX_DEFAULT="systemd.unified_cgroup_hierarchy=0"
UPD And
|
Under cgroups v2 the default for As you noted, passing Aside from workarounds, it would be good to know what the Docker community's official advice on the matter is (where fundamentally we just want the container's |
Related issue: #42040 |
@x-yuri the docker approach is not working that great tbh. It is working with namespace isolation when creating a extra slice for docker and adding this slice to the
That kinda worked for me. However our other containers stopped working with namespace isolation because they were not configured for that. That meant to much work in order to run one container with systemd. So I suggest you to just install
|
Actually for now I'm planning to employ the hybrid/legacy systemd mode (cgroup v1), which seems tolerable in my case. But |
@x-yuri sounds like a plan. My reason for not using v1 is that I needed cgroups v2 to work. |
In the past, I used vagrant -> libvirt to run acceptance test, but after upgrading the worksration to Ubuntu 22.04 LTS (jammy), this stopped working because of incompatibilities. Tests are run using pdk bundle exec rake ... and this uses the pdk environment with ruby 2.7.x, while my operating system (and thus vagrant) have been ported to ruby 3. This creates confusion within the tools, because the Gemfile (from PDK) depends on modules that depend on the ruby version in their name (puppet-module-posix-...-r3.0). After days of debugging and finding seemingly inactive issues on github and jira, I decided that I do not want to waste even more time on this. I like docker as much or less as vagrant (not much) and converted all litmus tests to use docker. Docker itself breaks on Ubuntu 22.04 LTS in unprivileged mode, too. Luckily, from my experience with Gentoo, I already suspected trouble with cgroups. Since systemd 248 (no likey either) something changed in the cgroup handling which causes containers using systemd to fail under further conditions. As much as I understand the discussion, systemd devs expect the container people to change their containers or container infrastructure to their ideas. I don't want to investigate this any deeper. The solution for using unprivileged docker on Ubuntu 22.04 LTS? Add "systemd.unified_cgroup_hierarchy=0" to your kernel cmdline. References for the ruby3 issues: puppetlabs/puppet-module-gems#166 => https://tickets.puppetlabs.com/browse/MODULES-11161 References for the docker/systemd/ubuntu22 issue: moby/moby#42275
This is the mount shown in the container:
|
This is ok (mode is rw). However I assume that you obtained this result with userns-remapping. I think that it should be possible to have the same result without such daemon option, with the proper modifications on the docker engine, like podman does. |
That's correct |
For everyone lurking aroundAs discussion seems to continue and people not able to find the stuff... |
Spent some time looking at this today trying to run a systemd container under rootless docker. The docker daemon is running under a docker run -it --rm --tmpfs /tmp --tmpfs /run registry.access.redhat.com/ubi8/ubi-init:8.8 docker run -it --rm --tmpfs /tmp --tmpfs -v /sys/fs/cgroup:/sys/fs/cgroup /run registry.access.redhat.com/ubi8/ubi-init:8.8 docker run -it --rm --tmpfs /tmp --tmpfs /run --cgroupns=host registry.access.redhat.com/ubi8/ubi-init:8.8 docker run -it --rm --tmpfs /tmp --tmpfs /run -v /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service:/sys/fs/cgroup/user.slice/user@1000.service --cgroupns=host registry.access.redhat.com/ubi8/ubi-init:8.8 docker run -it --rm --tmpfs /tmp --tmpfs /run -v /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service:/sys/fs/cgroup/user.slice/user@1000.service registry.access.redhat.com/ubi8/ubi-init:8.8 I can not find any documentation at all of what the expected behavior of cgroupns=private is supposed to be. Should it be a transparent mapping to a parent context? If so- should probably be mounted rw rather than ro. Also- systemd docs https://systemd.io/CONTAINER_INTERFACE/ seem to imply that's not a best practice anyway. It seems to me that the best approach for my situation is to just set the default cgroupns back to 'host' to get this working properly. |
You may find my blog post informative: https://lewisgaul.co.uk/blog/coding/2022/05/13/cgroups-intro/ I also have some tests that exercise different container setup modes for running systemd: https://github.com/LewisGaul/systemd-containers |
We need something like what's described in containers/podman#14322 (reply in thread) ( |
I've just tested it, it seems to work flawlessly with docker version: 26.1.4 |
Note that if host is running older cgroupv1, the |
I'm having the same problem with dockerdesktop in macos m1 and I'm wondering if anyone has a workaround already? |
Work ideally
|
I just faced the same issue and running the container with sysbox-runc runtime helped. With |
How did you create |
The scope is an ordinary folder, so can be created by Docker itself during volume mount, but this does not work for me - you have a scope, but systemd is not running within the mounted cgroup. |
Change the server in container url to 0.0.0.0, which should be safer long-term and resolve some odd errors found with podman related to pasta. Log the container run command for easier troubleshooting locally outside the test suite. Add an execution environment build test Note the failure in this test run: opening file /sys/fs/cgroup/cgroup.subtree_control for writing: Read-only file system https://github.com/ansible/ansible-dev-tools/actions/runs/10930266208/job/30342982168?pr=377 This is why unmask=/sys/fs/cgroup is added after the initial addition of the EE test which works for podman. For docker based on: moby/moby#42275 (comment) --privileged was added (not ideal, but few options) On macOS/intel/podman desktop the following errors were found: Error: crun: mknod /dev/null: Operation not permitted: OCI permission denied the following was added to resolve this error: --cap-add=mknod (docker gets this by default) this allowed all tests to pass on macOS/intel/podman desktop 277.32s call tests/integration/test_container.py::test_builder 6.21s call tests/integration/test_container.py::test_nav_playbook 4.99s call tests/integration/test_container.py::test_nav_collections 3.56s call tests/integration/test_container.py::test_navigator_simple_c_in_c 3.18s call tests/integration/test_container.py::test_nav_collection 2.77s call tests/integration/test_container.py::test_navigator_simple 2.58s call tests/integration/test_container.py::test_podman 1.23s call tests/integration/test_container.py::test_nav_images 1.15s setup tests/integration/test_container.py::test_nav_collections 0.78s setup tests/integration/test_container.py::test_nav_playbook ======================================= 34 passed, 1 warning in 310.65s (0:05:10) ======================================= Additional changes necessary for Windows user include the addition of "--cap-add=NET_ADMIN", to avoid bpf query: Operation failed errors when building an EE --------- Co-authored-by: Brad Thornton <bthornto@bthornto-mac.lan>
BUG REPORT INFORMATION
I used to run docker containers with systemd as CMD without having to expose /sys/fs/cgroup as rw; this worked until systemd 248 on the host. Now it fails with
I opened a related issue on the systemd github repo: systemd/systemd#19245
Workarounds
Steps to reproduce the issue:
Dockerfile:
Expected behaviour
Actual behaviour
Since systemd v248
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.):
x86_64 Intel hw, Arch Linux 5.11.11-arch1-1
The text was updated successfully, but these errors were encountered: