systemd 226 breaks privileged and unprivileged containers when run inside a container #663

Closed
brauner opened this Issue Sep 26, 2015 · 9 comments

Comments

Projects
None yet
3 participants
Member

brauner commented Sep 26, 2015

systemd 226 breaks privileged and unprivileged containers for me when run inside of a container. journald cannot be started apparently; it just keeps trying to start it endlessly. This happens for Debian Sid and Archlinux at least.

Owner

hallyn commented Sep 26, 2015

Member

brauner commented Sep 28, 2015

If you force systemd-journald.socket to close with:

systemctl stop systemd-journald.socket

the next service that fails is dbus (I think I have seen this behaviour with systemd < 220 and there it had something todo with OOM. This was fixed in systemd > 220). Afterwards, systemd-logind fails badly and systemd just prints:

Looping too fast. Throttling execution a little.

endlessly.

Contributor

martinpitt commented Oct 13, 2015

bisect says this introduced the failure: systemd/systemd@efdb023 ("core: unified cgroup hierarchy support").

unified cgroup hierarchy is disabled by default, but I figure the part that matters is that this also moves pid1 from the hierarchy root into a new /init.scope cgroup (only on the name=systemd controller though, as usual). As everything is proxied through cgmanager/lxcfs it could be that these need to be adjusted for this?

Owner

hallyn commented Oct 13, 2015

To deal with the init.scope I tried http://paste.ubuntu.com/12776710/ but that did not suffice. I think it gets me further, but I don't know how to figure out why the services systemd starts are failing.

Contributor

martinpitt commented Oct 14, 2015

Told @hallyn on IRC, but for the record:

I changed lib/systemd/system/systemd-journald.service to run ExecStart=/usr/bin/strace -fvvs1024 /lib/systemd/systemd-journald. But that doesn't help, it already fails setting up the cgroup for the new service, in between fork and exec apparently.

Attaching strace to the container's init itself is more insightful:

mkdir("/sys/fs/cgroup/systemd/lxc/s/system.slice/systemd-journald.service", 0755) = 0
open("/sys/fs/cgroup/systemd/lxc/s/system.slice/systemd-journald.service/cgroup.procs", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[...]
open("/sys/fs/cgroup/systemd/lxc/s/system.slice/systemd-journald.service/cgroup.procs", O_WRONLY|O_NOCTTY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[...]
sendmsg(3, {msg_name(0)=NULL, msg_iov(9)=[{"PRIORITY=3\nSYSLOG_FACILITY=3\nCODE_FILE=../src/core/execute.c\nCODE_LINE=2040\nCODE_FUNCTION=exec_spawn\nERRNO=2\nSYSLOG_IDENTIFIER=systemd\n", 135}, {"MESSAGE_ID=641257651c1b4ec9a8624d7a40a9e1e7", 43}, {"\n", 1}, {"UNIT=systemd-journald.service", 29}, {"\n", 1}, {"MESSAGE=systemd-journald.service: Failed at step CGROUP spawning /lib/systemd/systemd-journald: No such file or directory", 121}, {"\n", 1}, {"EXECUTABLE=/lib/systemd/systemd-journald", 40}, {"\n", 1}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL)

I. e. creating the new cgroup (mkdir) worked, but the loop where it waits for cgroup.procs to exist eventually stops trying, and it seems it never exists. The sendmsg is a log attempt where you see the error message (but there's no journald running yet.)

I don't think /sys/fs/cgroup/systemd/lxc/s/system.slice/systemd-journald.service/cgroup.procs is the real cgroupfs here, right? That's lxcfs. At least cgmanager and lxcfs spin to 100% CPU while a container is booting, and cgroup operations take ages, so it seems this is a "virtualized" cgroup fs.

Does that help at all?

Contributor

martinpitt commented Oct 14, 2015

If i lxc-attach to the container while it's trying to boot, the simulated cgroupfs seems suspiciously empty:

# find /sys/fs/cgroup/systemd/
/sys/fs/cgroup/systemd/
/sys/fs/cgroup/systemd/lxc
/sys/fs/cgroup/systemd/lxc/s
/sys/fs/cgroup/systemd/lxc/s/init.scope
/sys/fs/cgroup/systemd/lxc/s/init.scope/tasks
/sys/fs/cgroup/systemd/lxc/s/init.scope/cgroup.procs
/sys/fs/cgroup/systemd/lxc/s/init.scope/cgroup.clone_children
/sys/fs/cgroup/systemd/lxc/s/init.scope/notify_on_release

So maybe the successful mkdir() is wrong, and it never actually created the new cgroup? I see the same in systemd-cgls, there's never any other cgroup than the above.

Member

brauner commented Oct 14, 2015

/sys/fs/cgroup/systemd/lxc/arch_priv/system.slice/ inside the container on the other hand is fully populated at boot:

[chb@conventiont system.slice]$ pwd
/sys/fs/cgroup/systemd/lxc/arch_priv/system.slice
[chb@conventiont system.slice]$ ls -al
total 0
drwxr-xr-x 20 root root 0 Oct 14 11:44 .
drwxr-xr-x  5 root root 0 Oct 14 11:38 ..
-rw-r--r--  1 root root 0 Oct 14 11:38 cgroup.clone_children
-rw-r--r--  1 root root 0 Oct 14 11:38 cgroup.procs
drwxr-xr-x  2 root root 0 Oct 14 11:38 dev-lxc-console.mount
drwxr-xr-x  2 root root 0 Oct 14 11:38 dev-lxc-tty1.mount
drwxr-xr-x  2 root root 0 Oct 14 11:38 dev-lxc-tty2.mount
drwxr-xr-x  2 root root 0 Oct 14 11:38 dev-lxc-tty3.mount
drwxr-xr-x  2 root root 0 Oct 14 11:38 dev-lxc-tty4.mount
drwxr-xr-x  2 root root 0 Oct 14 11:38 -.mount
-rw-r--r--  1 root root 0 Oct 14 11:38 notify_on_release
drwxr-xr-x  2 root root 0 Oct 14 11:38 proc-cpuinfo.mount
drwxr-xr-x  2 root root 0 Oct 14 11:38 proc-diskstats.mount
drwxr-xr-x  2 root root 0 Oct 14 11:38 proc-meminfo.mount
drwxr-xr-x  2 root root 0 Oct 14 11:38 proc-stat.mount
drwxr-xr-x  2 root root 0 Oct 14 11:38 proc-sys-net.mount
drwxr-xr-x  2 root root 0 Oct 14 11:38 proc-sysrq\x2dtrigger.mount
drwxr-xr-x  2 root root 0 Oct 14 11:38 proc-uptime.mount
drwxr-xr-x  2 root root 0 Oct 14 11:38 sys-devices-virtual-net.mount
drwxr-xr-x  2 root root 0 Oct 14 11:38 sys-fs-fuse-connections.mount
drwxr-xr-x  2 root root 0 Oct 14 11:38 system-container\x2dgetty.slice
drwxr-xr-x  2 root root 0 Oct 14 11:44 systemd-journald.service
drwxr-xr-x  2 root root 0 Oct 14 11:38 system-getty.slice
-rw-r--r--  1 root root 0 Oct 14 11:38 tasks

But the systemd-journald.service folder gets constantly created, deleted and recreated. Stracing init reveals:

lstat("/sys/fs/cgroup/cpuacct/lxc/arch_priv/system.slice/systemd-journald.service", 0x7fff7851adc0) = -1 ENOENT (No such file or directory)
rmdir("/sys/fs/cgroup/cpuacct/lxc/arch_priv/system.slice/systemd-journald.service") = -1 ENOENT (No such file or directory)
lstat("/sys/fs/cgroup/blkio/lxc/arch_priv/system.slice/systemd-journald.service", 0x7fff7851adc0) = -1 ENOENT (No such file or directory)
rmdir("/sys/fs/cgroup/blkio/lxc/arch_priv/system.slice/systemd-journald.service") = -1 ENOENT (No such file or directory)
lstat("/sys/fs/cgroup/memory/lxc/arch_priv/system.slice/systemd-journald.service", 0x7fff7851adc0) = -1 ENOENT (No such file or directory)
rmdir("/sys/fs/cgroup/memory/lxc/arch_priv/system.slice/systemd-journald.service") = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sys/fs/cgroup/devices", {st_mode=S_IFDIR|0755, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
stat("/sys/fs/cgroup/devices/lxc/arch_priv/system.slice", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
mkdir("/sys/fs/cgroup/devices/lxc/arch_priv/system.slice/systemd-journald.service", 0755) = 0
open("/sys/fs/cgroup/systemd/lxc/arch_priv/system.slice/systemd-journald.service/cgroup.procs", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/sys/fs/cgroup/systemd/lxc/arch_priv/system.slice/systemd-journald.service", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
open("/sys/fs/cgroup/systemd/lxc/arch_priv/system.slice/systemd-journald.service", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27
fstat(27, {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
getdents(27, /* 0 entries */, 32768)    = 0
close(27)                               = 0
open("/sys/fs/cgroup/systemd/lxc/arch_priv/system.slice/systemd-journald.service/cgroup.procs", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/sys/fs/cgroup/systemd/lxc/arch_priv/system.slice/systemd-journald.service", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
open("/sys/fs/cgroup/systemd/lxc/arch_priv/system.slice/systemd-journald.service", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27
fstat(27, {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
getdents(27, /* 0 entries */, 32768)    = 0
close(27)                               = 0
open("/sys/fs/cgroup/systemd/lxc/arch_priv/system.slice/systemd-journald.service/cgroup.procs", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/sys/fs/cgroup/systemd/lxc/arch_priv/system.slice/systemd-journald.service", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
open("/sys/fs/cgroup/systemd/lxc/arch_priv/system.slice/systemd-journald.service", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27
fstat(27, {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
getdents(27, /* 0 entries */, 32768)    = 0
close(27)                               = 0
open("/sys/fs/cgroup/systemd/lxc/arch_priv/system.slice/systemd-journald.service/cgroup.procs", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/sys/fs/cgroup/systemd/lxc/arch_priv/system.slice/systemd-journald.service", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
open("/sys/fs/cgroup/systemd/lxc/arch_priv/system.slice/systemd-journald.service", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27
fstat(27, {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
getdents(27, /* 0 entries */, 32768)    = 0
close(27)                               = 0
open("/sys/fs/cgroup/systemd/lxc/arch_priv/system.slice/systemd-journald.service/cgroup.procs", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/sys/fs/cgroup/systemd/lxc/arch_priv/system.slice/systemd-journald.service", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
open("/sys/fs/cgroup/systemd/lxc/arch_priv/system.slice/systemd-journald.service", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 27
fstat(27, {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
getdents(27, /* 0 entries */, 32768)    = 0
close(27)                               = 0
open("/sys/fs/cgroup/devices/lxc/arch_priv/system.slice/systemd-journald.service/devices.allow", O_WRONLY|O_NOCTTY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fc8af704ad0) = 251
newfstatat(AT_FDCWD, "/sys/fs/cgroup/systemd", {st_mode=S_IFDIR|0755, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
open("/sys/fs/cgroup/systemd/lxc/arch_priv/system.slice/systemd-journald.service/cgroup.procs", O_WRONLY|O_NOCTTY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
Owner

hallyn commented Oct 15, 2015

This should be fixed with the newest commits to lxcfs and cgmanager. I can now start debian unstable containers as well as ubuntu wily containers with ppa:pitti/systemd (which is 226).

@hallyn hallyn closed this Oct 15, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment