LXC container won't start after systemd upgrade. #1554

Closed
linas opened this Issue May 12, 2017 · 13 comments

Comments

Projects
None yet
6 participants

linas commented May 12, 2017

I've been successfully using a dozen userland LXC containers for .. years. The host OS is Debian unstable. The containers run various flavors of Ubuntu. Last night, I apt-get upgraded two of the Ubuntu Xenial 16.04 containers, and after that, neither boots. To debug, I see the following:

$ lxc-start -F -l info -n opencog-learn 
systemd 229 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN)
Detected virtualization lxc.
Detected architecture x86-64.

Welcome to Ubuntu 16.04.2 LTS!

Set hostname to <opencog-learn>.
Failed to read AF_UNIX datagram queue length, ignoring: No such file or directory
Failed to install release agent, ignoring: No such file or directory
Failed to create /init.scope control group: Permission denied
Failed to allocate manager object: Permission denied
[!!!!!!] Failed to allocate manager object, freezing.
Freezing execution.

Scrolling back to review the apt-get upgrade info, (which was done in the container, not the host) I see that systemd-sysv udev were both upgraded, and that during install, I got this message:

Setting up systemd (229-4ubuntu17) ...
addgroup: The group `systemd-journal' already exists as a system group.
Exiting.
Failed to set capabilities on file `/usr/bin/systemd-detect-virt'
(Invalid argument)
The value of the capability argument is not permitted for a file. Or the
file is not a regular (non-symlink) file

I assume that this failed install is at the root cause of the non-bootable container. Am debugging more, will post news as I find it.

Member

brauner commented May 12, 2017

Was there an update to Debian Unstable's systemd?

linas commented May 12, 2017

No, the update was to systemd inside the container (ubuntu xenial) I did NOT upgrade the host.

Member

brauner commented May 12, 2017

What happens when you kill and restart the container? Same thing?

Member

brauner commented May 12, 2017

Ok, can you show the info that we request in the issue template, please? Especially

  • cat /proc/self/cgroup
  • cat /proc/self/mountinfo

linas commented May 12, 2017

fuuuuu ... closing, user-error. I failed to perform the magic incantation, in advance:

sudo cgm create all linas
sudo cgm chown all linas $(id -u) $(id -g)
cgm movepid all linas $$

I forget exactly why I have to do the above: its a work-around for some patch(es) that have not yet hit my host distro. (debian unstable).

My apologies for the noise.

@linas linas closed this May 12, 2017

Member

brauner commented May 12, 2017

All good. :)

Contributor

evgeni commented May 12, 2017

What patches do you miss in Debian? (Debian maintainer here ;))

Contributor

jannic commented Jul 12, 2017

It looks like I have the same issue, but the solution proposed by @linas doesn't work for me:
Host is debian stretch, guest is debian sid. After upgrading systemd on the guest from 232-25 to 233-10, the guest is unable to start. Exactly the same error message as in the original report.

Trying the cgm create gives another error message, so the proposed fix doesn't work:
Failed opening dbus connection: org.freedesktop.DBus.Error.FileNotFound: Failed to connect to socket /sys/fs/cgroup/cgmanager/sock: No such file or directory.

This is probably caused by cgmanager not running, but service cgmanager start doesn't work, either:

cgmanager: Error mounting unified: No such file or directory
cgmanager: failed to collect cgroup subsystems

This may be related to lxc/cgmanager#32, where @hallyn wrote If you are running systemd you should not need cgmanager.

Please note that I'm not implying there is a bug in lxc. I'm just adding my comment to this ticket, because it seems to be the same or a closely related issue. Any pointers to the core issue are very welcome!

Contributor

jannic commented Jul 12, 2017

As a workaround, setting the command line /sbin/init systemd.legacy_systemd_cgroup_controller when running lxc-start works with systemd 233.
It looks like this is a side-effect of the changes in systemd 233 (see NEWS file at https://github.com/systemd/systemd/blob/v233/NEWS)

systemd/systemd#3388 (comment) may be related?

jjb2016 commented Jul 20, 2017

I have just cone across this issue and initially posted it on the Arch forum here ...
https://bbs.archlinux.org/viewtopic.php?pid=1725626#p1725626

And also reported it on systemd github here ...
systemd/systemd#6408 (comment)

Am I to think that this will be fixed by by the systemd guys? From what i've read it seems like a change in systemd that has caused this.

saivert commented Oct 2, 2017

So as far as I can tell we will just have to keep booting Arch Linux with kernel arg. "systemd.unified_cgroup_hierarchy=false" until LXC developers have started using the new hybrid cgroup support and dropped the use of cgmanager (which current packages in Arch linux still depend on).
I'm running Arch Linux as a host btw which means that containers do work even with hybrid cgroups but I can't start lxcfs service because cgmanager doesn't work with hybrid cgroups. lxc itself seems to not use cgmanager anymore.

Member

brauner commented Oct 2, 2017

@saivert, LXC 2.1 comes with support for the standard systemd hybrid cgroup layout. However, Archlinux should kick cgmanager. We have marked this project as deprecated a long time ago. Instead they should switch to using the pam module which ships with LXCFS and which creates writable cgroups for unprivileged users on login. This is definitely required for hybrid cgroup support!

saivert commented Oct 2, 2017

I should look into the issue with the current Arch Linux packages then. I will try building this from source first and then suggest fixes. The lxc package also currently depends on the cgmanager package so I will have to build lxc from source too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment