userns doesn't work on ubuntu-xenial #769

Closed
tonistiigi opened this Issue Apr 21, 2016 · 16 comments

Comments

Projects
None yet
7 participants
Contributor

tonistiigi commented Apr 21, 2016

from master:

rootfs_linux.go:53: mounting "/dev/mqueue" to rootfs "/home/vagrant/a/rootfs" caused "device or resource busy"

from docker v1.11(slightly older runc):

docker: Error response from daemon: rpc error: code = 2 desc = "oci runtime error: could not synchronise with container process: device or resource busy".

root@vagrant-VirtualBox:/home/vagrant/a# uname -a
Linux vagrant-VirtualBox 4.4.0-21-generic #37-Ubuntu SMP Mon Apr 18 18:33:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

iavael commented Apr 24, 2016

Same problem on my host with docker 1.10.3-cs2 after upgrade to xenial (4.4.0-21-generic)

mount(2) of mqueue returns EPERM with --ipc=host and EBUSY without.

Contributor

mrunalp commented Apr 26, 2016

I suspect a bug in the Xenial kernel as there is no such issue on Fedora 23 with 4.4.6 kernel or rawhide that tracks upstream kernel 4.6.0.rc*.

Member

cyphar commented Apr 26, 2016

I agree that it's almost certainly a Xenial-specific kernel bug. Tumbleweed doesn't have this problem with its stock kernel (4.5.x) and I compiled 4.6-rc5 yesterday to play with cgroup namespaces and that didn't have this problem either.

iavael commented May 1, 2016

I can start lxd userns containers on xenial and /dev/mqueue is mounted inside them successfully, so somehow this bug is specific to runc

[pid 23026] mount("/dev/mqueue", "/proc/self/fd/12", 0x7ffda1a1b847, MS_BIND|MS_REC, NULL) = 0

iavael commented May 1, 2016

And this is what I get in docker:

without --ipc=host

[pid 24982] mount("mqueue", "/var/lib/docker/231072.231072/overlay/b3b009feb3bc6bfa609e0f550546a0eefc275ca0cc27fd3861526fdef37d1687/merged/dev/mqueue", "mqueue", MS_NOSUID|MS_NODEV|MS_NOEXEC, NULL) = -1 EBUSY (Device or resource busy)

and with --ipc=host

[pid 25130] mount("mqueue", "/var/lib/docker/231072.231072/overlay/04c0de3fdc0fc10d20f596d34a8f8eb890498b35cf303677007d8341322e0c2f/merged/dev/mqueue", "mqueue", MS_NOSUID|MS_NODEV|MS_NOEXEC, NULL) = -1 EPERM (Operation not permitted)
Contributor

mrunalp commented May 2, 2016

@iavael Can you reproduce this just with runc and no overlay rootfs (to rule out overlayfs)?

iavael commented May 2, 2016

Did it with runc 0.1.1 from github releases

# strace -qqyff -e trace=mount ./runc-amd64 start test
[pid 24259] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=24260, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 24262] mount("", "/", 0xc8200b4fae, MS_REC|MS_PRIVATE, NULL) = 0
[pid 24262] mount("/tmp/runc-test/rootfs", "/tmp/runc-test/rootfs", 0xc8200b5a28, MS_BIND|MS_REC, NULL) = 0
[pid 24262] mount("proc", "/tmp/runc-test/rootfs/proc", "proc", 0, NULL) = 0
[pid 24262] mount("tmpfs", "/tmp/runc-test/rootfs/dev", "tmpfs", MS_NOSUID|MS_STRICTATIME, "mode=755,size=65536k") = 0
[pid 24262] mount("devpts", "/tmp/runc-test/rootfs/dev/pts", "devpts", MS_NOSUID|MS_NOEXEC, "newinstance,ptmxmode=0666,mode=0"...) = 0
[pid 24262] mount("shm", "/tmp/runc-test/rootfs/dev/shm", "tmpfs", MS_NOSUID|MS_NODEV|MS_NOEXEC, "mode=1777,size=65536k") = 0
[pid 24262] mount("mqueue", "/tmp/runc-test/rootfs/dev/mqueue", "mqueue", MS_NOSUID|MS_NODEV|MS_NOEXEC, NULL) = -1 EBUSY (Device or resource busy)
[pid 24262] mount("mqueue", "/tmp/runc-test/rootfs/dev/mqueue", "mqueue", MS_NOSUID|MS_NODEV|MS_NOEXEC, NULL) = -1 EBUSY (Device or resource busy)
[pid 24259] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=24262, si_uid=65535, si_status=1, si_utime=0, si_stime=0} ---
could not synchronise with container process: device or resource busy
#

Tested on exported and chown-ed uid/gid+65535 docker busybox image and default config.json with added userns and uid/gidmapping sections.

runc-test.txz.zip

iavael commented May 2, 2016

changing mqueue mount section to

{
        "destination": "/dev/mqueue",
        "type": "bind",
        "source": "/dev/mqueue",
        "options": [
                "rbind"
        ]
},

resolves (more likely kludges) the problem in runc in exchange for posix mq isolation

Any news on that ?

Member

cyphar commented May 13, 2016

@mrunalp Yeah, it looks like the problems with /proc they mentioned weren't actually fixed in that commit -- so we fail with -EBUSY when setting up the container's mounts.

CRTX commented Aug 11, 2016

It's been quite a few months. No update on this?

iavael commented Aug 12, 2016

@CRTX bug was fixed in linux-image-4.4.0-25-generic kernel

Contributor

hqhq commented Aug 12, 2016

@iavael Can you add some links so we can close this issue?

iavael commented Aug 12, 2016

Contributor

hqhq commented Aug 12, 2016

@iavael Thanks, docker/docker#22633 is still open I think there is a different issue, close this now.

@hqhq hqhq closed this Aug 12, 2016

stefanberger pushed a commit to stefanberger/runc that referenced this issue Sep 8, 2017

Merge pull request #769 from wking/require-syscall-names
config-linux: Require at least one entry in linux.seccomp.sycalls[].names
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment