starting any container with umask 007 causes lxc-stop to hang and prevents clean shutdown of host system #1403

foresto · 2017-01-30T22:50:29Z

If I have umask 007 (or any other value that masks the world-execute bit) when I run lxc-start for the first time after logging in, my host system enters a state with the following problems:

lxc-stop hangs forever instead of stopping any container, even one that wasn't started with umask 007.
lxc-stop --kill --nolock hangs in the same way.
Attempts to reboot or shut down the host system fail, requiring a hard reset to recover.

When lxc-stop hangs, messages like these appear in syslog every couple of minutes:

Nov 17 01:22:11 hostbox kernel: [ 3360.091624] INFO: task systemd:12179 blocked for more than 120 seconds.
Nov 17 01:22:11 hostbox kernel: [ 3360.091629] Tainted: P OE 4.4.0-47-generic #68-Ubuntu
Nov 17 01:22:11 hostbox kernel: [ 3360.091631] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 17 01:22:11 hostbox kernel: [ 3360.091633] systemd D ffff8800c6febb58 0 12179 12168 0x00000104
Nov 17 01:22:11 hostbox kernel: [ 3360.091638] ffff8800c6febb58 ffff8800d318d280 ffff88040c649b80 ffff8800d318d280
Nov 17 01:22:11 hostbox kernel: [ 3360.091641] ffff8800c6fec000 ffff8800345bc088 ffff8800345bc070 ffffffff00000000
Nov 17 01:22:11 hostbox kernel: [ 3360.091644] fffffffe00000001 ffff8800c6febb70 ffffffff81830f15 ffff8800d318d280
Nov 17 01:22:11 hostbox kernel: [ 3360.091647] Call Trace:
Nov 17 01:22:11 hostbox kernel: [ 3360.091653] [<ffffffff81830f15>] schedule+0x35/0x80
Nov 17 01:22:11 hostbox kernel: [ 3360.091657] [<ffffffff81833b62>] rwsem_down_write_failed+0x202/0x350
Nov 17 01:22:11 hostbox kernel: [ 3360.091662] [<ffffffff812899a0>] ? kernfs_sop_show_options+0x40/0x40
Nov 17 01:22:11 hostbox kernel: [ 3360.091666] [<ffffffff81403fa3>] call_rwsem_down_write_failed+0x13/0x20
Nov 17 01:22:11 hostbox kernel: [ 3360.091669] [<ffffffff8183339d>] ? down_write+0x2d/0x40
Nov 17 01:22:11 hostbox kernel: [ 3360.091672] [<ffffffff812104a0>] grab_super+0x30/0xa0
Nov 17 01:22:11 hostbox kernel: [ 3360.091674] [<ffffffff81210a32>] sget_userns+0x152/0x450
Nov 17 01:22:11 hostbox kernel: [ 3360.091677] [<ffffffff81289a20>] ? kernfs_sop_show_path+0x50/0x50
Nov 17 01:22:11 hostbox kernel: [ 3360.091680] [<ffffffff81289c8e>] kernfs_mount_ns+0x7e/0x230
Nov 17 01:22:11 hostbox kernel: [ 3360.091685] [<ffffffff811187ab>] cgroup_mount+0x2eb/0x7f0
Nov 17 01:22:11 hostbox kernel: [ 3360.091687] [<ffffffff81211af8>] mount_fs+0x38/0x160
Nov 17 01:22:11 hostbox kernel: [ 3360.091691] [<ffffffff8122db57>] vfs_kern_mount+0x67/0x110
Nov 17 01:22:11 hostbox kernel: [ 3360.091694] [<ffffffff81230329>] do_mount+0x269/0xde0
Nov 17 01:22:11 hostbox kernel: [ 3360.091698] [<ffffffff812311cf>] SyS_mount+0x9f/0x100
Nov 17 01:22:11 hostbox kernel: [ 3360.091701] [<ffffffff81834ff2>] entry_SYSCALL_64_fastpath+0x16/0x71

When system shutdown hangs, similar messages appear on the console every couple of minutes.

I can reproduce this at will with a freshly-installed and fully-updated host OS in VirtualBox, and with either an old-ish container or a new one.

I'm running lxc 2.0.5-0ubuntu1~ubuntu16.04.2 on xubuntu 16.04.1 LTS amd64.

My containers are all unprivileged.

My umask at container creation time does not seem to matter. As far as I have seen, my umask only matters the first time I start a container in my login session.

I can work around the bug by manually setting my umask to something more permissive before I start my first container of the day, and then setting it back again, but that's rather a hassle. (Even worse, it's very easy to forget this workaround and be left with containers that can't be stopped and a host system that won't shut down cleanly.)

The text was updated successfully, but these errors were encountered:

ogai · 2017-04-28T15:14:29Z

I can confirm this also with umask 077

ivan · 2017-12-17T09:39:47Z

I am seeing the same thing with LXC 2.0.7 on Debian stretch + Linux 4.14.6, umask 027:

kernel: INFO: task systemd:21698 blocked for more than 120 seconds.
kernel:       Tainted: G           O    4.14.0-15-amd64 #1 Debian 4.14.6-1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: systemd         D    0 21698  21690 0x00000100
kernel: Call Trace:
kernel:  __schedule+0x3cc/0x850
kernel:  schedule+0x36/0x80
kernel:  rwsem_down_write_failed+0x220/0x400
kernel:  call_rwsem_down_write_failed+0x17/0x30
kernel:  ? up+0x32/0x50
kernel:  ? call_rwsem_down_write_failed+0x17/0x30
kernel:  down_write+0x2d/0x40
kernel:  grab_super+0x30/0x90
kernel:  ? kernfs_sop_show_options+0x50/0x50
kernel:  sget_userns+0x16a/0x4b0
kernel:  ? kernfs_sop_show_path+0x60/0x60
kernel:  kernfs_mount_ns+0x7e/0x240
kernel:  cgroup_do_mount+0x36/0x120
kernel:  cgroup1_mount+0x3e1/0x490
kernel:  cgroup_mount+0xa3/0x380
kernel:  ? alloc_pages_current+0x6a/0xd0
kernel:  mount_fs+0x38/0x140
kernel:  vfs_kern_mount+0x67/0x130
kernel:  do_mount+0x202/0xd10
kernel:  ? __check_object_size+0xb3/0x190
kernel:  ? copy_mount_options+0x2c/0x220
kernel:  SyS_mount+0x83/0xd0
kernel:  do_syscall_64+0x80/0x120
kernel:  entry_SYSCALL64_slow_path+0x25/0x25
kernel: RIP: 0033:0x7f94df86027a
kernel: RSP: 002b:00007ffda4fcab58 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
kernel: RAX: ffffffffffffffda RBX: 0000555fa09a5e48 RCX: 00007f94df86027a
kernel: RDX: 0000555fa095ee26 RSI: 0000555fa09621f3 RDI: 0000555fa095ee26
kernel: RBP: 0000000000000000 R08: 0000555fa0962222 R09: 0000000000000000
kernel: R10: 000000000000000e R11: 0000000000000246 R12: 0000000000000000
kernel: R13: 0000555fa09a5ef0 R14: 0000000000000000 R15: 0000000000000000

ShellCode33 · 2018-02-20T22:59:42Z

Up, same problem here on Debian Stretch, Kernel 4.9.0-5-amd64

brauner · 2018-02-20T23:16:51Z

On Tue, Feb 20, 2018 at 10:59:45PM +0000, ShellCode wrote: Up, same problem here on Debian Stretch

I have some time planned to debug this the next few days. Thanks! Christian

geez0x1 · 2019-05-23T12:21:16Z

I'm running umask 027 (set in .bashrc for my user, which is copied to my root session after using su) and can report this issue still occurs for unprivileged containers under Debian 9 (stretch, stable) using LXC 2.0.7-2+deb9u2 and kernel 4.19.0-0.bpo.4-amd64. I can also confirm umask at container creation time does not matter. Another symptom of this issue is that after it occurred, sync (both the utility and system call) and update-initramfs (which calls sync) start hanging.

The comments to the launchpad bug by the original author of this issue and this thread contain a few discussions with regards to possible reasons this occurs. Also, is #2277 perhaps related?

It took a long time before I traced this down to the umask, because the symptoms are bewildering. My original issue:
https://superuser.com/questions/1439108/lxc-start-stop-hangs-and-filesystem-sync-hangs/1440273

brauner mentioned this issue Apr 27, 2017

unprivilaged lxc invalid pid for SIGCHLD in Ubuntu 16.04.02 #1496

Closed

ghost mentioned this issue Dec 12, 2017

Avoid Timed out error when umask is 027 or 077 fgrehm/vagrant-lxc#435

Merged

brauner self-assigned this Feb 20, 2018

lpirl mentioned this issue Jul 22, 2019

unprivileged LXC 3 containers do not start with umask 027 or stricter #3100

Closed

stgraber added the Bug Confirmed to be a bug label Mar 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

starting any container with umask 007 causes lxc-stop to hang and prevents clean shutdown of host system #1403

starting any container with umask 007 causes lxc-stop to hang and prevents clean shutdown of host system #1403

foresto commented Jan 30, 2017 •

edited by brauner

ogai commented Apr 28, 2017

ivan commented Dec 17, 2017

ShellCode33 commented Feb 20, 2018 •

edited

brauner commented Feb 20, 2018 via email

geez0x1 commented May 23, 2019 •

edited

starting any container with umask 007 causes lxc-stop to hang and prevents clean shutdown of host system #1403

starting any container with umask 007 causes lxc-stop to hang and prevents clean shutdown of host system #1403

Comments

foresto commented Jan 30, 2017 • edited by brauner

ogai commented Apr 28, 2017

ivan commented Dec 17, 2017

ShellCode33 commented Feb 20, 2018 • edited

brauner commented Feb 20, 2018 via email

geez0x1 commented May 23, 2019 • edited

foresto commented Jan 30, 2017 •

edited by brauner

ShellCode33 commented Feb 20, 2018 •

edited

geez0x1 commented May 23, 2019 •

edited