Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

starting any container with umask 007 causes lxc-stop to hang and prevents clean shutdown of host system #1403

Open
foresto opened this issue Jan 30, 2017 · 5 comments
Assignees
Labels
Bug Confirmed to be a bug

Comments

@foresto
Copy link

foresto commented Jan 30, 2017

[copied from launchpad bug 1642767...]

If I have umask 007 (or any other value that masks the world-execute bit) when I run lxc-start for the first time after logging in, my host system enters a state with the following problems:

  • lxc-stop hangs forever instead of stopping any container, even one that wasn't started with umask 007.
  • lxc-stop --kill --nolock hangs in the same way.
  • Attempts to reboot or shut down the host system fail, requiring a hard reset to recover.

When lxc-stop hangs, messages like these appear in syslog every couple of minutes:

Nov 17 01:22:11 hostbox kernel: [ 3360.091624] INFO: task systemd:12179 blocked for more than 120 seconds.
Nov 17 01:22:11 hostbox kernel: [ 3360.091629] Tainted: P OE 4.4.0-47-generic #68-Ubuntu
Nov 17 01:22:11 hostbox kernel: [ 3360.091631] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 17 01:22:11 hostbox kernel: [ 3360.091633] systemd D ffff8800c6febb58 0 12179 12168 0x00000104
Nov 17 01:22:11 hostbox kernel: [ 3360.091638] ffff8800c6febb58 ffff8800d318d280 ffff88040c649b80 ffff8800d318d280
Nov 17 01:22:11 hostbox kernel: [ 3360.091641] ffff8800c6fec000 ffff8800345bc088 ffff8800345bc070 ffffffff00000000
Nov 17 01:22:11 hostbox kernel: [ 3360.091644] fffffffe00000001 ffff8800c6febb70 ffffffff81830f15 ffff8800d318d280
Nov 17 01:22:11 hostbox kernel: [ 3360.091647] Call Trace:
Nov 17 01:22:11 hostbox kernel: [ 3360.091653] [<ffffffff81830f15>] schedule+0x35/0x80
Nov 17 01:22:11 hostbox kernel: [ 3360.091657] [<ffffffff81833b62>] rwsem_down_write_failed+0x202/0x350
Nov 17 01:22:11 hostbox kernel: [ 3360.091662] [<ffffffff812899a0>] ? kernfs_sop_show_options+0x40/0x40
Nov 17 01:22:11 hostbox kernel: [ 3360.091666] [<ffffffff81403fa3>] call_rwsem_down_write_failed+0x13/0x20
Nov 17 01:22:11 hostbox kernel: [ 3360.091669] [<ffffffff8183339d>] ? down_write+0x2d/0x40
Nov 17 01:22:11 hostbox kernel: [ 3360.091672] [<ffffffff812104a0>] grab_super+0x30/0xa0
Nov 17 01:22:11 hostbox kernel: [ 3360.091674] [<ffffffff81210a32>] sget_userns+0x152/0x450
Nov 17 01:22:11 hostbox kernel: [ 3360.091677] [<ffffffff81289a20>] ? kernfs_sop_show_path+0x50/0x50
Nov 17 01:22:11 hostbox kernel: [ 3360.091680] [<ffffffff81289c8e>] kernfs_mount_ns+0x7e/0x230
Nov 17 01:22:11 hostbox kernel: [ 3360.091685] [<ffffffff811187ab>] cgroup_mount+0x2eb/0x7f0
Nov 17 01:22:11 hostbox kernel: [ 3360.091687] [<ffffffff81211af8>] mount_fs+0x38/0x160
Nov 17 01:22:11 hostbox kernel: [ 3360.091691] [<ffffffff8122db57>] vfs_kern_mount+0x67/0x110
Nov 17 01:22:11 hostbox kernel: [ 3360.091694] [<ffffffff81230329>] do_mount+0x269/0xde0
Nov 17 01:22:11 hostbox kernel: [ 3360.091698] [<ffffffff812311cf>] SyS_mount+0x9f/0x100
Nov 17 01:22:11 hostbox kernel: [ 3360.091701] [<ffffffff81834ff2>] entry_SYSCALL_64_fastpath+0x16/0x71

When system shutdown hangs, similar messages appear on the console every couple of minutes.

I can reproduce this at will with a freshly-installed and fully-updated host OS in VirtualBox, and with either an old-ish container or a new one.

I'm running lxc 2.0.5-0ubuntu1~ubuntu16.04.2 on xubuntu 16.04.1 LTS amd64.

My containers are all unprivileged.

My umask at container creation time does not seem to matter. As far as I have seen, my umask only matters the first time I start a container in my login session.

I can work around the bug by manually setting my umask to something more permissive before I start my first container of the day, and then setting it back again, but that's rather a hassle. (Even worse, it's very easy to forget this workaround and be left with containers that can't be stopped and a host system that won't shut down cleanly.)

@ogai
Copy link

ogai commented Apr 28, 2017

I can confirm this also with umask 077

@ivan
Copy link

ivan commented Dec 17, 2017

I am seeing the same thing with LXC 2.0.7 on Debian stretch + Linux 4.14.6, umask 027:

kernel: INFO: task systemd:21698 blocked for more than 120 seconds.
kernel:       Tainted: G           O    4.14.0-15-amd64 #1 Debian 4.14.6-1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: systemd         D    0 21698  21690 0x00000100
kernel: Call Trace:
kernel:  __schedule+0x3cc/0x850
kernel:  schedule+0x36/0x80
kernel:  rwsem_down_write_failed+0x220/0x400
kernel:  call_rwsem_down_write_failed+0x17/0x30
kernel:  ? up+0x32/0x50
kernel:  ? call_rwsem_down_write_failed+0x17/0x30
kernel:  down_write+0x2d/0x40
kernel:  grab_super+0x30/0x90
kernel:  ? kernfs_sop_show_options+0x50/0x50
kernel:  sget_userns+0x16a/0x4b0
kernel:  ? kernfs_sop_show_path+0x60/0x60
kernel:  kernfs_mount_ns+0x7e/0x240
kernel:  cgroup_do_mount+0x36/0x120
kernel:  cgroup1_mount+0x3e1/0x490
kernel:  cgroup_mount+0xa3/0x380
kernel:  ? alloc_pages_current+0x6a/0xd0
kernel:  mount_fs+0x38/0x140
kernel:  vfs_kern_mount+0x67/0x130
kernel:  do_mount+0x202/0xd10
kernel:  ? __check_object_size+0xb3/0x190
kernel:  ? copy_mount_options+0x2c/0x220
kernel:  SyS_mount+0x83/0xd0
kernel:  do_syscall_64+0x80/0x120
kernel:  entry_SYSCALL64_slow_path+0x25/0x25
kernel: RIP: 0033:0x7f94df86027a
kernel: RSP: 002b:00007ffda4fcab58 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
kernel: RAX: ffffffffffffffda RBX: 0000555fa09a5e48 RCX: 00007f94df86027a
kernel: RDX: 0000555fa095ee26 RSI: 0000555fa09621f3 RDI: 0000555fa095ee26
kernel: RBP: 0000000000000000 R08: 0000555fa0962222 R09: 0000000000000000
kernel: R10: 000000000000000e R11: 0000000000000246 R12: 0000000000000000
kernel: R13: 0000555fa09a5ef0 R14: 0000000000000000 R15: 0000000000000000

@ShellCode33
Copy link

ShellCode33 commented Feb 20, 2018

Up, same problem here on Debian Stretch, Kernel 4.9.0-5-amd64

@brauner
Copy link
Member

brauner commented Feb 20, 2018 via email

@brauner brauner self-assigned this Feb 20, 2018
@geez0x1
Copy link

geez0x1 commented May 23, 2019

I'm running umask 027 (set in .bashrc for my user, which is copied to my root session after using su) and can report this issue still occurs for unprivileged containers under Debian 9 (stretch, stable) using LXC 2.0.7-2+deb9u2 and kernel 4.19.0-0.bpo.4-amd64. I can also confirm umask at container creation time does not matter. Another symptom of this issue is that after it occurred, sync (both the utility and system call) and update-initramfs (which calls sync) start hanging.

The comments to the launchpad bug by the original author of this issue and this thread contain a few discussions with regards to possible reasons this occurs. Also, is #2277 perhaps related?

It took a long time before I traced this down to the umask, because the symptoms are bewildering. My original issue:
https://superuser.com/questions/1439108/lxc-start-stop-hangs-and-filesystem-sync-hangs/1440273

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Confirmed to be a bug
Development

No branches or pull requests

7 participants