Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock between zfs umount and snapentry_expiry #7751

Closed
rohan-puri opened this issue Jul 28, 2018 · 0 comments
Closed

Deadlock between zfs umount and snapentry_expiry #7751

rohan-puri opened this issue Jul 28, 2018 · 0 comments

Comments

@rohan-puri
Copy link
Contributor

rohan-puri commented Jul 28, 2018

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 16.04
Linux Kernel 4.8.0-36-generic
Architecture x86_64
ZFS Version v0.7.0-1483_gfb7307b
SPL Version

Describe the problem you're observing

zfs umount -> zfsctl_destroy() takes zfs_snapshot_lock WRITER &
calls zfsctl_snapshot_unmount_cancel() which waits for
snapentry_expire() if present(only when snap is automounted).
This snapentry_expire() itself then waits for zfs_snapshot_lock
READER, resulting in a deadlock.

Describe how to reproduce the problem

Need the below patch,
rohan@rohan-vm:~$ cat bug-repro.diff

diff --git a/module/zfs/zfs_ctldir.c b/module/zfs/zfs_ctldir.c
index baa7286..c7e615f 100644
--- a/module/zfs/zfs_ctldir.c
+++ b/module/zfs/zfs_ctldir.c
@@ -331,6 +331,7 @@ snapentry_expire(void *data)
 		return;
 	}
 
+	msleep(1000 * 30);
 	se->se_taskqid = TASKQID_INVALID;
 	(void) zfsctl_snapshot_unmount(se->se_name, MNT_EXPIRE);
 	zfsctl_snapshot_rele(se);
@@ -375,6 +376,7 @@ zfsctl_snapshot_unmount_delay_impl(zfs_snapentry_t *se, int delay)
 		return;
 
 	zfsctl_snapshot_hold(se);
+	delay = 1;
 	se->se_taskqid = taskq_dispatch_delay(system_delay_taskq,
 	    snapentry_expire, se, TQ_SLEEP, ddi_get_lbolt() + delay * HZ);
 }

Then series of commands,

sudo zpool/zpool create pool /dev/sdb
sudo zpool/zpool list
sudo zfs/zfs snapshot pool@snap1
mount | grep pool
cd /pool/.zfs/snapshot/snap1/
cd -
mount | grep pool
sudo zfs/zfs umount pool

After 120 seconds,
dmesg

Include any warning/errors/backtraces from the system logs

[ 2055.506502] INFO: task spl_delay_taskq:3742 blocked for more than 120 seconds.
[ 2055.506522]       Tainted: P           OE   4.8.0-36-generic #36~16.04.1-Ubuntu
[ 2055.506528] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2055.506536] spl_delay_taskq D ffff9015b6befbd8     0  3742      2 0x00000000
[ 2055.506558]  ffff9015b6befbd8 ffff9015b6befc10 ffffffffaf60d540 ffff901534380e80
[ 2055.506575]  ffff9015d020cc00 ffff9015b6bf0000 ffffffffc0803700 ffffffffc0803718
[ 2055.506591]  ffff9015b6befc10 ffff9015b6793c00 ffff9015b6befbf0 ffffffffaf091d45
[ 2055.506607] Call Trace:
[ 2055.506667]  [<ffffffffaf091d45>] schedule+0x35/0x80
[ 2055.506679]  [<ffffffffaf0949c3>] rwsem_down_read_failed+0x103/0x160
[ 2055.506707]  [<ffffffffaec3c898>] call_rwsem_down_read_failed+0x18/0x30
[ 2055.506718]  [<ffffffffaf094200>] down_read+0x20/0x40
[ 2055.506939]  [<ffffffffc0572c41>] zfsctl_snapshot_unmount+0x71/0x260 [zfs]
[ 2055.507231]  [<ffffffffc0572e76>] snapentry_expire+0x46/0x100 [zfs]
[ 2055.507265]  [<ffffffffc036dec0>] taskq_thread+0x2f0/0x600 [spl]
[ 2055.507282]  [<ffffffffae8af050>] ? wake_up_q+0x70/0x70
[ 2055.507310]  [<ffffffffc036dbd0>] ? taskq_thread_should_stop+0x70/0x70 [spl]
[ 2055.507320]  [<ffffffffae8a4008>] kthread+0xd8/0xf0
[ 2055.507336]  [<ffffffffaf09679f>] ret_from_fork+0x1f/0x40
[ 2055.507346]  [<ffffffffae8a3f30>] ? kthread_create_on_node+0x1e0/0x1e0
[ 2055.507388] INFO: task umount:4163 blocked for more than 120 seconds.
[ 2055.507400]       Tainted: P           OE   4.8.0-36-generic #36~16.04.1-Ubuntu
[ 2055.507405] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2055.507412] umount          D ffff9015a6e0bd38     0  4163   4149 0x00000000
[ 2055.507429]  ffff9015a6e0bd38 0000000000000286 ffff90161ab73a00 ffff9015d032ab80
[ 2055.507446]  0000000000000000 ffff9015a6e0c000 fffffffffffffff0 0000000000000286
[ 2055.507462]  0000000000000014 ffff9015d020ccd8 ffff9015a6e0bd50 ffffffffaf091d45
[ 2055.507477] Call Trace:
[ 2055.507490]  [<ffffffffaf091d45>] schedule+0x35/0x80
[ 2055.507511]  [<ffffffffc036ed9e>] taskq_wait_id+0x7e/0xf0 [spl]
[ 2055.507521]  [<ffffffffae8c74d0>] ? wake_atomic_t_function+0x60/0x60
[ 2055.507542]  [<ffffffffc036ef0e>] taskq_cancel_id+0xfe/0x180 [spl]
[ 2055.507738]  [<ffffffffc0571859>] zfsctl_snapshot_unmount_cancel+0x29/0x70 [zfs]
[ 2055.507929]  [<ffffffffc05720e1>] zfsctl_destroy+0x71/0x100 [zfs]
[ 2055.508189]  [<ffffffffc058ce5e>] zfs_preumount+0x1e/0x60 [zfs]
[ 2055.508399]  [<ffffffffc05c7722>] zpl_kill_sb+0x12/0x20 [zfs]
[ 2055.508433]  [<ffffffffaea35a63>] deactivate_locked_super+0x43/0x70
[ 2055.508442]  [<ffffffffaea35f3c>] deactivate_super+0x5c/0x60
[ 2055.508452]  [<ffffffffaea553ef>] cleanup_mnt+0x3f/0x90
[ 2055.508461]  [<ffffffffaea55482>] __cleanup_mnt+0x12/0x20
[ 2055.508470]  [<ffffffffae8a235e>] task_work_run+0x7e/0xa0
[ 2055.508481]  [<ffffffffae8032d2>] exit_to_usermode_loop+0xc2/0xd0
[ 2055.508492]  [<ffffffffae803b3e>] syscall_return_slowpath+0x4e/0x60
[ 2055.508503]  [<ffffffffaf0965fe>] entry_SYSCALL_64_fastpath+0xa6/0xa8
rohan-puri added a commit to rohan-puri/zfs that referenced this issue Aug 1, 2018
zfs umount -> zfsctl_destroy() takes zfs_snapshot_lock WRITER &
calls zfsctl_snapshot_unmount_cancel() which waits for
snapentry_expire() if present(only when snap is automounted).
This snapentry_expire() itself then waits for zfs_snapshot_lock
READER, resulting in a deadlock.

Fix is, in zfsctl_destroy() do avl_tree lookup & removal with
WRITER zfs_snapshot_lock & leave this lock before
zfsctl_snapshot_unmount_cancel() call, since the validity of se
is protected by se->se_refcount.
Also remove the corresponding lock assertion from
zfsctl_snapshot_unmount_cancel() too.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rohan Puri <rohan.puri15@gmail.com>
Closes openzfs#7751
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant