Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Direct reclaim can deadlock through recursive ZFS_OBJ_HOLD_ENTER() calls #3331

Closed
ryao opened this issue Apr 22, 2015 · 12 comments
Closed

Direct reclaim can deadlock through recursive ZFS_OBJ_HOLD_ENTER() calls #3331

ryao opened this issue Apr 22, 2015 · 12 comments

Comments

@ryao
Copy link
Contributor

ryao commented Apr 22, 2015

The following is from the buildbot:

[ 3774.649030] VERIFY3(((*(volatile typeof((((&((zsb))->z_hold_mtx[(((z_id)) & (256 - 1))])))->m_owner) *)&((((&((zsb))->z_hold_mtx[(((z_id)) & (256 - 1))])))->m_owner))) != get_current()) failed (ffff880036362dc0 != ffff880036362dc0)
[ 3774.649407] PANIC at zfs_znode.c:1108:zfs_zinactive()
[ 3774.649415] Showing stack for process 32119
[ 3774.649425] CPU: 3 PID: 32119 Comm: filebench Tainted: PF          O 3.11.10-100.fc18.x86_64 #1
[ 3774.649428] Hardware name: Red Hat RHEV Hypervisor, BIOS 0.5.1 01/01/2007
[ 3774.649428]  ffffffffa03a3af8 ffff880047cf2bb8 ffffffff81666676 0000000000000007
[ 3774.649430]  ffffffffa03a3b73 ffff880047cf2bc8 ffffffffa01c73e4 ffff880047cf2d68
[ 3774.649435]  ffffffffa01c761d 0000000000000003 ffff88004b1accc0 0000000000000030
[ 3774.649447] Call Trace:
[ 3774.649457]  [<ffffffff81666676>] dump_stack+0x46/0x58
[ 3774.649465]  [<ffffffffa01c73e4>] spl_dumpstack+0x44/0x50 [spl]
[ 3774.649468]  [<ffffffffa01c761d>] spl_panic+0xbd/0x100 [spl]
[ 3774.649476]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649493]  [<ffffffffa03369d5>] zfs_zinactive+0x1f5/0x240 [zfs]
[ 3774.649538]  [<ffffffffa032fb9c>] zfs_inactive+0x7c/0x430 [zfs]
[ 3774.649546]  [<ffffffffa03506fe>] zpl_evict_inode+0x4e/0xa0 [zfs]
[ 3774.649546]  [<ffffffff811c8e12>] evict+0xa2/0x1a0
[ 3774.649546]  [<ffffffff811c8f4e>] dispose_list+0x3e/0x60
[ 3774.649546]  [<ffffffff811c9cd1>] prune_icache_sb+0x161/0x300
[ 3774.649546]  [<ffffffff811b2e35>] prune_super+0xe5/0x1b0
[ 3774.649546]  [<ffffffff81153771>] shrink_slab+0x151/0x2e0
[ 3774.649546]  [<ffffffff811a9809>] ? vmpressure+0x29/0x90
[ 3774.649546]  [<ffffffff811a97e5>] ? vmpressure+0x5/0x90
[ 3774.649546]  [<ffffffff81156979>] do_try_to_free_pages+0x3e9/0x5a0
[ 3774.649548]  [<ffffffff811527ff>] ? throttle_direct_reclaim.isra.45+0x8f/0x280
[ 3774.649552]  [<ffffffff81156e38>] try_to_free_pages+0xf8/0x180
[ 3774.649556]  [<ffffffff8114ae3a>] __alloc_pages_nodemask+0x6aa/0xae0
[ 3774.649562]  [<ffffffff81189fb8>] alloc_pages_current+0xb8/0x190
[ 3774.649565]  [<ffffffff81193e30>] new_slab+0x2d0/0x3a0
[ 3774.649577]  [<ffffffff81664d2d>] __slab_alloc+0x393/0x560
[ 3774.649579]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649583]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649583]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649585]  [<ffffffff81195230>] kmem_cache_alloc+0x1a0/0x200
[ 3774.649589]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649594]  [<ffffffffa01c1b30>] spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649596]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649599]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649624]  [<ffffffffa03505c0>] ? zpl_inode_destroy+0x60/0x60 [zfs]
[ 3774.649687]  [<ffffffffa033266f>] zfs_inode_alloc+0x1f/0x40 [zfs]
[ 3774.649687]  [<ffffffffa03505da>] zpl_inode_alloc+0x1a/0x70 [zfs]
[ 3774.649687]  [<ffffffff811c7e16>] alloc_inode+0x26/0xa0
[ 3774.649687]  [<ffffffff811c9e83>] new_inode_pseudo+0x13/0x60
[ 3774.649687]  [<ffffffff811c9eed>] new_inode+0x1d/0x40
[ 3774.649710]  [<ffffffffa0332ac7>] zfs_znode_alloc+0x47/0x730 [zfs]
[ 3774.649770]  [<ffffffffa02c8f4e>] ? sa_build_index+0xbe/0x1b0 [zfs]
[ 3774.649770]  [<ffffffffa02c9775>] ? sa_build_layouts+0x6b5/0xc80 [zfs]
[ 3774.649770]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649794]  [<ffffffffa0333b5e>] zfs_mknode+0x93e/0xe90 [zfs]
[ 3774.649813]  [<ffffffffa032be5b>] zfs_create+0x5db/0x780 [zfs]
[ 3774.649840]  [<ffffffffa0350ba5>] zpl_xattr_set_dir.isra.9+0x245/0x2a0 [zfs]
[ 3774.649843]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649895]  [<ffffffffa0351140>] zpl_xattr_set+0xe0/0x3f0 [zfs]
[ 3774.649895]  [<ffffffffa03516a4>] __zpl_xattr_security_init+0x64/0xb0 [zfs]
[ 3774.649968]  [<ffffffffa0351640>] ? zpl_xattr_trusted_set+0xb0/0xb0 [zfs]
[ 3774.649972]  [<ffffffff812a737c>] security_inode_init_security+0xbc/0xf0
[ 3774.649977]  [<ffffffffa0352028>] zpl_xattr_security_init+0x18/0x20 [zfs]
[ 3774.650017]  [<ffffffffa0350134>] zpl_create+0x154/0x240 [zfs]
[ 3774.650018]  [<ffffffff811bde85>] vfs_create+0xb5/0x120
[ 3774.650018]  [<ffffffff811be874>] do_last+0x984/0xe40
[ 3774.650020]  [<ffffffff811baf55>] ? link_path_walk+0x255/0x880
[ 3774.650023]  [<ffffffff811bedf2>] path_openat+0xc2/0x680
[ 3774.650026]  [<ffffffff811bf653>] do_filp_open+0x43/0xa0
[ 3774.650030]  [<ffffffff811bf615>] ? do_filp_open+0x5/0xa0
[ 3774.650034]  [<ffffffff811ae7fc>] do_sys_open+0x13c/0x230
[ 3774.650037]  [<ffffffff811ae912>] SyS_open+0x22/0x30
[ 3774.650040]  [<ffffffff81675819>] system_call_fastpath+0x16/0x1b

http://buildbot.zfsonlinux.org/builders/fedora-18-x86_64-builder/builds/3066/steps/shell_17/logs/stdio

zfs_mknode grabbed an object hash mutex via ZFS_OBJ_HOLD_ENTER(), tried to allocate a znode with zfs_znode_alloc() and entered direct reclaim, which tried to do ZFS_OBJ_HOLD_ENTER(). The fix is in #3332.

ryao added a commit to ryao/zfs that referenced this issue Apr 22, 2015
…hash mutex

The following deadlock occurred on the buildbot:

[ 3774.649030] VERIFY3(((*(volatile typeof((((&((zsb))->z_hold_mtx[(((z_id)) & (256 - 1))])))->m_owner) *)&((((&((zsb))->z_hold_mtx[(((z_id)) & (256 - 1))])))->m_owner))) != get_current()) failed (ffff880036362dc0 != ffff880036362dc0)
[ 3774.649407] PANIC at zfs_znode.c:1108:zfs_zinactive()
[ 3774.649415] Showing stack for process 32119
[ 3774.649425] CPU: 3 PID: 32119 Comm: filebench Tainted: PF          O 3.11.10-100.fc18.x86_64 #1
[ 3774.649428] Hardware name: Red Hat RHEV Hypervisor, BIOS 0.5.1 01/01/2007
[ 3774.649428]  ffffffffa03a3af8 ffff880047cf2bb8 ffffffff81666676 0000000000000007
[ 3774.649430]  ffffffffa03a3b73 ffff880047cf2bc8 ffffffffa01c73e4 ffff880047cf2d68
[ 3774.649435]  ffffffffa01c761d 0000000000000003 ffff88004b1accc0 0000000000000030
[ 3774.649447] Call Trace:
[ 3774.649457]  [<ffffffff81666676>] dump_stack+0x46/0x58
[ 3774.649465]  [<ffffffffa01c73e4>] spl_dumpstack+0x44/0x50 [spl]
[ 3774.649468]  [<ffffffffa01c761d>] spl_panic+0xbd/0x100 [spl]
[ 3774.649476]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649493]  [<ffffffffa03369d5>] zfs_zinactive+0x1f5/0x240 [zfs]
[ 3774.649538]  [<ffffffffa032fb9c>] zfs_inactive+0x7c/0x430 [zfs]
[ 3774.649546]  [<ffffffffa03506fe>] zpl_evict_inode+0x4e/0xa0 [zfs]
[ 3774.649546]  [<ffffffff811c8e12>] evict+0xa2/0x1a0
[ 3774.649546]  [<ffffffff811c8f4e>] dispose_list+0x3e/0x60
[ 3774.649546]  [<ffffffff811c9cd1>] prune_icache_sb+0x161/0x300
[ 3774.649546]  [<ffffffff811b2e35>] prune_super+0xe5/0x1b0
[ 3774.649546]  [<ffffffff81153771>] shrink_slab+0x151/0x2e0
[ 3774.649546]  [<ffffffff811a9809>] ? vmpressure+0x29/0x90
[ 3774.649546]  [<ffffffff811a97e5>] ? vmpressure+0x5/0x90
[ 3774.649546]  [<ffffffff81156979>] do_try_to_free_pages+0x3e9/0x5a0
[ 3774.649548]  [<ffffffff811527ff>] ? throttle_direct_reclaim.isra.45+0x8f/0x280
[ 3774.649552]  [<ffffffff81156e38>] try_to_free_pages+0xf8/0x180
[ 3774.649556]  [<ffffffff8114ae3a>] __alloc_pages_nodemask+0x6aa/0xae0
[ 3774.649562]  [<ffffffff81189fb8>] alloc_pages_current+0xb8/0x190
[ 3774.649565]  [<ffffffff81193e30>] new_slab+0x2d0/0x3a0
[ 3774.649577]  [<ffffffff81664d2d>] __slab_alloc+0x393/0x560
[ 3774.649579]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649583]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649583]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649585]  [<ffffffff81195230>] kmem_cache_alloc+0x1a0/0x200
[ 3774.649589]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649594]  [<ffffffffa01c1b30>] spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649596]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649599]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649624]  [<ffffffffa03505c0>] ? zpl_inode_destroy+0x60/0x60 [zfs]
[ 3774.649687]  [<ffffffffa033266f>] zfs_inode_alloc+0x1f/0x40 [zfs]
[ 3774.649687]  [<ffffffffa03505da>] zpl_inode_alloc+0x1a/0x70 [zfs]
[ 3774.649687]  [<ffffffff811c7e16>] alloc_inode+0x26/0xa0
[ 3774.649687]  [<ffffffff811c9e83>] new_inode_pseudo+0x13/0x60
[ 3774.649687]  [<ffffffff811c9eed>] new_inode+0x1d/0x40
[ 3774.649710]  [<ffffffffa0332ac7>] zfs_znode_alloc+0x47/0x730 [zfs]
[ 3774.649770]  [<ffffffffa02c8f4e>] ? sa_build_index+0xbe/0x1b0 [zfs]
[ 3774.649770]  [<ffffffffa02c9775>] ? sa_build_layouts+0x6b5/0xc80 [zfs]
[ 3774.649770]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649794]  [<ffffffffa0333b5e>] zfs_mknode+0x93e/0xe90 [zfs]
[ 3774.649813]  [<ffffffffa032be5b>] zfs_create+0x5db/0x780 [zfs]
[ 3774.649840]  [<ffffffffa0350ba5>] zpl_xattr_set_dir.isra.9+0x245/0x2a0 [zfs]
[ 3774.649843]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649895]  [<ffffffffa0351140>] zpl_xattr_set+0xe0/0x3f0 [zfs]
[ 3774.649895]  [<ffffffffa03516a4>] __zpl_xattr_security_init+0x64/0xb0 [zfs]
[ 3774.649968]  [<ffffffffa0351640>] ? zpl_xattr_trusted_set+0xb0/0xb0 [zfs]
[ 3774.649972]  [<ffffffff812a737c>] security_inode_init_security+0xbc/0xf0
[ 3774.649977]  [<ffffffffa0352028>] zpl_xattr_security_init+0x18/0x20 [zfs]
[ 3774.650017]  [<ffffffffa0350134>] zpl_create+0x154/0x240 [zfs]
[ 3774.650018]  [<ffffffff811bde85>] vfs_create+0xb5/0x120
[ 3774.650018]  [<ffffffff811be874>] do_last+0x984/0xe40
[ 3774.650020]  [<ffffffff811baf55>] ? link_path_walk+0x255/0x880
[ 3774.650023]  [<ffffffff811bedf2>] path_openat+0xc2/0x680
[ 3774.650026]  [<ffffffff811bf653>] do_filp_open+0x43/0xa0
[ 3774.650030]  [<ffffffff811bf615>] ? do_filp_open+0x5/0xa0
[ 3774.650034]  [<ffffffff811ae7fc>] do_sys_open+0x13c/0x230
[ 3774.650037]  [<ffffffff811ae912>] SyS_open+0x22/0x30
[ 3774.650040]  [<ffffffff81675819>] system_call_fastpath+0x16/0x1b

`zfs_mknode()` grabbed an object hash mutex via ZFS_OBJ_HOLD_ENTER(), tried to
allocate a znode with zfs_znode_alloc() and entered direct reclaim, which tried
to do ZFS_OBJ_HOLD_ENTER(). We can fix this by making ZFS_OBJ_HOLD_ENTER() and
ZFS_OBJ_HOLD_EXIT() do calls to spl_fstrans_mark() and spl_fstrans_unmark()
respectively. We can allocate a array to hold the cookies that is protected by
the z_hold_mtx.

Closes openzfs#3331

Signed-off-by: Richard Yao <ryao@gentoo.org>
ryao added a commit to ryao/zfs that referenced this issue Apr 22, 2015
…hash mutex

The following deadlock occurred on the buildbot:

[ 3774.649030] VERIFY3(((*(volatile typeof((((&((zsb))->z_hold_mtx[(((z_id)) & (256 - 1))])))->m_owner) *)&((((&((zsb))->z_hold_mtx[(((z_id)) & (256 - 1))])))->m_owner))) != get_current()) failed (ffff880036362dc0 != ffff880036362dc0)
[ 3774.649407] PANIC at zfs_znode.c:1108:zfs_zinactive()
[ 3774.649415] Showing stack for process 32119
[ 3774.649425] CPU: 3 PID: 32119 Comm: filebench Tainted: PF          O 3.11.10-100.fc18.x86_64 #1
[ 3774.649428] Hardware name: Red Hat RHEV Hypervisor, BIOS 0.5.1 01/01/2007
[ 3774.649428]  ffffffffa03a3af8 ffff880047cf2bb8 ffffffff81666676 0000000000000007
[ 3774.649430]  ffffffffa03a3b73 ffff880047cf2bc8 ffffffffa01c73e4 ffff880047cf2d68
[ 3774.649435]  ffffffffa01c761d 0000000000000003 ffff88004b1accc0 0000000000000030
[ 3774.649447] Call Trace:
[ 3774.649457]  [<ffffffff81666676>] dump_stack+0x46/0x58
[ 3774.649465]  [<ffffffffa01c73e4>] spl_dumpstack+0x44/0x50 [spl]
[ 3774.649468]  [<ffffffffa01c761d>] spl_panic+0xbd/0x100 [spl]
[ 3774.649476]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649493]  [<ffffffffa03369d5>] zfs_zinactive+0x1f5/0x240 [zfs]
[ 3774.649538]  [<ffffffffa032fb9c>] zfs_inactive+0x7c/0x430 [zfs]
[ 3774.649546]  [<ffffffffa03506fe>] zpl_evict_inode+0x4e/0xa0 [zfs]
[ 3774.649546]  [<ffffffff811c8e12>] evict+0xa2/0x1a0
[ 3774.649546]  [<ffffffff811c8f4e>] dispose_list+0x3e/0x60
[ 3774.649546]  [<ffffffff811c9cd1>] prune_icache_sb+0x161/0x300
[ 3774.649546]  [<ffffffff811b2e35>] prune_super+0xe5/0x1b0
[ 3774.649546]  [<ffffffff81153771>] shrink_slab+0x151/0x2e0
[ 3774.649546]  [<ffffffff811a9809>] ? vmpressure+0x29/0x90
[ 3774.649546]  [<ffffffff811a97e5>] ? vmpressure+0x5/0x90
[ 3774.649546]  [<ffffffff81156979>] do_try_to_free_pages+0x3e9/0x5a0
[ 3774.649548]  [<ffffffff811527ff>] ? throttle_direct_reclaim.isra.45+0x8f/0x280
[ 3774.649552]  [<ffffffff81156e38>] try_to_free_pages+0xf8/0x180
[ 3774.649556]  [<ffffffff8114ae3a>] __alloc_pages_nodemask+0x6aa/0xae0
[ 3774.649562]  [<ffffffff81189fb8>] alloc_pages_current+0xb8/0x190
[ 3774.649565]  [<ffffffff81193e30>] new_slab+0x2d0/0x3a0
[ 3774.649577]  [<ffffffff81664d2d>] __slab_alloc+0x393/0x560
[ 3774.649579]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649583]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649583]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649585]  [<ffffffff81195230>] kmem_cache_alloc+0x1a0/0x200
[ 3774.649589]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649594]  [<ffffffffa01c1b30>] spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649596]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649599]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649624]  [<ffffffffa03505c0>] ? zpl_inode_destroy+0x60/0x60 [zfs]
[ 3774.649687]  [<ffffffffa033266f>] zfs_inode_alloc+0x1f/0x40 [zfs]
[ 3774.649687]  [<ffffffffa03505da>] zpl_inode_alloc+0x1a/0x70 [zfs]
[ 3774.649687]  [<ffffffff811c7e16>] alloc_inode+0x26/0xa0
[ 3774.649687]  [<ffffffff811c9e83>] new_inode_pseudo+0x13/0x60
[ 3774.649687]  [<ffffffff811c9eed>] new_inode+0x1d/0x40
[ 3774.649710]  [<ffffffffa0332ac7>] zfs_znode_alloc+0x47/0x730 [zfs]
[ 3774.649770]  [<ffffffffa02c8f4e>] ? sa_build_index+0xbe/0x1b0 [zfs]
[ 3774.649770]  [<ffffffffa02c9775>] ? sa_build_layouts+0x6b5/0xc80 [zfs]
[ 3774.649770]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649794]  [<ffffffffa0333b5e>] zfs_mknode+0x93e/0xe90 [zfs]
[ 3774.649813]  [<ffffffffa032be5b>] zfs_create+0x5db/0x780 [zfs]
[ 3774.649840]  [<ffffffffa0350ba5>] zpl_xattr_set_dir.isra.9+0x245/0x2a0 [zfs]
[ 3774.649843]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649895]  [<ffffffffa0351140>] zpl_xattr_set+0xe0/0x3f0 [zfs]
[ 3774.649895]  [<ffffffffa03516a4>] __zpl_xattr_security_init+0x64/0xb0 [zfs]
[ 3774.649968]  [<ffffffffa0351640>] ? zpl_xattr_trusted_set+0xb0/0xb0 [zfs]
[ 3774.649972]  [<ffffffff812a737c>] security_inode_init_security+0xbc/0xf0
[ 3774.649977]  [<ffffffffa0352028>] zpl_xattr_security_init+0x18/0x20 [zfs]
[ 3774.650017]  [<ffffffffa0350134>] zpl_create+0x154/0x240 [zfs]
[ 3774.650018]  [<ffffffff811bde85>] vfs_create+0xb5/0x120
[ 3774.650018]  [<ffffffff811be874>] do_last+0x984/0xe40
[ 3774.650020]  [<ffffffff811baf55>] ? link_path_walk+0x255/0x880
[ 3774.650023]  [<ffffffff811bedf2>] path_openat+0xc2/0x680
[ 3774.650026]  [<ffffffff811bf653>] do_filp_open+0x43/0xa0
[ 3774.650030]  [<ffffffff811bf615>] ? do_filp_open+0x5/0xa0
[ 3774.650034]  [<ffffffff811ae7fc>] do_sys_open+0x13c/0x230
[ 3774.650037]  [<ffffffff811ae912>] SyS_open+0x22/0x30
[ 3774.650040]  [<ffffffff81675819>] system_call_fastpath+0x16/0x1b

`zfs_mknode()` grabbed an object hash mutex via ZFS_OBJ_HOLD_ENTER(), tried to
allocate a znode with zfs_znode_alloc() and entered direct reclaim, which tried
to do ZFS_OBJ_HOLD_ENTER(). We can fix this by making ZFS_OBJ_HOLD_ENTER() and
ZFS_OBJ_HOLD_EXIT() do calls to spl_fstrans_mark() and spl_fstrans_unmark()
respectively. We can allocate a array to hold the cookies that is protected by
the z_hold_mtx.

Closes openzfs#3331

Signed-off-by: Richard Yao <ryao@gentoo.org>
ryao added a commit to ryao/zfs that referenced this issue Apr 22, 2015
…hash mutex

The following deadlock occurred on the buildbot:

[ 3774.649030] VERIFY3(((*(volatile typeof((((&((zsb))->z_hold_mtx[(((z_id)) & (256 - 1))])))->m_owner) *)&((((&((zsb))->z_hold_mtx[(((z_id)) & (256 - 1))])))->m_owner))) != get_current()) failed (ffff880036362dc0 != ffff880036362dc0)
[ 3774.649407] PANIC at zfs_znode.c:1108:zfs_zinactive()
[ 3774.649415] Showing stack for process 32119
[ 3774.649425] CPU: 3 PID: 32119 Comm: filebench Tainted: PF          O 3.11.10-100.fc18.x86_64 #1
[ 3774.649428] Hardware name: Red Hat RHEV Hypervisor, BIOS 0.5.1 01/01/2007
[ 3774.649428]  ffffffffa03a3af8 ffff880047cf2bb8 ffffffff81666676 0000000000000007
[ 3774.649430]  ffffffffa03a3b73 ffff880047cf2bc8 ffffffffa01c73e4 ffff880047cf2d68
[ 3774.649435]  ffffffffa01c761d 0000000000000003 ffff88004b1accc0 0000000000000030
[ 3774.649447] Call Trace:
[ 3774.649457]  [<ffffffff81666676>] dump_stack+0x46/0x58
[ 3774.649465]  [<ffffffffa01c73e4>] spl_dumpstack+0x44/0x50 [spl]
[ 3774.649468]  [<ffffffffa01c761d>] spl_panic+0xbd/0x100 [spl]
[ 3774.649476]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649493]  [<ffffffffa03369d5>] zfs_zinactive+0x1f5/0x240 [zfs]
[ 3774.649538]  [<ffffffffa032fb9c>] zfs_inactive+0x7c/0x430 [zfs]
[ 3774.649546]  [<ffffffffa03506fe>] zpl_evict_inode+0x4e/0xa0 [zfs]
[ 3774.649546]  [<ffffffff811c8e12>] evict+0xa2/0x1a0
[ 3774.649546]  [<ffffffff811c8f4e>] dispose_list+0x3e/0x60
[ 3774.649546]  [<ffffffff811c9cd1>] prune_icache_sb+0x161/0x300
[ 3774.649546]  [<ffffffff811b2e35>] prune_super+0xe5/0x1b0
[ 3774.649546]  [<ffffffff81153771>] shrink_slab+0x151/0x2e0
[ 3774.649546]  [<ffffffff811a9809>] ? vmpressure+0x29/0x90
[ 3774.649546]  [<ffffffff811a97e5>] ? vmpressure+0x5/0x90
[ 3774.649546]  [<ffffffff81156979>] do_try_to_free_pages+0x3e9/0x5a0
[ 3774.649548]  [<ffffffff811527ff>] ? throttle_direct_reclaim.isra.45+0x8f/0x280
[ 3774.649552]  [<ffffffff81156e38>] try_to_free_pages+0xf8/0x180
[ 3774.649556]  [<ffffffff8114ae3a>] __alloc_pages_nodemask+0x6aa/0xae0
[ 3774.649562]  [<ffffffff81189fb8>] alloc_pages_current+0xb8/0x190
[ 3774.649565]  [<ffffffff81193e30>] new_slab+0x2d0/0x3a0
[ 3774.649577]  [<ffffffff81664d2d>] __slab_alloc+0x393/0x560
[ 3774.649579]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649583]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649583]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649585]  [<ffffffff81195230>] kmem_cache_alloc+0x1a0/0x200
[ 3774.649589]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649594]  [<ffffffffa01c1b30>] spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649596]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649599]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649624]  [<ffffffffa03505c0>] ? zpl_inode_destroy+0x60/0x60 [zfs]
[ 3774.649687]  [<ffffffffa033266f>] zfs_inode_alloc+0x1f/0x40 [zfs]
[ 3774.649687]  [<ffffffffa03505da>] zpl_inode_alloc+0x1a/0x70 [zfs]
[ 3774.649687]  [<ffffffff811c7e16>] alloc_inode+0x26/0xa0
[ 3774.649687]  [<ffffffff811c9e83>] new_inode_pseudo+0x13/0x60
[ 3774.649687]  [<ffffffff811c9eed>] new_inode+0x1d/0x40
[ 3774.649710]  [<ffffffffa0332ac7>] zfs_znode_alloc+0x47/0x730 [zfs]
[ 3774.649770]  [<ffffffffa02c8f4e>] ? sa_build_index+0xbe/0x1b0 [zfs]
[ 3774.649770]  [<ffffffffa02c9775>] ? sa_build_layouts+0x6b5/0xc80 [zfs]
[ 3774.649770]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649794]  [<ffffffffa0333b5e>] zfs_mknode+0x93e/0xe90 [zfs]
[ 3774.649813]  [<ffffffffa032be5b>] zfs_create+0x5db/0x780 [zfs]
[ 3774.649840]  [<ffffffffa0350ba5>] zpl_xattr_set_dir.isra.9+0x245/0x2a0 [zfs]
[ 3774.649843]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649895]  [<ffffffffa0351140>] zpl_xattr_set+0xe0/0x3f0 [zfs]
[ 3774.649895]  [<ffffffffa03516a4>] __zpl_xattr_security_init+0x64/0xb0 [zfs]
[ 3774.649968]  [<ffffffffa0351640>] ? zpl_xattr_trusted_set+0xb0/0xb0 [zfs]
[ 3774.649972]  [<ffffffff812a737c>] security_inode_init_security+0xbc/0xf0
[ 3774.649977]  [<ffffffffa0352028>] zpl_xattr_security_init+0x18/0x20 [zfs]
[ 3774.650017]  [<ffffffffa0350134>] zpl_create+0x154/0x240 [zfs]
[ 3774.650018]  [<ffffffff811bde85>] vfs_create+0xb5/0x120
[ 3774.650018]  [<ffffffff811be874>] do_last+0x984/0xe40
[ 3774.650020]  [<ffffffff811baf55>] ? link_path_walk+0x255/0x880
[ 3774.650023]  [<ffffffff811bedf2>] path_openat+0xc2/0x680
[ 3774.650026]  [<ffffffff811bf653>] do_filp_open+0x43/0xa0
[ 3774.650030]  [<ffffffff811bf615>] ? do_filp_open+0x5/0xa0
[ 3774.650034]  [<ffffffff811ae7fc>] do_sys_open+0x13c/0x230
[ 3774.650037]  [<ffffffff811ae912>] SyS_open+0x22/0x30
[ 3774.650040]  [<ffffffff81675819>] system_call_fastpath+0x16/0x1b

`zfs_mknode()` grabbed an object hash mutex via ZFS_OBJ_HOLD_ENTER(), tried to
allocate a znode with zfs_znode_alloc() and entered direct reclaim, which tried
to do ZFS_OBJ_HOLD_ENTER(). We can fix this by making ZFS_OBJ_HOLD_ENTER() and
ZFS_OBJ_HOLD_EXIT() do calls to spl_fstrans_mark() and spl_fstrans_unmark()
respectively. We can allocate a array to hold the cookies that is protected by
the z_hold_mtx.

Closes openzfs#3331

Signed-off-by: Richard Yao <ryao@gentoo.org>
ryao added a commit to ryao/zfs that referenced this issue Apr 22, 2015
…hash mutex

The following deadlock occurred on the buildbot:

[ 3774.649030] VERIFY3(((*(volatile typeof((((&((zsb))->z_hold_mtx[(((z_id)) & (256 - 1))])))->m_owner) *)&((((&((zsb))->z_hold_mtx[(((z_id)) & (256 - 1))])))->m_owner))) != get_current()) failed (ffff880036362dc0 != ffff880036362dc0)
[ 3774.649407] PANIC at zfs_znode.c:1108:zfs_zinactive()
[ 3774.649415] Showing stack for process 32119
[ 3774.649425] CPU: 3 PID: 32119 Comm: filebench Tainted: PF          O 3.11.10-100.fc18.x86_64 #1
[ 3774.649428] Hardware name: Red Hat RHEV Hypervisor, BIOS 0.5.1 01/01/2007
[ 3774.649428]  ffffffffa03a3af8 ffff880047cf2bb8 ffffffff81666676 0000000000000007
[ 3774.649430]  ffffffffa03a3b73 ffff880047cf2bc8 ffffffffa01c73e4 ffff880047cf2d68
[ 3774.649435]  ffffffffa01c761d 0000000000000003 ffff88004b1accc0 0000000000000030
[ 3774.649447] Call Trace:
[ 3774.649457]  [<ffffffff81666676>] dump_stack+0x46/0x58
[ 3774.649465]  [<ffffffffa01c73e4>] spl_dumpstack+0x44/0x50 [spl]
[ 3774.649468]  [<ffffffffa01c761d>] spl_panic+0xbd/0x100 [spl]
[ 3774.649476]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649493]  [<ffffffffa03369d5>] zfs_zinactive+0x1f5/0x240 [zfs]
[ 3774.649538]  [<ffffffffa032fb9c>] zfs_inactive+0x7c/0x430 [zfs]
[ 3774.649546]  [<ffffffffa03506fe>] zpl_evict_inode+0x4e/0xa0 [zfs]
[ 3774.649546]  [<ffffffff811c8e12>] evict+0xa2/0x1a0
[ 3774.649546]  [<ffffffff811c8f4e>] dispose_list+0x3e/0x60
[ 3774.649546]  [<ffffffff811c9cd1>] prune_icache_sb+0x161/0x300
[ 3774.649546]  [<ffffffff811b2e35>] prune_super+0xe5/0x1b0
[ 3774.649546]  [<ffffffff81153771>] shrink_slab+0x151/0x2e0
[ 3774.649546]  [<ffffffff811a9809>] ? vmpressure+0x29/0x90
[ 3774.649546]  [<ffffffff811a97e5>] ? vmpressure+0x5/0x90
[ 3774.649546]  [<ffffffff81156979>] do_try_to_free_pages+0x3e9/0x5a0
[ 3774.649548]  [<ffffffff811527ff>] ? throttle_direct_reclaim.isra.45+0x8f/0x280
[ 3774.649552]  [<ffffffff81156e38>] try_to_free_pages+0xf8/0x180
[ 3774.649556]  [<ffffffff8114ae3a>] __alloc_pages_nodemask+0x6aa/0xae0
[ 3774.649562]  [<ffffffff81189fb8>] alloc_pages_current+0xb8/0x190
[ 3774.649565]  [<ffffffff81193e30>] new_slab+0x2d0/0x3a0
[ 3774.649577]  [<ffffffff81664d2d>] __slab_alloc+0x393/0x560
[ 3774.649579]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649583]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649583]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649585]  [<ffffffff81195230>] kmem_cache_alloc+0x1a0/0x200
[ 3774.649589]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649594]  [<ffffffffa01c1b30>] spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649596]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649599]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649624]  [<ffffffffa03505c0>] ? zpl_inode_destroy+0x60/0x60 [zfs]
[ 3774.649687]  [<ffffffffa033266f>] zfs_inode_alloc+0x1f/0x40 [zfs]
[ 3774.649687]  [<ffffffffa03505da>] zpl_inode_alloc+0x1a/0x70 [zfs]
[ 3774.649687]  [<ffffffff811c7e16>] alloc_inode+0x26/0xa0
[ 3774.649687]  [<ffffffff811c9e83>] new_inode_pseudo+0x13/0x60
[ 3774.649687]  [<ffffffff811c9eed>] new_inode+0x1d/0x40
[ 3774.649710]  [<ffffffffa0332ac7>] zfs_znode_alloc+0x47/0x730 [zfs]
[ 3774.649770]  [<ffffffffa02c8f4e>] ? sa_build_index+0xbe/0x1b0 [zfs]
[ 3774.649770]  [<ffffffffa02c9775>] ? sa_build_layouts+0x6b5/0xc80 [zfs]
[ 3774.649770]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649794]  [<ffffffffa0333b5e>] zfs_mknode+0x93e/0xe90 [zfs]
[ 3774.649813]  [<ffffffffa032be5b>] zfs_create+0x5db/0x780 [zfs]
[ 3774.649840]  [<ffffffffa0350ba5>] zpl_xattr_set_dir.isra.9+0x245/0x2a0 [zfs]
[ 3774.649843]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649895]  [<ffffffffa0351140>] zpl_xattr_set+0xe0/0x3f0 [zfs]
[ 3774.649895]  [<ffffffffa03516a4>] __zpl_xattr_security_init+0x64/0xb0 [zfs]
[ 3774.649968]  [<ffffffffa0351640>] ? zpl_xattr_trusted_set+0xb0/0xb0 [zfs]
[ 3774.649972]  [<ffffffff812a737c>] security_inode_init_security+0xbc/0xf0
[ 3774.649977]  [<ffffffffa0352028>] zpl_xattr_security_init+0x18/0x20 [zfs]
[ 3774.650017]  [<ffffffffa0350134>] zpl_create+0x154/0x240 [zfs]
[ 3774.650018]  [<ffffffff811bde85>] vfs_create+0xb5/0x120
[ 3774.650018]  [<ffffffff811be874>] do_last+0x984/0xe40
[ 3774.650020]  [<ffffffff811baf55>] ? link_path_walk+0x255/0x880
[ 3774.650023]  [<ffffffff811bedf2>] path_openat+0xc2/0x680
[ 3774.650026]  [<ffffffff811bf653>] do_filp_open+0x43/0xa0
[ 3774.650030]  [<ffffffff811bf615>] ? do_filp_open+0x5/0xa0
[ 3774.650034]  [<ffffffff811ae7fc>] do_sys_open+0x13c/0x230
[ 3774.650037]  [<ffffffff811ae912>] SyS_open+0x22/0x30
[ 3774.650040]  [<ffffffff81675819>] system_call_fastpath+0x16/0x1b

`zfs_mknode()` grabbed an object hash mutex via `ZFS_OBJ_HOLD_ENTER()`,
tried to allocate a znode with `zfs_znode_alloc()` and entered direct
reclaim, which tried to do `ZFS_OBJ_HOLD_ENTER()`. We can fix this by
making `ZFS_OBJ_HOLD_ENTER()` and ZFS_OBJ_HOLD_EXIT() do calls to
`spl_fstrans_mark()` and `spl_fstrans_unmark()` respectively. We resolve
this by allocating an array for each superblock to hold the cookies.
Each cookie is protected by the corresponding `->z_hold_mtx`.

Closes openzfs#3331

Signed-off-by: Richard Yao <ryao@gentoo.org>
ryao added a commit to ryao/zfs that referenced this issue Apr 22, 2015
…hash mutex

The following deadlock occurred on the buildbot:

[ 3774.649030] VERIFY3(((*(volatile typeof((((&((zsb))->z_hold_mtx[(((z_id)) & (256 - 1))])))->m_owner) *)&((((&((zsb))->z_hold_mtx[(((z_id)) & (256 - 1))])))->m_owner))) != get_current()) failed (ffff880036362dc0 != ffff880036362dc0)
[ 3774.649407] PANIC at zfs_znode.c:1108:zfs_zinactive()
[ 3774.649415] Showing stack for process 32119
[ 3774.649425] CPU: 3 PID: 32119 Comm: filebench Tainted: PF          O 3.11.10-100.fc18.x86_64 #1
[ 3774.649428] Hardware name: Red Hat RHEV Hypervisor, BIOS 0.5.1 01/01/2007
[ 3774.649428]  ffffffffa03a3af8 ffff880047cf2bb8 ffffffff81666676 0000000000000007
[ 3774.649430]  ffffffffa03a3b73 ffff880047cf2bc8 ffffffffa01c73e4 ffff880047cf2d68
[ 3774.649435]  ffffffffa01c761d 0000000000000003 ffff88004b1accc0 0000000000000030
[ 3774.649447] Call Trace:
[ 3774.649457]  [<ffffffff81666676>] dump_stack+0x46/0x58
[ 3774.649465]  [<ffffffffa01c73e4>] spl_dumpstack+0x44/0x50 [spl]
[ 3774.649468]  [<ffffffffa01c761d>] spl_panic+0xbd/0x100 [spl]
[ 3774.649476]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649493]  [<ffffffffa03369d5>] zfs_zinactive+0x1f5/0x240 [zfs]
[ 3774.649538]  [<ffffffffa032fb9c>] zfs_inactive+0x7c/0x430 [zfs]
[ 3774.649546]  [<ffffffffa03506fe>] zpl_evict_inode+0x4e/0xa0 [zfs]
[ 3774.649546]  [<ffffffff811c8e12>] evict+0xa2/0x1a0
[ 3774.649546]  [<ffffffff811c8f4e>] dispose_list+0x3e/0x60
[ 3774.649546]  [<ffffffff811c9cd1>] prune_icache_sb+0x161/0x300
[ 3774.649546]  [<ffffffff811b2e35>] prune_super+0xe5/0x1b0
[ 3774.649546]  [<ffffffff81153771>] shrink_slab+0x151/0x2e0
[ 3774.649546]  [<ffffffff811a9809>] ? vmpressure+0x29/0x90
[ 3774.649546]  [<ffffffff811a97e5>] ? vmpressure+0x5/0x90
[ 3774.649546]  [<ffffffff81156979>] do_try_to_free_pages+0x3e9/0x5a0
[ 3774.649548]  [<ffffffff811527ff>] ? throttle_direct_reclaim.isra.45+0x8f/0x280
[ 3774.649552]  [<ffffffff81156e38>] try_to_free_pages+0xf8/0x180
[ 3774.649556]  [<ffffffff8114ae3a>] __alloc_pages_nodemask+0x6aa/0xae0
[ 3774.649562]  [<ffffffff81189fb8>] alloc_pages_current+0xb8/0x190
[ 3774.649565]  [<ffffffff81193e30>] new_slab+0x2d0/0x3a0
[ 3774.649577]  [<ffffffff81664d2d>] __slab_alloc+0x393/0x560
[ 3774.649579]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649583]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649583]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649585]  [<ffffffff81195230>] kmem_cache_alloc+0x1a0/0x200
[ 3774.649589]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649594]  [<ffffffffa01c1b30>] spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649596]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649599]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649624]  [<ffffffffa03505c0>] ? zpl_inode_destroy+0x60/0x60 [zfs]
[ 3774.649687]  [<ffffffffa033266f>] zfs_inode_alloc+0x1f/0x40 [zfs]
[ 3774.649687]  [<ffffffffa03505da>] zpl_inode_alloc+0x1a/0x70 [zfs]
[ 3774.649687]  [<ffffffff811c7e16>] alloc_inode+0x26/0xa0
[ 3774.649687]  [<ffffffff811c9e83>] new_inode_pseudo+0x13/0x60
[ 3774.649687]  [<ffffffff811c9eed>] new_inode+0x1d/0x40
[ 3774.649710]  [<ffffffffa0332ac7>] zfs_znode_alloc+0x47/0x730 [zfs]
[ 3774.649770]  [<ffffffffa02c8f4e>] ? sa_build_index+0xbe/0x1b0 [zfs]
[ 3774.649770]  [<ffffffffa02c9775>] ? sa_build_layouts+0x6b5/0xc80 [zfs]
[ 3774.649770]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649794]  [<ffffffffa0333b5e>] zfs_mknode+0x93e/0xe90 [zfs]
[ 3774.649813]  [<ffffffffa032be5b>] zfs_create+0x5db/0x780 [zfs]
[ 3774.649840]  [<ffffffffa0350ba5>] zpl_xattr_set_dir.isra.9+0x245/0x2a0 [zfs]
[ 3774.649843]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649895]  [<ffffffffa0351140>] zpl_xattr_set+0xe0/0x3f0 [zfs]
[ 3774.649895]  [<ffffffffa03516a4>] __zpl_xattr_security_init+0x64/0xb0 [zfs]
[ 3774.649968]  [<ffffffffa0351640>] ? zpl_xattr_trusted_set+0xb0/0xb0 [zfs]
[ 3774.649972]  [<ffffffff812a737c>] security_inode_init_security+0xbc/0xf0
[ 3774.649977]  [<ffffffffa0352028>] zpl_xattr_security_init+0x18/0x20 [zfs]
[ 3774.650017]  [<ffffffffa0350134>] zpl_create+0x154/0x240 [zfs]
[ 3774.650018]  [<ffffffff811bde85>] vfs_create+0xb5/0x120
[ 3774.650018]  [<ffffffff811be874>] do_last+0x984/0xe40
[ 3774.650020]  [<ffffffff811baf55>] ? link_path_walk+0x255/0x880
[ 3774.650023]  [<ffffffff811bedf2>] path_openat+0xc2/0x680
[ 3774.650026]  [<ffffffff811bf653>] do_filp_open+0x43/0xa0
[ 3774.650030]  [<ffffffff811bf615>] ? do_filp_open+0x5/0xa0
[ 3774.650034]  [<ffffffff811ae7fc>] do_sys_open+0x13c/0x230
[ 3774.650037]  [<ffffffff811ae912>] SyS_open+0x22/0x30
[ 3774.650040]  [<ffffffff81675819>] system_call_fastpath+0x16/0x1b

`zfs_mknode()` grabbed an object hash mutex via `ZFS_OBJ_HOLD_ENTER()`,
tried to allocate a znode with `zfs_znode_alloc()` and entered direct
reclaim, which tried to do `ZFS_OBJ_HOLD_ENTER()`. This is an edge case
that the kmem-rework missed. Consequently, it is a regression from
79c76d5.

We can fix this by making `ZFS_OBJ_HOLD_ENTER()` and ZFS_OBJ_HOLD_EXIT()
do calls to `spl_fstrans_mark()` and `spl_fstrans_unmark()`
respectively. We resolve this by allocating an array for each superblock
to hold the cookies.  Each cookie is protected by the corresponding
`->z_hold_mtx`.

Closes openzfs#3331

Signed-off-by: Richard Yao <ryao@gentoo.org>
ryao added a commit to ryao/zfs that referenced this issue Apr 22, 2015
…hash mutex

The following deadlock occurred on the buildbot:

[ 3774.649030] VERIFY3(((*(volatile typeof((((&((zsb))->z_hold_mtx[(((z_id)) & (256 - 1))])))->m_owner) *)&((((&((zsb))->z_hold_mtx[(((z_id)) & (256 - 1))])))->m_owner))) != get_current()) failed (ffff880036362dc0 != ffff880036362dc0)
[ 3774.649407] PANIC at zfs_znode.c:1108:zfs_zinactive()
[ 3774.649415] Showing stack for process 32119
[ 3774.649425] CPU: 3 PID: 32119 Comm: filebench Tainted: PF          O 3.11.10-100.fc18.x86_64 #1
[ 3774.649428] Hardware name: Red Hat RHEV Hypervisor, BIOS 0.5.1 01/01/2007
[ 3774.649428]  ffffffffa03a3af8 ffff880047cf2bb8 ffffffff81666676 0000000000000007
[ 3774.649430]  ffffffffa03a3b73 ffff880047cf2bc8 ffffffffa01c73e4 ffff880047cf2d68
[ 3774.649435]  ffffffffa01c761d 0000000000000003 ffff88004b1accc0 0000000000000030
[ 3774.649447] Call Trace:
[ 3774.649457]  [<ffffffff81666676>] dump_stack+0x46/0x58
[ 3774.649465]  [<ffffffffa01c73e4>] spl_dumpstack+0x44/0x50 [spl]
[ 3774.649468]  [<ffffffffa01c761d>] spl_panic+0xbd/0x100 [spl]
[ 3774.649476]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649493]  [<ffffffffa03369d5>] zfs_zinactive+0x1f5/0x240 [zfs]
[ 3774.649538]  [<ffffffffa032fb9c>] zfs_inactive+0x7c/0x430 [zfs]
[ 3774.649546]  [<ffffffffa03506fe>] zpl_evict_inode+0x4e/0xa0 [zfs]
[ 3774.649546]  [<ffffffff811c8e12>] evict+0xa2/0x1a0
[ 3774.649546]  [<ffffffff811c8f4e>] dispose_list+0x3e/0x60
[ 3774.649546]  [<ffffffff811c9cd1>] prune_icache_sb+0x161/0x300
[ 3774.649546]  [<ffffffff811b2e35>] prune_super+0xe5/0x1b0
[ 3774.649546]  [<ffffffff81153771>] shrink_slab+0x151/0x2e0
[ 3774.649546]  [<ffffffff811a9809>] ? vmpressure+0x29/0x90
[ 3774.649546]  [<ffffffff811a97e5>] ? vmpressure+0x5/0x90
[ 3774.649546]  [<ffffffff81156979>] do_try_to_free_pages+0x3e9/0x5a0
[ 3774.649548]  [<ffffffff811527ff>] ? throttle_direct_reclaim.isra.45+0x8f/0x280
[ 3774.649552]  [<ffffffff81156e38>] try_to_free_pages+0xf8/0x180
[ 3774.649556]  [<ffffffff8114ae3a>] __alloc_pages_nodemask+0x6aa/0xae0
[ 3774.649562]  [<ffffffff81189fb8>] alloc_pages_current+0xb8/0x190
[ 3774.649565]  [<ffffffff81193e30>] new_slab+0x2d0/0x3a0
[ 3774.649577]  [<ffffffff81664d2d>] __slab_alloc+0x393/0x560
[ 3774.649579]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649583]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649583]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649585]  [<ffffffff81195230>] kmem_cache_alloc+0x1a0/0x200
[ 3774.649589]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649594]  [<ffffffffa01c1b30>] spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649596]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649599]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649624]  [<ffffffffa03505c0>] ? zpl_inode_destroy+0x60/0x60 [zfs]
[ 3774.649687]  [<ffffffffa033266f>] zfs_inode_alloc+0x1f/0x40 [zfs]
[ 3774.649687]  [<ffffffffa03505da>] zpl_inode_alloc+0x1a/0x70 [zfs]
[ 3774.649687]  [<ffffffff811c7e16>] alloc_inode+0x26/0xa0
[ 3774.649687]  [<ffffffff811c9e83>] new_inode_pseudo+0x13/0x60
[ 3774.649687]  [<ffffffff811c9eed>] new_inode+0x1d/0x40
[ 3774.649710]  [<ffffffffa0332ac7>] zfs_znode_alloc+0x47/0x730 [zfs]
[ 3774.649770]  [<ffffffffa02c8f4e>] ? sa_build_index+0xbe/0x1b0 [zfs]
[ 3774.649770]  [<ffffffffa02c9775>] ? sa_build_layouts+0x6b5/0xc80 [zfs]
[ 3774.649770]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649794]  [<ffffffffa0333b5e>] zfs_mknode+0x93e/0xe90 [zfs]
[ 3774.649813]  [<ffffffffa032be5b>] zfs_create+0x5db/0x780 [zfs]
[ 3774.649840]  [<ffffffffa0350ba5>] zpl_xattr_set_dir.isra.9+0x245/0x2a0 [zfs]
[ 3774.649843]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649895]  [<ffffffffa0351140>] zpl_xattr_set+0xe0/0x3f0 [zfs]
[ 3774.649895]  [<ffffffffa03516a4>] __zpl_xattr_security_init+0x64/0xb0 [zfs]
[ 3774.649968]  [<ffffffffa0351640>] ? zpl_xattr_trusted_set+0xb0/0xb0 [zfs]
[ 3774.649972]  [<ffffffff812a737c>] security_inode_init_security+0xbc/0xf0
[ 3774.649977]  [<ffffffffa0352028>] zpl_xattr_security_init+0x18/0x20 [zfs]
[ 3774.650017]  [<ffffffffa0350134>] zpl_create+0x154/0x240 [zfs]
[ 3774.650018]  [<ffffffff811bde85>] vfs_create+0xb5/0x120
[ 3774.650018]  [<ffffffff811be874>] do_last+0x984/0xe40
[ 3774.650020]  [<ffffffff811baf55>] ? link_path_walk+0x255/0x880
[ 3774.650023]  [<ffffffff811bedf2>] path_openat+0xc2/0x680
[ 3774.650026]  [<ffffffff811bf653>] do_filp_open+0x43/0xa0
[ 3774.650030]  [<ffffffff811bf615>] ? do_filp_open+0x5/0xa0
[ 3774.650034]  [<ffffffff811ae7fc>] do_sys_open+0x13c/0x230
[ 3774.650037]  [<ffffffff811ae912>] SyS_open+0x22/0x30
[ 3774.650040]  [<ffffffff81675819>] system_call_fastpath+0x16/0x1b

`zfs_mknode()` grabbed an object hash mutex via `ZFS_OBJ_HOLD_ENTER()`,
tried to allocate a znode with `zfs_znode_alloc()` and entered direct
reclaim, which tried to do `ZFS_OBJ_HOLD_ENTER()`. This is an edge case
that the kmem-rework missed. Consequently, it is a regression from
79c76d5.

We can fix this by making `ZFS_OBJ_HOLD_ENTER()` and
`ZFS_OBJ_HOLD_EXIT()` do calls to `spl_fstrans_mark()` and
`spl_fstrans_unmark()` respectively. We resolve this by allocating an
array for each superblock to hold the cookies.  Each cookie is protected
by the corresponding `->z_hold_mtx`.

Closes openzfs#3331

Signed-off-by: Richard Yao <ryao@gentoo.org>
ryao added a commit to ryao/zfs that referenced this issue Apr 22, 2015
…hash mutex

The following deadlock occurred on the buildbot:

[ 3774.649030] VERIFY3(((*(volatile typeof((((&((zsb))->z_hold_mtx[(((z_id)) & (256 - 1))])))->m_owner) *)&((((&((zsb))->z_hold_mtx[(((z_id)) & (256 - 1))])))->m_owner))) != get_current()) failed (ffff880036362dc0 != ffff880036362dc0)
[ 3774.649407] PANIC at zfs_znode.c:1108:zfs_zinactive()
[ 3774.649415] Showing stack for process 32119
[ 3774.649425] CPU: 3 PID: 32119 Comm: filebench Tainted: PF          O 3.11.10-100.fc18.x86_64 #1
[ 3774.649428] Hardware name: Red Hat RHEV Hypervisor, BIOS 0.5.1 01/01/2007
[ 3774.649428]  ffffffffa03a3af8 ffff880047cf2bb8 ffffffff81666676 0000000000000007
[ 3774.649430]  ffffffffa03a3b73 ffff880047cf2bc8 ffffffffa01c73e4 ffff880047cf2d68
[ 3774.649435]  ffffffffa01c761d 0000000000000003 ffff88004b1accc0 0000000000000030
[ 3774.649447] Call Trace:
[ 3774.649457]  [<ffffffff81666676>] dump_stack+0x46/0x58
[ 3774.649465]  [<ffffffffa01c73e4>] spl_dumpstack+0x44/0x50 [spl]
[ 3774.649468]  [<ffffffffa01c761d>] spl_panic+0xbd/0x100 [spl]
[ 3774.649476]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649493]  [<ffffffffa03369d5>] zfs_zinactive+0x1f5/0x240 [zfs]
[ 3774.649538]  [<ffffffffa032fb9c>] zfs_inactive+0x7c/0x430 [zfs]
[ 3774.649546]  [<ffffffffa03506fe>] zpl_evict_inode+0x4e/0xa0 [zfs]
[ 3774.649546]  [<ffffffff811c8e12>] evict+0xa2/0x1a0
[ 3774.649546]  [<ffffffff811c8f4e>] dispose_list+0x3e/0x60
[ 3774.649546]  [<ffffffff811c9cd1>] prune_icache_sb+0x161/0x300
[ 3774.649546]  [<ffffffff811b2e35>] prune_super+0xe5/0x1b0
[ 3774.649546]  [<ffffffff81153771>] shrink_slab+0x151/0x2e0
[ 3774.649546]  [<ffffffff811a9809>] ? vmpressure+0x29/0x90
[ 3774.649546]  [<ffffffff811a97e5>] ? vmpressure+0x5/0x90
[ 3774.649546]  [<ffffffff81156979>] do_try_to_free_pages+0x3e9/0x5a0
[ 3774.649548]  [<ffffffff811527ff>] ? throttle_direct_reclaim.isra.45+0x8f/0x280
[ 3774.649552]  [<ffffffff81156e38>] try_to_free_pages+0xf8/0x180
[ 3774.649556]  [<ffffffff8114ae3a>] __alloc_pages_nodemask+0x6aa/0xae0
[ 3774.649562]  [<ffffffff81189fb8>] alloc_pages_current+0xb8/0x190
[ 3774.649565]  [<ffffffff81193e30>] new_slab+0x2d0/0x3a0
[ 3774.649577]  [<ffffffff81664d2d>] __slab_alloc+0x393/0x560
[ 3774.649579]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649583]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649583]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649585]  [<ffffffff81195230>] kmem_cache_alloc+0x1a0/0x200
[ 3774.649589]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649594]  [<ffffffffa01c1b30>] spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649596]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649599]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649624]  [<ffffffffa03505c0>] ? zpl_inode_destroy+0x60/0x60 [zfs]
[ 3774.649687]  [<ffffffffa033266f>] zfs_inode_alloc+0x1f/0x40 [zfs]
[ 3774.649687]  [<ffffffffa03505da>] zpl_inode_alloc+0x1a/0x70 [zfs]
[ 3774.649687]  [<ffffffff811c7e16>] alloc_inode+0x26/0xa0
[ 3774.649687]  [<ffffffff811c9e83>] new_inode_pseudo+0x13/0x60
[ 3774.649687]  [<ffffffff811c9eed>] new_inode+0x1d/0x40
[ 3774.649710]  [<ffffffffa0332ac7>] zfs_znode_alloc+0x47/0x730 [zfs]
[ 3774.649770]  [<ffffffffa02c8f4e>] ? sa_build_index+0xbe/0x1b0 [zfs]
[ 3774.649770]  [<ffffffffa02c9775>] ? sa_build_layouts+0x6b5/0xc80 [zfs]
[ 3774.649770]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649794]  [<ffffffffa0333b5e>] zfs_mknode+0x93e/0xe90 [zfs]
[ 3774.649813]  [<ffffffffa032be5b>] zfs_create+0x5db/0x780 [zfs]
[ 3774.649840]  [<ffffffffa0350ba5>] zpl_xattr_set_dir.isra.9+0x245/0x2a0 [zfs]
[ 3774.649843]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649895]  [<ffffffffa0351140>] zpl_xattr_set+0xe0/0x3f0 [zfs]
[ 3774.649895]  [<ffffffffa03516a4>] __zpl_xattr_security_init+0x64/0xb0 [zfs]
[ 3774.649968]  [<ffffffffa0351640>] ? zpl_xattr_trusted_set+0xb0/0xb0 [zfs]
[ 3774.649972]  [<ffffffff812a737c>] security_inode_init_security+0xbc/0xf0
[ 3774.649977]  [<ffffffffa0352028>] zpl_xattr_security_init+0x18/0x20 [zfs]
[ 3774.650017]  [<ffffffffa0350134>] zpl_create+0x154/0x240 [zfs]
[ 3774.650018]  [<ffffffff811bde85>] vfs_create+0xb5/0x120
[ 3774.650018]  [<ffffffff811be874>] do_last+0x984/0xe40
[ 3774.650020]  [<ffffffff811baf55>] ? link_path_walk+0x255/0x880
[ 3774.650023]  [<ffffffff811bedf2>] path_openat+0xc2/0x680
[ 3774.650026]  [<ffffffff811bf653>] do_filp_open+0x43/0xa0
[ 3774.650030]  [<ffffffff811bf615>] ? do_filp_open+0x5/0xa0
[ 3774.650034]  [<ffffffff811ae7fc>] do_sys_open+0x13c/0x230
[ 3774.650037]  [<ffffffff811ae912>] SyS_open+0x22/0x30
[ 3774.650040]  [<ffffffff81675819>] system_call_fastpath+0x16/0x1b

`zfs_mknode()` grabbed an object hash mutex via `ZFS_OBJ_HOLD_ENTER()`,
tried to allocate a znode with `zfs_znode_alloc()` and entered direct
reclaim, which tried to do `ZFS_OBJ_HOLD_ENTER()`. This is an edge case
that the kmem-rework missed. Consequently, it is a regression from
79c76d5.

We can fix this by making `ZFS_OBJ_HOLD_ENTER()` and
`ZFS_OBJ_HOLD_EXIT()` do calls to `spl_fstrans_mark()` and
`spl_fstrans_unmark()` respectively. We resolve this by allocating an
array for each superblock to hold the cookies.  Each cookie is protected
by the corresponding `->z_hold_mtx`.

Closes openzfs#3331

Signed-off-by: Richard Yao <ryao@gentoo.org>
ryao added a commit to ryao/zfs that referenced this issue Apr 22, 2015
…hash mutex

The following deadlock occurred on the buildbot:

[ 3774.649030] VERIFY3(((*(volatile typeof((((&((zsb))->z_hold_mtx[(((z_id)) & (256 - 1))])))->m_owner) *)&((((&((zsb))->z_hold_mtx[(((z_id)) & (256 - 1))])))->m_owner))) != get_current()) failed (ffff880036362dc0 != ffff880036362dc0)
[ 3774.649407] PANIC at zfs_znode.c:1108:zfs_zinactive()
[ 3774.649415] Showing stack for process 32119
[ 3774.649425] CPU: 3 PID: 32119 Comm: filebench Tainted: PF          O 3.11.10-100.fc18.x86_64 #1
[ 3774.649428] Hardware name: Red Hat RHEV Hypervisor, BIOS 0.5.1 01/01/2007
[ 3774.649428]  ffffffffa03a3af8 ffff880047cf2bb8 ffffffff81666676 0000000000000007
[ 3774.649430]  ffffffffa03a3b73 ffff880047cf2bc8 ffffffffa01c73e4 ffff880047cf2d68
[ 3774.649435]  ffffffffa01c761d 0000000000000003 ffff88004b1accc0 0000000000000030
[ 3774.649447] Call Trace:
[ 3774.649457]  [<ffffffff81666676>] dump_stack+0x46/0x58
[ 3774.649465]  [<ffffffffa01c73e4>] spl_dumpstack+0x44/0x50 [spl]
[ 3774.649468]  [<ffffffffa01c761d>] spl_panic+0xbd/0x100 [spl]
[ 3774.649476]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649493]  [<ffffffffa03369d5>] zfs_zinactive+0x1f5/0x240 [zfs]
[ 3774.649538]  [<ffffffffa032fb9c>] zfs_inactive+0x7c/0x430 [zfs]
[ 3774.649546]  [<ffffffffa03506fe>] zpl_evict_inode+0x4e/0xa0 [zfs]
[ 3774.649546]  [<ffffffff811c8e12>] evict+0xa2/0x1a0
[ 3774.649546]  [<ffffffff811c8f4e>] dispose_list+0x3e/0x60
[ 3774.649546]  [<ffffffff811c9cd1>] prune_icache_sb+0x161/0x300
[ 3774.649546]  [<ffffffff811b2e35>] prune_super+0xe5/0x1b0
[ 3774.649546]  [<ffffffff81153771>] shrink_slab+0x151/0x2e0
[ 3774.649546]  [<ffffffff811a9809>] ? vmpressure+0x29/0x90
[ 3774.649546]  [<ffffffff811a97e5>] ? vmpressure+0x5/0x90
[ 3774.649546]  [<ffffffff81156979>] do_try_to_free_pages+0x3e9/0x5a0
[ 3774.649548]  [<ffffffff811527ff>] ? throttle_direct_reclaim.isra.45+0x8f/0x280
[ 3774.649552]  [<ffffffff81156e38>] try_to_free_pages+0xf8/0x180
[ 3774.649556]  [<ffffffff8114ae3a>] __alloc_pages_nodemask+0x6aa/0xae0
[ 3774.649562]  [<ffffffff81189fb8>] alloc_pages_current+0xb8/0x190
[ 3774.649565]  [<ffffffff81193e30>] new_slab+0x2d0/0x3a0
[ 3774.649577]  [<ffffffff81664d2d>] __slab_alloc+0x393/0x560
[ 3774.649579]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649583]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649583]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649585]  [<ffffffff81195230>] kmem_cache_alloc+0x1a0/0x200
[ 3774.649589]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649594]  [<ffffffffa01c1b30>] spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649596]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649599]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649624]  [<ffffffffa03505c0>] ? zpl_inode_destroy+0x60/0x60 [zfs]
[ 3774.649687]  [<ffffffffa033266f>] zfs_inode_alloc+0x1f/0x40 [zfs]
[ 3774.649687]  [<ffffffffa03505da>] zpl_inode_alloc+0x1a/0x70 [zfs]
[ 3774.649687]  [<ffffffff811c7e16>] alloc_inode+0x26/0xa0
[ 3774.649687]  [<ffffffff811c9e83>] new_inode_pseudo+0x13/0x60
[ 3774.649687]  [<ffffffff811c9eed>] new_inode+0x1d/0x40
[ 3774.649710]  [<ffffffffa0332ac7>] zfs_znode_alloc+0x47/0x730 [zfs]
[ 3774.649770]  [<ffffffffa02c8f4e>] ? sa_build_index+0xbe/0x1b0 [zfs]
[ 3774.649770]  [<ffffffffa02c9775>] ? sa_build_layouts+0x6b5/0xc80 [zfs]
[ 3774.649770]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649794]  [<ffffffffa0333b5e>] zfs_mknode+0x93e/0xe90 [zfs]
[ 3774.649813]  [<ffffffffa032be5b>] zfs_create+0x5db/0x780 [zfs]
[ 3774.649840]  [<ffffffffa0350ba5>] zpl_xattr_set_dir.isra.9+0x245/0x2a0 [zfs]
[ 3774.649843]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649895]  [<ffffffffa0351140>] zpl_xattr_set+0xe0/0x3f0 [zfs]
[ 3774.649895]  [<ffffffffa03516a4>] __zpl_xattr_security_init+0x64/0xb0 [zfs]
[ 3774.649968]  [<ffffffffa0351640>] ? zpl_xattr_trusted_set+0xb0/0xb0 [zfs]
[ 3774.649972]  [<ffffffff812a737c>] security_inode_init_security+0xbc/0xf0
[ 3774.649977]  [<ffffffffa0352028>] zpl_xattr_security_init+0x18/0x20 [zfs]
[ 3774.650017]  [<ffffffffa0350134>] zpl_create+0x154/0x240 [zfs]
[ 3774.650018]  [<ffffffff811bde85>] vfs_create+0xb5/0x120
[ 3774.650018]  [<ffffffff811be874>] do_last+0x984/0xe40
[ 3774.650020]  [<ffffffff811baf55>] ? link_path_walk+0x255/0x880
[ 3774.650023]  [<ffffffff811bedf2>] path_openat+0xc2/0x680
[ 3774.650026]  [<ffffffff811bf653>] do_filp_open+0x43/0xa0
[ 3774.650030]  [<ffffffff811bf615>] ? do_filp_open+0x5/0xa0
[ 3774.650034]  [<ffffffff811ae7fc>] do_sys_open+0x13c/0x230
[ 3774.650037]  [<ffffffff811ae912>] SyS_open+0x22/0x30
[ 3774.650040]  [<ffffffff81675819>] system_call_fastpath+0x16/0x1b

`zfs_mknode()` grabbed an object hash mutex via `ZFS_OBJ_HOLD_ENTER()`,
tried to allocate a znode with `zfs_znode_alloc()` and entered direct
reclaim, which tried to do `ZFS_OBJ_HOLD_ENTER()`. This is an edge case
that the kmem-rework missed. Consequently, it is a regression from
79c76d5.

We can fix this by making `ZFS_OBJ_HOLD_ENTER()` and
`ZFS_OBJ_HOLD_EXIT()` do calls to `spl_fstrans_mark()` and
`spl_fstrans_unmark()` respectively. We resolve this by allocating an
array for each superblock to hold the cookies.  Each cookie is protected
by the corresponding `->z_hold_mtx`.

Closes openzfs#3331

Signed-off-by: Richard Yao <ryao@gentoo.org>
@tuxoko
Copy link
Contributor

tuxoko commented Apr 22, 2015

@dweeezil
You're not covering enough in zpl_create, zpl_mknod, zpl_mkdir, zpl_symlink
You need to at least cover zpl_xattr_security_init and zpl_init_acl

@behlendorf
Copy link
Contributor

This issue should be addressed by 7fad629 which was merged to master several days ago. It will also appear shortly in an 0.6.4.1 point release because it addresses this significant issue. It doesn't cover zpl_xattr_security_init or zpl_init_acl but it does cover the common zpl_xattr_set function where this deadlock occurs.

@tuxoko
Copy link
Contributor

tuxoko commented Apr 22, 2015

@behlendorf
I don't think covering zpl_xattr_set is enough.

@behlendorf
Copy link
Contributor

@tuxoko which part? The bit about not covering enough in zpl_create, zpl_mknod, zpl_mkdir, zpl_symlink? It's true we've not covering everything in those functions but we shouldn't need to because we're not holding any critical resources in those cases that I see. Can you specifically call out a call path of concern. This stuff is subtle so I can completely believe we've overlooked something.

@tuxoko
Copy link
Contributor

tuxoko commented Apr 22, 2015

@behlendorf
Sorry, I was a bit confused as well.
There's zpl_init_acl calls to zfs_mark_inode_dirty, for whatever the reason, zfs_mark_inode_dirty needs to be protected.
Edit: There might be other functions between them that needs to be protected, but I'm not sure.

@behlendorf
Copy link
Contributor

@tuxoko maybe I'm just being thick but I don't see the issue. zpl_init_acl()->zfs_mark_inode_dirty()->zpl_dirty_inode() might enter direct reclaim and block but it's not holding any critical resource we'd deadlock on. It's just got a read lock on the z_teardown_lock which is OK. That said, I'm not opposed to extending the PF_FSTRANS over zpl_xattr_security_init and zpl_init_acl. I could believe I'm missing something and it would protect us from accidentally introducing a similar issue here in the future.

@tuxoko
Copy link
Contributor

tuxoko commented Apr 22, 2015

@behlendorf
It will recurse in dmu_tx_assign
#3225 (comment)

Edit: Seeing the stack, maybe we should also mark zpl_dirty_inode

@behlendorf
Copy link
Contributor

@tuxoko Ahh, that call trace makes everything crystal clear. Yes, that would be bad. OK, zpl_dirty_inode() clearly needs to be covered in all cases. Let's extend PF_FSTRANS as you suggested in zpl_create, etc and than also in zpl_dirty_inode() because we could arrive here legitimately for other reasons.

@tuxoko
Copy link
Contributor

tuxoko commented Apr 22, 2015

@behlendorf
Yeah, we should better play it safe.
Also, sorry that I was a bit lazy, so I didn't dig up the trace in the first place.

@behlendorf
Copy link
Contributor

@tuxoko Wanna make up for it by proposing a patch. ;) Also as an aside I hope to get a chance to review your refreshed tsd patches!

@ryao
Copy link
Contributor Author

ryao commented Apr 22, 2015

We could probably tackle things more cleanly by targetting the locks themselves like I did in #3332. #3225 mentions ->z_hold_mtx and zf_rwlock. Are those the only locks for which we have stack traces? Where can I find the stack traces for zf_rwlock?

kernelOfTruth pushed a commit to kernelOfTruth/zfs that referenced this issue Apr 22, 2015
…hash mutex

The following deadlock occurred on the buildbot:

[ 3774.649030] VERIFY3(((*(volatile typeof((((&((zsb))->z_hold_mtx[(((z_id)) & (256 - 1))])))->m_owner) *)&((((&((zsb))->z_hold_mtx[(((z_id)) & (256 - 1))])))->m_owner))) != get_current()) failed (ffff880036362dc0 != ffff880036362dc0)
[ 3774.649407] PANIC at zfs_znode.c:1108:zfs_zinactive()
[ 3774.649415] Showing stack for process 32119
[ 3774.649425] CPU: 3 PID: 32119 Comm: filebench Tainted: PF          O 3.11.10-100.fc18.x86_64 openzfs#1
[ 3774.649428] Hardware name: Red Hat RHEV Hypervisor, BIOS 0.5.1 01/01/2007
[ 3774.649428]  ffffffffa03a3af8 ffff880047cf2bb8 ffffffff81666676 0000000000000007
[ 3774.649430]  ffffffffa03a3b73 ffff880047cf2bc8 ffffffffa01c73e4 ffff880047cf2d68
[ 3774.649435]  ffffffffa01c761d 0000000000000003 ffff88004b1accc0 0000000000000030
[ 3774.649447] Call Trace:
[ 3774.649457]  [<ffffffff81666676>] dump_stack+0x46/0x58
[ 3774.649465]  [<ffffffffa01c73e4>] spl_dumpstack+0x44/0x50 [spl]
[ 3774.649468]  [<ffffffffa01c761d>] spl_panic+0xbd/0x100 [spl]
[ 3774.649476]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649493]  [<ffffffffa03369d5>] zfs_zinactive+0x1f5/0x240 [zfs]
[ 3774.649538]  [<ffffffffa032fb9c>] zfs_inactive+0x7c/0x430 [zfs]
[ 3774.649546]  [<ffffffffa03506fe>] zpl_evict_inode+0x4e/0xa0 [zfs]
[ 3774.649546]  [<ffffffff811c8e12>] evict+0xa2/0x1a0
[ 3774.649546]  [<ffffffff811c8f4e>] dispose_list+0x3e/0x60
[ 3774.649546]  [<ffffffff811c9cd1>] prune_icache_sb+0x161/0x300
[ 3774.649546]  [<ffffffff811b2e35>] prune_super+0xe5/0x1b0
[ 3774.649546]  [<ffffffff81153771>] shrink_slab+0x151/0x2e0
[ 3774.649546]  [<ffffffff811a9809>] ? vmpressure+0x29/0x90
[ 3774.649546]  [<ffffffff811a97e5>] ? vmpressure+0x5/0x90
[ 3774.649546]  [<ffffffff81156979>] do_try_to_free_pages+0x3e9/0x5a0
[ 3774.649548]  [<ffffffff811527ff>] ? throttle_direct_reclaim.isra.45+0x8f/0x280
[ 3774.649552]  [<ffffffff81156e38>] try_to_free_pages+0xf8/0x180
[ 3774.649556]  [<ffffffff8114ae3a>] __alloc_pages_nodemask+0x6aa/0xae0
[ 3774.649562]  [<ffffffff81189fb8>] alloc_pages_current+0xb8/0x190
[ 3774.649565]  [<ffffffff81193e30>] new_slab+0x2d0/0x3a0
[ 3774.649577]  [<ffffffff81664d2d>] __slab_alloc+0x393/0x560
[ 3774.649579]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649583]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649583]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649585]  [<ffffffff81195230>] kmem_cache_alloc+0x1a0/0x200
[ 3774.649589]  [<ffffffffa01c1b30>] ? spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649594]  [<ffffffffa01c1b30>] spl_kmem_cache_alloc+0xb0/0xee0 [spl]
[ 3774.649596]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649599]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649624]  [<ffffffffa03505c0>] ? zpl_inode_destroy+0x60/0x60 [zfs]
[ 3774.649687]  [<ffffffffa033266f>] zfs_inode_alloc+0x1f/0x40 [zfs]
[ 3774.649687]  [<ffffffffa03505da>] zpl_inode_alloc+0x1a/0x70 [zfs]
[ 3774.649687]  [<ffffffff811c7e16>] alloc_inode+0x26/0xa0
[ 3774.649687]  [<ffffffff811c9e83>] new_inode_pseudo+0x13/0x60
[ 3774.649687]  [<ffffffff811c9eed>] new_inode+0x1d/0x40
[ 3774.649710]  [<ffffffffa0332ac7>] zfs_znode_alloc+0x47/0x730 [zfs]
[ 3774.649770]  [<ffffffffa02c8f4e>] ? sa_build_index+0xbe/0x1b0 [zfs]
[ 3774.649770]  [<ffffffffa02c9775>] ? sa_build_layouts+0x6b5/0xc80 [zfs]
[ 3774.649770]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649794]  [<ffffffffa0333b5e>] zfs_mknode+0x93e/0xe90 [zfs]
[ 3774.649813]  [<ffffffffa032be5b>] zfs_create+0x5db/0x780 [zfs]
[ 3774.649840]  [<ffffffffa0350ba5>] zpl_xattr_set_dir.isra.9+0x245/0x2a0 [zfs]
[ 3774.649843]  [<ffffffff81675440>] ? ftrace_call+0x5/0x2f
[ 3774.649895]  [<ffffffffa0351140>] zpl_xattr_set+0xe0/0x3f0 [zfs]
[ 3774.649895]  [<ffffffffa03516a4>] __zpl_xattr_security_init+0x64/0xb0 [zfs]
[ 3774.649968]  [<ffffffffa0351640>] ? zpl_xattr_trusted_set+0xb0/0xb0 [zfs]
[ 3774.649972]  [<ffffffff812a737c>] security_inode_init_security+0xbc/0xf0
[ 3774.649977]  [<ffffffffa0352028>] zpl_xattr_security_init+0x18/0x20 [zfs]
[ 3774.650017]  [<ffffffffa0350134>] zpl_create+0x154/0x240 [zfs]
[ 3774.650018]  [<ffffffff811bde85>] vfs_create+0xb5/0x120
[ 3774.650018]  [<ffffffff811be874>] do_last+0x984/0xe40
[ 3774.650020]  [<ffffffff811baf55>] ? link_path_walk+0x255/0x880
[ 3774.650023]  [<ffffffff811bedf2>] path_openat+0xc2/0x680
[ 3774.650026]  [<ffffffff811bf653>] do_filp_open+0x43/0xa0
[ 3774.650030]  [<ffffffff811bf615>] ? do_filp_open+0x5/0xa0
[ 3774.650034]  [<ffffffff811ae7fc>] do_sys_open+0x13c/0x230
[ 3774.650037]  [<ffffffff811ae912>] SyS_open+0x22/0x30
[ 3774.650040]  [<ffffffff81675819>] system_call_fastpath+0x16/0x1b

`zfs_mknode()` grabbed an object hash mutex via `ZFS_OBJ_HOLD_ENTER()`,
tried to allocate a znode with `zfs_znode_alloc()` and entered direct
reclaim, which tried to do `ZFS_OBJ_HOLD_ENTER()`. This is an edge case
that the kmem-rework missed. Consequently, it is a regression from
79c76d5.

We can fix this by making `ZFS_OBJ_HOLD_ENTER()` and
`ZFS_OBJ_HOLD_EXIT()` do calls to `spl_fstrans_mark()` and
`spl_fstrans_unmark()` respectively. We resolve this by allocating an
array for each superblock to hold the cookies.  Each cookie is protected
by the corresponding `->z_hold_mtx`.

Closes openzfs#3331

Signed-off-by: Richard Yao <ryao@gentoo.org>
behlendorf added a commit to behlendorf/zfs that referenced this issue Apr 23, 2015
Additional testing has shown that the region covered by PF_FSTRANS
needs to be extended to cover the  zpl_xattr_security_init() and
init_acl() functions.  The zpl_mark_dirty() function can also recurse
and therefore must always be protected.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#3331
@behlendorf
Copy link
Contributor

@tuxoko I've proposed #3336 to address the remaining issues identified here. Can you please review.

kernelOfTruth pushed a commit to kernelOfTruth/zfs that referenced this issue Apr 23, 2015
Additional testing has shown that the region covered by PF_FSTRANS
needs to be extended to cover the  zpl_xattr_security_init() and
init_acl() functions.  The zpl_mark_dirty() function can also recurse
and therefore must always be protected.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#3331
behlendorf added a commit that referenced this issue Apr 24, 2015
Additional testing has shown that the region covered by PF_FSTRANS
needs to be extended to cover the  zpl_xattr_security_init() and
init_acl() functions.  The zpl_mark_dirty() function can also recurse
and therefore must always be protected.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes #3331
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants