Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic after setting dnodesize=auto #8705

Open
octomike opened this issue May 3, 2019 · 11 comments
Open

panic after setting dnodesize=auto #8705

octomike opened this issue May 3, 2019 · 11 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@octomike
Copy link

octomike commented May 3, 2019

System information

Type Version/Name
Distribution Name Debian
Distribution Version 9.8
Linux Kernel 4.19.0-0.bpo.4-amd64
Architecture amd64
ZFS Version 0.7.12-1~bpo9+1
SPL Version 0.7.12-1~bpo9+1

Describe the problem you're observing

I use beegfs and the metadata part is stored in extended attributes, so I also had xattr=sa since the beginning. After setting dnodesize=auto today I received a panic during a benchmark.

Include any warning/errors/backtraces from the system logs

[1646235.322083] VERIFY(dnode_add_ref(dn, (void *)(uintptr_t)tx->tx_txg)) failed
[1646235.322107] PANIC at dnode.c:1634:dnode_setdirty()
[1646235.322118] Showing stack for process 7725
[1646235.322120] CPU: 2 PID: 7725 Comm: Worker7 Tainted: P           OE     4.19.0-0.bpo.4-amd64 #1 Debian 4.19.28-2~bpo9+1
[1646235.322121] Hardware name: Dell Inc. PowerEdge R740xd/0RR8YK, BIOS 1.4.9 06/29/2018
[1646235.322121] Call Trace:
[1646235.322128]  dump_stack+0x5c/0x7b
[1646235.322138]  spl_panic+0xc8/0x110 [spl]
[1646235.322142]  ? spl_kmem_cache_alloc+0x11c/0x7e0 [spl]
[1646235.322145]  ? remove_wait_queue+0x60/0x60
[1646235.322176]  ? dbuf_rele_and_unlock+0x275/0x4a0 [zfs]
[1646235.322178]  ? _cond_resched+0x16/0x40
[1646235.322179]  ? mutex_lock+0xe/0x30
[1646235.322193]  ? dmu_objset_userquota_get_ids+0x208/0x3b0 [zfs]
[1646235.322209]  dnode_setdirty+0xd4/0xe0 [zfs]
[1646235.322224]  dnode_allocate+0x181/0x1f0 [zfs]
[1646235.322237]  dmu_object_alloc_dnsize+0x318/0x3a0 [zfs]
[1646235.322240]  ? __kmalloc_node+0x1c9/0x290
[1646235.322260]  zfs_mknode+0x11b/0xd90 [zfs]
[1646235.322281]  ? txg_rele_to_quiesce+0x26/0x40 [zfs]
[1646235.322296]  ? dmu_tx_assign+0x31e/0x460 [zfs]
[1646235.322316]  zfs_create+0x640/0x8c0 [zfs]
[1646235.322336]  zpl_create+0xa4/0x160 [zfs]
[1646235.322339]  path_openat+0x12d9/0x13e0
[1646235.322341]  do_filp_open+0x99/0x110
[1646235.322345]  ? inet_recvmsg+0x5b/0xd0
[1646235.322348]  ? __check_object_size+0x161/0x1a0
[1646235.322349]  ? do_sys_open+0x12e/0x210
[1646235.322350]  do_sys_open+0x12e/0x210
[1646235.322353]  do_syscall_64+0x55/0x110
[1646235.322355]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1646235.322357] RIP: 0033:0x7fe83a01a85d
[1646235.322358] Code: bb 20 00 00 75 10 b8 02 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 1e f6 ff ff 48 89 04 24 b8 02 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 67 f6 ff ff 48 89 d0 48 83 c4 08 48 3d 01
[1646235.322359] RSP: 002b:00007fe815ef5db0 EFLAGS: 00000293 ORIG_RAX: 0000000000000002
[1646235.322361] RAX: ffffffffffffffda RBX: 00007fe815ef7040 RCX: 00007fe83a01a85d
[1646235.322361] RDX: 00000000000001a4 RSI: 00000000000000c1 RDI: 00007fe7f800c150
[1646235.322362] RBP: 00007fe815ef6fc0 R08: 00007fe7f800ee00 R09: 0000000000000000
[1646235.322362] R10: 0000000000000000 R11: 0000000000000293 R12: 00007fe815ef5f40
[1646235.322363] R13: 00007fe815ef7760 R14: 00007fe7dc0037b0 R15: 00007fe815ef7760
[1646384.703867] INFO: task txg_quiesce:1737 blocked for more than 120 seconds.
[1646384.703923]       Tainted: P           OE     4.19.0-0.bpo.4-amd64 #1 Debian 4.19.28-2~bpo9+1
[1646384.703975] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1646384.704024] txg_quiesce     D    0  1737      2 0x80000000
[1646384.704029] Call Trace:
[1646384.704041]  ? __schedule+0x3f5/0x880
[1646384.704045]  schedule+0x32/0x80
[1646384.704061]  cv_wait_common+0x115/0x130 [spl]
[1646384.704069]  ? remove_wait_queue+0x60/0x60
[1646384.704152]  txg_quiesce_thread+0x2d0/0x3e0 [zfs]
[1646384.704230]  ? txg_wait_open+0xf0/0xf0 [zfs]
[1646384.704239]  thread_generic_wrapper+0x6f/0x80 [spl]
[1646384.704245]  kthread+0xf8/0x130
[1646384.704252]  ? __thread_exit+0x20/0x20 [spl]
[1646384.704256]  ? kthread_create_worker_on_cpu+0x70/0x70
[1646384.704259]  ret_from_fork+0x35/0x40

@octomike octomike changed the title panic after setting dnodesize=auto panic after setting dnodesize=auto May 3, 2019
@behlendorf behlendorf added the Type: Defect Incorrect behavior (e.g. crash, hang) label May 6, 2019
@behlendorf
Copy link
Contributor

Thanks for letting us know so we can look in to it.

@adilger
Copy link
Contributor

adilger commented Dec 20, 2019

We hit a similar problem with Lustre and ZFS 0.7.13 during a benchmark run:

kernel: VERIFY(dnode_add_ref(dn, (void *)(uintptr_t)tx->tx_txg)) failed
kernel: PANIC at dnode.c:1635:dnode_setdirty()
kernel: Showing stack for process 19341
kernel: CPU: 9 PID: 19341 Comm: mdt01_040 Kdump: loaded Tainted: P           OE  ------------   3.10.0-957.27.2.el7.patched.x86_64 #1

@cjm14
Copy link

cjm14 commented Feb 11, 2020

We are also seeing this with Lustre 2.12.3/ZFS 0.7.13 on one of our HPC clusters.
Centos 7, kernel 3.10.0-1062.4.1.el7.x86_64
ZFS version 0.7.13

We've seen 2 instances in 2 days - stack traces follow - is there any additional information I can provide that would be useful if this happens again?

Following the PANIC I/O to the affected zpool appears to hang and the zpool cannot be exported, however zpool status shows the zpool as online - is this expected behaviour? Following a manual failover the zpool was able to be imported and mounted without ny problems

February 6th - 2020
Feb 6 04:27:45 amds02a kernel: VERIFY(dnode_add_ref(dn, (void *)(uintptr_t)tx->tx_txg)) failed
Feb 6 04:27:45 amds02a kernel: PANIC at dnode.c:1635:dnode_setdirty()
Feb 6 04:27:45 amds02a kernel: Showing stack for process 44943
Feb 6 04:27:45 amds02a kernel: CPU: 36 PID: 44943 Comm: mdt01_026 Tainted: P OE ------------ 3.10.0-1062.4.1.el7.x86_64 #1
Feb 6 04:27:45 amds02a kernel: Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 11/13/2019
Feb 6 04:27:45 amds02a kernel: Call Trace:
Feb 6 04:27:45 amds02a kernel: [] dump_stack+0x19/0x1b
Feb 6 04:27:45 amds02a kernel: [] spl_dumpstack+0x44/0x50 [spl]
Feb 6 04:27:45 amds02a kernel: [] spl_panic+0xc9/0x110 [spl]
Feb 6 04:27:45 amds02a kernel: [] ? __mutex_unlock_slowpath+0x70/0x90
Feb 6 04:27:45 amds02a kernel: [] ? dbuf_rele_and_unlock+0x283/0x4c0 [zfs]
Feb 6 04:27:45 amds02a kernel: [] ? __cv_init+0x41/0x60 [spl]
Feb 6 04:27:45 amds02a kernel: [] ? dnode_cons+0x24a/0x260 [zfs]
Feb 6 04:27:45 amds02a kernel: [] ? spl_kmem_cache_alloc+0xbd/0x150 [spl]
Feb 6 04:27:45 amds02a kernel: [] ? mutex_lock+0x12/0x2f
Feb 6 04:27:45 amds02a kernel: [] ? dmu_objset_userquota_get_ids+0x23c/0x440 [zfs]
Feb 6 04:27:45 amds02a kernel: [] dnode_setdirty+0xe9/0xf0 [zfs]
Feb 6 04:27:45 amds02a kernel: [] dnode_allocate+0x18c/0x230 [zfs]
Feb 6 04:27:45 amds02a kernel: [] dmu_object_alloc_dnsize+0x34b/0x3e0 [zfs]
Feb 6 04:27:45 amds02a kernel: [] __osd_object_create+0x82/0x170 [osd_zfs]
Feb 6 04:27:45 amds02a kernel: [] osd_mkreg+0x7d/0x210 [osd_zfs]
Feb 6 04:27:45 amds02a kernel: [] osd_create+0x316/0xaf0 [osd_zfs]
Feb 6 04:27:45 amds02a kernel: [] lod_sub_create+0x1f5/0x480 [lod]
Feb 6 04:27:45 amds02a kernel: [] ? __kmalloc_node+0x5c/0x2b0
Feb 6 04:27:45 amds02a kernel: [] lod_create+0x69/0x340 [lod]
Feb 6 04:27:45 amds02a kernel: [] mdd_create_object_internal+0xb8/0x280 [mdd]
Feb 6 04:27:45 amds02a kernel: [] mdd_create_object+0x75/0x820 [mdd]
Feb 6 04:27:45 amds02a kernel: [] mdd_create+0xe31/0x14e0 [mdd]
Feb 6 04:27:45 amds02a kernel: [] mdt_reint_open+0x224f/0x3240 [mdt]
Feb 6 04:27:45 amds02a kernel: [] ? upcall_cache_get_entry+0x218/0x8b0 [obdclass]
Feb 6 04:27:45 amds02a kernel: [] mdt_reint_rec+0x83/0x210 [mdt]
Feb 6 04:27:45 amds02a kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt]
Feb 6 04:27:45 amds02a kernel: [] ? mdt_intent_fixup_resent+0x36/0x220 [mdt]
Feb 6 04:27:45 amds02a kernel: [] mdt_intent_open+0x82/0x3a0 [mdt]
Feb 6 04:27:45 amds02a kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
Feb 6 04:27:45 amds02a kernel: [] mdt_intent_policy+0x435/0xd80 [mdt]
Feb 6 04:27:45 amds02a kernel: [] ? mdt_intent_fixup_resent+0x220/0x220 [mdt]
Feb 6 04:27:45 amds02a kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc]
Feb 6 04:27:45 amds02a kernel: [] ? cfs_hash_bd_add_locked+0x63/0x80 [libcfs]
Feb 6 04:27:45 amds02a kernel: [] ? cfs_hash_add+0xbe/0x1a0 [libcfs]
Feb 6 04:27:45 amds02a kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc]
Feb 6 04:27:45 amds02a kernel: [] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc]
Feb 6 04:27:45 amds02a kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc]
Feb 6 04:27:45 amds02a kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc]
Feb 6 04:27:45 amds02a kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc]
Feb 6 04:27:45 amds02a kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs]
Feb 6 04:27:45 amds02a kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
Feb 6 04:27:45 amds02a kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc]
Feb 6 04:27:45 amds02a kernel: [] ? wake_up_state+0x20/0x20
Feb 6 04:27:45 amds02a kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc]
Feb 6 04:27:45 amds02a kernel: [] ? __schedule+0x448/0x9c0
Feb 6 04:27:45 amds02a kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc]
Feb 6 04:27:45 amds02a kernel: [] kthread+0xd1/0xe0
Feb 6 04:27:45 amds02a kernel: [] ? insert_kthread_work+0x40/0x40
Feb 6 04:27:45 amds02a kernel: [] ret_from_fork_nospec_begin+0x7/0x21
Feb 6 04:27:45 amds02a kernel: [] ? insert_kthread_work+0x40/0x40

Feb 8th 2020

Feb 8 12:47:37 amds02a kernel: VERIFY(dnode_add_ref(dn, (void *)(uintptr_t)tx->tx_txg)) failed
Feb 8 12:47:37 amds02a kernel: PANIC at dnode.c:1635:dnode_setdirty()
Feb 8 12:47:37 amds02a kernel: Showing stack for process 5058
Feb 8 12:47:37 amds02a kernel: CPU: 17 PID: 5058 Comm: mdt01_102 Tainted: P OE ------------ 3.10.0-1062.4.1.el7.x86_64 #1
Feb 8 12:47:37 amds02a kernel: Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 11/13/2019
Feb 8 12:47:37 amds02a kernel: Call Trace:
Feb 8 12:47:37 amds02a kernel: [] dump_stack+0x19/0x1b
Feb 8 12:47:37 amds02a kernel: [] spl_dumpstack+0x44/0x50 [spl]
Feb 8 12:47:37 amds02a kernel: [] spl_panic+0xc9/0x110 [spl]
Feb 8 12:47:37 amds02a kernel: [] ? dbuf_rele_and_unlock+0x283/0x4c0 [zfs]
Feb 8 12:47:37 amds02a kernel: [] ? __cv_init+0x41/0x60 [spl]
Feb 8 12:47:37 amds02a kernel: [] ? dnode_cons+0x24a/0x260 [zfs]
Feb 8 12:47:37 amds02a kernel: [] ? spl_kmem_cache_alloc+0xbd/0x150 [spl]
Feb 8 12:47:37 amds02a kernel: [] ? mutex_lock+0x12/0x2f
Feb 8 12:47:37 amds02a kernel: [] ? dmu_objset_userquota_get_ids+0x23c/0x440 [zfs]
Feb 8 12:47:37 amds02a kernel: [] dnode_setdirty+0xe9/0xf0 [zfs]
Feb 8 12:47:37 amds02a kernel: [] dnode_allocate+0x18c/0x230 [zfs]
Feb 8 12:47:37 amds02a kernel: [] dmu_object_alloc_dnsize+0x34b/0x3e0 [zfs]
Feb 8 12:47:37 amds02a kernel: [] __osd_object_create+0x82/0x170 [osd_zfs]
Feb 8 12:47:37 amds02a kernel: [] osd_mkreg+0x7d/0x210 [osd_zfs]
Feb 8 12:47:37 amds02a kernel: [] osd_create+0x316/0xaf0 [osd_zfs]
Feb 8 12:47:37 amds02a kernel: [] lod_sub_create+0x1f5/0x480 [lod]
Feb 8 12:47:37 amds02a kernel: [] ? __kmalloc_node+0x5c/0x2b0
Feb 8 12:47:37 amds02a kernel: [] lod_create+0x69/0x340 [lod]
Feb 8 12:47:37 amds02a kernel: [] mdd_create_object_internal+0xb8/0x280 [mdd]
Feb 8 12:47:37 amds02a kernel: [] mdd_create_object+0x75/0x820 [mdd]
Feb 8 12:47:37 amds02a kernel: [] mdd_create+0xe31/0x14e0 [mdd]
Feb 8 12:47:37 amds02a kernel: [] mdt_reint_open+0x224f/0x3240 [mdt]
Feb 8 12:47:37 amds02a kernel: [] ? upcall_cache_get_entry+0x218/0x8b0 [obdclass]
Feb 8 12:47:37 amds02a kernel: [] mdt_reint_rec+0x83/0x210 [mdt]
Feb 8 12:47:37 amds02a kernel: [] mdt_reint_internal+0x6e3/0xaf0 [mdt]
Feb 8 12:47:37 amds02a kernel: [] ? mdt_intent_fixup_resent+0x36/0x220 [mdt]
Feb 8 12:47:37 amds02a kernel: [] mdt_intent_open+0x82/0x3a0 [mdt]
Feb 8 12:47:37 amds02a kernel: [] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
Feb 8 12:47:37 amds02a kernel: [] mdt_intent_policy+0x435/0xd80 [mdt]
Feb 8 12:47:37 amds02a kernel: [] ? mdt_intent_fixup_resent+0x220/0x220 [mdt]
Feb 8 12:47:37 amds02a kernel: [] ldlm_lock_enqueue+0x356/0xa20 [ptlrpc]
Feb 8 12:47:37 amds02a kernel: [] ? cfs_hash_bd_add_locked+0x63/0x80 [libcfs]
Feb 8 12:47:37 amds02a kernel: [] ? cfs_hash_add+0xbe/0x1a0 [libcfs]
Feb 8 12:47:37 amds02a kernel: [] ldlm_handle_enqueue0+0xa56/0x15f0 [ptlrpc]
Feb 8 12:47:37 amds02a kernel: [] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc]
Feb 8 12:47:37 amds02a kernel: [] tgt_enqueue+0x62/0x210 [ptlrpc]
Feb 8 12:47:37 amds02a kernel: [] tgt_request_handle+0xaea/0x1580 [ptlrpc]
Feb 8 12:47:37 amds02a kernel: [] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc]
Feb 8 12:47:37 amds02a kernel: [] ? ktime_get_real_seconds+0xe/0x10 [libcfs]
Feb 8 12:47:37 amds02a kernel: [] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
Feb 8 12:47:37 amds02a kernel: [] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc]
Feb 8 12:47:37 amds02a kernel: [] ? wake_up_state+0x20/0x20
Feb 8 12:47:37 amds02a kernel: [] ptlrpc_main+0xb2c/0x1460 [ptlrpc]
Feb 8 12:47:37 amds02a kernel: [] ? __schedule+0x448/0x9c0
Feb 8 12:47:37 amds02a kernel: [] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc]
Feb 8 12:47:37 amds02a kernel: [] kthread+0xd1/0xe0
Feb 8 12:47:37 amds02a kernel: [] ? insert_kthread_work+0x40/0x40
Feb 8 12:47:37 amds02a kernel: [] ret_from_fork_nospec_begin+0x7/0x21
Feb 8 12:47:37 amds02a kernel: [] ? insert_kthread_work+0x40/0x40

The server has 2 lustre MDT zpools during normal operation, both with the same setup:

[root@amds02a ~]# zpool get all alicemdt00
NAME PROPERTY VALUE SOURCE
alicemdt00 size 4.36T -
alicemdt00 capacity 23% -
alicemdt00 altroot - default
alicemdt00 health ONLINE -
alicemdt00 guid 6272857209922850427 -
alicemdt00 version - default
alicemdt00 bootfs - default
alicemdt00 delegation on default
alicemdt00 autoreplace off default
alicemdt00 cachefile none local
alicemdt00 failmode wait default
alicemdt00 listsnapshots off default
alicemdt00 autoexpand off default
alicemdt00 dedupditto 0 default
alicemdt00 dedupratio 1.00x -
alicemdt00 free 3.32T -
alicemdt00 allocated 1.04T -
alicemdt00 readonly off -
alicemdt00 ashift 12 local
alicemdt00 comment - default
alicemdt00 expandsize - -
alicemdt00 freeing 0 -
alicemdt00 fragmentation 48% -
alicemdt00 leaked 0 -
alicemdt00 multihost on local
alicemdt00 feature@async_destroy enabled local
alicemdt00 feature@empty_bpobj active local
alicemdt00 feature@lz4_compress active local
alicemdt00 feature@multi_vdev_crash_dump enabled local
alicemdt00 feature@spacemap_histogram active local
alicemdt00 feature@enabled_txg active local
alicemdt00 feature@hole_birth active local
alicemdt00 feature@extensible_dataset active local
alicemdt00 feature@embedded_data active local
alicemdt00 feature@bookmarks enabled local
alicemdt00 feature@filesystem_limits enabled local
alicemdt00 feature@large_blocks enabled local
alicemdt00 feature@large_dnode active local
alicemdt00 feature@sha512 enabled local
alicemdt00 feature@skein enabled local
alicemdt00 feature@edonr enabled local
alicemdt00 feature@userobj_accounting active local
[root@amds02a ~]# zpool status alicemdt00
pool: alicemdt00
state: ONLINE
scan: resilvered 96K in 0h0m with 0 errors on Tue Dec 10 13:42:38 2019
config:

NAME                  STATE     READ WRITE CKSUM
alicemdt00            ONLINE       0     0     0
  mirror-0            ONLINE       0     0     0
    A12-amds02j1-001  ONLINE       0     0     0
    A12-amds02j1-002  ONLINE       0     0     0
  mirror-1            ONLINE       0     0     0
    A12-amds02j1-003  ONLINE       0     0     0
    A12-amds02j1-004  ONLINE       0     0     0
  mirror-2            ONLINE       0     0     0
    A12-amds02j1-005  ONLINE       0     0     0
    A12-amds02j1-006  ONLINE       0     0     0

errors: No known data errors

@agb32
Copy link

agb32 commented Mar 20, 2020

We also see it on Lustre 2.12.2 servers, Centos 7.6, 3.10.0-957.10.1.el7_lustre.x86_64, zfs 0.7.13, and 2.12.3 servers.

@cjm14
Copy link

cjm14 commented Jun 1, 2020

We are still seeing this problem 1-2 times a week, the dnodesize on the ZFS pool which is panics is "auto" - I'll try setting it to legacy.

@adilger
Copy link
Contributor

adilger commented Sep 18, 2020

For those of you hitting this problem regularly, does setting dnodesize=1024 or dnodesize=legacy instead of dnodesize=auto avoid this problem?

@rfehren
Copy link

rfehren commented Oct 17, 2020

We also ran into this bug on an MDS running 4.19.150, zfs 0.7.13 and Lustre 2.12.5.
If you look at the ZFS code, it turns out dnodesize=auto is equivalent to dnodesize=1024, so only dnodesize=legacy (meaning 512 bytes) could make a difference. The worrying thing is that, as you (Andreas) pointed out a while ago (https://www.opensfs.org/wp-content/uploads/2018/04/Dilger-LUG2018-Lustre_ZFS_Update_Intel.pdf), going to dnodesize=legacy will lead to a significant drop in Lustre seek performance and I/O efficiency. Any experiences with this?

@stale
Copy link

stale bot commented Oct 17, 2021

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Oct 17, 2021
@eilerjc
Copy link

eilerjc commented Nov 3, 2021

We just hit this yesterday with a stack dump like cjm14's Feb 6th stackdump going through __mutex_unlock_slowpath.
Centos 7.6, 3.10.0-957.10.1.el7_lustre.x86_64, zfs 0.7.13, Lustre 2.12.2. Posting to fight off the stale bot ;-)

@stale stale bot removed the Status: Stale No recent activity for issue label Nov 3, 2021
@agb32
Copy link

agb32 commented Oct 31, 2022

We've just seen this on 2.12.6, CentOS 7.9, ZFS 0.7.13. Just in time for the stale bot!

@behlendorf
Copy link
Contributor

After moving our Lustre 2.12 servers from zfs-0.7.13 to zfs-2.1.x we're no longer able to trigger this issue. Unfortunately, it's not entirely clear from the skimming commit logs exactly what commit resolved this bug. If you're able to update your zfs version to the latest 2.1.6 release I'd recommend it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

7 participants