You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All IO stops to pool, any commands such as zfs list hang forever. It's clear there is no problem on the SAS backplane as I can send various SCSI commands using sg_persist.
Describe how to reproduce the problem
Unclear, system was under heavy load at the time. System is used for mostly 4K IO from VMware via iSCSI.
Like other reports, this system was under a heavy random write workload. The system was under this heavy 4K write workload (250k iops) for about 5 hours before the problem surfaced, but it doesn't seem that easy to reproduce, although this is not the first time I have seen this "deadlock".
Running 56 SSDs in RAIDZ2 configuration.
Include any warning/errors/backtraces from the system logs
Aug 26 22:18:57 clg5-lab-san2-1 kernel: INFO: task zvol:23141 blocked for more than 120 seconds.
Aug 26 22:18:57 clg5-lab-san2-1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 26 22:18:57 clg5-lab-san2-1 kernel: zvol D ffff99bd67071040 0 23141 2 0x00000000
Aug 26 22:18:57 clg5-lab-san2-1 kernel: Call Trace:
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] schedule+0x29/0x70
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] rwsem_down_read_failed+0x10d/0x1a0
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ? mutex_lock+0x12/0x2f
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] call_rwsem_down_read_failed+0x18/0x30
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ? spl_kmem_zalloc+0xd8/0x180 [spl]
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] down_read+0x20/0x40
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] dbuf_dirty+0x307/0x820 [zfs]
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] dmu_buf_will_fill+0x22/0x30 [zfs]
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] dmu_write_uio_dnode+0xe6/0x150 [zfs]
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] zvol_write+0x17c/0x5a0 [zfs]
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ? __schedule+0x42a/0x860
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] taskq_thread+0x2ac/0x4f0 [spl]
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ? wake_up_state+0x20/0x20
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ? taskq_thread_spawn+0x60/0x60 [spl]
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] kthread+0xd1/0xe0
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ? insert_kthread_work+0x40/0x40
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ret_from_fork_nospec_begin+0x7/0x21
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ? insert_kthread_work+0x40/0x40
The text was updated successfully, but these errors were encountered:
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
System information
Describe the problem you're observing
All IO stops to pool, any commands such as zfs list hang forever. It's clear there is no problem on the SAS backplane as I can send various SCSI commands using sg_persist.
Describe how to reproduce the problem
Unclear, system was under heavy load at the time. System is used for mostly 4K IO from VMware via iSCSI.
Like other reports, this system was under a heavy random write workload. The system was under this heavy 4K write workload (250k iops) for about 5 hours before the problem surfaced, but it doesn't seem that easy to reproduce, although this is not the first time I have seen this "deadlock".
Running 56 SSDs in RAIDZ2 configuration.
Include any warning/errors/backtraces from the system logs
Aug 26 22:18:57 clg5-lab-san2-1 kernel: INFO: task zvol:23141 blocked for more than 120 seconds.
Aug 26 22:18:57 clg5-lab-san2-1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 26 22:18:57 clg5-lab-san2-1 kernel: zvol D ffff99bd67071040 0 23141 2 0x00000000
Aug 26 22:18:57 clg5-lab-san2-1 kernel: Call Trace:
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] schedule+0x29/0x70
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] rwsem_down_read_failed+0x10d/0x1a0
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ? mutex_lock+0x12/0x2f
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] call_rwsem_down_read_failed+0x18/0x30
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ? spl_kmem_zalloc+0xd8/0x180 [spl]
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] down_read+0x20/0x40
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] dbuf_dirty+0x307/0x820 [zfs]
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] dmu_buf_will_fill+0x22/0x30 [zfs]
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] dmu_write_uio_dnode+0xe6/0x150 [zfs]
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] zvol_write+0x17c/0x5a0 [zfs]
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ? __schedule+0x42a/0x860
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] taskq_thread+0x2ac/0x4f0 [spl]
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ? wake_up_state+0x20/0x20
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ? taskq_thread_spawn+0x60/0x60 [spl]
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] kthread+0xd1/0xe0
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ? insert_kthread_work+0x40/0x40
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ret_from_fork_nospec_begin+0x7/0x21
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ? insert_kthread_work+0x40/0x40
The text was updated successfully, but these errors were encountered: