All ZFS IO Hangs #9218

cisco-abrandel · 2019-08-26T23:09:03Z

System information

Type	Version/Name
Distribution Name	CentOS
Distribution Version	7.6
Linux Kernel	3.10.0-957.21.3.el7.x86_64
Architecture	x86_64
ZFS Version	0.7.13-1
SPL Version	0.7.13-1

Describe the problem you're observing

All IO stops to pool, any commands such as zfs list hang forever. It's clear there is no problem on the SAS backplane as I can send various SCSI commands using sg_persist.

Describe how to reproduce the problem

Unclear, system was under heavy load at the time. System is used for mostly 4K IO from VMware via iSCSI.

Like other reports, this system was under a heavy random write workload. The system was under this heavy 4K write workload (250k iops) for about 5 hours before the problem surfaced, but it doesn't seem that easy to reproduce, although this is not the first time I have seen this "deadlock".

Running 56 SSDs in RAIDZ2 configuration.

Include any warning/errors/backtraces from the system logs

Aug 26 22:18:57 clg5-lab-san2-1 kernel: INFO: task zvol:23141 blocked for more than 120 seconds.
Aug 26 22:18:57 clg5-lab-san2-1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 26 22:18:57 clg5-lab-san2-1 kernel: zvol D ffff99bd67071040 0 23141 2 0x00000000
Aug 26 22:18:57 clg5-lab-san2-1 kernel: Call Trace:
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] schedule+0x29/0x70
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] rwsem_down_read_failed+0x10d/0x1a0
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ? mutex_lock+0x12/0x2f
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] call_rwsem_down_read_failed+0x18/0x30
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ? spl_kmem_zalloc+0xd8/0x180 [spl]
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] down_read+0x20/0x40
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] dbuf_dirty+0x307/0x820 [zfs]
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] dmu_buf_will_fill+0x22/0x30 [zfs]
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] dmu_write_uio_dnode+0xe6/0x150 [zfs]
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] zvol_write+0x17c/0x5a0 [zfs]
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ? __schedule+0x42a/0x860
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] taskq_thread+0x2ac/0x4f0 [spl]
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ? wake_up_state+0x20/0x20
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ? taskq_thread_spawn+0x60/0x60 [spl]
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] kthread+0xd1/0xe0
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ? insert_kthread_work+0x40/0x40
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ret_from_fork_nospec_begin+0x7/0x21
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ? insert_kthread_work+0x40/0x40

GregorKopka · 2019-08-27T16:37:30Z

Possibly a variant of #9172 (heavy load resulting in hanging zvols)?

stale · 2020-08-26T16:42:34Z

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

stale bot added the Status: Stale No recent activity for issue label Aug 26, 2020

stale bot closed this as completed Nov 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All ZFS IO Hangs #9218

All ZFS IO Hangs #9218

cisco-abrandel commented Aug 26, 2019 •

edited

GregorKopka commented Aug 27, 2019

stale bot commented Aug 26, 2020

All ZFS IO Hangs #9218

All ZFS IO Hangs #9218

Comments

cisco-abrandel commented Aug 26, 2019 • edited

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

GregorKopka commented Aug 27, 2019

stale bot commented Aug 26, 2020

cisco-abrandel commented Aug 26, 2019 •

edited