Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All ZFS IO Hangs #9218

Closed
cisco-abrandel opened this issue Aug 26, 2019 · 2 comments
Closed

All ZFS IO Hangs #9218

cisco-abrandel opened this issue Aug 26, 2019 · 2 comments
Labels
Status: Stale No recent activity for issue

Comments

@cisco-abrandel
Copy link

cisco-abrandel commented Aug 26, 2019

System information

Type Version/Name
Distribution Name CentOS
Distribution Version 7.6
Linux Kernel 3.10.0-957.21.3.el7.x86_64
Architecture x86_64
ZFS Version 0.7.13-1
SPL Version 0.7.13-1

Describe the problem you're observing

All IO stops to pool, any commands such as zfs list hang forever. It's clear there is no problem on the SAS backplane as I can send various SCSI commands using sg_persist.

Describe how to reproduce the problem

Unclear, system was under heavy load at the time. System is used for mostly 4K IO from VMware via iSCSI.

Like other reports, this system was under a heavy random write workload. The system was under this heavy 4K write workload (250k iops) for about 5 hours before the problem surfaced, but it doesn't seem that easy to reproduce, although this is not the first time I have seen this "deadlock".

Running 56 SSDs in RAIDZ2 configuration.

Include any warning/errors/backtraces from the system logs

Aug 26 22:18:57 clg5-lab-san2-1 kernel: INFO: task zvol:23141 blocked for more than 120 seconds.
Aug 26 22:18:57 clg5-lab-san2-1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 26 22:18:57 clg5-lab-san2-1 kernel: zvol D ffff99bd67071040 0 23141 2 0x00000000
Aug 26 22:18:57 clg5-lab-san2-1 kernel: Call Trace:
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] schedule+0x29/0x70
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] rwsem_down_read_failed+0x10d/0x1a0
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ? mutex_lock+0x12/0x2f
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] call_rwsem_down_read_failed+0x18/0x30
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ? spl_kmem_zalloc+0xd8/0x180 [spl]
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] down_read+0x20/0x40
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] dbuf_dirty+0x307/0x820 [zfs]
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] dmu_buf_will_fill+0x22/0x30 [zfs]
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] dmu_write_uio_dnode+0xe6/0x150 [zfs]
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] zvol_write+0x17c/0x5a0 [zfs]
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ? __schedule+0x42a/0x860
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] taskq_thread+0x2ac/0x4f0 [spl]
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ? wake_up_state+0x20/0x20
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ? taskq_thread_spawn+0x60/0x60 [spl]
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] kthread+0xd1/0xe0
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ? insert_kthread_work+0x40/0x40
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ret_from_fork_nospec_begin+0x7/0x21
Aug 26 22:18:57 clg5-lab-san2-1 kernel: [] ? insert_kthread_work+0x40/0x40

@GregorKopka
Copy link
Contributor

Possibly a variant of #9172 (heavy load resulting in hanging zvols)?

@stale
Copy link

stale bot commented Aug 26, 2020

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Aug 26, 2020
@stale stale bot closed this as completed Nov 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale No recent activity for issue
Projects
None yet
Development

No branches or pull requests

2 participants