-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZFS infinite retry after vdev IO error #13362
Comments
You may find the zpool property Not to say it can't break in other ways, and it might not help you in particular, but there is a setting explicitly for not waiting forever hoping it gets better. |
@rincebrain Thanks for the advice!
This one seems still put exist write requests in the block state? Is there any particular reason to not discard all write requests in a unrecoverable failure to avoid the hanging? In this case the kernel cannot even reboot properly, because it is waiting for the blocked IO to finish. |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
not stale |
I'm actually getting these errors quite often - pretty sure there's a timing issue in the software I'm using - but basically what happens is: zfs on multipathd on iscsi somehow multipathd can pull the device out from under zfs, before the pool is successfully exported (multipathd has the queue_if_no_path option set), the multipath device is gone, and after that all zfs / zpool etc commands hang, and that in every version I have since tested (2.0.4 +) - only solution so far was the forced reset, which is quite annoying Interestingly other pools continue to work fine, but it hangs when you want to disconnect them, or add new ones. /proc/spl/kstat/zfs/ gives me a suspended state for the affected pool. txgs shows the following:
Edit: I do not know where the kernel module hangs, but all user applications hang at ioctl to the /dev/zfs device. |
System information
Describe the problem you're observing
ZFS infinite retry after vdev IO error, causes all operations / processes on the damaged pool to be in the D state, include zpool.
As a result, it is impossible to stop the related process, unmount, export or stop the pool.
Affected systems have no other option but to be hard rebooted.
Describe how to reproduce the problem
Not sure how to simulate IO errors, but the pool is running ZFS on LUKS.
Include any warning/errors/backtraces from the system logs
Millions of lines of
I expected that ZFS should give up after some attempts and return an IO error to the caller to resolve the IO deadlock. But it retries infinitely causing all operations to freeze.
The text was updated successfully, but these errors were encountered: