New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock or missed wakeup with heavy I/O, file deletion, and snapshot creation on Debian Linux/ppc64le #11527
Comments
I took an initial look at this, and it looks like the quiesce thread is actually in txg_quiesce (which is inlined in the particular module being run), and is currently in the "Quiesce the transaction group by waiting for everyone to txg_exit()" loop. This would suggest that some tx_commit somewhere was missed, or that there is a bug with the tc_count accounting. One thing that would help in verifying this assumption is the contents of |
After spotting the stack for pid 1437, the culprit has been revealed.
We're calling iput from iput_async because we think the i_count is greater than 0, but if we race with another iput call, we end up in the evict logic, which can easily deadlock. The solution here is to do the decrement ourselves, unless it would make the count 0; in that case, we dispatch to a taskq and let them do the eviction. I will file a PR once I've tested the code. |
Closing in favor of #11530; will reopen if it crops up in a different guise. Thanks again, @pcd1193182. |
There is a race condition in zfs_zrele_async when we are checking if we would be the one to evict an inode. This can lead to a txg sync deadlock. Instead of calling into iput directly, we attempt to perform the atomic decrement ourselves, unless that would set the i_count value to zero. In that case, we dispatch a call to iput to run later, to prevent a deadlock from occurring. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #11527 Closes #11530
There is a race condition in zfs_zrele_async when we are checking if we would be the one to evict an inode. This can lead to a txg sync deadlock. Instead of calling into iput directly, we attempt to perform the atomic decrement ourselves, unless that would set the i_count value to zero. In that case, we dispatch a call to iput to run later, to prevent a deadlock from occurring. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #11527 Closes #11530
I think I have correctly applied the fix from 43eaef6 to my local The pool's
Here's the kstack of every thread in the system parked in ZFS: zfs-hang.txt
Note that there is, indeed, a snapshot destroy operation in flight, though that may, as @pcd1193182 has suggested, just be increasing the frequency of whatever race is going on rather than actually properly causal. |
Nope, my fault. Had rebuilt the module and rebooted but had failed to update the initramfs. Sorry for the noise! |
There is a race condition in zfs_zrele_async when we are checking if we would be the one to evict an inode. This can lead to a txg sync deadlock. Instead of calling into iput directly, we attempt to perform the atomic decrement ourselves, unless that would set the i_count value to zero. In that case, we dispatch a call to iput to run later, to prevent a deadlock from occurring. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes openzfs#11527 Closes openzfs#11530
There is a race condition in zfs_zrele_async when we are checking if we would be the one to evict an inode. This can lead to a txg sync deadlock. Instead of calling into iput directly, we attempt to perform the atomic decrement ourselves, unless that would set the i_count value to zero. In that case, we dispatch a call to iput to run later, to prevent a deadlock from occurring. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes openzfs#11527 Closes openzfs#11530
System information
There are two zpools in this machine: one is a single NVMe device and holds the root filesystem while the other is five spinning rust drives and a SSD l2arc. The spinning rust drives are the pool of concern and are fully given over to
dmcrypt
and thence to ZFS; there is nodmcrypt
intermediation of thel2arc
.Describe the problem you're observing
When doing a large,
ccache
-enabled build which generates lots of I/O to the spinning rust zpool,zfs-auto-snapshot
'szfs snapshot
commands, also targeting that pool, appear to be able to bring the system to standstill. It seems thattxg_quiesce
goes to sleep and stays there (inD
state); on what, I cannot say.txg_sync
is asleep but inS
state.Please find attached a list of most threads involved in zfs or spl at the time of deadlock; for each, I have captured
/proc/$PID/status
,/proc/$PID/cmdline | tr '\000' ' '
and/proc/$PID/stack
. I have removed all threads that appeared to be unrelated or merely idle task queues awaiting work. If there's additional information I could provide, please do not hesitate to ask; if I do not have it at present I shall attempt to capture it next time the issue rears its ugly head.Describe how to reproduce the problem
I'm unsure how to have anyone else reproduce the problem, but the above recipe happens often enough on this machine that I have had this occur somewhere in the vicinity of once per month on average, though we're at twice this week, so that's exciting! I've told
zfs-auto-snapshot
to not take hourly snapshots, in hopes that whatever's being tickled is tickled less often.FWIW, this happened as well with
0.7.5
before I moved up to the0.8
series.The text was updated successfully, but these errors were encountered: