-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process 'zfs_unlinked_drain' asynchronously on mount #3814
Comments
We did something quick and dirty |
I see this problem on my system; The system not only does appear to be hung, it is effectively hung. I can RO-import the pool in question without problems. |
How do you know it is "this problem"? Ordinarily, the unlinked drain list should not be very long. |
The kernel dumps a trace (on the console) that contains "zfs_unlinked_drain" (using 0.6.5 and 0.6.4) and the last thing I did before a crash of the system was to delete a lot of files (many hundred thousands). This and the fact that I can import with |
If it is indeed this issue, the solution is to wait until it's done. |
Agreed. One problem though: I have absolutely no indication that it's actually making progress, as the system appears to be hung. I can only switch consoles, maybe get |
Are you able to build from source? lundman's patch does work, though we removed it from openzfsonosx master after it was no longer needed to help with unlinked drain lists that we hadn't been draining at all because the patch would occasionally deadlock. Even though the full patch was removed, there are still currently status messages in openzfsonosx master when the list is long, so you could just add those if all you need is to be able to monitor the progress: |
Thanks! I'll patch it in later/tomorrow! |
So, I patched zfs-0.6.5.3 with this: neingeist@2edff94
It stops there. It's definitely hung up in unlinked drain on the dataset with the many thousand deletes. Here's a call trace:
With my limited understanding of this, I think it's not the size of the list but maybe a deadlock? |
@neingeist |
I've run into something similar, but caused by the ZAP structure growing to enormous sizes and not shrinking when done. A 391 megabytes ZAP with 0 entries in it delays boot by around 10 minutes. I might try porting this patch. |
A few issues came up looking at the OS X patch as is.
|
Once the unlinked_drain list has been very large (we've had users in multiple of millions of entries) the zap object is still slow to process each mount, even if it is empty (now). When I talked to ahrens, it might need some code to destroy the object (assuming unlinked-drain is empty, and there were no errors) and recreate it, so it doesn't need to traverse all the disk buffers, only to find there is nothing there. We have not yet had time to look at that. The async drain did help people being able to use their FS immediately, instead of having to wait. But we removed it in that it treated the symptoms, not the cause. |
A potentially nice way to tackle this issue would be update the ZAP code so it's smart about reverting back to a microzap when a fatzap is no longer necessary. This is something we've wanted to do for quite some time since it has the advantage of improving other common use cases. For example, when a large number of files are added to a directory and then removed readdir performance in in that directory will continue suffer. I think it would be great if someone had time to tackle this optimization. |
It processes the ZAP asynchronously in the background, hence not blocking the mounting of the filesystem. Fundamentally what was asked for has been provided. ...Which is the most important bit of what we want, but there's a related pathological case where the thread may spend a lot of time and disk IO accomplishing nothing, which is what behlendorf's most recent comment and my complaint above are about. |
Historically ZFS has always processed the unlinked set synchronously during mount. When the unlinked set is very large this can lead to long delays mounting the filesystem and it may appear as if the system is hung. It would be desirable to handle this work asynchronously after the mount completes. There's no technical reason this can't be done and it would significantly improve availability in certain situations.
The text was updated successfully, but these errors were encountered: