Process 'zfs_unlinked_drain' asynchronously on mount #3814

behlendorf · 2015-09-22T00:42:57Z

Historically ZFS has always processed the unlinked set synchronously during mount. When the unlinked set is very large this can lead to long delays mounting the filesystem and it may appear as if the system is hung. It would be desirable to handle this work asynchronously after the mount completes. There's no technical reason this can't be done and it would significantly improve availability in certain situations.

lundman · 2015-09-22T00:50:05Z

We did something quick and dirty

openzfsonosx/zfs@11dd716

neingeist · 2015-11-12T14:53:51Z

I see this problem on my system; The system not only does appear to be hung, it is effectively hung. I can RO-import the pool in question without problems.

ilovezfs · 2015-11-12T15:03:01Z

How do you know it is "this problem"? Ordinarily, the unlinked drain list should not be very long.

neingeist · 2015-11-12T15:17:28Z

The kernel dumps a trace (on the console) that contains "zfs_unlinked_drain" (using 0.6.5 and 0.6.4) and the last thing I did before a crash of the system was to delete a lot of files (many hundred thousands). This and the fact that I can import with -o readonly=on makes me believe that it is this issue.

ilovezfs · 2015-11-12T15:18:42Z

If it is indeed this issue, the solution is to wait until it's done.

neingeist · 2015-11-12T15:21:21Z

Agreed. One problem though: I have absolutely no indication that it's actually making progress, as the system appears to be hung. I can only switch consoles, maybe get dmesg output on one of the previously running shells (this appears to be better working with 0.6.4).

ilovezfs · 2015-11-12T15:36:39Z

Are you able to build from source? lundman's patch does work, though we removed it from openzfsonosx master after it was no longer needed to help with unlinked drain lists that we hadn't been draining at all because the patch would occasionally deadlock. Even though the full patch was removed, there are still currently status messages in openzfsonosx master when the list is long, so you could just add those if all you need is to be able to monitor the progress:
https://github.com/openzfsonosx/zfs/blob/master/module/zfs/zfs_dir.c#L537-L546

neingeist · 2015-11-12T15:43:57Z

Thanks! I'll patch it in later/tomorrow!

neingeist · 2015-11-13T14:18:06Z

So, I patched zfs-0.6.5.3 with this: neingeist@2edff94

[ 1283.738635] ZFS: unlinked drain started.
[ 1283.963863] ZFS: unlinked drain completed (0).
[ 1284.125962] ZFS: unlinked drain started.

It stops there.

It's definitely hung up in unlinked drain on the dataset with the many thousand deletes. Here's a call trace:

[ 1289.332037] Call Trace:
[ 1289.332037]  <NMI>  [<ffffffff81046cbd>] ? warn_slowpath_common+0x78/0x8c
[ 1289.332037]  [<ffffffff81046d69>] ? warn_slowpath_fmt+0x45/0x4a
[ 1289.332037]  [<ffffffff810139dc>] ? sched_clock+0x5/0x8
[ 1289.332037]  [<ffffffff810906a4>] ? watchdog_overflow_callback+0x93/0x9e
[ 1289.332037]  [<ffffffff81090611>] ? touch_nmi_watchdog+0x60/0x60
[ 1289.332037]  [<ffffffff810b05ca>] ? __perf_event_overflow+0xf9/0x17b
[ 1289.332037]  [<ffffffff81063fbe>] ? local_clock+0x24/0x2b
[ 1289.332037]  [<ffffffff810aea04>] ? perf_event_update_userpage+0x9/0x9d
[ 1289.332037]  [<ffffffff81019ea7>] ? x86_perf_event_set_period+0x101/0x10b
[ 1289.332037]  [<ffffffff8101a293>] ? x86_pmu_handle_irq+0xb7/0xec
[ 1289.332037]  [<ffffffff8135015c>] ? nmi_handle.isra.0+0x41/0x64
[ 1289.332037]  [<ffffffff813501fc>] ? do_nmi+0x7d/0x258
[ 1289.332037]  [<ffffffff8134fa70>] ? nmi+0x20/0x30
[ 1289.332037]  [<ffffffff81070ff2>] ? do_raw_spin_lock+0x11/0x1b
[ 1289.332037]  <<EOE>>  [<ffffffff8134f0f5>] ? rwsem_down_failed_common+0x41/0x114
[ 1289.332037]  [<ffffffffa06a7529>] ? dmu_buf_hold_noread+0x96/0xbd [zfs]
[ 1289.332037]  [<ffffffff811b3f24>] ? call_rwsem_down_read_failed+0x14/0x30
[ 1289.332037]  [<ffffffff8134eb1a>] ? down_read+0x17/0x19
[ 1289.332037]  [<ffffffffa06f57f5>] ? zap_get_leaf_byblk+0x15b/0x1c7 [zfs]
[ 1289.332037]  [<ffffffffa06f5a70>] ? zap_deref_leaf+0x65/0x6e [zfs]
[ 1289.332037]  [<ffffffffa06f7563>] ? fzap_cursor_retrieve+0x8c/0x205 [zfs]
[ 1289.332037]  [<ffffffffa06f9ff6>] ? zap_cursor_retrieve+0xc1/0x1e4 [zfs]
[ 1289.332037]  [<ffffffffa06d6889>] ? sa_lookup+0x75/0x87 [zfs]
[ 1289.332037]  [<ffffffffa070214b>] ? zfs_purgedir+0x192/0x1dc [zfs]
[ 1289.332037]  [<ffffffffa0719c84>] ? zfs_inode_update+0x114/0x17a [zfs]
[ 1289.332037]  [<ffffffff8105f9ba>] ? bit_waitqueue+0x17/0x75
[ 1289.332037]  [<ffffffff8105fa52>] ? wake_up_bit+0xd/0x1e
[ 1289.332037]  [<ffffffff8110c92e>] ? unlock_new_inode+0x43/0x4c
[ 1289.332037]  [<ffffffffa071a17b>] ? zfs_znode_alloc+0x491/0x4bb [zfs]
[ 1289.332037]  [<ffffffff81036628>] ? should_resched+0x5/0x23
[ 1289.332037]  [<ffffffff8134deec>] ? _cond_resched+0x7/0x1c
[ 1289.332037]  [<ffffffff81036628>] ? should_resched+0x5/0x23
[ 1289.332037]  [<ffffffffa07021e1>] ? zfs_rmnode+0x4c/0x2b8 [zfs]
[ 1289.332037]  [<ffffffffa071c932>] ? zfs_zinactive+0xb3/0x139 [zfs]
[ 1289.332037]  [<ffffffffa07129b8>] ? zfs_inactive+0x139/0x18a [zfs]
[ 1289.332037]  [<ffffffffa0729aa7>] ? zpl_evict_inode+0x3e/0x4c [zfs]
[ 1289.332037]  [<ffffffff8110d068>] ? evict+0x96/0x148
[ 1289.332037]  [<ffffffffa0701779>] ? zfs_unlinked_drain+0x9d/0x11d [zfs]
[ 1289.332037]  [<ffffffff81070fc1>] ? arch_local_irq_save+0x11/0x17
[ 1289.332037]  [<ffffffff81070fc1>] ? arch_local_irq_save+0x11/0x17
[ 1289.332037]  [<ffffffff8134f209>] ? _raw_spin_lock_irqsave+0x9/0x25
[ 1289.332037]  [<ffffffff8105fd09>] ? finish_wait+0x35/0x60
[ 1289.332037]  [<ffffffff8134f247>] ? _raw_spin_unlock_irqrestore+0xe/0xf
[ 1289.332037]  [<ffffffffa05fd0af>] ? taskq_create+0x277/0x299 [spl]
[ 1289.332037]  [<ffffffff8105fc83>] ? add_wait_queue+0x3c/0x3c
[ 1289.332037]  [<ffffffffa0718935>] ? zfs_get_done+0x5a/0x5a [zfs]
[ 1289.332037]  [<ffffffffa0711355>] ? zfs_sb_setup+0xdc/0x150 [zfs]
[ 1289.332037]  [<ffffffffa0712111>] ? zfs_domount+0x245/0x2da [zfs]
[ 1289.332037]  [<ffffffff8134eae5>] ? down_write+0x9/0x27
[ 1289.332037]  [<ffffffff810c143b>] ? register_shrinker+0x18/0x46
[ 1289.332037]  [<ffffffffa072991d>] ? spl_fstrans_unmark.isra.0+0x14/0x14 [zfs]
[ 1289.332037]  [<ffffffffa072995a>] ? zpl_fill_super+0x3d/0x52 [zfs]
[ 1289.332037]  [<ffffffff810fc731>] ? mount_nodev+0x42/0x78
[ 1289.332037]  [<ffffffff810fd117>] ? mount_fs+0x61/0x146
[ 1289.332037]  [<ffffffff811101a6>] ? vfs_kern_mount+0x5f/0x99
[ 1289.332037]  [<ffffffff81110590>] ? do_kern_mount+0x49/0xd8
[ 1289.332037]  [<ffffffff81111c0b>] ? do_mount+0x660/0x6c6
[ 1289.332037]  [<ffffffff81111f19>] ? sys_mount+0x88/0xc3
[ 1289.332037]  [<ffffffff81354212>] ? system_call_fastpath+0x16/0x1b

With my limited understanding of this, I think it's not the size of the list but maybe a deadlock?

tuxoko · 2015-11-13T23:28:54Z

@neingeist
#4006
Also from zfs_unlinked_drain.
The problem seems to be some kind of memory corruption.

DeHackEd · 2017-03-23T17:51:27Z

I've run into something similar, but caused by the ZAP structure growing to enormous sizes and not shrinking when done. A 391 megabytes ZAP with 0 entries in it delays boot by around 10 minutes.

I might try porting this patch.

DeHackEd · 2017-03-25T20:28:12Z

A few issues came up looking at the OS X patch as is.

I'm not sure what would happen if there were an attempt to unmount the filesystem while the thread is still running since there's no synchronization in its execution.
It doesn't help my issue (Slow mount times due to enormous ZFS delete queue object #5916) of the queue ZAP itself becoming too large.

lundman · 2017-03-25T22:07:29Z

Once the unlinked_drain list has been very large (we've had users in multiple of millions of entries) the zap object is still slow to process each mount, even if it is empty (now). When I talked to ahrens, it might need some code to destroy the object (assuming unlinked-drain is empty, and there were no errors) and recreate it, so it doesn't need to traverse all the disk buffers, only to find there is nothing there. We have not yet had time to look at that.

The async drain did help people being able to use their FS immediately, instead of having to wait. But we removed it in that it treated the symptoms, not the cause.

behlendorf · 2017-03-27T21:17:35Z

A potentially nice way to tackle this issue would be update the ZAP code so it's smart about reverting back to a microzap when a fatzap is no longer necessary. This is something we've wanted to do for quite some time since it has the advantage of improving other common use cases. For example, when a large number of files are added to a directory and then removed readdir performance in in that directory will continue suffer. I think it would be great if someone had time to tackle this optimization.

DeHackEd · 2019-02-16T02:53:17Z

It processes the ZAP asynchronously in the background, hence not blocking the mounting of the filesystem. Fundamentally what was asked for has been provided.

...Which is the most important bit of what we want, but there's a related pathological case where the thread may spend a lot of time and disk IO accomplishing nothing, which is what behlendorf's most recent comment and my complaint above are about.

behlendorf added the Type: Feature Feature request or new feature label Sep 22, 2015

behlendorf added this to the 0.7.0 milestone Sep 22, 2015

behlendorf mentioned this issue Sep 22, 2015

task txg_sync blocked for more than 120 seconds #2934

Closed

neingeist mentioned this issue Nov 14, 2015

Nasty deadlock on zpool import #4006

Closed

behlendorf mentioned this issue Mar 25, 2016

mount.zfs hangs in D state for a long time before it succeeds #2516

Closed

behlendorf modified the milestones: 0.8.0, 0.7.0 Mar 25, 2016

loli10K mentioned this issue Mar 23, 2017

Slow mount times due to enormous ZFS delete queue object #5916

Closed

behlendorf modified the milestones: 0.8.0, 0.9.0 Feb 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Process 'zfs_unlinked_drain' asynchronously on mount #3814

Process 'zfs_unlinked_drain' asynchronously on mount #3814

behlendorf commented Sep 22, 2015

lundman commented Sep 22, 2015

neingeist commented Nov 12, 2015

ilovezfs commented Nov 12, 2015

neingeist commented Nov 12, 2015

ilovezfs commented Nov 12, 2015

neingeist commented Nov 12, 2015

ilovezfs commented Nov 12, 2015

neingeist commented Nov 12, 2015

neingeist commented Nov 13, 2015

tuxoko commented Nov 13, 2015

DeHackEd commented Mar 23, 2017

DeHackEd commented Mar 25, 2017

lundman commented Mar 25, 2017

behlendorf commented Mar 27, 2017

DeHackEd commented Feb 16, 2019

Process 'zfs_unlinked_drain' asynchronously on mount #3814

Process 'zfs_unlinked_drain' asynchronously on mount #3814

Comments

behlendorf commented Sep 22, 2015

lundman commented Sep 22, 2015

neingeist commented Nov 12, 2015

ilovezfs commented Nov 12, 2015

neingeist commented Nov 12, 2015

ilovezfs commented Nov 12, 2015

neingeist commented Nov 12, 2015

ilovezfs commented Nov 12, 2015

neingeist commented Nov 12, 2015

neingeist commented Nov 13, 2015

tuxoko commented Nov 13, 2015

DeHackEd commented Mar 23, 2017

DeHackEd commented Mar 25, 2017

lundman commented Mar 25, 2017

behlendorf commented Mar 27, 2017

DeHackEd commented Feb 16, 2019