Directory copy stalls #69

paragw · 2010-10-13T22:09:44Z

Test Setup

Host Machine : 4xXeon, 10Gb RAM 4 Disks out of which 2 are using software RAID1 (Standard MD) running Ubuntu Maverick x86_64 (2.6.35.xx) and SPL+ZFS built from Git.

Virtual Machine : KVM/Qemu - 1Gb RAM 3 Virtio based disk images - one for OS install (Ubuntu Maverick Server x86_64) and 2 others vdb and vdc

ZFS standard striped pool named rpool (no RAID options specified) - 39Gb allocated from vdb and vdc. compress=on dedup=on.

2 Volumes created - rpool/one 20G and other rpool/two 10g and ext3 file systems created on both.

Trying to copy a 650Mb directory (OS install ISO contents) to any one of the ext3 file systems on the rpool/one (rpool/two for that matter) never seems to finish. Various ZFS related process and kdmflush/kswapd0 et.al continue to take lots of CPU and the copy never seems to finish - too slow disk throughput and not enough CPU for cp are couple things I suspect may be the cause.

Also while the copy is in progress no other sync operation can complete - running dpkg to install a package for example results in kernel hung task detector triggering several times showing dpkg stuck in sync call doing wait_for_completion.

Is this sort of VM based setup not good enough for ZFS or am I doing something wrong in creating the pools or using features that aren't fully tested yet (dedup comes to mind)? I remember trying something very similar on a real machine with similar results though.

Would be glad to provide any additional information or do further testing.

poige · 2011-02-04T17:54:24Z

Similar symptoms:

top - 17:41:31 up 44 min,  8 users,  load average: 13.03, 13.06, 11.24
Tasks: 257 total,   1 running, 256 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.1%us,  0.1%sy,  0.0%ni, 65.9%id, 33.8%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:   1464756k total,   963500k used,   501256k free,    71140k buffers
Swap:  1048572k total,        0k used,  1048572k free,   330304k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1590 poige     20   0 10912 1232  812 R    0  0.1   0:02.52 top
   20 root      20   0     0    0    0 D    0  0.0   0:00.04 sync_supers
  735 root      39  19     0    0    0 D    0  0.0   0:00.72 zvol/2
 1341 root      39  19     0    0    0 D    0  0.0   0:00.03 zio_write_intr/
 1342 root      39  19     0    0    0 D    0  0.0   0:00.01 zio_write_intr/
 1343 root      39  19     0    0    0 D    0  0.0   0:00.02 zio_write_intr/
 1344 root      39  19     0    0    0 D    0  0.0   0:00.02 zio_write_intr/
 1345 root      39  19     0    0    0 D    0  0.0   0:00.01 zio_write_intr/
 1346 root      39  19     0    0    0 D    0  0.0   0:00.01 zio_write_intr/
 1347 root      39  19     0    0    0 D    0  0.0   0:00.02 zio_write_intr/
 1348 root      39  19     0    0    0 D    0  0.0   0:00.03 zio_write_intr/
 1489 root       0 -20     0    0    0 D    0  0.0   0:00.00 txg_quiesce
 1490 root       0 -20     0    0    0 D    0  0.0   0:00.21 txg_sync
 1551 root      20   0 14124  628  516 D    0  0.0   0:03.58 cp

poige · 2011-02-09T13:31:30Z

Yeah, also I can add it happens only with SMP. It works Ok when booted in uniprocessor mode.

ChrisAzfs · 2011-02-19T23:46:46Z

It appears the culprit is dedup. Copying a 466 MB directory from ext4 to ZFS took about 40 seconds with dedup off, and 2 minutes with dedup on. Enabling compression makes it faster (fewer blocks to checksum?). Sometimes compression=on/dedup=on was as fast as dedup=off; I don't know why. Copying a 3 GB directory with many files made the machine unusable.

For comparison, copying the 466 MB directory took about 20 seconds with zfs-fuse, so there is room for improvement.

Using 0.6.0-rc1 on Fedora 14 x86_64 (SMP).

poige · 2011-02-20T08:46:40Z

ChrisAzfs, seems you're missing the point. In mine and paragw' case copying never gets completed. It stalls until rebooting.

behlendorf · 2011-02-20T19:57:09Z

Poige can you try again using the 0.6.0-rc1 release. There are a couple changes in the newer code which will probably help, plus if won't be restricted to using a zvol anymore. You can use a normal ZFS filesystem now for your testing if you prefer.

poige · 2011-02-20T20:01:35Z

behlendorf, would 2.6.32.28 fit for that?

behlendorf · 2011-02-20T20:21:48Z

You can use the 2.6.32.28 kernel with the newer 0.6.0-rc1 source, but the kernel itself doesn't explain the issue you've reported.

poige · 2011-02-20T20:28:29Z

Asked just to be sure. Already compiled, gonna test it now.

poige · 2011-02-20T20:31:26Z

ok, on zpool list:
SPL: Loaded Solaris Porting Layer v0.6.0
zunicode: module license 'CDDL' taints kernel.
Disabling lock debugging due to kernel taint
BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/3379
caller is txg_hold_open+0x1d/0x5a [zfs]
Pid: 3379, comm: modprobe Tainted: P 2.6.32.28 #1
Call Trace:
[] debug_smp_processor_id+0xbd/0xd8
[] txg_hold_open+0x1d/0x5a [zfs]
[] dmu_tx_assign+0x159/0x349 [zfs]
[] ? dsl_dataset_block_freeable+0x37/0x45 [zfs]
[] zvol_replay_write+0x61/0xab [zfs]
[] zil_replay_log_record+0x10c/0x162 [zfs]
[] zil_parse+0x3cb/0x4ba [zfs]
[] ? __switch_to+0xcb/0x1f2
[] ? zil_replay_log_record+0x0/0x162 [zfs]
[] ? zil_incr_blks+0x0/0xf [zfs]
[] zil_replay+0xb1/0xea [zfs]
[] __zvol_create_minor+0x2ef/0x413 [zfs]
[] zvol_create_minors_cb+0x26/0x2d [zfs]
[] dmu_objset_find_spa+0x332/0x348 [zfs]
[] ? zvol_create_minors_cb+0x0/0x2d [zfs]
[] dmu_objset_find_spa+0x131/0x348 [zfs]
[] ? zvol_create_minors_cb+0x0/0x2d [zfs]
[] zvol_create_minors+0x69/0xc0 [zfs]
[] ? blk_register_region+0x26/0x28
[] zvol_init+0x11e/0x125 [zfs]
[] ? spl__init+0x0/0x10 [zfs]
[] _init+0x1d/0xfe [zfs]
[] ? spl__init+0x0/0x10 [zfs]
[] spl__init+0xe/0x10 [zfs]
[] do_one_initcall+0x59/0x14b
[] sys_init_module+0xd0/0x227
[] system_call_fastpath+0x16/0x1b

poige · 2011-02-20T20:35:41Z

And even simple mount stalls:

 3740 poige     20   0 10864 1344  904 R    1  0.1   0:00.13 top
 3396 root      39  19     0    0    0 D    0  0.0   0:00.14 zvol/3
 3582 root      39  19     0    0    0 D    0  0.0   0:00.00 z_wr_int/0
 3583 root      39  19     0    0    0 D    0  0.0   0:00.02 z_wr_int/1
 3584 root      39  19     0    0    0 D    0  0.0   0:00.02 z_wr_int/2
 3585 root      39  19     0    0    0 D    0  0.0   0:00.01 z_wr_int/3
 3586 root      39  19     0    0    0 D    0  0.0   0:00.01 z_wr_int/4
 3587 root      39  19     0    0    0 D    0  0.0   0:00.02 z_wr_int/5
 3588 root      39  19     0    0    0 D    0  0.0   0:00.02 z_wr_int/6
 3589 root      39  19     0    0    0 D    0  0.0   0:00.02 z_wr_int/7
 3590 root      39  19     0    0    0 D    0  0.0   0:00.00 z_wr_int/8
 3591 root      39  19     0    0    0 D    0  0.0   0:00.03 z_wr_int/9
 3592 root      39  19     0    0    0 D    0  0.0   0:00.03 z_wr_int/10
 3593 root      39  19     0    0    0 D    0  0.0   0:00.02 z_wr_int/11
 3594 root      39  19     0    0    0 D    0  0.0   0:00.01 z_wr_int/12
 3595 root      39  19     0    0    0 D    0  0.0   0:00.01 z_wr_int/13
 3596 root      39  19     0    0    0 D    0  0.0   0:00.02 z_wr_int/14
 3597 root      39  19     0    0    0 D    0  0.0   0:00.02 z_wr_int/15
 3721 root       0 -20     0    0    0 D    0  0.0   0:00.00 txg_quiesce
 3722 root       0 -20     0    0    0 D    0  0.0   0:00.07 txg_sync
 3735 root      20   0  9888  880  724 D    0  0.1   0:00.13 mount

poige · 2011-02-20T20:42:24Z

And again: mount worked ok when booted w/o SMP (uniprocessor mode).

behlendorf · 2011-02-20T20:55:33Z

OK, this is a known issue. The problem is your kernel is built with CONFIG_PREEMPT, it's a low latency kernel. See issue #83. This can and will be fixed but it isn't yet.

https://github.com/behlendorf/zfs/issues/#issue/83

poige · 2011-02-21T01:48:34Z

Damn! I just had recompiled the kernel with CONFIG_PREEMPT before you wrote that new version is available to give it a try. :-) Ok, I'll revert this.

poige · 2011-02-21T03:01:32Z

behlendorf, I gave it a try on 2.6.35.10: copying staled, alas…

$ zgrep CONFIG_PREEMPT /proc/config.gz 
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set

behlendorf · 2011-02-21T03:06:17Z

Was there any debugging printed to 'dmesg'? That would be critical to determine what the system is stuck on. With CONFIG_PREEMPT disabled you certainly shouldn't be seeing this message anymore.

BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/3379

poige · 2011-02-21T03:12:34Z

Not in the moment of the stale. Also, it's a bit strange that having only ran sudo zpool list I have had 2 processes in state 'D':

1320 poige     20   0 10912 1224  812 R    1  0.1   0:00.06 top
1313 root       0 -20     0    0    0 D    0  0.0   0:00.00 txg_quiesce
1314 root       0 -20     0    0    0 D    0  0.0   0:00.00 txg_sync

poige · 2011-02-21T03:19:59Z

vmstat 1 shows a burst in a number of running processes @ stale:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  1      0  22760  49648 471020    0    0  9612     0 1321 1963  0  8 66 27
17  1      0  13472  50028 477532    0    0  8344  4011 1221 2218  0 10 65 25
 0  1      0  14008  50456 479920    0    0 11072  8550 1839 4049  1 13 61 25
 1  2      0  12344  50660 474048    0    0  8768 47804 2862 7540  0 26 45 29
 0  2      0  16516  50868 482004    0    0  4792  8012 1150 2453  0  6 33 60
 0  2      0  16516  50872 482088    0    0     4     0  275  234  0  1 33 66
 0  2      0 252396  50880 482084    0    0     0   548  310  238  0  3 33 64
 0  2      0 252396  50880 482088    0    0     0     0  271  237  0  0 34 66
 0  2      0 252396  50880 482088    0    0     0     0  278  238  0  1 33 66
 0  2      0 252396  50880 482088    0    0     0     0  299  249  0  0 33 66
 0  2      0 252396  50880 482088    0    0     0     0  252  243  0  1 33 66

poige · 2011-02-21T03:21:12Z

The first stale-related dmesg is:

INFO: task arc_reclaim:986 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
arc_reclaim   D 0000000000000000     0   986      2 0x00000000
 ffff880057b87cd0 0000000000000046 ffff880000000000 ffff880057b87fd8
 ffff880058e16540 0000000000014d00 0000000000014d00 ffff880057b87fd8
 0000000000014d00 0000000000014d00 ffff880057b87fd8 0000000000014d00
Call Trace:
 [<ffffffff812d21a5>] __mutex_lock_slowpath+0x145/0x247
 [<ffffffff812d22b8>] mutex_lock+0x11/0x28
 [<ffffffffa03f758d>] arc_evict+0x60/0x4d0 [zfs]
 [<ffffffff8105a725>] ? finish_wait+0x61/0x69
 [<ffffffffa03671cd>] ? __cv_timedwait_common+0xd4/0xf4 [spl]
 [<ffffffffa03f7a64>] arc_adjust+0x67/0x170 [zfs]
 [<ffffffffa03f8d1a>] arc_reclaim_thread+0x107/0x182 [zfs]
 [<ffffffffa03f8c13>] ? arc_reclaim_thread+0x0/0x182 [zfs]
 [<ffffffffa0364105>] thread_generic_wrapper+0x6c/0x79 [spl]
 [<ffffffffa0364099>] ? thread_generic_wrapper+0x0/0x79 [spl]
 [<ffffffff8105a193>] kthread+0x7a/0x82
 [<ffffffff81009824>] kernel_thread_helper+0x4/0x10
 [<ffffffff8105a119>] ? kthread+0x0/0x82
 [<ffffffff81009820>] ? kernel_thread_helper+0x0/0x10

behlendorf · 2011-02-21T18:01:35Z

Thanks, that debugging is a good start. When I get a moment I'll try and reproduce this locally, is your test setup the same as described in comment #1? Also, exactly what commands did you run to trigger this deadlock? Finally, I notice your VM only has 1GiB of memory which isn't much for ZFS in theory it should be OK, just slow, but I try to leave 2GiB of memory in all my VMs for ZFS testing.

poige · 2011-02-21T18:27:35Z

You're welcome! It's x86_64 Linux in VirtualBox. Actually it has 1430 MiB and, AFAIR, I did ± it (no RAM-starving occured), as well as changed number of availables CPUs. (With only one CPU everything works ok.). The trigger is simple sudo cp -xa /usr/src /mounted_zfs_volume. The volume is 10G size with Reiser3 on the top. The pool is either simple stripe or raidz, consisting of 3 disk devices each of 8 GiB size.

poige · 2011-03-11T05:48:04Z

Hi! Any changes worth of re-trying? :)

Rudd-O · 2011-03-15T15:49:18Z

Ew, I have been using CONFIG_PREEMPT_VOLUNTARY for a while. Maybe that's why ZFS goes all retarded on me every day? I am gonna turn it off now.

behlendorf · 2011-03-15T20:32:03Z

Please let me know if that improves thing. Leaving CONFIG_PREEMPT on could result in a deadlock in the slab reclaim which could manifest itself as a spinning kswapd task.

Rudd-O · 2011-03-15T23:27:15Z

So far no spinning kswapd task with the new kernel and no config preempt enabled.

behlendorf · 2011-03-22T21:11:59Z

The kswapd spinning and deadlock issues should be resolved in the latest spl/zfs master source. Performance issues and memory usage must still be addressed, but those are different issues than the kswapd thrashing/deadlock described here. The following commits have resolved the issue.

691f6ac, behlendorf/spl@cb255ae - Allows the kernel to perform direct reclaim regularly which eases the pressure on kswapd (indirect reclaim)

d6bd8ea - Fixes a deadlock exposed by increased direct reclaim.

behlendorf/spl@2092cf6 - Fixes upstream kernel vmalloc() deadlock, the kernel patch no longer required. However, I will work to get it included in the upstream kernel since it is a real kernel bug.

I'm closing this issue, if your still seeing a spinning kswapd with the latest code please let me know and we'll reopen this issue or file a new one.

Rudd-O · 2011-03-22T23:25:07Z

Will try now

poige · 2011-10-02T18:42:48Z

Marked as Closed but I can't confirm that — I managed to get another hang but now I think it can be due to really tight memory limit. Gonna check this out soon,

poige · 2011-10-02T19:31:45Z

Having increased mem. to 2 GiB I could have copied /usr/src to ZFS' volume. During this copying I noticed from time to time valuable burst of number of running processes:

 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
102  0      0  25112   1588 665884    0    0  2392 11177 1660 3694  0 35 46 19

quite strange…

poige · 2011-10-03T04:22:10Z

And 4096 GiB was the lowest boundary to have successfully copied /usr/src onto dedup=on, compress=on ZFS. Otherwise it stalled all the ways and even panic-rebooted once.

behlendorf · 2011-10-18T23:07:43Z

That doesn't sound fixed, I'm reopening the issue.

behlendorf · 2011-11-02T22:07:36Z

This was likely caused by issue #287, this fix will likely be merged in the next day or two so I'm closing this bug.

Signed-off-by: Jan Kryl <jan.kryl@cloudbyte.com>

Add zfs-headers package

behlendorf closed this as completed Mar 22, 2011

behlendorf reopened this Oct 18, 2011

behlendorf closed this as completed Nov 2, 2011

Boldie mentioned this issue Nov 29, 2014

General protection fault #2941

Closed

rpdrewes mentioned this issue Dec 1, 2014

zfs changes back to /dev/sdx device names upon reboot #2944

Closed

pdemarino mentioned this issue Dec 2, 2014

Can't cancel scrub / can't issue zpool commands #2929

Closed

wonder mentioned this issue Oct 21, 2015

nfsd bash z_unmount blocked for more than 120 seconds #3941

Closed

richardelling pushed a commit to richardelling/zfs that referenced this issue Oct 15, 2018

dockerfile fix for recently introduced zrepl lib (openzfs#69)

6b4711a

Signed-off-by: Jan Kryl <jan.kryl@cloudbyte.com>

ahrens pushed a commit to ahrens/zfs that referenced this issue Aug 28, 2019

Add zfs-headers package (openzfs#69)

944a05f

Add zfs-headers package

tonynguien pushed a commit to tonynguien/zfs that referenced this issue Dec 21, 2021

DOSE-516 Mechanism to version on-disk format and upgrade (openzfs#69)

30bd1b0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Directory copy stalls #69

Directory copy stalls #69

paragw commented Oct 13, 2010

poige commented Feb 4, 2011

poige commented Feb 9, 2011

ChrisAzfs commented Feb 19, 2011

poige commented Feb 20, 2011

behlendorf commented Feb 20, 2011

poige commented Feb 20, 2011

behlendorf commented Feb 20, 2011

poige commented Feb 20, 2011

poige commented Feb 20, 2011

poige commented Feb 20, 2011

poige commented Feb 20, 2011

behlendorf commented Feb 20, 2011

poige commented Feb 21, 2011

poige commented Feb 21, 2011

behlendorf commented Feb 21, 2011

poige commented Feb 21, 2011

poige commented Feb 21, 2011

poige commented Feb 21, 2011

behlendorf commented Feb 21, 2011

poige commented Feb 21, 2011

poige commented Mar 11, 2011

Rudd-O commented Mar 15, 2011

behlendorf commented Mar 15, 2011

Rudd-O commented Mar 15, 2011

behlendorf commented Mar 22, 2011

Rudd-O commented Mar 22, 2011

poige commented Oct 2, 2011

poige commented Oct 2, 2011

poige commented Oct 3, 2011

behlendorf commented Oct 18, 2011

behlendorf commented Nov 2, 2011

Directory copy stalls #69

Directory copy stalls #69

Comments

paragw commented Oct 13, 2010

poige commented Feb 4, 2011

poige commented Feb 9, 2011

ChrisAzfs commented Feb 19, 2011

poige commented Feb 20, 2011

behlendorf commented Feb 20, 2011

poige commented Feb 20, 2011

behlendorf commented Feb 20, 2011

poige commented Feb 20, 2011

poige commented Feb 20, 2011

poige commented Feb 20, 2011

poige commented Feb 20, 2011

behlendorf commented Feb 20, 2011

poige commented Feb 21, 2011

poige commented Feb 21, 2011

behlendorf commented Feb 21, 2011

poige commented Feb 21, 2011

poige commented Feb 21, 2011

poige commented Feb 21, 2011

behlendorf commented Feb 21, 2011

poige commented Feb 21, 2011

poige commented Mar 11, 2011

Rudd-O commented Mar 15, 2011

behlendorf commented Mar 15, 2011

Rudd-O commented Mar 15, 2011

behlendorf commented Mar 22, 2011

Rudd-O commented Mar 22, 2011

poige commented Oct 2, 2011

poige commented Oct 2, 2011

poige commented Oct 3, 2011

behlendorf commented Oct 18, 2011

behlendorf commented Nov 2, 2011