Skip to content

Very high cpu load writing to a zvol #7631

@akschu

Description

@akschu

System information:
Distro is customized slackware64-14.2
Kernel 4.14.49
zfs/spl 0.7.9-1
(2) E5-2690 v3 CPUs
HP P440ar raid controller (using ZFS for volume management/compression)

Also tried on (with same results):
Distro is customized slackware64-14.2
Kernel 4.9.101
zfs/spl 0.7.9-1
(1) E3-1230 CPU
LSI 2008 in IT mode with 4 SAS disks.

The issue is that I get poor write performance to a ZVOL, and the zvol kernel threads burn lots of CPU causing very high load averages on the machine. At first I was seeing the issue in libvirt/qemu while doing a virtual machine block copy, but reduced it down to this:

# dd if=/datastore/vm/dng-smokeping/dng-smokeping.raw of=/dev/zvol/datastore/vm/test bs=1M
51200+0 records in
51200+0 records out
53687091200 bytes (50.0GB) copied, 318.477527 seconds, 160.8MB/s

Speed isn't great, but the real issue is the load average goes through the roof:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     15503 44.2  0.0  17660  2956 pts/1    D+   12:12   1:22 dd if /datastore/vm/dng-smokeping/dng-smokeping.raw of /dev/zvol/datastore/vm/test bs 1M
root     15506 18.6  0.0      0     0 ?        R<   12:12   0:33 [zvol]
root     15505 18.6  0.0      0     0 ?        D<   12:12   0:33 [zvol]
root     48390 17.2  0.0      0     0 ?        D<   12:00   2:42 [zvol]
root     48296 17.2  0.0      0     0 ?        R<   11:59   2:43 [zvol]
root     48290 17.2  0.0      0     0 ?        R<   11:59   2:43 [zvol]
root     48289 17.2  0.0      0     0 ?        D<   11:59   2:43 [zvol]
root     48287 17.2  0.0      0     0 ?        D<   11:59   2:43 [zvol]
root     48282 17.2  0.0      0     0 ?        D<   11:59   2:43 [zvol]
root     48280 17.2  0.0      0     0 ?        D<   11:59   2:43 [zvol]
root     48274 17.2  0.0      0     0 ?        D<   11:59   2:43 [zvol]
root     48273 17.2  0.0      0     0 ?        R<   11:59   2:43 [zvol]
root     48271 17.2  0.0      0     0 ?        R<   11:59   2:43 [zvol]
root     48298 17.1  0.0      0     0 ?        D<   11:59   2:43 [zvol]
root     48297 17.1  0.0      0     0 ?        D<   11:59   2:42 [zvol]
root     48295 17.1  0.0      0     0 ?        D<   11:59   2:42 [zvol]
root     48293 17.1  0.0      0     0 ?        R<   11:59   2:42 [zvol]
root     48292 17.1  0.0      0     0 ?        R<   11:59   2:43 [zvol]
root     48291 17.1  0.0      0     0 ?        D<   11:59   2:43 [zvol]
root     48288 17.1  0.0      0     0 ?        D<   11:59   2:43 [zvol]
root     48286 17.1  0.0      0     0 ?        D<   11:59   2:43 [zvol]
root     48284 17.1  0.0      0     0 ?        D<   11:59   2:42 [zvol]
root     48283 17.1  0.0      0     0 ?        D<   11:59   2:43 [zvol]
root     48281 17.1  0.0      0     0 ?        D<   11:59   2:42 [zvol]
root     48279 17.1  0.0      0     0 ?        R<   11:59   2:43 [zvol]
root     48278 17.1  0.0      0     0 ?        D<   11:59   2:42 [zvol]
root     48277 17.1  0.0      0     0 ?        R<   11:59   2:42 [zvol]
root     48276 17.1  0.0      0     0 ?        D<   11:59   2:42 [zvol]
root     48275 17.1  0.0      0     0 ?        R<   11:59   2:42 [zvol]
root     48272 17.1  0.0      0     0 ?        D<   11:59   2:43 [zvol]
root     48270 17.1  0.0      0     0 ?        D<   11:59   2:43 [zvol]
root       800 13.9  0.0      0     0 ?        D<   11:12   8:47 [zvol]
root     47832 12.2  0.0      0     0 ?        R<   11:53   2:43 [zvol]
root      3798  0.0  0.0  16764  1200 pts/0    S+   12:15   0:00 egrep USER|zvol
root      1432  0.0  0.0      0     0 ?        S    11:13   0:00 [z_zvol]

# uptime
 12:15:47 up  1:03,  2 users,  load average: 44.88, 25.17, 19.15

Now, if I go the opposite direction it's much faster and the load average isn't nearly as high:

# dd of=/datastore/vm/dng-smokeping/dng-smokeping.raw if=/dev/zvol/datastore/vm/test bs=1M
51200+0 records in
51200+0 records out
53687091200 bytes (50.0GB) copied, 94.473277 seconds, 542.0MB/s

There is only a single zvol, and the load average is normal:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     45782 67.0  0.0  17660  2928 pts/1    R+   12:19   1:00 dd of /datastore/vm/dng-smokeping/dng-smokeping.raw if /dev/zvol/datastore/vm/test bs 1M
root       800 14.6  0.0      0     0 ?        S<   11:12  10:02 [zvol]
root      1432  0.0  0.0      0     0 ?        S    11:13   0:00 [z_zvol]
root      1303  0.0  0.0  16764  1032 pts/0    S+   12:21   0:00 egrep USER|zvol

# uptime
 12:21:14 up  1:08,  2 users,  load average: 3.57, 16.60, 18.11

What is also interesting is that both of these things are on the same dataset:

# zfs list
NAME                         USED  AVAIL  REFER  MOUNTPOINT
datastore                    299G  1.78T  15.1G  /datastore
datastore/vm                 284G  1.78T  18.0G  /datastore/vm
datastore/vm/test           51.6G  1.83T  1007M  -

So not sure what to look at. As it is right now, I can't really write to a zvol without killing the machine, so I'm using raw disk images on mounted zfs filesystem to avoid the double COW.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions