FreeBSD r256956: Improve ZFS N-way mirror read performance by using l… #4334

ryao · 2016-02-13T02:03:30Z

…oad and locality information.

The existing algorithm selects a preferred leaf vdev based on offset of the zio
request modulo the number of members in the mirror. It assumes the devices are
of equal performance and that spreading the requests randomly over both drives
will be sufficient to saturate them. In practice this results in the leaf vdevs
being under utilized.

The new algorithm takes into the following additional factors:

Load of the vdevs (number outstanding I/O requests)
The locality of last queued I/O vs the new I/O request.

Within the locality calculation additional knowledge about the underlying vdev
is considered such as; is the device backing the vdev a rotating media device.

This results in performance increases across the board as well as significant
increases for predominantly streaming loads and for configurations which don't
have evenly performing devices.

The following are results from a setup with 3 Way Mirror with 2 x HD's and
1 x SSD from a basic test running multiple parrallel dd's.

With pre-fetch disabled (vfs.zfs.prefetch_disable=1):

== Stripe Balanced (default) ==
Read 15360MB using bs: 1048576, readers: 3, took 161 seconds @ 95 MB/s
== Load Balanced (zfslinux) ==
Read 15360MB using bs: 1048576, readers: 3, took 297 seconds @ 51 MB/s
== Load Balanced (locality freebsd) ==
Read 15360MB using bs: 1048576, readers: 3, took 54 seconds @ 284 MB/s

With pre-fetch enabled (vfs.zfs.prefetch_disable=0):

== Stripe Balanced (default) ==
Read 15360MB using bs: 1048576, readers: 3, took 91 seconds @ 168 MB/s
== Load Balanced (zfslinux) ==
Read 15360MB using bs: 1048576, readers: 3, took 108 seconds @ 142 MB/s
== Load Balanced (locality freebsd) ==
Read 15360MB using bs: 1048576, readers: 3, took 48 seconds @ 320 MB/s

In addition to the performance changes the code was also restructured, with
the help of Justin Gibbs, to provide a more logical flow which also ensures
vdevs loads are only calculated from the set of valid candidates.

The following additional sysctls where added to allow the administrator
to tune the behaviour of the load algorithm:

vfs.zfs.vdev.mirror.rotating_inc
vfs.zfs.vdev.mirror.rotating_seek_inc
vfs.zfs.vdev.mirror.rotating_seek_offset
vfs.zfs.vdev.mirror.non_rotating_inc
vfs.zfs.vdev.mirror.non_rotating_seek_inc

These changes where based on work started by the zfsonlinux developers:
#1487

Reviewed by: gibbs, mav, will
MFC after: 2 weeks
Sponsored by: Multiplay

Porting notes:

The tunables were adjusted to have ZoL-style names.
The code was modified to use ZoL's vd_nonrot.
Fixes were done to make cstyle.pl happy
Merge conflicts were handled manually
freebsd/freebsd-src@e186f56 by my
collegue Andriy Gapon has been included. It applied perfectly, but
added a cstyle regression.
This replaces 556011d entirely.
vdev_mirror_shift from OpenSolaris was missing from our code, so it
has been added.
A typo "IO'a" has been corrected to say "IO's"

Ported-by: Richard Yao ryao@gentoo.org

ryao · 2016-02-13T02:13:17Z

Performance tests have not been done on this. If the buildbot does not identify any problems, someone should do some benchmarks.

ryao · 2016-02-16T03:21:09Z

@kpande It is FreeBSD's version of it, which is considered superior. However, there is nothing left from #1487 when this patch is applied.

behlendorf · 2016-02-16T23:14:08Z

I'm all for a better implementation but we'll need to get some performance numbers to verify that.

ryao · 2016-02-17T02:04:09Z

@behlendorf I ported this after a user complained that this code is not being shared across platforms. I am leaving benchmarks to others willing to volunteer. I imagine this could be used:

https://gist.github.com/brendangregg/7270ff9698c70d9e7496

Whoever does benchmarks will just need to test 1, 2, 3 and 4 drive configurations like you did for the commit message of 556011d.

ryao · 2016-02-24T16:17:56Z

@behlendorf @inkdot7 has done tests on this in #4363 (with prefetch enabled). They show improvements that appear to be consistent with the FreeBSD numbers.

behlendorf · 2016-02-24T20:46:41Z

@inkdot7 thank you for running those performance tests and posting the results. To summarize they show a big performance win mixing a HDD and SSD pretty much across the board. For devices of the same type there may be a small improvement of a few percent.

@ryao aside for the needed man page updates this looks good to me. If you can get that updated I'll get this merged, I definitely agree we should stay consistent with the improvements FreeBSD made here.

ryao · 2016-02-24T21:35:22Z

@behlendorf I have modified the commit to amend the man page, rebased on master and repushed.

…oad and locality information. The existing algorithm selects a preferred leaf vdev based on offset of the zio request modulo the number of members in the mirror. It assumes the devices are of equal performance and that spreading the requests randomly over both drives will be sufficient to saturate them. In practice this results in the leaf vdevs being under utilized. The new algorithm takes into the following additional factors: * Load of the vdevs (number outstanding I/O requests) * The locality of last queued I/O vs the new I/O request. Within the locality calculation additional knowledge about the underlying vdev is considered such as; is the device backing the vdev a rotating media device. This results in performance increases across the board as well as significant increases for predominantly streaming loads and for configurations which don't have evenly performing devices. The following are results from a setup with 3 Way Mirror with 2 x HD's and 1 x SSD from a basic test running multiple parrallel dd's. With pre-fetch disabled (vfs.zfs.prefetch_disable=1): == Stripe Balanced (default) == Read 15360MB using bs: 1048576, readers: 3, took 161 seconds @ 95 MB/s == Load Balanced (zfslinux) == Read 15360MB using bs: 1048576, readers: 3, took 297 seconds @ 51 MB/s == Load Balanced (locality freebsd) == Read 15360MB using bs: 1048576, readers: 3, took 54 seconds @ 284 MB/s With pre-fetch enabled (vfs.zfs.prefetch_disable=0): == Stripe Balanced (default) == Read 15360MB using bs: 1048576, readers: 3, took 91 seconds @ 168 MB/s == Load Balanced (zfslinux) == Read 15360MB using bs: 1048576, readers: 3, took 108 seconds @ 142 MB/s == Load Balanced (locality freebsd) == Read 15360MB using bs: 1048576, readers: 3, took 48 seconds @ 320 MB/s In addition to the performance changes the code was also restructured, with the help of Justin Gibbs, to provide a more logical flow which also ensures vdevs loads are only calculated from the set of valid candidates. The following additional sysctls where added to allow the administrator to tune the behaviour of the load algorithm: * vfs.zfs.vdev.mirror.rotating_inc * vfs.zfs.vdev.mirror.rotating_seek_inc * vfs.zfs.vdev.mirror.rotating_seek_offset * vfs.zfs.vdev.mirror.non_rotating_inc * vfs.zfs.vdev.mirror.non_rotating_seek_inc These changes where based on work started by the zfsonlinux developers: openzfs#1487 Reviewed by: gibbs, mav, will MFC after: 2 weeks Sponsored by: Multiplay Porting notes: - The tunables were adjusted to have ZoL-style names. - The code was modified to use ZoL's vd_nonrot. - Fixes were done to make cstyle.pl happy - Merge conflicts were handled manually - freebsd/freebsd-src@e186f56 by my collegue Andriy Gapon has been included. It applied perfectly, but added a cstyle regression. - This replaces 556011d entirely. - A typo "IO'a" has been corrected to say "IO's" - Descriptions of new tunables were added to man/man5/zfs-module-parameters.5. Ported-by: Richard Yao <ryao@gentoo.org>

behlendorf · 2016-02-25T00:50:58Z

@ryao awesome, thanks.

behlendorf · 2016-02-26T00:52:17Z

Performance results courtesy of testing done by @inkdot7.

Testing Parameters:

ZFS recordsizes:
- 4k, 16k and 128k
fio variations:
- --rw randread, randwrite, read, write
- --engine sync and libaio
pool vdev variations:
- SSD+HDD (the interesting case),
- SSD+SSD to see the effect where there shall be no effect,
- SSD and HDD are the storage devices used in the first case on their own.

Notes:

For randread and randwrite the iops values are presented, and for read and write the bw (MB/s).
Original is unmodified zfsonlinux (v0.6.5.4).
Inactive is with the patches, but the module parameter
Each value is followed by the standard deviation for the three measurements.
Gain is the performance improvement. That should also be 0 for all cases except SSD+HDD. For the SSD+HDD cases there are considerable advantages however.
Between each set of four (write, read, randread, randwrite) fio runs, the SSD storage was trimmed. Still it seems to be difficult to get stable measurements. The HDD actually seem to be the source of the most relative uncertainty.
Results are from the averaging of 3 measurements each of 60s.

======================================================
Operation:  randread    sync iops 

                      Original    seekinc1    seekinc0  G-seekinc1  G-seekinc0
                    ----------  ----------  ----------  ----------  ----------
mir SSD+HDD    4k     2143  52    3636  17    3626  47   52% (  2)   51% (  2)
mir SSD+HDD   16k     2341  59    3802  11    3778  47   48% (  2)   47% (  2)
mir SSD+HDD  128k     1358   5    2485  10    2485   4   59% (  1)   59% (  1)
mir SSD+SSD    4k     4361  58    4443  10    4402  91    2% (  1)    1% (  2)
mir SSD+SSD   16k     4603 112    4663  21    4602   0    1% (  1)   -0% (  0)
mir SSD+SSD  128k     2673   1    2700  11    2706  23    1% (  0)    1% (  1)
       SSD     4k     3624  66     nan inf     nan inf  nan% (nan)  nan% (nan)
       SSD    16k     3780  38     nan inf     nan inf  nan% (nan)  nan% (nan)
       SSD   128k     2508   2     nan inf     nan inf  nan% (nan)  nan% (nan)
       HDD     4k       50   0     nan inf     nan inf  nan% (nan)  nan% (nan)
       HDD    16k       56   4     nan inf     nan inf  nan% (nan)  nan% (nan)
       HDD   128k       82   6     nan inf     nan inf  nan% (nan)  nan% (nan)

------------------------------------------------------
Operation:  randread    libaio iops

                      Original    seekinc1    seekinc0  G-seekinc1  G-seekinc0
                    ----------  ----------  ----------  ----------  ----------
mir SSD+HDD    4k     2075  50    3500   8    3467  69   51% (  1)   50% (  3)
mir SSD+HDD   16k     2337  27    3950  59    3925 107   51% (  2)   51% (  4)
mir SSD+HDD  128k     1322  10    2443  41    2440  47   60% (  2)   59% (  3)
mir SSD+SSD    4k     4228  60    4302   5    4252  15    2% (  1)    1% (  1)
mir SSD+SSD   16k     4577  42    4681  14    4642  22    2% (  1)    1% (  1)
mir SSD+SSD  128k     2629   9    2694  15    2656  17    2% (  1)    1% (  1)
       SSD     4k     3475  61     nan inf     nan inf  nan% (nan)  nan% (nan)
       SSD    16k     3822  81     nan inf     nan inf  nan% (nan)  nan% (nan)
       SSD   128k     2485   9     nan inf     nan inf  nan% (nan)  nan% (nan)
       HDD     4k       97  18     nan inf     nan inf  nan% (nan)  nan% (nan)
       HDD    16k       90   0     nan inf     nan inf  nan% (nan)  nan% (nan)
       HDD   128k       90   1     nan inf     nan inf  nan% (nan)  nan% (nan)

======================================================
Operation:  randwrite   sync iops

                      Original    seekinc1    seekinc0  G-seekinc1  G-seekinc0
                    ----------  ----------  ----------  ----------  ----------
mir SSD+HDD    4k     5984 746    6746 392    6797 100   12% (  8)   13% (  6)
mir SSD+HDD   16k     1950  59    2249   7    2257  23   14% (  4)   15% (  4)
mir SSD+HDD  128k      524  16     724   5     732   7   32% (  4)   33% (  4)
mir SSD+SSD    4k     9591 759    9458  45   10058 489   -1% (  2)    5% (  5)
mir SSD+SSD   16k     2510  32    2500   8    2485   1   -0% (  1)   -1% (  1)
mir SSD+SSD  128k      830   9     816  18     836  19   -2% (  2)    1% (  2)
       SSD     4k     9854 320     nan inf     nan inf  nan% (nan)  nan% (nan)
       SSD    16k     2279  39     nan inf     nan inf  nan% (nan)  nan% (nan)
       SSD   128k      749  19     nan inf     nan inf  nan% (nan)  nan% (nan)
       HDD     4k      108   8     nan inf     nan inf  nan% (nan)  nan% (nan)
       HDD    16k       53   2     nan inf     nan inf  nan% (nan)  nan% (nan)
       HDD   128k       69   3     nan inf     nan inf  nan% (nan)  nan% (nan)

------------------------------------------------------
Operation:  randwrite   libaio iops

                      Original    seekinc1    seekinc0  G-seekinc1  G-seekinc0
                    ----------  ----------  ----------  ----------  ----------
mir SSD+HDD    4k     5698 683    7455   4    6896 197   27% ( 19)   19% ( 20)
mir SSD+HDD   16k     1908  86    2328  64    2297  44   20% (  4)   19% (  4)
mir SSD+HDD  128k      531  26     726   1     720   5   31% (  4)   30% (  4)
mir SSD+SSD    4k     9819 244    9823 365    9417 320    0% (  4)   -4% (  4)
mir SSD+SSD   16k     2526  27    2500  42    2506   1   -1% (  2)   -1% (  1)
mir SSD+SSD  128k      807  12     802  21     829   8   -1% (  3)    3% (  1)
       SSD     4k     9950 278     nan inf     nan inf  nan% (nan)  nan% (nan)
       SSD    16k     2299  51     nan inf     nan inf  nan% (nan)  nan% (nan)
       SSD   128k      737  15     nan inf     nan inf  nan% (nan)  nan% (nan)
       HDD     4k      151   9     nan inf     nan inf  nan% (nan)  nan% (nan)
       HDD    16k       88   1     nan inf     nan inf  nan% (nan)  nan% (nan)
       HDD   128k       85   1     nan inf     nan inf  nan% (nan)  nan% (nan)

======================================================
Operation:  read        sync bw (MB/s)

                      Original    seekinc1    seekinc0  G-seekinc1  G-seekinc0
                    ----------  ----------  ----------  ----------  ----------
mir SSD+HDD    4k       73   5     170   0     172   1   80% (  5)   81% (  5)
mir SSD+HDD   16k      204  41     391   0     379   1   63% ( 15)   60% ( 16)
mir SSD+HDD  128k      406  53     380   1     380   0   -7% ( 20)   -7% ( 20)
mir SSD+SSD    4k      192   1     200   2     200   2    4% (  1)    4% (  1)
mir SSD+SSD   16k      500  14     510   5     524   1    2% (  1)    5% (  1)
mir SSD+SSD  128k      669  10     722  19     718   4    8% (  4)    7% (  3)
       SSD     4k      184   9     nan inf     nan inf  nan% (nan)  nan% (nan)
       SSD    16k      386   5     nan inf     nan inf  nan% (nan)  nan% (nan)
       SSD   128k      381   1     nan inf     nan inf  nan% (nan)  nan% (nan)
       HDD     4k       35   4     nan inf     nan inf  nan% (nan)  nan% (nan)
       HDD    16k       64   3     nan inf     nan inf  nan% (nan)  nan% (nan)
       HDD   128k      111   3     nan inf     nan inf  nan% (nan)  nan% (nan)

------------------------------------------------------
Operation:  read        libaio bw (MB/s)

                      Original    seekinc1    seekinc0  G-seekinc1  G-seekinc0
                    ----------  ----------  ----------  ----------  ----------
mir SSD+HDD    4k       56   4     131   2     129   0   80% (  5)   79% (  5)
mir SSD+HDD   16k       74   5     243   0     244   4  107% (  3)  107% (  4)
mir SSD+HDD  128k      156  10     348   0     350   3   76% (  5)   76% (  5)
mir SSD+SSD    4k      168   1     174   6     178   0    4% (  3)    6% (  0)
mir SSD+SSD   16k      368   5     420   1     422   6   13% (  0)   14% (  2)
mir SSD+SSD  128k      595  12     647   7     634   6    8% (  5)    6% (  5)
       SSD     4k      132   4     nan inf     nan inf  nan% (nan)  nan% (nan)
       SSD    16k      244   3     nan inf     nan inf  nan% (nan)  nan% (nan)
       SSD   128k      349   1     nan inf     nan inf  nan% (nan)  nan% (nan)
       HDD     4k       34   4     nan inf     nan inf  nan% (nan)  nan% (nan)
       HDD    16k       53   0     nan inf     nan inf  nan% (nan)  nan% (nan)
       HDD   128k       92   3     nan inf     nan inf  nan% (nan)  nan% (nan)

======================================================
Operation:  write       sync bw (MB/s)

                      Original    seekinc1    seekinc0  G-seekinc1  G-seekinc0
                    ----------  ----------  ----------  ----------  ----------
mir SSD+HDD    4k       44   3      59   1      57   0   29% (  2)   27% (  1)
mir SSD+HDD   16k       30   2      40   0      40   0   27% (  6)   27% (  6)
mir SSD+HDD  128k       54   3      66   0      66   0   20% (  7)   19% (  7)
mir SSD+SSD    4k       70   0      69   0      70   1   -0% (  1)    0% (  1)
mir SSD+SSD   16k       48   0      50   0      49   0    4% (  1)    3% (  0)
mir SSD+SSD  128k       80   0      79   0      77   2   -2% (  1)   -4% (  2)
       SSD     4k       71   0     nan inf     nan inf  nan% (nan)  nan% (nan)
       SSD    16k       40   0     nan inf     nan inf  nan% (nan)  nan% (nan)
       SSD   128k       66   1     nan inf     nan inf  nan% (nan)  nan% (nan)
       HDD     4k       30   2     nan inf     nan inf  nan% (nan)  nan% (nan)
       HDD    16k       26   3     nan inf     nan inf  nan% (nan)  nan% (nan)
       HDD   128k       32   9     nan inf     nan inf  nan% (nan)  nan% (nan)

------------------------------------------------------
Operation:  write       libaio bw (MB/s)

                      Original    seekinc1    seekinc0  G-seekinc1  G-seekinc0
                    ----------  ----------  ----------  ----------  ----------
mir SSD+HDD    4k       49   2      58   0      58   0   17% (  7)   16% (  7)
mir SSD+HDD   16k       28   1      36   0      36   1   25% (  5)   26% (  5)
mir SSD+HDD  128k       50   2      61   0      61   0   20% (  3)   20% (  3)
mir SSD+SSD    4k       70   0      70   0      70   0   -0% (  1)   -1% (  0)
mir SSD+SSD   16k       41   0      43   0      42   0    4% (  2)    2% (  1)
mir SSD+SSD  128k       76   1      74   0      74   1   -2% (  2)   -2% (  2)
       SSD     4k       69   2     nan inf     nan inf  nan% (nan)  nan% (nan)
       SSD    16k       36   0     nan inf     nan inf  nan% (nan)  nan% (nan)
       SSD   128k       61   1     nan inf     nan inf  nan% (nan)  nan% (nan)
       HDD     4k       34   5     nan inf     nan inf  nan% (nan)  nan% (nan)
       HDD    16k       24   1     nan inf     nan inf  nan% (nan)  nan% (nan)
       HDD   128k       24   5     nan inf     nan inf  nan% (nan)  nan% (nan)

…oad and locality information. The existing algorithm selects a preferred leaf vdev based on offset of the zio request modulo the number of members in the mirror. It assumes the devices are of equal performance and that spreading the requests randomly over both drives will be sufficient to saturate them. In practice this results in the leaf vdevs being under utilized. The new algorithm takes into the following additional factors: * Load of the vdevs (number outstanding I/O requests) * The locality of last queued I/O vs the new I/O request. Within the locality calculation additional knowledge about the underlying vdev is considered such as; is the device backing the vdev a rotating media device. This results in performance increases across the board as well as significant increases for predominantly streaming loads and for configurations which don't have evenly performing devices. The following are results from a setup with 3 Way Mirror with 2 x HD's and 1 x SSD from a basic test running multiple parrallel dd's. With pre-fetch disabled (vfs.zfs.prefetch_disable=1): == Stripe Balanced (default) == Read 15360MB using bs: 1048576, readers: 3, took 161 seconds @ 95 MB/s == Load Balanced (zfslinux) == Read 15360MB using bs: 1048576, readers: 3, took 297 seconds @ 51 MB/s == Load Balanced (locality freebsd) == Read 15360MB using bs: 1048576, readers: 3, took 54 seconds @ 284 MB/s With pre-fetch enabled (vfs.zfs.prefetch_disable=0): == Stripe Balanced (default) == Read 15360MB using bs: 1048576, readers: 3, took 91 seconds @ 168 MB/s == Load Balanced (zfslinux) == Read 15360MB using bs: 1048576, readers: 3, took 108 seconds @ 142 MB/s == Load Balanced (locality freebsd) == Read 15360MB using bs: 1048576, readers: 3, took 48 seconds @ 320 MB/s In addition to the performance changes the code was also restructured, with the help of Justin Gibbs, to provide a more logical flow which also ensures vdevs loads are only calculated from the set of valid candidates. The following additional sysctls where added to allow the administrator to tune the behaviour of the load algorithm: * vfs.zfs.vdev.mirror.rotating_inc * vfs.zfs.vdev.mirror.rotating_seek_inc * vfs.zfs.vdev.mirror.rotating_seek_offset * vfs.zfs.vdev.mirror.non_rotating_inc * vfs.zfs.vdev.mirror.non_rotating_seek_inc These changes where based on work started by the zfsonlinux developers: openzfs/zfs#1487 Reviewed by: gibbs, mav, will MFC after: 2 weeks Sponsored by: Multiplay References: https://github.com/freebsd/freebsd@5c7a6f5d https://github.com/freebsd/freebsd@31b7f68d https://github.com/freebsd/freebsd@e186f564 Performance Testing: openzfs/zfs#4334 (comment) Porting notes: - The tunables were adjusted to have ZoL-style names. - The code was modified to use ZoL's vd_nonrot. - Fixes were done to make cstyle.pl happy - Merge conflicts were handled manually - freebsd/freebsd-src@e186f56 by my collegue Andriy Gapon has been included. It applied perfectly, but added a cstyle regression. - This replaces 556011d entirely. - A typo "IO'a" has been corrected to say "IO's" - Descriptions of new tunables were added to man/man5/zfs-module-parameters.5. Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Changed kstat types, and added kstat defines for OSX. Ported-by: Jorgen Lundman <lundman@lundman.net>

jumbi77 · 2016-03-16T20:31:22Z

Nice work!
Is there any chance to port that to OpenZFS?

ryao · 2016-03-16T21:07:27Z

@jumbi77 It is possible, although that will likely wait until someone sits down to port patches from ZoL to illumos unless @mmatuska ports it from FreeBSD first. I don't think Illumos' block layer had the hooks to indicate when a drive is solid state when this was originally written for FreeBSD, but it does now.

…oad and locality information. The existing algorithm selects a preferred leaf vdev based on offset of the zio request modulo the number of members in the mirror. It assumes the devices are of equal performance and that spreading the requests randomly over both drives will be sufficient to saturate them. In practice this results in the leaf vdevs being under utilized. The new algorithm takes into the following additional factors: * Load of the vdevs (number outstanding I/O requests) * The locality of last queued I/O vs the new I/O request. Within the locality calculation additional knowledge about the underlying vdev is considered such as; is the device backing the vdev a rotating media device. This results in performance increases across the board as well as significant increases for predominantly streaming loads and for configurations which don't have evenly performing devices. The following are results from a setup with 3 Way Mirror with 2 x HD's and 1 x SSD from a basic test running multiple parrallel dd's. With pre-fetch disabled (vfs.zfs.prefetch_disable=1): == Stripe Balanced (default) == Read 15360MB using bs: 1048576, readers: 3, took 161 seconds @ 95 MB/s == Load Balanced (zfslinux) == Read 15360MB using bs: 1048576, readers: 3, took 297 seconds @ 51 MB/s == Load Balanced (locality freebsd) == Read 15360MB using bs: 1048576, readers: 3, took 54 seconds @ 284 MB/s With pre-fetch enabled (vfs.zfs.prefetch_disable=0): == Stripe Balanced (default) == Read 15360MB using bs: 1048576, readers: 3, took 91 seconds @ 168 MB/s == Load Balanced (zfslinux) == Read 15360MB using bs: 1048576, readers: 3, took 108 seconds @ 142 MB/s == Load Balanced (locality freebsd) == Read 15360MB using bs: 1048576, readers: 3, took 48 seconds @ 320 MB/s In addition to the performance changes the code was also restructured, with the help of Justin Gibbs, to provide a more logical flow which also ensures vdevs loads are only calculated from the set of valid candidates. The following additional sysctls where added to allow the administrator to tune the behaviour of the load algorithm: * vfs.zfs.vdev.mirror.rotating_inc * vfs.zfs.vdev.mirror.rotating_seek_inc * vfs.zfs.vdev.mirror.rotating_seek_offset * vfs.zfs.vdev.mirror.non_rotating_inc * vfs.zfs.vdev.mirror.non_rotating_seek_inc These changes where based on work started by the zfsonlinux developers: openzfs/zfs#1487 Reviewed by: gibbs, mav, will MFC after: 2 weeks Sponsored by: Multiplay References: https://github.com/freebsd/freebsd@5c7a6f5d https://github.com/freebsd/freebsd@31b7f68d https://github.com/freebsd/freebsd@e186f564 Performance Testing: openzfs/zfs#4334 (comment) Porting notes: - The tunables were adjusted to have ZoL-style names. - The code was modified to use ZoL's vd_nonrot. - Fixes were done to make cstyle.pl happy - Merge conflicts were handled manually - freebsd/freebsd-src@e186f56 by my collegue Andriy Gapon has been included. It applied perfectly, but added a cstyle regression. - This replaces 556011d entirely. - A typo "IO'a" has been corrected to say "IO's" - Descriptions of new tunables were added to man/man5/zfs-module-parameters.5. Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Changed kstat types, and added kstat defines for OSX. Ported-by: Jorgen Lundman <lundman@lundman.net>

thegreatgazoo · 2017-05-16T22:50:10Z

module/zfs/vdev_mirror.c

+
+	/*
+	 * We don't return INT_MAX if the device is resilvering i.e.
+	 * vdev_resilver_txg != 0 as when tested performance was slightly


This comment seemed obsolete. I don't see this function uses vdev_resilver_txg anywhere. Maybe I missed something?

My reading of the comment is that it explains why there isn't additional code here which adds additional weight to devices which are currently resilvering. As the comment says it wasn't worthwhile.

@behlendorf I see - thanks!

djkazic · 2018-06-22T15:29:50Z

Will this be merged? Looks straightforward to me, and it has tests.

drescherjm · 2018-06-22T16:13:48Z

Wasn't it merged here: 9f50093

ryao force-pushed the mirror-locality branch from a6a040c to 48c4cde Compare February 13, 2016 03:16

ryao force-pushed the mirror-locality branch from 48c4cde to fb1e93e Compare February 16, 2016 22:15

behlendorf mentioned this pull request Feb 24, 2016

Prefer reading from SSDs in asymmetric (SSD+HDD) mirrors #4363

Closed

inkdot7 mentioned this pull request Feb 24, 2016

Prefernonrot #4364

Closed

ryao force-pushed the mirror-locality branch 3 times, most recently from 8e18498 to 1bffa12 Compare February 24, 2016 21:34

ryao force-pushed the mirror-locality branch from 1bffa12 to f19d289 Compare February 24, 2016 21:37

behlendorf closed this in 9f50093 Feb 26, 2016

inkdot7 mentioned this pull request Sep 1, 2016

Rotor vector allocation (small records favour SSD) #4365

Closed

inkdot7 mentioned this pull request Jan 11, 2017

Zpool iostat show ssd #5117

Closed

thegreatgazoo reviewed May 16, 2017

View reviewed changes

inkdot7 mentioned this pull request Nov 23, 2020

Prefer read from fast/low-latency device in asymmetric mirror #11234

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FreeBSD r256956: Improve ZFS N-way mirror read performance by using l… #4334

FreeBSD r256956: Improve ZFS N-way mirror read performance by using l… #4334

ryao commented Feb 13, 2016

ryao commented Feb 13, 2016

ryao commented Feb 16, 2016

behlendorf commented Feb 16, 2016

ryao commented Feb 17, 2016

ryao commented Feb 24, 2016

behlendorf commented Feb 24, 2016

ryao commented Feb 24, 2016

behlendorf commented Feb 25, 2016

behlendorf commented Feb 26, 2016

jumbi77 commented Mar 16, 2016

ryao commented Mar 16, 2016

thegreatgazoo May 16, 2017

behlendorf May 16, 2017

thegreatgazoo May 16, 2017

djkazic commented Jun 22, 2018

drescherjm commented Jun 22, 2018

FreeBSD r256956: Improve ZFS N-way mirror read performance by using l… #4334

FreeBSD r256956: Improve ZFS N-way mirror read performance by using l… #4334

Conversation

ryao commented Feb 13, 2016

ryao commented Feb 13, 2016

ryao commented Feb 16, 2016

behlendorf commented Feb 16, 2016

ryao commented Feb 17, 2016

ryao commented Feb 24, 2016

behlendorf commented Feb 24, 2016

ryao commented Feb 24, 2016

behlendorf commented Feb 25, 2016

behlendorf commented Feb 26, 2016

jumbi77 commented Mar 16, 2016

ryao commented Mar 16, 2016

thegreatgazoo May 16, 2017

Choose a reason for hiding this comment

behlendorf May 16, 2017

Choose a reason for hiding this comment

thegreatgazoo May 16, 2017

Choose a reason for hiding this comment

djkazic commented Jun 22, 2018

drescherjm commented Jun 22, 2018