Skip to content

Commit

Permalink
Allocate using vector of rotors. Proof-of-concept, not for production.
Browse files Browse the repository at this point in the history
In a pool that consist of e.g. a small but fast SSD-based mirror and
a large but long-latency HDD-based RAIDZn, it is useful to have the
metadata, as well as very small files, stored on the SSD.  This
is handled in this patch by selecting the storage based on the size
of the allocation.

This is done by using a vector of rotors, each of which is associated with
metaslab groups of each kind of storage.  If the preferred group is full,
attempts are made to fill slower groups.  Better groups are not attempted -
rationale is that an almost full filesystem shall not spill large-size
data into the expensive SSD vdev, since that will not be reclaimable
without deleting the files.  Better then to consider the filesystem full
when the large-size storage is full.

One could also think of having e.g. a 3-level storage:  Mirror SSD for
really small records, mirror HDD for medium-size records and raidzn HDD
for the bulk of data.

Some performance numbers:

Tested on three separate pools each consisting of a 20 GB SSD
partition and a 100 GB HDD partition, from the same disks.  The HDD is
2 TB in total.)  SSD raw reads: 350 MB/s, HDD raw reads 132 MB/s.

The filesystems were filled to ~60 % with a random directory tree,
each with random 0-6 subdirectories and 0-100 files, maximum depth 8.
The filesize was random 0-400 kB.  The fill script was run with 10
instances in parallel, aborted at ~the same size.  The performance
variations below are much larger than the filesystem fill differences.

The patch does not handle the 'inactive' case very well (it begins by
filling nonrotating storage).  Setting 0 is actually the original
7a27ad0 commit.  Setting 8000 and 16000 is the value for
zfs_mixed_slowsize_threshold, i.e. below which size data is stored
using rotor[0] (nonrotating SSD), instead of rotor[1] (rotating HDD).

-              Setting 8000  Setting 16000  Setting 0
-              ------------  -------------  ------------

Total # files  305666        304439         308962
Total size     75334 kB      75098 kB       75231 kB

As per 'zfs iostat -v':

Total alloc    71.8 G        71.6 G         71.7 G
SSD alloc      3.34 G        3.41 G         3.71 G
HDD alloc      68.5 G        68.2 G         68.0

Time for 'find' and 'zpool scrub' after fresh 'zfs import':

find           5.6 s         5.5 s          42 s
scrub          560 s         560 s          1510 s

Time for serial 'find | xargs -P 1 md5sum' and
parallell 'find | xargs -P 4 -n 10 md5sum'.
(Only first 10000 files each)

-P 1 md5sum    129 s         122 s          168 s
-P 4 md5sum    182 s         150 s          187 s
(size summed)  2443 MB       2499 MB        2423 MB

Conflicts:
	include/sys/metaslab_impl.h
  • Loading branch information
inkdot7 committed Feb 25, 2016
1 parent 3185c50 commit 1555df6
Show file tree
Hide file tree
Showing 2 changed files with 45 additions and 3 deletions.
2 changes: 1 addition & 1 deletion include/sys/metaslab_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ extern "C" {
* big and less expensive. Depending on the size of an allocation,
* a vdev will be chosen.
*/
#define METASLAB_CLASS_ROTORS 1
#define METASLAB_CLASS_ROTORS 2

struct metaslab_class {
spa_t *mc_spa;
Expand Down
46 changes: 44 additions & 2 deletions module/zfs/metaslab.c
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,12 @@ int zfs_mg_noalloc_threshold = 0;
*/
int zfs_mg_fragmentation_threshold = 85;

/*
* Allocate from faster vdev in pool if below threshold, allocate
* from slower vdev in pool if above threshold.
*/
int zfs_mixed_slowsize_threshold = 0;

/*
* Allow metaslabs to keep their active state as long as their fragmentation
* percentage is less than or equal to zfs_metaslab_fragmentation_threshold. An
Expand Down Expand Up @@ -546,6 +552,9 @@ metaslab_group_activate(metaslab_group_t *mg)

mg->mg_nrot = 0; /* TODO: when vector, decide which rotor to place in */

if (!mg->mg_vd->vdev_nonrot)
mg->mg_nrot = 1;

mg->mg_aliquot = metaslab_aliquot * MAX(1, mg->mg_vd->vdev_children);
metaslab_group_alloc_update(mg);

Expand Down Expand Up @@ -2244,6 +2253,24 @@ metaslab_alloc_dva(spa_t *spa, metaslab_class_t *mc, uint64_t psize,
*/
nrot = 0;

if (zfs_mixed_slowsize_threshold) {
if (psize >= zfs_mixed_slowsize_threshold) {
/*
* TODO: do not go higher than we actually
* have members.
*/
nrot = 1;

for (i = METASLAB_CLASS_ROTORS-1; i >= 0; i--) {
if (mc->mc_rotorv[i]) {
if (i < nrot)
nrot = i;
break;
}
}
}
}

/*
* Start at the rotor and loop through all mgs until we find something.
* Note that there's no locking on mc_rotor or mc_aliquot because
Expand Down Expand Up @@ -2303,12 +2330,24 @@ metaslab_alloc_dva(spa_t *spa, metaslab_class_t *mc, uint64_t psize,
mg = mc->mc_rotorv[nrot];
}

if (mg == NULL ||
mg->mg_nrot < nrot) {
for (i = nrot; i < METASLAB_CLASS_ROTORS; i++) {
if (mc->mc_rotorv[i] != NULL)
mg = mc->mc_rotorv[i];
}
}

/*
* If the hint put us into the wrong metaslab class, or into a
* metaslab group that has been passivated, just follow the rotor.
*/
if (mg->mg_class != mc || mg->mg_activation_count <= 0)
mg = mc->mc_rotorv[nrot];
if (mg->mg_class != mc || mg->mg_activation_count <= 0) {
for (i = nrot; i < METASLAB_CLASS_ROTORS; i++) {
if (mc->mc_rotorv[i] != NULL)
mg = mc->mc_rotorv[i];
}
}

top1:
rotor = mg;
Expand Down Expand Up @@ -2779,6 +2818,7 @@ module_param(metaslab_debug_unload, int, 0644);
module_param(metaslab_preload_enabled, int, 0644);
module_param(zfs_mg_noalloc_threshold, int, 0644);
module_param(zfs_mg_fragmentation_threshold, int, 0644);
module_param(zfs_mixed_slowsize_threshold, int, 0644);
module_param(zfs_metaslab_fragmentation_threshold, int, 0644);
module_param(metaslab_fragmentation_factor_enabled, int, 0644);
module_param(metaslab_lba_weighting_enabled, int, 0644);
Expand All @@ -2797,6 +2837,8 @@ MODULE_PARM_DESC(zfs_mg_noalloc_threshold,
"percentage of free space for metaslab group to allow allocation");
MODULE_PARM_DESC(zfs_mg_fragmentation_threshold,
"fragmentation for metaslab group to allow allocation");
MODULE_PARM_DESC(zfs_mixed_slowsize_threshold,
"size threshold to choose slower (rotating) storage in mixed pool");

MODULE_PARM_DESC(zfs_metaslab_fragmentation_threshold,
"fragmentation for metaslab to allow allocation");
Expand Down

0 comments on commit 1555df6

Please sign in to comment.