Skip to content

Commit

Permalink
Disable LBA weighting on files and SSDs
Browse files Browse the repository at this point in the history
The LBA weighting makes sense on rotational media where the outer tracks
have twice the bandwidth of the inner tracks. However, it is detrimental
on nonrotational media such as solid state disks, where the only effect
is to ensure that metaslabs enter the best-fit allocation behavior
sooner, which is detrimental to performance. It also makes no sense on
files where the underlying filesystem can arrange things however it
wants.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3712
  • Loading branch information
ryao authored and behlendorf committed Sep 1, 2015
1 parent cafbd2a commit fb40095
Show file tree
Hide file tree
Showing 5 changed files with 17 additions and 2 deletions.
1 change: 1 addition & 0 deletions include/sys/vdev_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,7 @@ struct vdev {
vdev_stat_t vdev_stat; /* virtual device statistics */
boolean_t vdev_expanding; /* expand the vdev? */
boolean_t vdev_reopening; /* reopen in progress? */
boolean_t vdev_nonrot; /* true if solid state */
int vdev_open_error; /* error on last open */
kthread_t *vdev_open_thread; /* thread opening children */
uint64_t vdev_crtxg; /* txg when top-level was added */
Expand Down
2 changes: 1 addition & 1 deletion module/zfs/metaslab.c
Original file line number Diff line number Diff line change
Expand Up @@ -1518,7 +1518,7 @@ metaslab_weight(metaslab_t *msp)
* In effect, this means that we'll select the metaslab with the most
* free bandwidth rather than simply the one with the most free space.
*/
if (metaslab_lba_weighting_enabled) {
if (!vd->vdev_nonrot && metaslab_lba_weighting_enabled) {
weight = 2 * weight - (msp->ms_id * weight) / vd->vdev_ms_count;
ASSERT(weight >= space && weight <= 2 * space);
}
Expand Down
10 changes: 9 additions & 1 deletion module/zfs/vdev.c
Original file line number Diff line number Diff line change
Expand Up @@ -1108,6 +1108,7 @@ vdev_open_child(void *arg)
vd->vdev_open_thread = curthread;
vd->vdev_open_error = vdev_open(vd);
vd->vdev_open_thread = NULL;
vd->vdev_parent->vdev_nonrot &= vd->vdev_nonrot;
}

static boolean_t
Expand All @@ -1134,15 +1135,19 @@ vdev_open_children(vdev_t *vd)
int children = vd->vdev_children;
int c;

vd->vdev_nonrot = B_TRUE;

/*
* in order to handle pools on top of zvols, do the opens
* in a single thread so that the same thread holds the
* spa_namespace_lock
*/
if (vdev_uses_zvols(vd)) {
for (c = 0; c < children; c++)
for (c = 0; c < children; c++) {
vd->vdev_child[c]->vdev_open_error =
vdev_open(vd->vdev_child[c]);
vd->vdev_nonrot &= vd->vdev_child[c]->vdev_nonrot;
}
return;
}
tq = taskq_create("vdev_open", children, minclsyspri,
Expand All @@ -1153,6 +1158,9 @@ vdev_open_children(vdev_t *vd)
TQ_SLEEP) != 0);

taskq_destroy(tq);

for (c = 0; c < children; c++)
vd->vdev_nonrot &= vd->vdev_child[c]->vdev_nonrot;
}

/*
Expand Down
3 changes: 3 additions & 0 deletions module/zfs/vdev_disk.c
Original file line number Diff line number Diff line change
Expand Up @@ -301,6 +301,9 @@ vdev_disk_open(vdev_t *v, uint64_t *psize, uint64_t *max_psize,
/* Clear the nowritecache bit, causes vdev_reopen() to try again. */
v->vdev_nowritecache = B_FALSE;

/* Inform the ZIO pipeline that we are non-rotational */
v->vdev_nonrot = blk_queue_nonrot(bdev_get_queue(vd->vd_bdev));

/* Physical volume size in bytes */
*psize = bdev_capacity(vd->vd_bdev);

Expand Down
3 changes: 3 additions & 0 deletions module/zfs/vdev_file.c
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,9 @@ vdev_file_open(vdev_t *vd, uint64_t *psize, uint64_t *max_psize,
vattr_t vattr;
int error;

/* Rotational optimizations only make sense on block devices */
vd->vdev_nonrot = B_TRUE;

/*
* We must have a pathname, and it must be absolute.
*/
Expand Down

0 comments on commit fb40095

Please sign in to comment.