Skip to content
This repository has been archived by the owner. It is now read-only.

7614 zfs device evacuation/removal #482

Closed
wants to merge 5 commits into from
Closed

7614 zfs device evacuation/removal #482

wants to merge 5 commits into from

Conversation

@prashks
Copy link

prashks commented Oct 24, 2017

Reviewed by: Alex Reece alex@delphix.com
Reviewed by: George Wilson george.wilson@delphix.com
Reviewed by: John Kennedy john.kennedy@delphix.com
Reviewed by: Prakash Surya prakash.surya@delphix.com

This project allows top-level vdevs to be removed from the storage pool
with "zpool remove", reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now "indirect") vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing
operations on the indirect vdev.

The size of the in-memory mapping table will be reduced when its entries
become "obsolete" because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use it
are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been "remapped" in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be
"remapped" to their new (concrete) locations if possible. This process
can be accelerated by using the "zfs remap" command to proactively
rewrite all indirect blocks that reference indirect (removed) vdevs.

Note that when a device is removed, we do not verify the checksum of the
data that is copied. This makes the process much faster, but if it were
used on raidz vdevs, it would be possible to copy the wrong data,
when we have the correct data on e.g. the other side of the raidz.
Therefore, raidz devices can not be removed.

@mailinglists35

This comment has been minimized.

Copy link

mailinglists35 commented Oct 25, 2017

what is needed to have this running on ZoL, and is it feature complete, as in can we users safely start testing it?

@prashks

This comment has been minimized.

Copy link
Author

prashks commented Oct 27, 2017

@mailinglists35 Yes this is feature complete to start testing it as users. The removal/evacuation for striped vdevs has been in production for years now at Delphix and we have been testing the mirrored top-level vdev removal for months now.
For ZoL, these code changes need to be ported to ZoL to start with. Did you have a more specific question ?

@prakashsurya prakashsurya force-pushed the prashks:7614 branch from 3682216 to ab76136 Oct 28, 2017
@ahrens

This comment has been minimized.

Copy link
Member

ahrens commented Oct 30, 2017

The relevant test failures are:

Tests with results other than PASS that are unexpected:
    FAIL removal/removal_with_add (expected PASS)
    FAIL removal/remove_mirror (expected PASS)
@dweeezil

This comment has been minimized.

Copy link

dweeezil commented Nov 20, 2017

FYI, I'm working on porting this to ZoL. The basic porting is complete but there's quite a bit of interference due to the removal of range tree functions not managing caller's locks and also ZoL's extra range tree functions. Another point of conflict is openzfs/zfs@0ea05c6 [EDIT: which was added to illumos after this] and a few gratuitous changes that crept into ZoL with the encryption support. I'll mention @mailinglists35 since you asked about it.

This bit of porting ought to give me enough familiarity with the feature to hopefully offer a reasonable review.

@ahrens

This comment has been minimized.

Copy link
Member

ahrens commented Nov 20, 2017

@dweeezil That's great, thank you Tim! Let me know if there's anything I can offer insight/advice on for the port.

@ikozhukhov

This comment has been minimized.

Copy link

ikozhukhov commented Nov 27, 2017

hey, what is status of this PR?

Copy link

ikozhukhov left a comment

i have tested it by integration to dilos with some manual tests in different scenarios

@prashks

This comment has been minimized.

Copy link
Author

prashks commented Nov 27, 2017

Thanks for sharing your test results @ikozhukhov
I'm working on couple of minor changes (bug fix & addition of dry run option that shows memory usage for the operation) and sorting out the zfs tests, hope to post an update this week and move this forward.

@dweeezil

This comment has been minimized.

Copy link

dweeezil commented Nov 27, 2017

I've gotten the port to ZoL to the point that it works but am now going to have to plow through a bunch of testing issues which may have applicability in OpenZFS/illumos as well.

@ahrens The first issue I've run across is the addition of the addition of the vdev_is_concrete() check to vdev_writeable(). This has the possibly unintended side-effect of causing a zpool clear operation to generate a bunch of extra events, including kicking off a scrub because pseudo top-level vdev is not concrete and that causes vdev_clear(), when called from the clear ioctl to enter the "When reopening in response to clear event..." when it wouldn't have in the past. My question is whether this behavior is intentional or desired. It was detected by one of the test suite cases in ZoL which perform a bunch of "zpool clear" operations and then count the resulting events.

@dweeezil dweeezil mentioned this pull request Nov 27, 2017
4 of 13 tasks complete
static int
dva_mapping_overlap_compare(const void *v_key, const void *v_array_elem)
{
const uint64_t const *key = v_key;

This comment has been minimized.

Copy link
@dweeezil

dweeezil Nov 27, 2017

Duplicate const.

This comment has been minimized.

Copy link
@dweeezil

dweeezil Dec 3, 2017

I moved the second const after the "*".

This comment has been minimized.

Copy link
@prashks

prashks Dec 3, 2017

Author

yes, you mean the second const before the "*" ? I did the same as well.

This comment has been minimized.

Copy link
@dweeezil
{
const uint64_t const *key = v_key;
const vdev_indirect_mapping_entry_phys_t const *array_elem =
v_array_elem;

This comment has been minimized.

Copy link
@dweeezil

dweeezil Nov 27, 2017

Duplicate const.

This comment has been minimized.

Copy link
@dweeezil

dweeezil Dec 3, 2017

I moved the second const after the "*".

This comment has been minimized.

Copy link
@prashks

prashks Dec 3, 2017

Author

same here, thanks.

This comment has been minimized.

Copy link
@dweeezil

dweeezil Dec 4, 2017

Yes, here, too.

@ahrens

This comment has been minimized.

Copy link
Member

ahrens commented Nov 27, 2017

@dweeezil I think that was a mistake. It doesn't make sense to "clear" an indirect (non-concrete) vdev. We should probably just return before the "If we're in the FAULTED state ..." check, if it's an indirect vdev.

@dweeezil

This comment has been minimized.

Copy link

dweeezil commented Nov 28, 2017

@ahrens Like this?

diff --git a/module/zfs/vdev.c b/module/zfs/vdev.c
index 5185355..bfe7020 100644
--- a/module/zfs/vdev.c
+++ b/module/zfs/vdev.c
@@ -2921,6 +2921,12 @@ vdev_clear(spa_t *spa, vdev_t *vd)
                vdev_clear(spa, vd->vdev_child[c]);
 
        /*
+        * It makes no sense to "clear" an indirect vdev.
+        */
+       if (!vdev_is_concrete(vd))
+               return;
+
+       /*
         * If we're in the FAULTED state or have experienced failed I/O, then
         * clear the persistent state and attempt to reopen the device.  We
         * also mark the vdev config dirty, so that the new faulted state is

Also, it looks like we need something similar in vdev_free() to avoid panicking on ASSERT(vd->vdev_child == NULL);.

@ahrens

This comment has been minimized.

Copy link
Member

ahrens commented Nov 28, 2017

@dweeezil That diff makes sense to me. But I don't understand what you're saying about vdev_free(). Why is vdev_child not NULL at that point? I think that ASSERT is worthwhile since otherwise you'll leak the memory of vdev_child.

@dweeezil

This comment has been minimized.

Copy link

dweeezil commented Nov 28, 2017

@ahrens I looked into the vdev_free() situation a bit closer and, for some reason, I'm seeing a value of 0x10 in vdev_child for an "indirect" vdev. I need to investigate further as to how this is happening.

@dweeezil

This comment has been minimized.

Copy link

dweeezil commented Nov 29, 2017

@ahrens, I figured it out: In illumos, kmem_alloc(0, ...) returns NULL but in Linux, we get a magic pointer. I've done this little patch to vdev_compact_children() because apparently with the vdev removal code, it now depends on vdev_child being nulled out during compaction.

diff --git a/module/zfs/vdev.c b/module/zfs/vdev.c
index 5185355..75554c0 100644
--- a/module/zfs/vdev.c
+++ b/module/zfs/vdev.c
@@ -291,17 +291,24 @@ vdev_compact_children(vdev_t *pvd)
 
        ASSERT(spa_config_held(pvd->vdev_spa, SCL_ALL, RW_WRITER) == SCL_ALL);
 
+       if (oldc == 0)
+               return;
+
        for (int c = newc = 0; c < oldc; c++)
                if (pvd->vdev_child[c])
                        newc++;
 
-       newchild = kmem_zalloc(newc * sizeof (vdev_t *), KM_SLEEP);
+       if (newc > 0) {
+               newchild = kmem_zalloc(newc * sizeof (vdev_t *), KM_SLEEP);
 
-       for (int c = newc = 0; c < oldc; c++) {
-               if ((cvd = pvd->vdev_child[c]) != NULL) {
-                       newchild[newc] = cvd;
-                       cvd->vdev_id = newc++;
+               for (int c = newc = 0; c < oldc; c++) {
+                       if ((cvd = pvd->vdev_child[c]) != NULL) {
+                               newchild[newc] = cvd;
+                               cvd->vdev_id = newc++;
+                       }
                }
+       } else {
+               newchild = NULL;
        }
 
        kmem_free(pvd->vdev_child, oldc * sizeof (vdev_t *));

Note: ZoL has always used kmem_zalloc() rather than kmem_alloc() here for probably no particularly good reason.

@ahrens

This comment has been minimized.

Copy link
Member

ahrens commented Nov 29, 2017

@dweeezil great find. I think that on illumos we're also trying to avoid kmem_alloc(0) and kmem_free(0), so this will be a good change to make on illumos too.

@mailinglists35

This comment has been minimized.

Copy link

mailinglists35 commented Dec 1, 2017

if i understand correctly, this feature requires enabling a feature flag thus irreversibly changing the on-disk format. this means if on a non-patched version you do the mistake of adding the unwanted vdev, in order to use the feature you need to upgrade the pool.

can the feature be shipped also separate as a zdb feature for an exported pool, without needing to upgrade the pool, so a production non-patched pool can benefit the feature without having to touch the production zfs code and existing pool on-disk format?

@ahrens

This comment has been minimized.

Copy link
Member

ahrens commented Dec 1, 2017

@mailinglists35

if i understand correctly, this feature requires enabling a feature flag thus irreversibly changing the on-disk format

Yes, the on-disk format changes (and the feature property changes to active) when zpool remove is used to remove a top-level device (which is only allowed if the feature flag is enabled)

can the feature be shipped also separate as a zdb feature for an exported pool, without needing to upgrade the pool, so a production non-patched pool can benefit the feature without having to touch the production zfs code and existing pool on-disk format?

No, software that doesn't understand device removal has no hope of ever reading a pool with removed devices. This applies regardless of whether you were to use zdb, zhack, or the kernel to change the on-disk format.

/* Fail if its raidz */
if (vd->vdev_ops == &vdev_raidz_ops) {
return (spa_vdev_exit(spa, vd, txg, EINVAL));
}

This comment has been minimized.

Copy link
@dweeezil

dweeezil Dec 3, 2017

This (the "vdev_ops" test) doesn't work and causes the removal_with_add test to fail its "add a raidz" test because when adding a raidz, "vdev->vdev_ops" is a "root" vdev and its "vdev_child[0]" is the raidz vdev type.

This comment has been minimized.

Copy link
@prashks

prashks Dec 3, 2017

Author

Yes, i have fixed this already and it passes the zfs test, below is the change set that addresses this -

diff --git a/usr/src/uts/common/fs/zfs/spa.c b/usr/src/uts/common/fs/zfs/spa.c
index 3c0ef4c..6b35858 100644
--- a/usr/src/uts/common/fs/zfs/spa.c
+++ b/usr/src/uts/common/fs/zfs/spa.c
@@ -5527,29 +5527,31 @@ spa_vdev_add(spa_t *spa, nvlist_t *nvroot)
         * If we are in the middle of a device removal, we can only add
         * devices which match the existing devices in the pool.
         * If we are in the middle of a removal, or have some indirect
-        * vdevs, we can not add redundant toplevels.  This ensures that
+        * vdevs, we can not add raidz toplevels.  This ensures that
         * we do not rely on resilver, which does not properly handle
         * indirect vdevs.
         */
        if (spa->spa_vdev_removal != NULL ||
            spa->spa_removing_phys.sr_prev_indirect_vdev != -1) {
                for (int c = 0; c < vd->vdev_children; c++) {
+                       tvd = vd->vdev_child[c];
                        if (spa->spa_vdev_removal != NULL &&
-                           vd->vdev_child[c]->vdev_ashift !=
+                           tvd->vdev_ashift !=
                            spa->spa_vdev_removal->svr_vdev->vdev_ashift) {
                                return (spa_vdev_exit(spa, vd, txg, EINVAL));
                        }
-                       /* Fail if its raidz */
-                       if (vd->vdev_ops == &vdev_raidz_ops) {
+                       /* Fail if top level vdev is raidz */
+                       if (tvd->vdev_ops == &vdev_raidz_ops) {
                                return (spa_vdev_exit(spa, vd, txg, EINVAL));
                        }
                        /*
-                        * Need the mirror to be mirror of leaf vdevs only
+                        * Need the top level mirror to be
+                        * a mirror of leaf vdevs only
                         */
-                       if (vd->vdev_ops == &vdev_mirror_ops) {
+                       if (tvd->vdev_ops == &vdev_mirror_ops) {
                                for (uint64_t cid = 0;
-                                    cid < vd->vdev_children; cid++) {
-                                       vdev_t *cvd = vd->vdev_child[cid];
+                                    cid < tvd->vdev_children; cid++) {
+                                       vdev_t *cvd = tvd->vdev_child[cid];
                                        if (!cvd->vdev_ops->vdev_op_leaf) {
                                                return (spa_vdev_exit(spa, vd,
                                                        txg, EINVAL));

This comment has been minimized.

Copy link
@dweeezil

dweeezil Dec 4, 2017

Looks good and does properly fix the test case.

The following command removes the mirrored log device
.Sy mirror-2 .
.It Sy Example 14 No Removing a Mirrored top-level (Log or Data) Device
The following commands removes the mirrored log device

This comment has been minimized.

Copy link
@rlaager

rlaager Dec 3, 2017

Small grammar error here. This should be "commands remove".

dweeezil added a commit to dweeezil/zfs that referenced this pull request Dec 4, 2017
@prashks prashks force-pushed the prashks:7614 branch from ab76136 to f684987 Dec 5, 2017
@prashks

This comment has been minimized.

Copy link
Author

prashks commented Dec 5, 2017

Thanks for the reviews and testing - @dweeezil, @ikozhukhov, @rlaager

See the updated changes at:
f684987

Added changes for - /sbin/zpool should be able to estimate memory used by device removal
Bug fixes and code review comments
Added new zfs test - remove_mirror_sanity

@mailinglists35

This comment has been minimized.

Copy link

mailinglists35 commented Dec 5, 2017

can the feature be shipped also separate as a zdb feature for an exported pool, without needing to upgrade the pool, so a production non-patched pool can benefit the feature without having to touch the production zfs code and existing pool on-disk format?

No, software that doesn't understand device removal has no hope of ever reading a pool with removed devices. This applies regardless of whether you were to use zdb, zhack, or the kernel to change the on-disk format.

whan I meant in the question is: is it trivial to modify zdb logic based on this PR to perform offline device evacuation without changing the on-disk format?

@ahrens

This comment has been minimized.

Copy link
Member

ahrens commented Dec 5, 2017

is it trivial to modify zdb logic based on this PR to perform offline device evacuation without changing the on-disk format?

No.

dweeezil added a commit to dweeezil/zfs that referenced this pull request Dec 6, 2017
Copy link

ikozhukhov left a comment

i have tested update - no issues found

@ahrens

This comment has been minimized.

Copy link
Member

ahrens commented Dec 20, 2017

@ofcaah Typically, after a ZFS change is integrated to OpenZFS/illumos, someone from the ZFSonLinux community will port it to Linux, and someone from the FreeBSD community will port it to FreeBSD. @dweeezil has started this work for Linux: openzfs/zfs#6900

Modified remove_mirror zfs-test and other code review comments
@ahrens

This comment has been minimized.

Copy link
Member

ahrens commented Dec 21, 2017

I believe these changes are final (pending the last test run), and ready to be integrated to illumos. @rlaager and @dweeezil, can we count you as code reviewers?

@ahrens
ahrens approved these changes Dec 21, 2017
@rlaager

This comment has been minimized.

Copy link

rlaager commented Dec 21, 2017

@ahrens If we're using the Linux kernel contributor sign-off definitions, I'm okay with Acked-by, but I think Reviewed-by would overstate the amount of review I have given it (or even could give it, given my lack of kernel and ZFS-internals knowledge). However, if it is normal practice in OpenZFS to convert "looks good to me" to "Reviewed by", then I'm absolutely okay with that.

@dweeezil

This comment has been minimized.

Copy link

dweeezil commented Dec 27, 2017

@ahrens Yes, I'm on way way home from a vacation now and will give it a final review within the next day once I catch up the ZoL PR.

@ahrens

This comment has been minimized.

Copy link
Member

ahrens commented Jan 2, 2018

@dweeezil did you have a chance to look at the final version?

@ahrens ahrens force-pushed the prashks:7614 branch from fc9c74e to c9d83b1 Jan 2, 2018
@dweeezil

This comment has been minimized.

Copy link

dweeezil commented Jan 3, 2018

@ahrens I was out of town for a week and am getting caught up now. I'm working on reconciling the changes to my ZoL PR and should be able to give this a final review today.

@dweeezil

This comment has been minimized.

Copy link

dweeezil commented Jan 3, 2018

This is looking pretty good. I'm going to submit it to the ZoL buildbot now for additional verification.

Copy link

dweeezil left a comment

I'm still working through this on ZoL.

}

if (!locked)
return (spa_vdev_exit(spa, NULL, txg, error));

This comment has been minimized.

Copy link
@dweeezil

dweeezil Jan 4, 2018

This early return can cause the events created above to not be posted. It should probably be changed back to "error = spa_vdev_exit(...".

This comment has been minimized.

Copy link
@prashks

prashks Jan 5, 2018

Author

Thanks, i've fixed this.

@dweeezil

This comment has been minimized.

Copy link

dweeezil commented Jan 4, 2018

I've also added the patch to vdev-clear() mentioned in #482 (comment) to ZoL.

Copy link

dweeezil left a comment

Changed event when removing a log device.

*txg = spa_vdev_config_enter(spa);

sysevent_t *ev = spa_event_create(spa, vd, NULL,
ESC_ZFS_VDEV_REMOVE_AUX);

This comment has been minimized.

Copy link
@dweeezil

dweeezil Jan 4, 2018

Previously, removing a log device would post a VDEV_REMOVE_DEV event but now it posts a VDEV_REMOVE_AUX. Is this intentional? If so, event consumers may need to be updated appropriately.

This comment has been minimized.

Copy link
@prashks

prashks Jan 5, 2018

Author

Good catch, that should be VDEV_REMOVE_DEV and not AUX.

Copy link

dweeezil left a comment

Renamed tunable possibly unrelated to device evacutation?

int zfs_resilver_min_time_ms = 3000; /* min millisecs to resilver per txg */
boolean_t zfs_no_scrub_io = B_FALSE; /* set to disable scrub i/o */
boolean_t zfs_no_scrub_prefetch = B_FALSE; /* set to disable scrub prefetch */
enum ddt_class zfs_scrub_ddt_class_max = DDT_CLASS_DUPLICATE;
int dsl_scan_delay_completion = B_FALSE; /* set to delay scan completion */
/* max number of blocks to free in a single TXG */
uint64_t zfs_free_max_blocks = UINT64_MAX;
uint64_t zfs_async_block_max_blocks = UINT64_MAX;

This comment has been minimized.

Copy link
@dweeezil

dweeezil Jan 5, 2018

Does this name change really belong in the device evacuation patch?

This comment has been minimized.

Copy link
@ahrens

ahrens Jan 5, 2018

Member

It is relevant because we've changed the meaning to cover not only async frees but also obsoleting of the indirect mapping - see dsl_scan_obsolete_block_cb(). The process is similar to background deletion of snapshots. Now, the name of the new variable might leave something to be desired...

@prashks prashks force-pushed the prashks:7614 branch from c9d83b1 to 9b12b22 Jan 8, 2018
@prashks

This comment has been minimized.

Copy link
Author

prashks commented Jan 8, 2018

Updated my branch to address code review comments and some cstyle/copyright fixes.
Relevant diffs at:
prashks@8b37b1d

@Skaronator

This comment has been minimized.

Copy link

Skaronator commented on usr/src/cmd/zpool/zpool_main.c in 8b37b1d Jan 8, 2018

Shouldn't this be 2011, 2016, 2018 or 2011, 2016-2018?

@ahrens

This comment has been minimized.

Copy link
Member

ahrens commented Jan 9, 2018

@Skaronator The policy from the Delphix legal team is that this should be <year of first modification>, <year of last modification>. I don't know the reasoning behind that, but it seems several other companies do it this way too, and it isn't too hard to maintain.

Each copyright holder is free to choose the format of their copyright messages, so others can do it as you've described, if they wish.

@Skaronator

This comment has been minimized.

Copy link

Skaronator commented Jan 9, 2018

Ah Gotcha!

@ivucica

This comment has been minimized.

Copy link

ivucica commented Jan 27, 2018

@prakashsurya To explicitly confirm: Is f539f1e closing #482 or #251?

Descriptions differ slightly with regards to support for removal of mirror devices...

Commit:

Note that when a device is removed, we do not verify the checksum of the
data that is copied. This makes the process much faster, but if it were
used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror. Therefore, mirror and raidz devices can
not be removed.

#482:

Note that when a device is removed, we do not verify the checksum of the
data that is copied. This makes the process much faster, but if it were
used on raidz vdevs, it would be possible to copy the wrong data,
when we have the correct data on e.g. the other side of the raidz.
Therefore, raidz devices can not be removed.

#251:

Note that when a device is removed, we do not verify the checksum of the
data that is copied. This makes the process much faster, but if it were
used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror. Therefore, mirror and raidz devices can
not be removed.

Thanks :-)

@prashks

This comment has been minimized.

Copy link
Author

prashks commented Jan 30, 2018

@ivucica - the final commit message for f539f1e ended up wrong, so this commit closes #482 and does add support for removal of top-level mirror vdevs as described in #482 commit message.
Sorry for the confusion.

@ivucica

This comment has been minimized.

Copy link

ivucica commented Jan 30, 2018

Thank you - much appreciated!

@jdrch

This comment has been minimized.

Copy link

jdrch commented Jun 11, 2019

I'm looking into using ZFS and am a bit confused by the wording of the pull request. Specifically, this part:

raidz devices cannot be removed.

This seems to contradict the 1st sentence:

allows top-level vdevs to be removed from the storage pool with "zpool remove", reducing the total amount of storage in the pool.

Unless the differentiating factor between the 2 statements is the use of "device" vs. "vdev." Assuming this is the case, let's say I have a pool, Pool A:

Pool A:
vdev1 = array of diska and diskb
vdev2 = array of diskc and diskd
vdev3 = array of diske and diskf
vdev4 = diskg

Per how I read the above, I'd be able to remove any of the vdevs and have Pool A's size be reduced accordingly, but I would be unable to remove any of the individual disks from the pool except vdev4. Is that correct?

Following from the above, if I wanted to remove an individual disk from vdev1, vdev2, or vdev3, I'd have to remove the vdev containing it and then remove it from that vdev, but doing so would "destroy" that vdev as far as ZFS is concerned. Is that correct?

Just want to ensure I'm understanding this.

@ahrens

This comment has been minimized.

Copy link
Member

ahrens commented Jun 11, 2019

@jdrch That sentence should read, "allows some top-level vdevs to be removed"

In your example:

  • I assume that by "array" you mean "mirror" (as in, zpool create poolname mirror diska diskb mirror diskc diskd ...)
  • It would be unusual to have both mirrors and plain (non-redundant) vdevs in the same pool (vdev4). What is the intended use case?
  • This feature allows you to remove any of the vdevs, reducing the pool capacity.
  • You can remove one side of a mirror, reducing the redundancy, with zpool detach (a feature which has been part of ZFS since the beginning). So "to remove an individual disk from vdev1, vdev2, or vdev3", you would use zpool detach.
@jdrch

This comment has been minimized.

Copy link

jdrch commented Jun 11, 2019

I assume that by "array" you mean "mirror" (as in, zpool create poolname mirror diska diskb mirror diskc diskd ...)

I meant mirror or RAID, since I thought both were subsets/special cases of arrays. Perhaps I was wrong; if I was, apologies.

It would be unusual to have both mirrors and plain (non-redundant) vdevs in the same pool (vdev4). What is the intended use case?

Ideally I'd like to maximize pool flexibility in terms of being able to move drives in and out of it, which is why my pool example had different vdev types.

EDIT: After realizing that:

  1. zfs stripes data across top-level vdevs
  2. All top-level vdevs needs to be of the same replication level e.g. n-way mirror or [RAIDZ_n_ + number of disks + capacity of each disk in each of the RAID_n_s]

I finally realize this isn't a good idea.

This feature allows you to remove any of the vdevs, reducing the pool capacity.

"Any" as in RAIDZ vdevs, mirrored vdevs, and standalone disk vdevs, or "any" as in any of the vdevs in the example I gave?

EDIT: This Reddit reply thread combined with the top-level vdev removal OpenZFS roadmap clears it up. Condition 2 above needs to be satisfied for zpool remove to work. From the 2nd link:

Removal of top-level RAIDZ vdevs technically possible, but ONLY for pools of identical raidz vdevs - ie 4 6-disk RAIDZ2 vdevs, etc. You will not be able to remove a raidz vdev from a "mutt" pool.


You can remove one side of a mirror, reducing the redundancy, with zpool detach (a feature which has been part of ZFS since the beginning). So "to remove an individual disk from vdev1, vdev2, or vdev3", you would use zpool detach.

Thanks for the information!

EDIT 2: Most comprehensive explanation so far.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

10 participants
You can’t perform that action at this time.