Conversation
@zettabot go |
1 similar comment
@zettabot go |
@zettabot go |
@zettabot go |
Nice news. 👍 In our pocket of the IRC-verse, we decided that the last sentence limits the use of this new wonderful feature to containing (or amending) those "Oh, crap!" moments when people mistake a Are there plans to enable the feature for other VDEV types, perhaps in a later PR and learning eventually some lessons from this one? |
I don't have any plans to do so, but I would welcome effort along those lines. |
range_tree_verify() was the only range tree support function which locked rt_lock whereas all the other functions required the lock to be taken by the caller. If the lock is taken in range_tree_verify(), it's not possible to atomically verify a set of related range trees (those which are likely protected by the same lock). In the previous implementation, checking "related" trees would be done as follows: range_tree_verify(tree1, offset, size); /* tree1's rt_lock is not taken here */ range_tree_verify(tree2, offset, size); The new implementation requires: mutex_enter(tree1->rt_lock); range_tree_verify(tree1, offset, size); range_tree_verify(tree2, offset, size); mutex_exit(tree1->rt_lock); Currently, the only consumer of range_tree_verify() is metaslab_check_free() which verifies a set of realted range trees in a metaslab. The TRIM/DISCARD code adds an additional set of checks of the current and previous trimsets, both of which are represented as range trees. metaslab_check_free() has been updated to lock ms_lock once for each vdev's metaslab and also for debugging builds to verify that the each tree's rt_lock matches the metaslab's ms_lock to prove they're related.
@zettabot go |
@zettabot go |
@zettabot go |
@zettabot go |
1188b7d
to
b103dc8
Compare
@ahrens when do you expect to merge this PR in master and release a new version with this great feature? |
Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> This project allows top-level vdevs to be removed from the storage pool with "zpool remove", reducing the total amount of storage in the pool. This operation copies all allocated regions of the device to be removed onto other devices, recording the mapping from old to new location. After the removal is complete, read and free operations to the removed (now "indirect") vdev must be remapped and performed at the new location on disk. The indirect mapping table is kept in memory whenever the pool is loaded, so there is minimal performance overhead when doing operations on the indirect vdev. The size of the in-memory mapping table will be reduced when its entries become "obsolete" because they are no longer used by any block pointers in the pool. An entry becomes obsolete when all the blocks that use it are freed. An entry can also become obsolete when all the snapshots that reference it are deleted, and the block pointers that reference it have been "remapped" in all filesystems/zvols (and clones). Whenever an indirect block is written, all the block pointers in it will be "remapped" to their new (concrete) locations if possible. This process can be accelerated by using the "zfs remap" command to proactively rewrite all indirect blocks that reference indirect (removed) vdevs. Note that when a device is removed, we do not verify the checksum of the data that is copied. This makes the process much faster, but if it were used on redundant vdevs (i.e. mirror or raidz vdevs), it would be possible to copy the wrong data, when we have the correct data on e.g. the other side of the mirror. Therefore, mirror and raidz devices can not be removed.
@galindro I've been working on getting this rebased onto master, which is now completed. I'm hoping to get it merged sometime in July. |
Nice. |
@gmelikov I was wondering if you'd be interested in porting this to Linux before we integrate it to illumos. It would be great to get additional feedback on this change, and we might attract more attention on ZoL. |
@ahrens I'm afraid I won't have much time to do it quickly now, but i've already began, sooner or later we'll port it. |
@ahrens unfortunately I did't have enough time this summer to port it due to unexpected load at work, I'm very sorry that I misled you. |
@gmelikov no worries. We're also behind on our work upstreaming it to illumos, but we hope to have the final version out by OpenZFS DevSummit (Oct 24). |
hey, when we can see it integrated? :) |
@ikozhukhov: @prashks is working on adding support for removing mirror devices. He's almost done and we hope to have the review updated by the OpenZFS DevSummit later this month. |
superseded by #482 |
Reviewed by: Alex Reece alex@delphix.com
Reviewed by: George Wilson george.wilson@delphix.com
Reviewed by: John Kennedy john.kennedy@delphix.com
Reviewed by: Prakash Surya prakash.surya@delphix.com
This project allows top-level vdevs to be removed from the storage pool
with “zpool remove”, reducing the total amount of storage in the pool.
This operation copies all allocated regions of the device to be removed
onto other devices, recording the mapping from old to new location.
After the removal is complete, read and free operations to the removed
(now “indirect”) vdev must be remapped and performed at the new location
on disk. The indirect mapping table is kept in memory whenever the pool
is loaded, so there is minimal performance overhead when doing
operations on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become “obsolete” because they are no longer used by any block pointers
in the pool. An entry becomes obsolete when all the blocks that use it
are freed. An entry can also become obsolete when all the snapshots
that reference it are deleted, and the block pointers that reference it
have been “remapped” in all filesystems/zvols (and clones). Whenever an
indirect block is written, all the block pointers in it will be
“remapped” to their new (concrete) locations if possible. This process
can be accelerated by using the “zfs remap” command to proactively
rewrite all indirect blocks that reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of the
data that is copied. This makes the process much faster, but if it were
used on redundant vdevs (i.e. mirror or raidz vdevs), it would be
possible to copy the wrong data, when we have the correct data on e.g.
the other side of the mirror. Therefore, mirror and raidz devices can
not be removed.