Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defer new resilvers until the current one ends #7732

Merged
merged 1 commit into from Oct 19, 2018

Conversation

tcaputi
Copy link
Contributor

@tcaputi tcaputi commented Jul 20, 2018

Currently, if a resilver is triggered for any reason while an
existing one is running, zfs will immediately restart the existing
resilver from the beginning to include the new drive. This causes
problems for system administrators when a drive fails while another
is already resilvering. In this case, the optimal thing to do to
reduce risk of data loss is to wait for the current resilver to end
before immediately replacing the second failed drive, which allows
the system to operate with two incomplete drives for the minimum
amount of time.

This patch introduces the resilver_defer feature that essentially
does this for the admin without forcing them to wait and monitor
the resilver manually. The change requires an on-disk feature
since we must mark drives that are part of a defered resilver in
the vdev config to ensure that we do not assume they are done
resilvering when an existing resilver completes.

Signed-off-by: Tom Caputi tcaputi@datto.com

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the contributing document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commit messages are properly formatted and contain Signed-off-by.
  • Change has been approved by a ZFS on Linux member.

@richardelling
Copy link
Contributor

IMHO, this feature should check the progress of the current scan:

  • if scan > 50% complete, then check option to defer
  • if scan < 50% complete, don't defer

because, if you're almost finished with a scan, defer makes good sense. OTOH, if you're just starting a scan, it will be faster to restart.

@@ -270,6 +270,7 @@ struct spa {
uint64_t spa_scan_pass_scrub_spent_paused; /* total paused */
uint64_t spa_scan_pass_exam; /* examined bytes per pass */
uint64_t spa_scan_pass_issued; /* issued bytes per pass */
boolean_t spa_resilver_defered; /* resilver has been defered */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "defered" should be "deferred"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This idea has merit.

But, the other side is if you are re-silvering a 12TB disk that is 9TBs full, you may not want to re-start at any point. Except perhaps very early on. This only gets worse as larger drives are released.

Two ways to handle this:

  1. Time based. If the existing is going to finish soonish, (a debatable time...), then continue.
  2. Size / percent based.

Another way to handle this is for the feature flag to have a companion variable that specifies percent complete for restarting. Perhaps defaulting to 0 / zero for deferring all re-silvers. And 100 for allowing a second to cause a restart at any point.

Copy link
Member

@ahrens ahrens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we ensure that we don't issue unnecessary resilver zio's, on vdevs that have DTL's but that are not part of the current resilver (i.e. they've been deferred)? I was expecting to see some change to dsl_scan_need_resilver(), its callees, or the DTL code.

module/zfs/vdev_label.c Show resolved Hide resolved
@@ -270,6 +270,7 @@ struct spa {
uint64_t spa_scan_pass_scrub_spent_paused; /* total paused */
uint64_t spa_scan_pass_exam; /* examined bytes per pass */
uint64_t spa_scan_pass_issued; /* issued bytes per pass */
boolean_t spa_resilver_defered; /* resilver has been defered */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on the comment, e.g. we are in the middle of a resilver, and another resilver is needed once this one completes. This is set iff any vdev_resilver_deferred is set.

That said, I'm not entirely sure that spa_resilver_deferred is needed. Could we check if the feature is active instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually spent a good bit of time thinking about this. The problem is that we don't always have a tx handy when we want to defer a resilver. This is the reason we increment the feature count in dsl_scan_sync().

vdev_resilver_needed(spa->spa_root_vdev, NULL, NULL)) {
zfs_dbgmsg("starting defered resilver txg=%llu",
(longlong_t)tx->tx_txg);
spa_async_request(spa, SPA_ASYNC_RESILVER);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We mark these vdevs no longer "resilver deferred" in this txg (in the config), but the resilver is started asynchronously. So I think we could "lose" the resilver if we crash before it starts. Is that OK because we'll re-check for resilver when we open the pool again? Assuming that's the case, let's leave a comment here explaining it.

Maybe another way to make this more clear would be to have dsl_scan_reset_defered() not return anything, and only check vdev_resilver_needed() here. I think that would provide the same behavior. But that way it doesn't look like we're depending on resilver_deferred to tell us to do a resilver. I think that the resilver_deferred flag might be more accurately described as "vdev is not part of current resilver". Because all it needs to do is tell us to ignore this vdev's DTL for resilver purposes (not issue resilver i/os to it, and not clear the DTL when the resilver completes). (Incidentally, this is why a "participates in resilver" flag would be more natural, but I understand that we don't want to activate a new on-disk feature during all resilvers.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Matt's comments. How about changing the name of dsl_scan_reset_deferred() to dsl_scan_clear_deferred() and change the variables resilver_available and should_resilver to resilver_needed.
I would also change your comment slightly:
/*
* Clear any deferred_resilver flags in the config. If there are drives that
* need resilvering, kick off an asynchronous request to start resilver.
* ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

module/zfs/dsl_scan.c Show resolved Hide resolved
/*
* Check for scn_restart_txg before checking spa_load_state, so
* that we can restart an old-style scan while the pool is being
* imported (see dsl_scan_init).
*/
if (dsl_scan_restarting(scn, tx)) {
if (!spa->spa_resilver_defered && dsl_scan_restarting(scn, tx)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it could be better to never set scn_restart_txg, rather than setting it and then ignoring it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already don't set it. I originally added it to deal with the old-style resilver code that also wanted to set th restart txg. However, this code should never be run when features are active anyway, so it should be safe to put this line back to the way it was.

@ahrens ahrens requested a review from jgallag88 July 26, 2018 23:01
@scineram
Copy link

With RAIDZ2 or triple mirrors what if 1 drive is resilvering, but 2 drives fail in another vdev simultaneously. Do you still finish the first resilver without redundancy?

@tcaputi
Copy link
Contributor Author

tcaputi commented Jul 27, 2018

Sorry. everybody. I've been on vacation for the last week. I'll be able to address all comments on Monday. I am definitely aware that we need some kind parameter that will determine at what point we should still allow existing resilvers to restart (so that you can resilver 2 driver at once for instance).

@ahrens the biggest issue I have with this how I wrote this PR (as you mentioned in one of your comments) is that I dont like the fact that I don't have a good way to atomically dirty the vdev config AND activate the feature refcount. Right now I'm getting around this by having the sync thread check if it needs to activate the feature every time, but that doesnt seem like the best way to do this. Do you know of something better I can do here? You are also correct that I omitted a check that would prevent unnecessary IOs from being issued to a defered resilver vdev and I will correct that on Monday.

I am adding a WIP tag to this PR (which definitely should have been there when I posted it).

@tcaputi tcaputi added the Status: Work in Progress Not yet ready for general review label Jul 27, 2018
@ahrens
Copy link
Member

ahrens commented Jul 27, 2018

@scineram Yes, in fact that is a primary use case for this. You really want that first failed drive to resilver ASAP to restore some of the redundancy. With the current logic, and a RAIDZ3 vdev, imagine the following scenerio:

one disk dies
it takes 10 days to get 90% done resilvering it
another disk dies - now we have 2 bad drives
we restart the resilver, ignoring the progress already made
it takes another 10 days to get 90% done
then a third disk dies - now we have 3 bad drives
again we restart the resilver
Now if we lose a drive during this resilver, there will be data loss.
If 4 drives fail in (3 * resilver time), we can lose data

With the new logic, the same pattern of failures would cause:
one disk dies
it takes 10 days to get 90% done resilvering it
another disk dies - now we have 2 bad drives
we ignore that failure for now
it takes 1 more day to complete resilvering the first drive
the vdev now has 1 failed drive (the new one)
we start a new resilver for the 2nd failed drive
this pattern can repeat indefinitely, since we always have at most 2 failed drives.
If 4 drives fail in (1 * resilver time), we can lose data

So with RAIDZ3, this change makes us 1/3 as vulnerable to data loss.

@ahrens
Copy link
Member

ahrens commented Jul 27, 2018

@tcaputi

we need some kind parameter that will determine at what point we should still allow existing resilvers to restart (so that you can resilver 2 driver at once for instance)

With the current logic, you can resilver 2 drives at once, if they both fail while a resilver is in progress. (The "deferred" resilver will resilver all drives that died during the previous resilver.) I'm not sure we would want a parameter to allow resilvers to restart, but if we do, it should be set very low (e.g. only if we are <10% complete and another drive fails then it's OK to lose the progress and restart the resilver).

@tcaputi
Copy link
Contributor Author

tcaputi commented Jul 27, 2018

@ahrens Yeah. My concern was for the use case where 2 drives fail, the sysadmin replaces one and it starts resilvering and then immediately replaces the other and it gets defered because the initial resilver is still going.

@ahrens
Copy link
Member

ahrens commented Jul 27, 2018

@tcaputi That makes sense.

@richardelling
Copy link
Contributor

it gets more complicated when there is more than one top-level vdev, in which case the policy might change depending on the redundancy risk... somewhere around here I have models for this stuff...

@scineram
Copy link

@ahrens, my question was more about the multiple vdev scenario. I have
mirror/raidz2-1: A (resilvering, 50%), B, C
mirror/raidz2-2: D, E, F.
If D and E fail, do I care that vdev-2 is on the edge and restart, or do I wait for vdev-1 to finish?

@tcaputi
Copy link
Contributor Author

tcaputi commented Jul 30, 2018

The patch has been updated to address comments. Still don't have a mechanism for preventing deferrals when they are not desired yet.

@tcaputi tcaputi force-pushed the resilver_restart branch 2 times, most recently from c350841 to e6dd10a Compare August 28, 2018 18:40
@tcaputi tcaputi force-pushed the resilver_restart branch 2 times, most recently from 63bc054 to 4f96adf Compare August 31, 2018 00:41
@ahrens
Copy link
Member

ahrens commented Sep 17, 2018

@scineram That is an interesting scenario. Let's compare the time until we are "safe" and can tolerate another (third) failure on vdev 2:

Current code:
When A failed, we start a resilver
When D failed, we restart the resilver from the beginning - now fixing A and D
When E failed, we restart the resilver from the beginning - now fixing A, D, and E
Now wait for that resilver to complete before we can tolerate another failure. So we are at risk until (E failure time + 1 whole resilver).

Proposed change, timing 1:
When A failed, we start a resilver
When D failed, we do nothing, remembering to resilver D when the current resilver completes
When E failed, we do nothing, remembering to also resilver E when the current resilver completes
When A's resilver completes, we start resilvering D and E.
Now wait for that resilver to complete before we can tolerate another failure. So we are at risk until (A failure time + 2 whole resilvers), which is the same or later than (E failure time + 1 whole resilver).

Proposed change, timing 2:
When A failed, we start a resilver
When D failed, we do nothing, remembering to resilver D when the current resilver completes
A's resilver completes. We start resilvering D
When E failed, we do nothing, remembering to resilver E when the current resilver completes
When D's resilver completes, we start resilvering E.
At this point we can tolerate another failure. So we are at risk until (D failure time + 1 whole resilver), which is less time than the current code.

So you're right to be concerned that in some cases, this change could extend the risk window. But in other cases it could reduce the risk window. I think it's worth considering restarting the resilver if the current resilver is not restoring the last redundancy of any vdev (e.g. we're fixing 1 drive in a RAIDZ2), and restarting the resilver would restore the last redundancy of some vdev (e.g. fix 2 drives in a RAIDZ2). But I think we can leave that work for a future PR.

Copy link
Member

@ahrens ahrens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

I think you'll also need to update the manpage sections for "zpool attach" and "zpool scrub".

@@ -2966,6 +3026,26 @@ dsl_scan_active(dsl_scan_t *scn)
return (used != 0);
}

static boolean_t
dsl_scan_check_deferred_vdevs(vdev_t *vd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems similar to dsl_scan_reset_deferred(), so a similar name would be dsl_scan_check_deferred() (or add _vdevs to both?)

You might also put the two functions next to each other in the file.

@@ -749,6 +749,11 @@ vdev_alloc(spa_t *spa, vdev_t **vdp, nvlist_t *nv, vdev_t *parent, uint_t id,
(void) nvlist_lookup_uint64(nv, ZPOOL_CONFIG_RESILVER_TXG,
&vd->vdev_resilver_txg);

vd->vdev_resilver_deferred = (nvlist_lookup_boolean(nv,
ZPOOL_CONFIG_RESILVER_DEFER) == 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, another way to do this would be ... = nvlist_exists(...);. Not sure if that's any more or less clean.

module/zfs/vdev_label.c Show resolved Hide resolved
module/zfs/spa.c Outdated
if (dsl_scan_resilvering(spa_get_dsl(spa)) &&
spa_feature_is_enabled(spa, SPA_FEATURE_RESILVER_DEFER)) {
newvd->vdev_resilver_deferred = B_TRUE;
spa->spa_resilver_deferred = B_TRUE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep this code next to the logical else dsl_resilver_restart()?

module/zfs/vdev.c Show resolved Hide resolved
@behlendorf behlendorf added Status: Revision Needed Changes are required for the PR to be accepted and removed Status: Work in Progress Not yet ready for general review labels Sep 25, 2018
@tcaputi tcaputi added Status: Code Review Needed Ready for review and testing and removed Status: Revision Needed Changes are required for the PR to be accepted labels Sep 26, 2018
@tcaputi
Copy link
Contributor Author

tcaputi commented Sep 26, 2018

Apologies for letting this sit for a week. PR is rebased on master and @ahrens comments have been addressed.

@@ -988,6 +989,7 @@ typedef struct vdev_stat {
uint64_t vs_scan_processed; /* scan processed bytes */
uint64_t vs_fragmentation; /* device fragmentation */
uint64_t vs_checkpoint_space; /* checkpoint-consumed space */
uint64_t vs_resilver_deferred; /* resilver deferred? */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why the the ?, this shouldn't be ambiguous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the convention of vs_scan_removing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. It's a little thing but would you mind removing it. My concern is that the question mark implies, to me at least, that vs_resilver_deferred can't be entirely trusted. Being set may or may not imply a resilver is currently deferred.

zfeature_register(SPA_FEATURE_RESILVER_DEFER,
"com.datto:resilver_defer", "resilver_defer",
"Support for defering new resilvers when one is already running.",
ZFEATURE_FLAG_READONLY_COMPAT, NULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you disable this? All new pools enable all features by default, and once enabled this feature cannot be disabled. I think this functionality needs an additional pool property to set the policy, how about resilver=immediate|defer|never.

immediate - the current behavior
defer - the new defered behavior
never - do not automatically start resilvers (a reasonable option now that there's a sub-command)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean that it can't be disabled once it is enabled? Cant you just disable it like any other feature?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind the distinction between disabled (not in use, not allowed to use), enabled (but inactive - not in use, but allowed to use), and active (in use, downgrade to older software not supported). There is no feature disabling in ZFS. However features can go back and forth between enabled and active.

All that said, I disagree with @behlendorf's suggestion of making this a pool property, because I think that the new behavior is appropriate in nearly all cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no feature disabling in ZFS

This is specifically my concern. I agree the new behavior may be appropriate for the majority of cases, but there are operational reasons for preferring the existing behavior. For example, any existing infrastructure built on ZFS may depend on it.

This change provides no mechanism to opt-out except at pool creation time. By the time you discover this is an issue, it could be too late and it's non-reversible. That's not a good user experience.

If it were possible to run zfs set feature@resilver_defer=disabled <pool> when the feature is enabled (but not active) that would be fine. Adding a module option would also be reasonable. As long as there's a mechanism if you absolutely need it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@behlendorf I think you can accomplish what you want by using zpool create -o feature@resilver_defer=disabled `

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @behlendorf was saying there is no way to disable a feature after the feature is enabled.

cmd/zpool/zpool_main.c Show resolved Hide resolved
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the new test case might need a little longer to run:

http://build.zfsonlinux.org/builders/Ubuntu%2017.04%20x86_64%20Coverage%20%28TEST%29/builds/3710

@tcaputi
Copy link
Contributor Author

tcaputi commented Oct 3, 2018

@behlendorf Timeout has been doubled and I also added a line to reset zfs_scan_vdev_limit, which should get things to move along a bit faster.

dsl_scan_reset_deferred(spa->spa_root_vdev, tx);
if (should_resilver) {
zfs_dbgmsg("starting deferred resilver txg=%llu",
(longlong_t)tx->tx_txg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary - spa_history_log_internal() should also generate a dbgmsg (but feel free to double check this - code is in spa_history_log_sync()).

module/zfs/vdev.c Show resolved Hide resolved
@tcaputi tcaputi added this to the 0.8.0 milestone Oct 10, 2018
Copy link
Contributor

@jwk404 jwk404 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add logic to the positive test that verifies the state of the feature as it moves from enabled to active and back?

Copy link
Contributor

@mmaybee mmaybee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good. Let me know if my suggestions make sense.

(ps->pss_func == POOL_SCAN_RESILVER) ?
"resilvering" : "repairing");
} else if (vs->vs_resilver_deferred) {
(void) printf(gettext(" (resilver deferred)"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if a better message here might be (awaiting resilver) or perhaps (needs resilvering)?

spa_feature_is_enabled(spa, SPA_FEATURE_RESILVER_DEFER))
vdev_set_deferred_resilver(spa, vd);
else
spa_async_request(spa, SPA_ASYNC_RESILVER);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a macro/helper-function like: boolean_t spa_should_defer_resilver(spa)? This would check if the feature was set and then look to see if we are currently resilvering. This could be used in at least 4 places in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matt and I actually talked a bit about this before offline. The checks are actually slightly different. 2 of them could use a common helper function, but I personally try to find 3 use-cases before abstracting something to a separate function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, fair enough, although I think my suggestion was a little more narrow than the original suggestion from Matt. I was suggesting just combining the two checks "dsl_scan_resilvering()" and "spa_feature_is_enabled()" into a single function call. I think you use that in more than two locations. This was more about clarity than code reduction.

vdev_resilver_needed(spa->spa_root_vdev, NULL, NULL)) {
zfs_dbgmsg("starting defered resilver txg=%llu",
(longlong_t)tx->tx_txg);
spa_async_request(spa, SPA_ASYNC_RESILVER);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Matt's comments. How about changing the name of dsl_scan_reset_deferred() to dsl_scan_clear_deferred() and change the variables resilver_available and should_resilver to resilver_needed.
I would also change your comment slightly:
/*
* Clear any deferred_resilver flags in the config. If there are drives that
* need resilvering, kick off an asynchronous request to start resilver.
* ...

Currently, if a resilver is triggered for any reason while an
existing one is running, zfs will immediately restart the existing
resilver from the beginning to include the new drive. This causes
problems for system administrators when a drive fails while another
is already resilvering. In this case, the optimal thing to do to
reduce risk of data loss is to wait for the current resilver to end
before immediately replacing the second failed drive, which allows
the system to operate with two incomplete drives for the minimum
amount of time.

This patch introduces the resilver_defer feature that essentially
does this for the admin without forcing them to wait and monitor
the resilver manually. The change requires an on-disk feature
since we must mark drives that are part of a defered resilver in
the vdev config to ensure that we do not assume they are done
resilvering when an existing resilver completes.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
@tcaputi
Copy link
Contributor Author

tcaputi commented Oct 18, 2018

@mmaybee your comments have been addressed and the patch has been rebased on master (it actually conflicted with the new ZFEATURE_TYPE_BOOLEAN changes, but git didn't notice).

@codecov
Copy link

codecov bot commented Oct 18, 2018

Codecov Report

Merging #7732 into master will increase coverage by 0.12%.
The diff coverage is 97.93%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7732      +/-   ##
==========================================
+ Coverage   78.46%   78.58%   +0.12%     
==========================================
  Files         377      377              
  Lines      114404   114488      +84     
==========================================
+ Hits        89765    89974     +209     
+ Misses      24639    24514     -125
Flag Coverage Δ
#kernel 78.8% <98.59%> (+0.09%) ⬆️
#user 67.84% <93.81%> (+0.56%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d43194...09156cf. Read the comment docs.

vdev_resilver_needed(spa->spa_root_vdev, NULL, NULL)) {
zfs_dbgmsg("starting defered resilver txg=%llu",
(longlong_t)tx->tx_txg);
spa_async_request(spa, SPA_ASYNC_RESILVER);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

spa_feature_is_enabled(spa, SPA_FEATURE_RESILVER_DEFER))
vdev_set_deferred_resilver(spa, vd);
else
spa_async_request(spa, SPA_ASYNC_RESILVER);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, fair enough, although I think my suggestion was a little more narrow than the original suggestion from Matt. I was suggesting just combining the two checks "dsl_scan_resilvering()" and "spa_feature_is_enabled()" into a single function call. I think you use that in more than two locations. This was more about clarity than code reduction.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Oct 18, 2018
@behlendorf behlendorf merged commit 80a91e7 into openzfs:master Oct 19, 2018
ghfields pushed a commit to ghfields/zfs that referenced this pull request Oct 29, 2018
Currently, if a resilver is triggered for any reason while an
existing one is running, zfs will immediately restart the existing
resilver from the beginning to include the new drive. This causes
problems for system administrators when a drive fails while another
is already resilvering. In this case, the optimal thing to do to
reduce risk of data loss is to wait for the current resilver to end
before immediately replacing the second failed drive, which allows
the system to operate with two incomplete drives for the minimum
amount of time.

This patch introduces the resilver_defer feature that essentially
does this for the admin without forcing them to wait and monitor
the resilver manually. The change requires an on-disk feature
since we must mark drives that are part of a deferred resilver in
the vdev config to ensure that we do not assume they are done
resilvering when an existing resilver completes.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: @mmaybee 
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes openzfs#7732
behlendorf pushed a commit that referenced this pull request Nov 7, 2018
vdev_clear() can call vdev_set_deferred_resilver() with a
non-leaf vdev to setup a deferred resilver. However, this
function is currently written to only handle leaf vdevs.
This bug was introduced with deferred resilvers in 80a91e7.
This patch makes this function recursive so that it can find
appropriate vdevs to resilver and set vdev_resilver_deferred
on them.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Issue #7732
Closes #8082
GregorKopka pushed a commit to GregorKopka/zfs that referenced this pull request Jan 7, 2019
Currently, if a resilver is triggered for any reason while an
existing one is running, zfs will immediately restart the existing
resilver from the beginning to include the new drive. This causes
problems for system administrators when a drive fails while another
is already resilvering. In this case, the optimal thing to do to
reduce risk of data loss is to wait for the current resilver to end
before immediately replacing the second failed drive, which allows
the system to operate with two incomplete drives for the minimum
amount of time.

This patch introduces the resilver_defer feature that essentially
does this for the admin without forcing them to wait and monitor
the resilver manually. The change requires an on-disk feature
since we must mark drives that are part of a deferred resilver in
the vdev config to ensure that we do not assume they are done
resilvering when an existing resilver completes.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: @mmaybee 
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes openzfs#7732
GregorKopka pushed a commit to GregorKopka/zfs that referenced this pull request Jan 7, 2019
vdev_clear() can call vdev_set_deferred_resilver() with a
non-leaf vdev to setup a deferred resilver. However, this
function is currently written to only handle leaf vdevs.
This bug was introduced with deferred resilvers in 80a91e7.
This patch makes this function recursive so that it can find
appropriate vdevs to resilver and set vdev_resilver_deferred
on them.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Issue openzfs#7732
Closes openzfs#8082
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants