Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 'zpool add' handling of nested interior VDEVs #6996

Merged
merged 1 commit into from Dec 28, 2017

Conversation

loli10K
Copy link
Contributor

@loli10K loli10K commented Dec 26, 2017

Description

When replacing a faulted device which was previously handled by a spare multiple levels of nested interior VDEVs will be present in the pool configuration: this is safe and allowed, but get_replication() needs to handle this situation gracefully to let zpool add new devices to the pool otherwise we will suffer the following failure:

root@linux:~# zpool status
  pool: testpool
 state: DEGRADED
  scan: resilvered 20.1M in 0 days 00:00:07 with 0 errors on Tue Dec 26 18:47:37 2017
config:

	NAME                         STATE     READ WRITE CKSUM
	testpool                     DEGRADED     0     0     0
	  mirror-0                   DEGRADED     0     0     0
	    /var/tmp/file-vdev1      ONLINE       0     0     0
	    spare-1                  DEGRADED     0     0     0
	      replacing-0            DEGRADED     0     0     0
	        /var/tmp/file-vdev2  UNAVAIL      0     0     0  cannot open
	        /var/tmp/file-vdev3  ONLINE       0     0     0
	      /var/tmp/file-spare1   ONLINE       0     0     0
	spares
	  /var/tmp/file-spare1       INUSE     currently in use

errors: No known data errors
root@linux:~# zpool add $POOLNAME spare $TMPDIR/file-spare2
nvlist_lookup_string(cnv, ZPOOL_CONFIG_PATH, &path) == 0
ASSERT at zpool_vdev.c:884:get_replication()Aborted (core dumped)

Motivation and Context

Fix #6678

How Has This Been Tested?

Tested on local builder, script added to the ZTS; zpool add is now able to handle the following situation:

root@linux:/usr/src/zfs# zpool status
  pool: testpool
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
	using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: resilvered 20.2M in 0 days 00:00:00 with 0 errors on Tue Dec 26 18:41:38 2017
config:

	NAME                                                STATE     READ WRITE CKSUM
	testpool                                            DEGRADED     0     0     0
	  mirror-0                                          DEGRADED     0     0     0
	    /var/tmp/file-vdev1                             ONLINE       0     0     0
	    spare-1                                         DEGRADED   261     0     0
	      replacing-0                                   DEGRADED     0     0     0
	        /var/tmp/file-vdev2                         UNAVAIL      0     0     0  cannot open
	        spare-1                                     DEGRADED     0     0     0
	          replacing-0                               DEGRADED     0     0     0
	            /var/tmp/file-vdev3                     UNAVAIL      0     0     0  cannot open
	            spare-1                                 DEGRADED     0     0     0
	              replacing-0                           DEGRADED     0     0     0
	                /var/tmp/file-vdev4                 UNAVAIL      0     0     0  cannot open
	                spare-1                             DEGRADED     0     0     0
	                  replacing-0                       DEGRADED     0     0     0
	                    /var/tmp/file-vdev5             UNAVAIL      0     0     0  cannot open
	                    spare-1                         DEGRADED     0     0     0
	                      replacing-0                   DEGRADED     0     0     0
	                        /var/tmp/file-vdev6         UNAVAIL      0     0     0  cannot open
	                        spare-1                     DEGRADED     0     0     0
	                          /var/tmp/file-vdev7       UNAVAIL      0     0     0  cannot open
	                          /var/tmp/file-spare6      ONLINE       0     0     0
	                        spare-2                     DEGRADED     0     0     0
	                          replacing-0               DEGRADED     0     0     0
	                            /var/tmp/file-vdev8     UNAVAIL      0     0     0  cannot open
	                            spare-1                 DEGRADED     0     0     0
	                              /var/tmp/file-vdev9   UNAVAIL      0     0     0  cannot open
	                              /var/tmp/file-spare8  ONLINE       0     0     0
	                            /var/tmp/file-vdev10    ONLINE       0     0     0
	                          /var/tmp/file-spare7      ONLINE       0     0     0
	                      /var/tmp/file-spare5          ONLINE       0     0     0
	                  /var/tmp/file-spare4              ONLINE       0     0     0
	              /var/tmp/file-spare3                  ONLINE       0     0     0
	          /var/tmp/file-spare2                      ONLINE       0     0     0
	      /var/tmp/file-spare1                          ONLINE       0     0   261
	spares
	  /var/tmp/file-spare1                              INUSE     currently in use
	  /var/tmp/file-spare2                              INUSE     currently in use
	  /var/tmp/file-spare3                              INUSE     currently in use
	  /var/tmp/file-spare4                              INUSE     currently in use
	  /var/tmp/file-spare5                              INUSE     currently in use
	  /var/tmp/file-spare6                              INUSE     currently in use
	  /var/tmp/file-spare7                              INUSE     currently in use
	  /var/tmp/file-spare8                              INUSE     currently in use
	  /var/tmp/file-spare9                              AVAIL   

errors: No known data errors
root@linux:/usr/src/zfs# truncate -s 512m $TMPDIR/file-spare-new
root@linux:/usr/src/zfs# zpool add $POOLNAME spare $TMPDIR/file-spare-new
root@linux:/usr/src/zfs# echo $?
0
root@linux:/usr/src/zfs# 

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commit messages are properly formatted and contain Signed-off-by.
  • Change has been approved by a ZFS on Linux member.

@loli10K loli10K force-pushed the issue-6678 branch 3 times, most recently from c5bb381 to c59d4f9 Compare December 27, 2017 18:36
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice fix!

log_must wait_hotspare_state $TESTPOOL $SPARE_DEV1 "INUSE"
log_must check_state $TESTPOOL "" "DEGRADED"

# 2.2 Replace the faulted device: this create a replacing vdev inside a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/create/creates

When replacing a faulted device which was previously handled by a spare
multiple levels of nested interior VDEVs will be present in the pool
configuration; the following example illustrates one of the possible
situations:

   NAME                          STATE     READ WRITE CKSUM
   testpool                      DEGRADED     0     0     0
     raidz1-0                    DEGRADED     0     0     0
       spare-0                   DEGRADED     0     0     0
         replacing-0             DEGRADED     0     0     0
           /var/tmp/fault-dev    UNAVAIL      0     0     0  cannot open
           /var/tmp/replace-dev  ONLINE       0     0     0
         /var/tmp/spare-dev1     ONLINE       0     0     0
       /var/tmp/safe-dev         ONLINE       0     0     0
   spares
     /var/tmp/spare-dev1         INUSE     currently in use

This is safe and allowed, but get_replication() needs to handle this
situation gracefully to let zpool add new devices to the pool.

Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
@codecov
Copy link

codecov bot commented Dec 28, 2017

Codecov Report

Merging #6996 into master will decrease coverage by 0.18%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #6996      +/-   ##
=========================================
- Coverage   75.49%   75.3%   -0.19%     
=========================================
  Files         296     296              
  Lines       95500   95500              
=========================================
- Hits        72095   71915     -180     
- Misses      23405   23585     +180
Flag Coverage Δ
#kernel 74.47% <ø> (-0.39%) ⬇️
#user 67.46% <100%> (-0.41%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 823d48b...1e7caa4. Read the comment docs.

@behlendorf behlendorf merged commit 390d679 into openzfs:master Dec 28, 2017
@behlendorf behlendorf added this to To Do in 0.7.6 via automation Dec 28, 2017
@loli10K loli10K deleted the issue-6678 branch December 28, 2017 19:09
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Jan 16, 2018
When replacing a faulted device which was previously handled by a spare
multiple levels of nested interior VDEVs will be present in the pool
configuration; the following example illustrates one of the possible
situations:

   NAME                          STATE     READ WRITE CKSUM
   testpool                      DEGRADED     0     0     0
     raidz1-0                    DEGRADED     0     0     0
       spare-0                   DEGRADED     0     0     0
         replacing-0             DEGRADED     0     0     0
           /var/tmp/fault-dev    UNAVAIL      0     0     0  cannot open
           /var/tmp/replace-dev  ONLINE       0     0     0
         /var/tmp/spare-dev1     ONLINE       0     0     0
       /var/tmp/safe-dev         ONLINE       0     0     0
   spares
     /var/tmp/spare-dev1         INUSE     currently in use

This is safe and allowed, but get_replication() needs to handle this
situation gracefully to let zpool add new devices to the pool.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes openzfs#6678 
Closes openzfs#6996
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Jan 18, 2018
When replacing a faulted device which was previously handled by a spare
multiple levels of nested interior VDEVs will be present in the pool
configuration; the following example illustrates one of the possible
situations:

   NAME                          STATE     READ WRITE CKSUM
   testpool                      DEGRADED     0     0     0
     raidz1-0                    DEGRADED     0     0     0
       spare-0                   DEGRADED     0     0     0
         replacing-0             DEGRADED     0     0     0
           /var/tmp/fault-dev    UNAVAIL      0     0     0  cannot open
           /var/tmp/replace-dev  ONLINE       0     0     0
         /var/tmp/spare-dev1     ONLINE       0     0     0
       /var/tmp/safe-dev         ONLINE       0     0     0
   spares
     /var/tmp/spare-dev1         INUSE     currently in use

This is safe and allowed, but get_replication() needs to handle this
situation gracefully to let zpool add new devices to the pool.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes openzfs#6678 
Closes openzfs#6996
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Jan 18, 2018
When replacing a faulted device which was previously handled by a spare
multiple levels of nested interior VDEVs will be present in the pool
configuration; the following example illustrates one of the possible
situations:

   NAME                          STATE     READ WRITE CKSUM
   testpool                      DEGRADED     0     0     0
     raidz1-0                    DEGRADED     0     0     0
       spare-0                   DEGRADED     0     0     0
         replacing-0             DEGRADED     0     0     0
           /var/tmp/fault-dev    UNAVAIL      0     0     0  cannot open
           /var/tmp/replace-dev  ONLINE       0     0     0
         /var/tmp/spare-dev1     ONLINE       0     0     0
       /var/tmp/safe-dev         ONLINE       0     0     0
   spares
     /var/tmp/spare-dev1         INUSE     currently in use

This is safe and allowed, but get_replication() needs to handle this
situation gracefully to let zpool add new devices to the pool.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes openzfs#6678 
Closes openzfs#6996
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Jan 19, 2018
When replacing a faulted device which was previously handled by a spare
multiple levels of nested interior VDEVs will be present in the pool
configuration; the following example illustrates one of the possible
situations:

   NAME                          STATE     READ WRITE CKSUM
   testpool                      DEGRADED     0     0     0
     raidz1-0                    DEGRADED     0     0     0
       spare-0                   DEGRADED     0     0     0
         replacing-0             DEGRADED     0     0     0
           /var/tmp/fault-dev    UNAVAIL      0     0     0  cannot open
           /var/tmp/replace-dev  ONLINE       0     0     0
         /var/tmp/spare-dev1     ONLINE       0     0     0
       /var/tmp/safe-dev         ONLINE       0     0     0
   spares
     /var/tmp/spare-dev1         INUSE     currently in use

This is safe and allowed, but get_replication() needs to handle this
situation gracefully to let zpool add new devices to the pool.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes openzfs#6678
Closes openzfs#6996
@behlendorf behlendorf moved this from To Do to In progress in 0.7.6 Jan 19, 2018
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Jan 23, 2018
When replacing a faulted device which was previously handled by a spare
multiple levels of nested interior VDEVs will be present in the pool
configuration; the following example illustrates one of the possible
situations:

   NAME                          STATE     READ WRITE CKSUM
   testpool                      DEGRADED     0     0     0
     raidz1-0                    DEGRADED     0     0     0
       spare-0                   DEGRADED     0     0     0
         replacing-0             DEGRADED     0     0     0
           /var/tmp/fault-dev    UNAVAIL      0     0     0  cannot open
           /var/tmp/replace-dev  ONLINE       0     0     0
         /var/tmp/spare-dev1     ONLINE       0     0     0
       /var/tmp/safe-dev         ONLINE       0     0     0
   spares
     /var/tmp/spare-dev1         INUSE     currently in use

This is safe and allowed, but get_replication() needs to handle this
situation gracefully to let zpool add new devices to the pool.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes openzfs#6678
Closes openzfs#6996
Nasf-Fan pushed a commit to Nasf-Fan/zfs that referenced this pull request Jan 29, 2018
When replacing a faulted device which was previously handled by a spare
multiple levels of nested interior VDEVs will be present in the pool
configuration; the following example illustrates one of the possible
situations:

   NAME                          STATE     READ WRITE CKSUM
   testpool                      DEGRADED     0     0     0
     raidz1-0                    DEGRADED     0     0     0
       spare-0                   DEGRADED     0     0     0
         replacing-0             DEGRADED     0     0     0
           /var/tmp/fault-dev    UNAVAIL      0     0     0  cannot open
           /var/tmp/replace-dev  ONLINE       0     0     0
         /var/tmp/spare-dev1     ONLINE       0     0     0
       /var/tmp/safe-dev         ONLINE       0     0     0
   spares
     /var/tmp/spare-dev1         INUSE     currently in use

This is safe and allowed, but get_replication() needs to handle this
situation gracefully to let zpool add new devices to the pool.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes openzfs#6678
Closes openzfs#6996
tonyhutter pushed a commit that referenced this pull request Feb 6, 2018
When replacing a faulted device which was previously handled by a spare
multiple levels of nested interior VDEVs will be present in the pool
configuration; the following example illustrates one of the possible
situations:

   NAME                          STATE     READ WRITE CKSUM
   testpool                      DEGRADED     0     0     0
     raidz1-0                    DEGRADED     0     0     0
       spare-0                   DEGRADED     0     0     0
         replacing-0             DEGRADED     0     0     0
           /var/tmp/fault-dev    UNAVAIL      0     0     0  cannot open
           /var/tmp/replace-dev  ONLINE       0     0     0
         /var/tmp/spare-dev1     ONLINE       0     0     0
       /var/tmp/safe-dev         ONLINE       0     0     0
   spares
     /var/tmp/spare-dev1         INUSE     currently in use

This is safe and allowed, but get_replication() needs to handle this
situation gracefully to let zpool add new devices to the pool.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6678
Closes #6996
@tonyhutter tonyhutter moved this from In progress to Done in 0.7.6 Feb 6, 2018
Nasf-Fan pushed a commit to Nasf-Fan/zfs that referenced this pull request Feb 13, 2018
When replacing a faulted device which was previously handled by a spare
multiple levels of nested interior VDEVs will be present in the pool
configuration; the following example illustrates one of the possible
situations:

   NAME                          STATE     READ WRITE CKSUM
   testpool                      DEGRADED     0     0     0
     raidz1-0                    DEGRADED     0     0     0
       spare-0                   DEGRADED     0     0     0
         replacing-0             DEGRADED     0     0     0
           /var/tmp/fault-dev    UNAVAIL      0     0     0  cannot open
           /var/tmp/replace-dev  ONLINE       0     0     0
         /var/tmp/spare-dev1     ONLINE       0     0     0
       /var/tmp/safe-dev         ONLINE       0     0     0
   spares
     /var/tmp/spare-dev1         INUSE     currently in use

This is safe and allowed, but get_replication() needs to handle this
situation gracefully to let zpool add new devices to the pool.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes openzfs#6678
Closes openzfs#6996
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
0.7.6
  
Done
Development

Successfully merging this pull request may close these issues.

adding spare fails during resilver
3 participants