Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mistake adding log device as single-drive vdev seems unrecoverable #6907

Closed
tesujimath opened this issue Nov 29, 2017 · 20 comments
Closed

Mistake adding log device as single-drive vdev seems unrecoverable #6907

tesujimath opened this issue Nov 29, 2017 · 20 comments

Comments

@tesujimath
Copy link
Contributor

tesujimath commented Nov 29, 2017

I mistyped the zpool command to add a log device, so it got added as a single-disk vdev, alongside all my raidz1 vdevs. Each of them are around 5TB, and the wannabee log device is an 8GB ZeusRAM drive (i.e. tiny).

I now can't remove the device, and my zpool is in a really vulnerable state. And has no log device.

This is a busy production fileserver, and since there is 56TB of data in this zpool, copying the data to another fileserver so I can destroy and re-create the zpool is very unattractive. I see #3371 would be a solution. Alternatively, perhaps I could install another OpenZFS implementation (such as Delphix), and use that to recover my zpool, then revert back to ZFS on Linux. Or use some development branch of ZFS on Linux for the recovery, if such a thing exists.

Any suggestions here on what would be possible will be gratefully received.

System information

Type Version/Name
Distribution Name CentOS
Distribution Version 7.4
Linux Kernel 3.10.0-693.2.2.el7.x86_64
Architecture x86_64
ZFS Version 0.7.3-1
SPL Version 0.7.3-1

Describe the problem you're observing

I mistakenly added a single drive to a zpool as a vdev rather than a log device. Now I can't remove it.

Describe how to reproduce the problem

[Edit: See later comment for reproducible sequence]

I mistyped the command to add a log device to my pool. Instead of this:

# zpool add z102 log H35

I typed this:

# zpool add z102 H35

Trying to remove it gives me this error:

# zpool remove z102 H35
cannot remove H35: only inactive hot spares, cache, or log devices can be removed

Here's what my zpool looks like now:

# zpool status
  pool: z102
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
	still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
	the pool may no longer be accessible by software that does not support
	the features. See zpool-features(5) for details.
  scan: scrub repaired 0B in 11h57m with 0 errors on Sun Nov  5 06:57:06 2017
config:

	NAME         STATE     READ WRITE CKSUM
	z102         ONLINE       0     0     0
	  raidz1-0   ONLINE       0     0     0
	    A1       ONLINE       0     0     0
	    A2       ONLINE       0     0     0
	    A3       ONLINE       0     0     0
	    A4       ONLINE       0     0     0
	    A5       ONLINE       0     0     0
	  raidz1-1   ONLINE       0     0     0
	    A6       ONLINE       0     0     0
	    A7       ONLINE       0     0     0
	    A8       ONLINE       0     0     0
	    A9       ONLINE       0     0     0
	    A10      ONLINE       0     0     0
	  raidz1-2   ONLINE       0     0     0
	    A11      ONLINE       0     0     0
	    A12      ONLINE       0     0     0
	    A13      ONLINE       0     0     0
	    A14      ONLINE       0     0     0
	    A15      ONLINE       0     0     0
	  raidz1-3   ONLINE       0     0     0
	    A16      ONLINE       0     0     0
	    A17      ONLINE       0     0     0
	    A18      ONLINE       0     0     0
	    A19      ONLINE       0     0     0
	    A20      ONLINE       0     0     0
	  raidz1-4   ONLINE       0     0     0
	    A21      ONLINE       0     0     0
	    A22      ONLINE       0     0     0
	    A23      ONLINE       0     0     0
	    A24      ONLINE       0     0     0
	    A25      ONLINE       0     0     0
	  raidz1-5   ONLINE       0     0     0
	    B1       ONLINE       0     0     0
	    B2       ONLINE       0     0     0
	    B3       ONLINE       0     0     0
	    B4       ONLINE       0     0     0
	    B5       ONLINE       0     0     0
	  raidz1-6   ONLINE       0     0     0
	    B6       ONLINE       0     0     0
	    B7       ONLINE       0     0     0
	    B8       ONLINE       0     0     0
	    B9       ONLINE       0     0     0
	    B10      ONLINE       0     0     0
	  raidz1-7   ONLINE       0     0     0
	    B11      ONLINE       0     0     0
	    B12      ONLINE       0     0     0
	    B13      ONLINE       0     0     0
	    B14      ONLINE       0     0     0
	    B15      ONLINE       0     0     0
	  raidz1-8   ONLINE       0     0     0
	    B16      ONLINE       0     0     0
	    B17      ONLINE       0     0     0
	    B18      ONLINE       0     0     0
	    B19      ONLINE       0     0     0
	    B20      ONLINE       0     0     0
	  raidz1-9   ONLINE       0     0     0
	    B21      ONLINE       0     0     0
	    B22      ONLINE       0     0     0
	    B23      ONLINE       0     0     0
	    B24      ONLINE       0     0     0
	    B25      ONLINE       0     0     0
	  H35        ONLINE       0     0     0
	  raidz1-11  ONLINE       0     0     0
	    C1       ONLINE       0     0     0
	    C2       ONLINE       0     0     0
	    C3       ONLINE       0     0     0
	    C4       ONLINE       0     0     0
	    C5       ONLINE       0     0     0
	  raidz1-12  ONLINE       0     0     0
	    C6       ONLINE       0     0     0
	    C7       ONLINE       0     0     0
	    C8       ONLINE       0     0     0
	    C9       ONLINE       0     0     0
	    C10      ONLINE       0     0     0
	  raidz1-13  ONLINE       0     0     0
	    C11      ONLINE       0     0     0
	    C12      ONLINE       0     0     0
	    C13      ONLINE       0     0     0
	    C14      ONLINE       0     0     0
	    C15      ONLINE       0     0     0
	  raidz1-14  ONLINE       0     0     0
	    C16      ONLINE       0     0     0
	    C17      ONLINE       0     0     0
	    C18      ONLINE       0     0     0
	    C19      ONLINE       0     0     0
	    C20      ONLINE       0     0     0
	  raidz1-15  ONLINE       0     0     0
	    C21      ONLINE       0     0     0
	    C22      ONLINE       0     0     0
	    C23      ONLINE       0     0     0
	    C24      ONLINE       0     0     0
	    C25      ONLINE       0     0     0

errors: No known data errors

# zfs list -d 0
NAME   USED  AVAIL  REFER  MOUNTPOINT
z102  56.0T  7.12T   257K  /export/z102
@tesujimath
Copy link
Contributor Author

Well, since Solaris ZFS ensures that the zpool add command can't change the level of redundancy without the -f option, this could be seen as a severe misfeature of ZFS on Linux.

@tesujimath
Copy link
Contributor Author

If you're saying that is the documented behaviour of ZFS on Linux, then this is actually a bug, since the exact commands I showed above demonstrate a situation in which that check is not working.

@rincebrain
Copy link
Contributor

@tesujimath So, presuming for the purpose of discussion that at some point you did run zpool add without -f and it added the device, could you please share what ZoL version was on the machine at the time?

(The contents of "zpool history | grep zpool" would probably also be useful, but also might have been absorbed in the morass of other commands often run on a pool in the intervening period.)

@tesujimath
Copy link
Contributor Author

@kpande please believe me. I'm not making this up. Here's my recent history:

  979  zpool import
  980  zpool import z102
  986  zpool status
  989  zpool status
  990  zpool status | more
 1000  zpool status
 1005  zpool status
 1006  zpool remove z102 35000a7203008e6c0
 1007  zpool add z102 H35
 1008  zpool status
 1009  zpool remove z102 H35
 1010  zpool remove -f z102 H35
 1011  man zpool
 1012  zpool offline z102 H35
 1014  zpool status
 1019  man zpool
 1021  zpool remove z102 H36
 1022  zpool remove z102 H35
 1027  history | grep zpool

My goal was to rename the rawly named log device using its vdev-id alias.

@rincebrain The information you request is in the original issue comment. Or did I misunderstand you?

@rincebrain
Copy link
Contributor

@tesujimath One of the main things that is unusual is that usually, for non-{log,cache} vdevs, it always orders them by when they were added to a pool, so the status output you're seeing suggests you added 5 of the raidz vdevs after the fateful zpool add command, which is why I was asking for confirmation that you are running the same ZoL version now that you were when you added the device.

The fact that it's showing up this way...is fascinating.

@tesujimath
Copy link
Contributor Author

tesujimath commented Nov 29, 2017

@rincebrain Yep, upgraded earlier today from ZFS 0.6.5.9 to 0.7.3, and 0.7.3 is definitely what was running when I erroneously added H35 to my pool. I rebooted twice after the upgrade before doing this. So no chance of old zfs kernel module hanging around. I'm using zfs-kmod, BTW.

Fascination is not my dominant emotion just now ...

@tesujimath
Copy link
Contributor Author

@kpande Is it relevant that my zpool version hasn't been upgraded since 0.6.5.9, so the new zpool features in 0.7.3 have not been activated in the zpool?

@rincebrain
Copy link
Contributor

@tesujimath Occupational hazard, my day job is primarily "huh, why did it do that", even when it's a five-alarm fire.

So, a few remarks, before I go on:

  • the conservative option would, of course, be recreating the pool, which is obviously highly inconvenient, but guaranteed to work in a finite time period
  • even if you used another platform's OpenZFS implementation, I don't think any of the standard platforms have incorporated the evacuation changes yet, so this would be rather bleeding edge, and even after you did the zpool remove, you'd need to keep running that change, versions of OpenZFS without it wouldn't be able to write to the pool (or probably even read, since it involves modifying where you look for data)
  • A not-ideal measure to preserve the pool from the ZeusRAM losing power causing the pool to die for right now would be to add a mirror of it against an 8G/16G/whatever flat file on the root filesystem or somewhere other than a file on the pool (because then you'd have a bootstrapping problem of needing the pool imported and FSes mounted to import the pool).
  • If you wanted, you could wait until evacuation got merged, but even then, you'd have to run the git master branch, 0.7.X is almost certainly not going to get things that aren't bugfixes added

The fact that you made the pool on 0.6.5.9 but did the remove and add on the pool after upgrading 0.7.3 is useful for reproduction, but we won't know if it's relevant to why this happened until after we get it reproducing somewhere else.

@tesujimath
Copy link
Contributor Author

@rincebrain Thanks for those ideas. I am mulling it over.

Actually, the zpool was created several years ago, on ZFS 0.6.0-rc14 I think, and then zpool upgraded over the years. Current feature flags are these:

# zpool upgrade
This system supports ZFS pool feature flags.

All pools are formatted using feature flags.


Some supported features are not enabled on the following pools. Once a
feature is enabled the pool may become incompatible with software
that does not support the feature. See zpool-features(5) for details.

POOL  FEATURE
---------------
z102
      multi_vdev_crash_dump
      spacemap_histogram
      enabled_txg
      hole_birth
      extensible_dataset
      embedded_data
      bookmarks
      filesystem_limits
      large_blocks
      large_dnode
      sha512
      skein
      edonr
      userobj_accounting

@gmelikov
Copy link
Member

This check is in cmd/zpool/zpool_vdev.c file, and ZoL's one has more checks instead of OpenZFS, it's the best place to dig in such case. I think if there is a problem - it may spread all over OpenZFS.

Screenshot: screenshot_2017-11-29_08-25-51

@tesujimath
Copy link
Contributor Author

@kpande I managed to reproduce the problem on a test server, with a brand newly created zpool. Here is the sequence that exhibits the nasty behviour. Essentially, we create a zpool with a log device, extend the pool, then remove the log device and add it as a standalone vdev (without -f). This sequence reliably reproduced the problem for me.

# zpool create ztest raidz1 T31 T32 log T33
# zpool add ztest raidz1 T34 T35
# zpool status ztest
  pool: ztest
 state: ONLINE
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	ztest       ONLINE       0     0     0
	  raidz1-0  ONLINE       0     0     0
	    T31     ONLINE       0     0     0
	    T32     ONLINE       0     0     0
	  raidz1-2  ONLINE       0     0     0
	    T34     ONLINE       0     0     0
	    T35     ONLINE       0     0     0
	logs
	  T33       ONLINE       0     0     0

errors: No known data errors

# zpool remove ztest T33
# zpool add ztest T33   # this is a terrible mistake, omitted to say log
# zpool status ztest
  pool: ztest
 state: ONLINE
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	ztest       ONLINE       0     0     0
	  raidz1-0  ONLINE       0     0     0
	    T31     ONLINE       0     0     0
	    T32     ONLINE       0     0     0
	  T33       ONLINE       0     0     0
	  raidz1-2  ONLINE       0     0     0
	    T34     ONLINE       0     0     0
	    T35     ONLINE       0     0     0

errors: No known data errors

And just to be clear, here's my version info:

# modinfo zfs | grep -iw version
version:        0.7.3-1
# rpm -qi zfs
Name        : zfs
Version     : 0.7.3
Release     : 1.el7_4
Architecture: x86_64
Install Date: Wed 22 Nov 2017 02:01:52 PM NZDT
Group       : System Environment/Kernel
Size        : 1052134
License     : CDDL
Signature   : RSA/SHA256, Fri 20 Oct 2017 07:36:02 AM NZDT, Key ID a9d5a1c0f14ab620
Source RPM  : zfs-0.7.3-1.el7_4.src.rpm
Build Date  : Fri 20 Oct 2017 07:29:56 AM NZDT
Build Host  : fedora-24-repo
Relocations : (not relocatable)
URL         : http://zfsonlinux.org/
Summary     : Commands to control the kernel modules and libraries
Description :
This package contains the ZFS command line utilities.

@behlendorf
Copy link
Contributor

For reference the zpool_add_010_pos test case is designed to verify these safety checks. That test case is passing, but it doesn't include the zpool remove step , it operates on file vdevs, and it doesn't exactly run your specific case. It would be interesting to extend the test case and see if the issue can be reproduced.

You can run it locally from the ZoL source tree by running.

$ ./scripts/zfs-tests.sh -t tests/zfs-tests/tests/functional/cli_root/zpool_add/zpool_add_010_pos

Here's an except from the log.

20:42:45.77 SUCCESS: zpool create testpool1 mirror /testpool/vdev0 /testpool/vdev1
20:42:45.84 SUCCESS: zpool add testpool1 /testpool/vdev5 exited 1
20:42:46.24 SUCCESS: zpool add -f testpool1 /testpool/vdev5
20:42:46.42 SUCCESS: zpool destroy -f testpool1

I'm reopening this issue until this is understood.

@behlendorf behlendorf reopened this Nov 30, 2017
@rincebrain
Copy link
Contributor

rincebrain commented Nov 30, 2017

I can confirm that doing the following reproduces this on 0.7.3, though I was confused to discover very strange failures of trying to run even existing zfs-tests/.../zpool_add/ tests on my vanilla CentOS 7 VM.

zpool create testpool raidz1 file1 file2;
zpool add testpool log file3;
zpool add testpool raidz1 file4 file5;
zpool remove testpool file3;
zpool add testpool file3;

The additional vdev after the log appears necessary, as doing this without it complains as expected on 0.7.3.

rincebrain added a commit to rincebrain/zfs that referenced this issue Nov 30, 2017
@rincebrain
Copy link
Contributor

Who knew, mixing git master's zfs-tests with 0.7.3 doesn't work well. Shocking, I know.

I haven't opened a PR because I haven't done the linting and cleanup yet, but you can find a test in https://github.com/rincebrain/zfs/tree/6907_test

behlendorf added a commit to behlendorf/zfs that referenced this issue Nov 30, 2017
When the pool configuration contains a hole due to a previous device
removal ignore this top level vdev.  Failure to do so will result in
the current configuration being assessed to have a non-uniform
replication level and the expected warning will be disabled.

The zpool_add_010_pos test case was extended to cover this scenario.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#6907
@behlendorf
Copy link
Contributor

Proposed fix in #6911 with a shamelessly stolen version of @rincebrain's test case.

behlendorf added a commit to tonyhutter/zfs that referenced this issue Dec 1, 2017
When the pool configuration contains a hole due to a previous device
removal ignore this top level vdev.  Failure to do so will result in
the current configuration being assessed to have a non-uniform
replication level and the expected warning will be disabled.

The zpool_add_010_pos test case was extended to cover this scenario.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#6907
behlendorf added a commit to tonyhutter/zfs that referenced this issue Dec 1, 2017
When the pool configuration contains a hole due to a previous device
removal ignore this top level vdev.  Failure to do so will result in
the current configuration being assessed to have a non-uniform
replication level and the expected warning will be disabled.

The zpool_add_010_pos test case was extended to cover this scenario.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#6907
behlendorf added a commit to tonyhutter/zfs that referenced this issue Dec 1, 2017
When the pool configuration contains a hole due to a previous device
removal ignore this top level vdev.  Failure to do so will result in
the current configuration being assessed to have a non-uniform
replication level and the expected warning will be disabled.

The zpool_add_010_pos test case was extended to cover this scenario.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#6907
behlendorf added a commit to tonyhutter/zfs that referenced this issue Dec 1, 2017
When the pool configuration contains a hole due to a previous device
removal ignore this top level vdev.  Failure to do so will result in
the current configuration being assessed to have a non-uniform
replication level and the expected warning will be disabled.

The zpool_add_010_pos test case was extended to cover this scenario.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#6907
tonyhutter pushed a commit that referenced this issue Dec 5, 2017
When the pool configuration contains a hole due to a previous device
removal ignore this top level vdev.  Failure to do so will result in
the current configuration being assessed to have a non-uniform
replication level and the expected warning will be disabled.

The zpool_add_010_pos test case was extended to cover this scenario.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6907
Closes #6911
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Dec 5, 2017
When the pool configuration contains a hole due to a previous device
removal ignore this top level vdev.  Failure to do so will result in
the current configuration being assessed to have a non-uniform
replication level and the expected warning will be disabled.

The zpool_add_010_pos test case was extended to cover this scenario.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#6907
Closes openzfs#6911
Nasf-Fan pushed a commit to Nasf-Fan/zfs that referenced this issue Jan 29, 2018
When the pool configuration contains a hole due to a previous device
removal ignore this top level vdev.  Failure to do so will result in
the current configuration being assessed to have a non-uniform
replication level and the expected warning will be disabled.

The zpool_add_010_pos test case was extended to cover this scenario.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#6907
Closes openzfs#6911
Nasf-Fan pushed a commit to Nasf-Fan/zfs that referenced this issue Feb 13, 2018
When the pool configuration contains a hole due to a previous device
removal ignore this top level vdev.  Failure to do so will result in
the current configuration being assessed to have a non-uniform
replication level and the expected warning will be disabled.

The zpool_add_010_pos test case was extended to cover this scenario.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#6907
Closes openzfs#6911
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@behlendorf @rincebrain @gmelikov @tesujimath and others