Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear error message when attaching a vdev to a mirror with different ashift #11414

Closed
maxximino opened this issue Dec 29, 2020 · 7 comments
Closed
Labels
Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@maxximino
Copy link
Contributor

System information

Type Version/Name
Distribution Name Debian
Distribution Version Sid
Linux Kernel 5.9.0-5-amd64
Architecture amd64
ZFS Version 2.0 (2.0.0-1~exp1)
SPL Version 2.0 (2.0.0-1~exp1)

Describe the problem you're observing

# zpool attach bootpool /dev/sdb4 /dev/sdc4
cannot attach /dev/sdc4 to /dev/sdb4: can only attach to mirrors and top-level disks

This happens with this pool:

# zpool status bootpool
  pool: bootpool
 state: ONLINE
status: One or more devices are configured to use a non-native block size.
        Expect reduced performance.
action: Replace affected devices with devices that support the
        configured block size, or migrate data to a properly configured
        pool.
  scan: resilvered 4.10G in 00:04:35 with 0 errors on Wed Dec 30 00:25:51 2020
config:

        NAME        STATE     READ WRITE CKSUM
        bootpool    ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sda1    ONLINE       0     0     0
            sdb4    ONLINE       0     0     0  block size: 512B configured, 4096B native

errors: No known data errors

while, instead: # zpool attach -o ashift=9 bootpool /dev/sdb4 /dev/sdc4 worked as intended.

It did not help during troubleshooting that ashift is being reported as zero:

# zpool get ashift bootpool
NAME      PROPERTY  VALUE   SOURCE
bootpool  ashift    0       default

I know I should rebuild the bootpool - I'm not very concerned about performance there. This bug is about giving a more clear error message that talks about the real cause of the problem.

Describe how to reproduce the problem

Try to attach to a mirror a device with a detected native block size that is different from the one of the mirror.

@maxximino maxximino added Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang) labels Dec 29, 2020
@lordmoocow
Copy link

lordmoocow commented Jan 13, 2021

Just to add another scenario (and also hopefully provide google presence for my issue) I have also been hit by this albeit with a different message while using zpool replace.

cannot replace x with y: already in replacing/spare config; wait for completion or use 'zpool detach'

As you can see, it also bears no reference to what appears to be the actual issue. I tried this solution just because I have tried everything else I could possibly find and it worked immediately. Perhaps unluckily, I had in fact already tried specifying the ashift option, but somewhat misguidedly with a value of "12" which was incorrect, and yielded the same misleading error.


With this pool:

# zpool status
  pool: lmcr
 state: ONLINE
status: One or more devices are configured to use a non-native block size.
	Expect reduced performance.
action: Replace affected devices with devices that support the
	configured block size, or migrate data to a properly configured
	pool.
  scan: resilvered 9K in 00:00:01 with 0 errors on Wed Jan 13 20:08:18 2021
config:

	NAME                                                  STATE     READ WRITE CKSUM
	lmcr                                                  ONLINE       0     0     0
	  raidz1-0                                            ONLINE       0     0     0
	    ata-Hitachi_HCS5C2020ALA632_ML0230FA0V2Z8D-part1  ONLINE       0     0     0
	    ata-Hitachi_HCS5C2020ALA632_ML4230FA0RYVRK-part1  ONLINE       0     0     0
	    ata-TOSHIBA_HDWQ140_Z9Q3K2X1FAYG                  ONLINE       0     0     0
	    ata-TOSHIBA_HDWE140_Y95GK2XLFBRG                  ONLINE       0     0     0  block size: 512B configured, 4096B native
	    ata-TOSHIBA_HDWE140_X9URK2SNFBRG                  ONLINE       0     0     0  block size: 512B configured, 4096B native

errors: No known data errors

Attempting to replace ata-Hitachi_HCS5C2020ALA632_ML4230FA0RYVRK-part1 with a new device:

# zpool replace lmcr /dev/disk/by-id/ata-Hitachi_HCS5C2020ALA632_ML4230FA0RYVRK-part1 /dev/disk/by-id/ata-TOSHIBA_HDWE140_76S2K3ADF58D 
cannot replace /dev/disk/by-id/ata-Hitachi_HCS5C2020ALA632_ML4230FA0RYVRK-part1 with /dev/disk/by-id/ata-TOSHIBA_HDWE140_76S2K3ADF58D: already in replacing/spare config; wait for completion or use 'zpool detach'

This is indeed misleading, and by guesswork and reading through open issues I have found that it is caused by the same issue you describe here.

I experienced the same information reported for ashift as you describe:

# zpool get ashift
NAME  PROPERTY  VALUE   SOURCE
lmcr  ashift    0       default

The following command was successful:

# zpool replace -o ashift=9 lmcr /dev/disk/by-id/ata-Hitachi_HCS5C2020ALA632_ML4230FA0RYVRK-part1 /dev/disk/by-id/ata-TOSHIBA_HDWE140_76S2K3ADF58D 

@user2-simon26333
Copy link

@lordmoocow
To add to that, I had this issue too with the misleading error message when adding a 4096 block size device to a 512 block size raidz2 pool. The command that was ultimately successful for you also worked for me.

Wanted to call out that these 'issues' with failing commands and misleading errors tie into #9013 and #2486. It's very easy in response to the not working commands to try a "zpool add" of the replacement device, which irreversibly destroys redundancy and structure of the pool requiring it to be rebuilt - I ran into this issue with this fairly recent zfs version which is after issue #2486, so I suspect there are broader issues here too.

root@debian:~# modinfo zfs | grep version
version:        2.0.4-pve1
srcversion:     97660BA0CB525BEBFA887BB
vermagic:       5.4.114-1-pve SMP mod_unload modversions

@NiklasGollenstede
Copy link

Got the same error message (can only attach to mirrors and top-level disks) when I wanted to swap out the USB disk for a backup dump (zpool attach -f -s backup /dev/sdb /dev/sda).

Fix: zpool upgrade backup (I believe, the pool was created with Ubuntu 20.04 latest ZFS version (0.8....).)

Enabled the following features on 'backup':
  redaction_bookmarks
  redacted_datasets
  bookmark_written
  log_spacemap
  livelist
  device_rebuild
  zstd_compress

After that, it worked!

@rincebrain
Copy link
Contributor

Just ran into this in passing personally - namely, that zpool attach -s gives you this if you don't have device_rebuild enabled.

Shouldn't be that hard to check...

rincebrain added a commit to rincebrain/zfs that referenced this issue Oct 25, 2021
Currently, you get back "can only attach to mirrors and top-level disks"
unconditionally if zpool attach returns ENOTSUP, but that also happens
if, say, feature@device_rebuild=disabled and you tried attach -s.

So let's print an error for that case, lest people go down a rabbit hole
looking into what they did wrong.

Closes: openzfs#11414

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
@romanshein
Copy link

Same trouble with ashift=0:

zpool replace -o ashift=12 raids1 wwn-0x50026b774c00b73d-part4 /dev/sdd11
cannot replace wwn-0x50026b774c00b73d-part4 with /dev/sdd11: already in replacing/spare config; wait for completion or use 'zpool detach'

zpool get ashift
NAME PROPERTY VALUE SOURCE
raids1 ashift 0 default
rpool ashift 13 local
z6 ashift 12 local

Adding -o ashift=9 did the trick: zpool replace -o ashift=9 raids1 wwn-0x50026b774c00b73d-part4 /dev/sdd11

@rincebrain
Copy link
Contributor

I would suspect that's less that the pool "property" ashift is 0 and more that the vdev property "ashift" was 9 and the disk you were putting in was 12, but yes, that should also be handled.

@inkdot7
Copy link
Contributor

inkdot7 commented Apr 17, 2022

Just want to mention that I just had a run-in with the 'can only attach to mirrors and top-level disks' message, which was solved with '-o ashift=9' as above in @romanshein's post above. Am running ZFS v2.0.3-9. feature@device_rebuild is already enabled.

It is kind of scary to not be able to attach devices, as one might want to do that when something has failed. If nothing else, how about mentioning '-o ashift' in the 'can only attach ...' message, such that the user has a chance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

7 participants