ashift issue on linux - cannot add mirror disk or replace single-disk pool created with "default" (ashift=9) with 4K disk ashift=12 -- new device has a different optimal sector size #4740

kingneutron · 2016-06-07T22:46:40Z

--System info:
Ubuntu 14.04--64--LTS
Kernel: 4.2.0-30-generic #36~14.04.1-Ubuntu SMP
RAM: 12GB
Swap: 2GB, mostly un-used due to system optimizations

--ZFS software versions:

$ dpkg -l|grep zfs
ii dkms 2.2.0.3-1.1ubuntu5.14.04.1+zfs10trusty all Dynamic Kernel Module Support Framework
ii libzfs2 0.6.5.7-1trusty amd64 Native OpenZFS filesystem library for Linux
ii mountall 2.53-zfs1 amd64 filesystem mounting tool
ii ubuntu-zfs 8trusty amd64 Native ZFS filesystem metapackage for Ubuntu.
ii zfs-dkms 0.6.5.7-1trusty amd64 Native OpenZFS filesystem kernel modules for Linux
ii zfs-doc 0.6.5.7-1trusty amd64 Native OpenZFS filesystem documentation and examples.
ii zfsutils 0.6.5.7-1trusty amd64 Native OpenZFS management utilities for Linux

INTENT: I have a single-disk pool called "bigvaiterazfs" with mountpoints for /home and a few others, and I want to add a mirror to it with the least amount of fuss:

$ df
Filesystem 1K-blocks Used Available Use% Mounted on
bigvaiterazfs 628019968 0 628019968 0% /bigvaiterazfs
bigvaiterazfs/bluraytemp 24641536 23980800 660736 98% /mnt/bluraytemp25
bigvaiterazfs/dv 643575424 15555456 628019968 3% /bigvaiterazfs/dv
bigvaiterazfs/dv/bigvai500 857146496 229126528 628019968 27% /mnt/bigvai500
bigvaiterazfs/dv/compr 659756544 31736576 628019968 5% /bigvaiterazfs/dv/compr
bigvaiterazfs/home 628036608 16640 628019968 1% /home
bigvaiterazfs/home/user 640859264 12839296 628019968 3% /home/user
bigvaiterazfs/home/squid 629390336 1370368 628019968 1% /home/squid
bigvaiterazfs/home/vmtmpdir 628019968 0 628019968 0% /home/vmtmpdir

TODO: attach new WD 1TB black as mirror to existing zfs 1-disk pool "bigvaiterazfs" to get redundancy and better I/O

smartctl -a /dev/sdd # Original disk

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.2.0-30-generic](local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Black
Device Model: WDC WD1002FAEX-00Z3A0
Firmware Version: 05.01D05
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical

zpool status

pool: bigvaiterazfs
state: ONLINE
scan: scrub repaired 0 in 2h48m with 0 errors on Wed May 11 13:54:52 2016
config:

    NAME                                         STATE     READ WRITE CKSUM
    bigvaiterazfs                                ONLINE       0     0     0
      ata-WDC_WD1002FAEX-00Z3A0_WD-WCATRC635585  ONLINE       0     0     0

errors: No known data errors

--Intended mirror disk: new WD 1TB Black

smartctl -a /dev/sdd # NEW disk

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.2.0-30-generic](local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: WDC WD1003FZEX-00MK2A0
Serial Number: WD-WCC3F7RZZCL7
LU WWN Device Id: 5 0014ee 20d54fbb4
Firmware Version: 01.01A01
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)

--I put a GPT label on the new disk with gparted, and tried:

pool1=bigvaiterazfs; time zpool attach -o ashift=12 $pool1 \

ata-WDC_WD1002FAEX-00Z3A0_WD-WCATRC635585
ata-WDC_WD1003FZEX-00MK2A0_WD-WCC3F7RZZCL7

...and got error:

" cannot attach ata-WDC_WD1003FZEX-00MK2A0_WD-WCC3F7RZZCL7 to
ata-WDC_WD1002FAEX-00Z3A0_WD-WCATRC635585: new device has a different
optimal sector size; use the option '-o ashift=N' to override the optimal
size "

zpool get all bigvaiterazfs|grep ashift

bigvaiterazfs ashift 0 default

FML :( bigvaiterazfs pool was not created with ashift=12!

--Ok fair enough, the single-disk pool was created in ~2014 and I have learned a lot about ZFS since then. Trying to think around the problem, maybe I can replace the existing disk with the new disk on the fly, labelclear the old disk, re-GPT it, and attach the old one as a mirror with ashift=12...

--I came across this on a google search, but it doesn't work:
http://www.sotechdesign.com.au/zfs-zpool-replace-returns-error-cannot-replace-devices-have-different-sector-alignment/

time zpool replace bigvaiterazfs \

ata-WDC_WD1002FAEX-00Z3A0_WD-WCATRC635585
ata-WDC_WD1003FZEX-00MK2A0_WD-WCC3F7RZZCL7 -o ashift=12 -f

...and got:
(error) cannot replace ata-WDC_WD1002FAEX-00Z3A0_WD-WCATRC635585 with ata-WDC_WD1003FZEX-00MK2A0_WD-WCC3F7RZZCL7: new device has a different optimal sector size; use the option '-o ashift=N' to override the optimal size

BUG: No matter what I try, I can't accomplish what the error message is recommending, even when I move the -o to before the "replace" part of the command.

...So, looks like I have to create a new single-disk pool with the new drive, snapshot existing pool, copy existing data over with " zfs send |zfs receive ", and recreate the mountpoints. On top of planning all that out, I have to do this from TTY1 with no GUI running in a "screen" session because /home is involved and I will need to destroy the original pool to re-use the disk.

--This is turning out to be a rather large PITA when it was supposed to be simple(r) with ZFS.

--So, I am filing a bug report / feature request and documenting what I am doing so maybe it will help others. Ideally, ZFS should be able to copy existing data over on the fly to the new ashift=12 disk.

REF for zfs send/receive: https://forums.freebsd.org/threads/37819/

NEW INTENT: create a 1-disk pool with new drive, copy data over, reuse old drive as new mirror disk

p1=bigvaiterazfs

p2=bigvaiterazfsNB

d1=ata-WDC_WD1002FAEX-00Z3A0_WD-WCATRC635585

d2=ata-WDC_WD1003FZEX-00MK2A0_WD-WCC3F7RZZCL7

zpool create -o ashift=12 -o autoexpand=on -O atime=off $p2 $d2

Filesystem 1K-blocks Used Available Use% Mounted on
bigvaiterazfsNB 942669440 0 942669440 0% /bigvaiterazfsNB

zpool get ashift $p2

NAME PROPERTY VALUE SOURCE
bigvaiterazfsNB ashift 12 local

Prepare for data migration to new pool

zfs snapshot -r $p1@now

DONE - ONLY ONCE!

zpool set listsnaps=on $p1

zfs list -r $p1

NAME USED AVAIL REFER MOUNTPOINT
bigvaiterazfs 300G 599G 31K /bigvaiterazfs
bigvaiterazfs@now 0 - 31K -
bigvaiterazfs/bluraytemp 22.9G 645M 22.9G /mnt/bluraytemp25
bigvaiterazfs/bluraytemp@now 0 - 22.9G -
bigvaiterazfs/dv 264G 599G 14.8G /bigvaiterazfs/dv
bigvaiterazfs/dv@now 0 - 14.8G -
bigvaiterazfs/dv/bigvai500 219G 599G 219G /mnt/bigvai500
bigvaiterazfs/dv/bigvai500@now 0 - 219G -
bigvaiterazfs/dv/compr 30.3G 599G 30.3G /bigvaiterazfs/dv/compr
bigvaiterazfs/dv/compr@now 0 - 30.3G -
bigvaiterazfs/home 13.6G 599G 16.3M /home
bigvaiterazfs/home@now 0 - 16.3M -
bigvaiterazfs/home/user 12.2G 599G 12.2G /home/user
bigvaiterazfs/home/user@now 563K - 12.2G -
bigvaiterazfs/home/squid 1.31G 599G 1.31G /home/squid
bigvaiterazfs/home/squid@now 0 - 1.31G -
bigvaiterazfs/home/vmtmpdir 37K 599G 37K /home/vmtmpdir
bigvaiterazfs/home/vmtmpdir@now 0 - 37K -

DONE - migrate existing pool to new pool:

time zfs send -R $p1@now | zfs recv -dF $p2

cannot share 'bigvaiterazfsNB/dv': smb add share failed
cannot share 'bigvaiterazfsNB/dv/compr': smb add share failed
real 75m43.531s

Filesystem 1K-blocks Used Available Use% Mounted on
bigvaiterazfsNB 627546624 128 627546496 1% /bigvaiterazfsNB
bigvaiterazfsNB/dv 643107712 15561216 627546496 3% /bigvaiterazfsNB/dv
bigvaiterazfsNB/dv/compr 659511296 31964800 627546496 5% /bigvaiterazfsNB/dv/compr

--Now I need to take myself out of X windows, recreate the mountpoints (which can be gotten from zpool history, but fortunately I also document all my changes in a text file) and get the old disk out of the way. Will update this issue when I have everything in place, and hopefully nothing goes wrong.

The text was updated successfully, but these errors were encountered:

kingneutron · 2016-06-07T23:19:40Z

--OK, so after unplugging the original disk and rebooting, things went a bit better than expected. Did not have to recreate the mountpoints. Again, this is Ubuntu 14.04-64

( I also forgot to mention that I backed up the whole original pool with tar to another compressed zfs pool before doing ANYTHING: )

time tar cpf - /bigvaiterazfs /mnt/bluraytemp25 /mnt/bigvai500 /home \

|pv > /zredpool2/dvcompr/bkp-bigvaiterazfs--home--bigvai500--bluray--b4-add-mirror--20160607.tar1
325GB 2:21:32 [39.2MB/s] ]
real 141m32.419s

DbigvaiterazfsA=/dev/disk/by-id/ata-WDC_WD1002FAEX-00Z3A0_WD-WCATRC635585 #= sdd
DbigvaiterazfsB=/dev/disk/by-id/ata-WDC_WD1003FZEX-00MK2A0_WD-WCC3F7RZZCL7 #= sdh

stop lightdm

umount /mnt/bluraytemp25; umount /mnt/bigvai500; umount /home; umount /home/*

umount: /home: device is busy.
(In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))

zpool export bigvaiterazfs ## also unplugged SATA power

( NOTE: this is what I tried, but you can SKIP this step )

zfs create -o mountpoint=/home -o atime=off bigvaiterazfsNB/home

(got error) cannot create 'bigvaiterazfsNB/home': dataset already exists

reboot

stop lightdm

things went better than expected; did not have to recreate mountpoints with original disk unplugged:

bigvaiterazfsNB 627546624 128 627546496 1% /bigvaiterazfsNB
bigvaiterazfsNB/bluraytemp 24641536 23988224 653312 98% /mnt/bluraytemp25
bigvaiterazfsNB/dv 643107712 15561216 627546496 3% /bigvaiterazfsNB/dv
bigvaiterazfsNB/dv/bigvai500 856759040 229212544 627546496 27% /mnt/bigvai500
bigvaiterazfsNB/dv/compr 659511296 31964800 627546496 5% /bigvaiterazfsNB/dv/compr
bigvaiterazfsNB/home 627563392 16896 627546496 1% /home
bigvaiterazfsNB/home/user 640518912 12972416 627546496 3% /home/user
bigvaiterazfsNB/home/squid 628951040 1404544 627546496 1% /home/squid
bigvaiterazfsNB/home/vmtmpdir 627546624 128 627546496 1% /home/vmtmpdir

SKIP # zfs create -o mountpoint=/mnt/bigvai500 -o atime=off bigvaiterazfsNB/dv/bigvai500
SKIP # zfs create -o compression=off -o atime=off
-o mountpoint=/mnt/bluraytemp25 -o quota=23.5G bigvaiterazfsNB/bluraytemp

start lightdm

--X came up OK, and overall response is improved because the original disk was attached to a 4-port SATA II PCI card.

--Now I can re-use the original disk as a mirror, more details to follow.

DeHackEd · 2016-06-08T23:20:36Z

You're actually supposed to use zpool replace -o ashift=9 ... to override detection on the new drive. Also note that while ashift looks like a pool property, it really isn't and the -o ashift=X notation is not really related to the equivalent operation of zpool set. That could be documented better.

But the warning is for your own good. The resilver will easily take 2x, maybe 10x as long as a scrub normally would and the pool will perform a bit worse during normal operation.

Use <code>tags</code> to help make your output more readable

kingneutron · 2016-06-09T08:31:52Z

--I understand what you're saying, but I don't believe this behavior is what the user expects. Original drive was 512 sector, replacement drive is 512 reported/4096 actual, so using ashift=9 gets worse performance out of the new drive.

--Desired behavior is for ZFS to replace the existing drive on the fly and use the more desirable ashift=12, since practically nobody is making 512 sector drives anymore and you want to future-proof the pool PLUS get better performance.

--Actually my resilvers don't take that long, I just do a few basic blockdev --setra tweaks and don't use the pool during resilver. This is the new replacement WD 1TB Black with a WD 1TB Blue mirror, standard COTS hardware; they are not even SAS drives or 10K RPM:


zpool status
pool: bigvaiterazfsNB

state: ONLINE

scan: resilvered 301G in 0h36m with 0 errors on Wed Jun  8 11:52:10 2016

config:
    NAME                                            STATE     READ WRITE CKSUM
    bigvaiterazfsNB                                 ONLINE       0     0     0
      mirror-0                                      ONLINE       0     0     0
        ata-WDC_WD1003FZEX-00MK2A0_WD-WCC3F7RZZCL7  ONLINE       0     0     0
        ata-WDC_WD10EZEX-00RKKA0_WD-WCC1S0347255    ONLINE       0     0     0

DeHackEd · 2016-06-09T09:57:52Z

Sorry, it doesn't matter what the user expects because that's not how it works. You can't mirror two disks but have them use different pool layout geometries. Furthermore ZFS is incapable of converting the geometry of an existing vdev due to the requirements of Block Pointer Rewrites.

kingneutron · 2016-06-09T15:24:31Z

Furthermore ZFS is incapable of converting the geometry of an existing vdev due to the requirements of Block Pointer Rewrites.

--Which has been "pending" for YEARS. Which is why I'm filing this bug report... I was trying to add a mirror, and failing that the device should be able to be replaced with a higher ashift since ZFS is copying the data over anyway.

--Getting bitten by "ashift" in 2016 is a big PITA when you've been strongly recommending to all your friends to use ZFS on Linux for the last 3 years. For anyone else who may run into the issue, I hope I've documented the process well enough to get past it, but the filesystem should be capable of doing what the user expects given how it works in other areas.

--Ashift behavior should be uniform across commands. If the pool was created with ashift=12, all vdevs that are added to the pool should INHERIT this property unless overridden. Furthermore, ALL new pools should by now be created with ashift=12 as the default (with the exception of SSD-based disks) even if they are using 512 sector disks, to avoid this issue in the future.

--We expect the filesystem to automagic do what it takes to make our lives easier, that is one of the main draws of ZFS. Waiting for this bug report to get an Assignee, thanks

This commit allow higher ashift values (up to 16) in 'zpool create' The ashift value was previously limited to 13 (8K block) in b41c990 because the limited number of uberblocks we could fit in the statically sized (128K) vdev label ring buffer could prevent the ability the safely roll back a pool to recover it. Since b02fe35 the largest uberblock size we support is 8K: this allow us to store a minimum number of 16 uberblocks in the vdev label, even with higher ashift values. Additionally change 'ashift' pool property behaviour: if set it will be used as the default hint value in subsequent vdev operations ('zpool add', 'attach' and 'replace'). A custom ashift value can still be specified from the command line, if desired. Finally, fix a bug in add-o_ashift.ksh caused by a missing variable. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #2024 Closes #4205 Closes #4740 Closes #5763

DurvalMenezes · 2017-11-08T20:00:57Z

I'm getting exactly this error message:
Trying "zpool replace":

cannot replace REDACTED_OLD_DEV with REDACTED_NEW_DEV: new device has a
different optimal sector size; use the option '-o ashift=N' to override the
optimal size

Ditto, "zpool attach" (after detaching the device that was going to be replaced):

cannot attach REDACTED_NEW_DEV to REDACTED_BASE_DEV: new device has a
different optimal sector size; use the option '-o ashift=N' to override the
optimal size

In both cases, re-running the commands with the suggested "-o ashift=12" accomplishes nothing but getting the same messages all over again.

To add insult to injury, the aforementioned pool has ashift=12:

zpool get ashift REDACTED_POOL_NAME
NAME                                   PROPERTY  VALUE   SOURCE
REDACTED_POOL_NAME  ashift    12      local

REDACTED_NEW_DEV is listed by fdisk as having 512-byte physical sectors (as it should, being a LUKS device):

fdisk -l /dev/mapper/ZFS_ARCHIVE_003B2_TRY3 
[...]
Sector size (logical/physical): 512 bytes / 512 bytes

If the error is due to the device being 512 bytes, shouldn't "-o ashift=12" override it?

Please see more details here: http://list.zfsonlinux.org/pipermail/zfs-discuss/2017-November/029831.html

This is with ZFS/SPL 0.7.2 running on top of kernel 4.9.30 on amd64.

Wasn't this supposed to be resolved already? Or didn't the fix made it to 0.7.2?

I really need this working, I have a critical pool here without redundancy because of it... thanks in advance for any help in fixing and/or working around this.

rlaager · 2017-11-09T15:40:31Z

I can't help much, but... What does sudo zdb POOL show for ashift values?

loli10K mentioned this issue Apr 10, 2017

Further ashift handling improvements #5763

Merged

11 tasks

behlendorf closed this as completed in #5763 May 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ashift issue on linux - cannot add mirror disk or replace single-disk pool created with "default" (ashift=9) with 4K disk ashift=12 -- new device has a different optimal sector size #4740

ashift issue on linux - cannot add mirror disk or replace single-disk pool created with "default" (ashift=9) with 4K disk ashift=12 -- new device has a different optimal sector size #4740

kingneutron commented Jun 7, 2016 •

edited

Loading

kingneutron commented Jun 7, 2016 •

edited

Loading

DeHackEd commented Jun 8, 2016

kingneutron commented Jun 9, 2016

zpool status

DeHackEd commented Jun 9, 2016

kingneutron commented Jun 9, 2016

DurvalMenezes commented Nov 8, 2017

rlaager commented Nov 9, 2017

ashift issue on linux - cannot add mirror disk or replace single-disk pool created with "default" (ashift=9) with 4K disk ashift=12 -- new device has a different optimal sector size #4740

ashift issue on linux - cannot add mirror disk or replace single-disk pool created with "default" (ashift=9) with 4K disk ashift=12 -- new device has a different optimal sector size #4740

Comments

kingneutron commented Jun 7, 2016 • edited Loading

smartctl -a /dev/sdd # Original disk

zpool status

smartctl -a /dev/sdd # NEW disk

pool1=bigvaiterazfs; time zpool attach -o ashift=12 $pool1 \

zpool get all bigvaiterazfs|grep ashift

FML :( bigvaiterazfs pool was not created with ashift=12!

time zpool replace bigvaiterazfs \

p1=bigvaiterazfs

p2=bigvaiterazfsNB

d1=ata-WDC_WD1002FAEX-00Z3A0_WD-WCATRC635585

d2=ata-WDC_WD1003FZEX-00MK2A0_WD-WCC3F7RZZCL7

zpool create -o ashift=12 -o autoexpand=on -O atime=off $p2 $d2

zpool get ashift $p2

Prepare for data migration to new pool

zfs snapshot -r $p1@now

zpool set listsnaps=on $p1

zfs list -r $p1

time zfs send -R $p1@now | zfs recv -dF $p2

kingneutron commented Jun 7, 2016 • edited Loading

time tar cpf - /bigvaiterazfs /mnt/bluraytemp25 /mnt/bigvai500 /home \

stop lightdm

umount /mnt/bluraytemp25; umount /mnt/bigvai500; umount /home; umount /home/*

zpool export bigvaiterazfs ## also unplugged SATA power

zfs create -o mountpoint=/home -o atime=off bigvaiterazfsNB/home

reboot

stop lightdm

things went better than expected; did not have to recreate mountpoints with original disk unplugged:

start lightdm

DeHackEd commented Jun 8, 2016

kingneutron commented Jun 9, 2016

zpool status

DeHackEd commented Jun 9, 2016

kingneutron commented Jun 9, 2016

DurvalMenezes commented Nov 8, 2017

rlaager commented Nov 9, 2017

kingneutron commented Jun 7, 2016 •

edited

Loading

kingneutron commented Jun 7, 2016 •

edited

Loading