-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ashift issue on linux - cannot add mirror disk or replace single-disk pool created with "default" (ashift=9) with 4K disk ashift=12 -- new device has a different optimal sector size #4740
Comments
--OK, so after unplugging the original disk and rebooting, things went a bit better than expected. Did not have to recreate the mountpoints. Again, this is Ubuntu 14.04-64 ( I also forgot to mention that I backed up the whole original pool with tar to another compressed zfs pool before doing ANYTHING: ) time tar cpf - /bigvaiterazfs /mnt/bluraytemp25 /mnt/bigvai500 /home \|pv > /zredpool2/dvcompr/bkp-bigvaiterazfs--home--bigvai500--bluray--b4-add-mirror--20160607.tar1 DbigvaiterazfsA=/dev/disk/by-id/ata-WDC_WD1002FAEX-00Z3A0_WD-WCATRC635585 #= sdd stop lightdmumount /mnt/bluraytemp25; umount /mnt/bigvai500; umount /home; umount /home/*umount: /home: device is busy. zpool export bigvaiterazfs ## also unplugged SATA power( NOTE: this is what I tried, but you can SKIP this step ) zfs create -o mountpoint=/home -o atime=off bigvaiterazfsNB/home(got error) cannot create 'bigvaiterazfsNB/home': dataset already exists rebootstop lightdmthings went better than expected; did not have to recreate mountpoints with original disk unplugged:bigvaiterazfsNB 627546624 128 627546496 1% /bigvaiterazfsNB SKIP # zfs create -o mountpoint=/mnt/bigvai500 -o atime=off bigvaiterazfsNB/dv/bigvai500 start lightdm--X came up OK, and overall response is improved because the original disk was attached to a 4-port SATA II PCI card. --Now I can re-use the original disk as a mirror, more details to follow. |
You're actually supposed to use But the warning is for your own good. The resilver will easily take 2x, maybe 10x as long as a scrub normally would and the pool will perform a bit worse during normal operation.
|
--I understand what you're saying, but I don't believe this behavior is what the user expects. Original drive was 512 sector, replacement drive is 512 reported/4096 actual, so using ashift=9 gets worse performance out of the new drive. --Desired behavior is for ZFS to replace the existing drive on the fly and use the more desirable ashift=12, since practically nobody is making 512 sector drives anymore and you want to future-proof the pool PLUS get better performance. --Actually my resilvers don't take that long, I just do a few basic blockdev --setra tweaks and don't use the pool during resilver. This is the new replacement WD 1TB Black with a WD 1TB Blue mirror, standard COTS hardware; they are not even SAS drives or 10K RPM:
|
Sorry, it doesn't matter what the user expects because that's not how it works. You can't mirror two disks but have them use different pool layout geometries. Furthermore ZFS is incapable of converting the geometry of an existing vdev due to the requirements of Block Pointer Rewrites. |
--Which has been "pending" for YEARS. Which is why I'm filing this bug report... I was trying to add a mirror, and failing that the device should be able to be replaced with a higher ashift since ZFS is copying the data over anyway. --Getting bitten by "ashift" in 2016 is a big PITA when you've been strongly recommending to all your friends to use ZFS on Linux for the last 3 years. For anyone else who may run into the issue, I hope I've documented the process well enough to get past it, but the filesystem should be capable of doing what the user expects given how it works in other areas. --Ashift behavior should be uniform across commands. If the pool was created with ashift=12, all vdevs that are added to the pool should INHERIT this property unless overridden. Furthermore, ALL new pools should by now be created with ashift=12 as the default (with the exception of SSD-based disks) even if they are using 512 sector disks, to avoid this issue in the future. --We expect the filesystem to automagic do what it takes to make our lives easier, that is one of the main draws of ZFS. Waiting for this bug report to get an Assignee, thanks |
This commit allow higher ashift values (up to 16) in 'zpool create' The ashift value was previously limited to 13 (8K block) in b41c990 because the limited number of uberblocks we could fit in the statically sized (128K) vdev label ring buffer could prevent the ability the safely roll back a pool to recover it. Since b02fe35 the largest uberblock size we support is 8K: this allow us to store a minimum number of 16 uberblocks in the vdev label, even with higher ashift values. Additionally change 'ashift' pool property behaviour: if set it will be used as the default hint value in subsequent vdev operations ('zpool add', 'attach' and 'replace'). A custom ashift value can still be specified from the command line, if desired. Finally, fix a bug in add-o_ashift.ksh caused by a missing variable. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #2024 Closes #4205 Closes #4740 Closes #5763
I'm getting exactly this error message:
Ditto, "zpool attach" (after detaching the device that was going to be replaced):
In both cases, re-running the commands with the suggested "-o ashift=12" accomplishes nothing but getting the same messages all over again. To add insult to injury, the aforementioned pool has ashift=12:
REDACTED_NEW_DEV is listed by fdisk as having 512-byte physical sectors (as it should, being a LUKS device):
If the error is due to the device being 512 bytes, shouldn't "-o ashift=12" override it? Please see more details here: http://list.zfsonlinux.org/pipermail/zfs-discuss/2017-November/029831.html This is with ZFS/SPL 0.7.2 running on top of kernel 4.9.30 on amd64. Wasn't this supposed to be resolved already? Or didn't the fix made it to 0.7.2? I really need this working, I have a critical pool here without redundancy because of it... thanks in advance for any help in fixing and/or working around this. |
I can't help much, but... What does |
Ref: #1328
--System info:
Ubuntu 14.04--64--LTS
Kernel: 4.2.0-30-generic #36~14.04.1-Ubuntu SMP
RAM: 12GB
Swap: 2GB, mostly un-used due to system optimizations
--ZFS software versions:
$ dpkg -l|grep zfs
ii dkms 2.2.0.3-1.1ubuntu5.14.04.1+zfs10
trusty all Dynamic Kernel Module Support Frameworktrusty amd64 Native OpenZFS filesystem library for Linuxii libzfs2 0.6.5.7-1
ii mountall 2.53-zfs1 amd64 filesystem mounting tool
ii ubuntu-zfs 8
trusty amd64 Native ZFS filesystem metapackage for Ubuntu.trusty amd64 Native OpenZFS filesystem kernel modules for Linuxii zfs-dkms 0.6.5.7-1
ii zfs-doc 0.6.5.7-1
trusty amd64 Native OpenZFS filesystem documentation and examples.trusty amd64 Native OpenZFS management utilities for Linuxii zfsutils 0.6.5.7-1
INTENT: I have a single-disk pool called "bigvaiterazfs" with mountpoints for /home and a few others, and I want to add a mirror to it with the least amount of fuss:
$ df
Filesystem 1K-blocks Used Available Use% Mounted on
bigvaiterazfs 628019968 0 628019968 0% /bigvaiterazfs
bigvaiterazfs/bluraytemp 24641536 23980800 660736 98% /mnt/bluraytemp25
bigvaiterazfs/dv 643575424 15555456 628019968 3% /bigvaiterazfs/dv
bigvaiterazfs/dv/bigvai500 857146496 229126528 628019968 27% /mnt/bigvai500
bigvaiterazfs/dv/compr 659756544 31736576 628019968 5% /bigvaiterazfs/dv/compr
bigvaiterazfs/home 628036608 16640 628019968 1% /home
bigvaiterazfs/home/user 640859264 12839296 628019968 3% /home/user
bigvaiterazfs/home/squid 629390336 1370368 628019968 1% /home/squid
bigvaiterazfs/home/vmtmpdir 628019968 0 628019968 0% /home/vmtmpdir
TODO: attach new WD 1TB black as mirror to existing zfs 1-disk pool "bigvaiterazfs" to get redundancy and better I/O
smartctl -a /dev/sdd # Original disk
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.2.0-30-generic](local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Black
Device Model: WDC WD1002FAEX-00Z3A0
Firmware Version: 05.01D05
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
zpool status
pool: bigvaiterazfs
state: ONLINE
scan: scrub repaired 0 in 2h48m with 0 errors on Wed May 11 13:54:52 2016
config:
errors: No known data errors
--Intended mirror disk: new WD 1TB Black
smartctl -a /dev/sdd # NEW disk
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.2.0-30-generic](local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: WDC WD1003FZEX-00MK2A0
Serial Number: WD-WCC3F7RZZCL7
LU WWN Device Id: 5 0014ee 20d54fbb4
Firmware Version: 01.01A01
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
--I put a GPT label on the new disk with gparted, and tried:
pool1=bigvaiterazfs; time zpool attach -o ashift=12 $pool1 \
ata-WDC_WD1002FAEX-00Z3A0_WD-WCATRC635585
ata-WDC_WD1003FZEX-00MK2A0_WD-WCC3F7RZZCL7
...and got error:
" cannot attach ata-WDC_WD1003FZEX-00MK2A0_WD-WCC3F7RZZCL7 to
ata-WDC_WD1002FAEX-00Z3A0_WD-WCATRC635585: new device has a different
optimal sector size; use the option '-o ashift=N' to override the optimal
size "
zpool get all bigvaiterazfs|grep ashift
bigvaiterazfs ashift 0 default
FML :( bigvaiterazfs pool was not created with ashift=12!
--Ok fair enough, the single-disk pool was created in ~2014 and I have learned a lot about ZFS since then. Trying to think around the problem, maybe I can replace the existing disk with the new disk on the fly, labelclear the old disk, re-GPT it, and attach the old one as a mirror with ashift=12...
--I came across this on a google search, but it doesn't work:
http://www.sotechdesign.com.au/zfs-zpool-replace-returns-error-cannot-replace-devices-have-different-sector-alignment/
time zpool replace bigvaiterazfs \
ata-WDC_WD1002FAEX-00Z3A0_WD-WCATRC635585
ata-WDC_WD1003FZEX-00MK2A0_WD-WCC3F7RZZCL7 -o ashift=12 -f
...and got:
(error) cannot replace ata-WDC_WD1002FAEX-00Z3A0_WD-WCATRC635585 with ata-WDC_WD1003FZEX-00MK2A0_WD-WCC3F7RZZCL7: new device has a different optimal sector size; use the option '-o ashift=N' to override the optimal size
BUG: No matter what I try, I can't accomplish what the error message is recommending, even when I move the -o to before the "replace" part of the command.
...So, looks like I have to create a new single-disk pool with the new drive, snapshot existing pool, copy existing data over with " zfs send |zfs receive ", and recreate the mountpoints. On top of planning all that out, I have to do this from TTY1 with no GUI running in a "screen" session because /home is involved and I will need to destroy the original pool to re-use the disk.
--This is turning out to be a rather large PITA when it was supposed to be simple(r) with ZFS.
--So, I am filing a bug report / feature request and documenting what I am doing so maybe it will help others. Ideally, ZFS should be able to copy existing data over on the fly to the new ashift=12 disk.
REF for zfs send/receive: https://forums.freebsd.org/threads/37819/
NEW INTENT: create a 1-disk pool with new drive, copy data over, reuse old drive as new mirror disk
p1=bigvaiterazfs
p2=bigvaiterazfsNB
d1=ata-WDC_WD1002FAEX-00Z3A0_WD-WCATRC635585
d2=ata-WDC_WD1003FZEX-00MK2A0_WD-WCC3F7RZZCL7
zpool create -o ashift=12 -o autoexpand=on -O atime=off $p2 $d2
Filesystem 1K-blocks Used Available Use% Mounted on
bigvaiterazfsNB 942669440 0 942669440 0% /bigvaiterazfsNB
zpool get ashift $p2
NAME PROPERTY VALUE SOURCE
bigvaiterazfsNB ashift 12 local
Prepare for data migration to new pool
zfs snapshot -r $p1@now
DONE - ONLY ONCE!
zpool set listsnaps=on $p1
zfs list -r $p1
NAME USED AVAIL REFER MOUNTPOINT
bigvaiterazfs 300G 599G 31K /bigvaiterazfs
bigvaiterazfs@now 0 - 31K -
bigvaiterazfs/bluraytemp 22.9G 645M 22.9G /mnt/bluraytemp25
bigvaiterazfs/bluraytemp@now 0 - 22.9G -
bigvaiterazfs/dv 264G 599G 14.8G /bigvaiterazfs/dv
bigvaiterazfs/dv@now 0 - 14.8G -
bigvaiterazfs/dv/bigvai500 219G 599G 219G /mnt/bigvai500
bigvaiterazfs/dv/bigvai500@now 0 - 219G -
bigvaiterazfs/dv/compr 30.3G 599G 30.3G /bigvaiterazfs/dv/compr
bigvaiterazfs/dv/compr@now 0 - 30.3G -
bigvaiterazfs/home 13.6G 599G 16.3M /home
bigvaiterazfs/home@now 0 - 16.3M -
bigvaiterazfs/home/user 12.2G 599G 12.2G /home/user
bigvaiterazfs/home/user@now 563K - 12.2G -
bigvaiterazfs/home/squid 1.31G 599G 1.31G /home/squid
bigvaiterazfs/home/squid@now 0 - 1.31G -
bigvaiterazfs/home/vmtmpdir 37K 599G 37K /home/vmtmpdir
bigvaiterazfs/home/vmtmpdir@now 0 - 37K -
DONE - migrate existing pool to new pool:
time zfs send -R $p1@now | zfs recv -dF $p2
cannot share 'bigvaiterazfsNB/dv': smb add share failed
cannot share 'bigvaiterazfsNB/dv/compr': smb add share failed
real 75m43.531s
Filesystem 1K-blocks Used Available Use% Mounted on
bigvaiterazfsNB 627546624 128 627546496 1% /bigvaiterazfsNB
bigvaiterazfsNB/dv 643107712 15561216 627546496 3% /bigvaiterazfsNB/dv
bigvaiterazfsNB/dv/compr 659511296 31964800 627546496 5% /bigvaiterazfsNB/dv/compr
--Now I need to take myself out of X windows, recreate the mountpoints (which can be gotten from zpool history, but fortunately I also document all my changes in a text file) and get the old disk out of the way. Will update this issue when I have everything in place, and hopefully nothing goes wrong.
The text was updated successfully, but these errors were encountered: