-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draid and available space #13727
Comments
That is surprising, the space usage should be close to what's reported for a similar raidz config. Using the 2.1.5 release I wasn't able to reproduce this. Can you double check that all of the drives are the expected size?
With dRAID your allowed to independently select the parity level, number of data drives, spares, and total children. You don't need to worry about the total number of groups, ZFS will calculate the optimal number to best utilize the capacity. If you're interested in the details there's a nice [comment which described the layout. |
I checked
And on 16 TB files:
And on 18 TB files:
|
And information about disks
|
@behlendorf I can hold this hardware for a day if you have some thoughts to test, otherwise I'll have to use raidz for now to start production usage, unfortunately |
@gleb-shchavlev I looked in to this a bit and the decrease in reported usable capacity is caused by:
With dRAID variable stripe widths are not supported which is differs from RAIDZ. This means every RAID stripe will be padded out to the full stripe width if needed. For a 16d+2p configuration with 4k sectors that makes the minimum allocation size 16*4k=64K. If the pool is primarily storing large files (>1M) this overhead is minimal, however if you'll be storing small files (<64k) it will be significant. This is the fundamental tradeoff which needed to be made in order to support sequential resilvering for dRAID, and why this feature can't be supported with RAIDZ. Which vdev configuration is right for you will depend on your expected workload. If you'd like to use dRAID for the faster rebuild times, then using either a narrower stripe width (say 8d+2p) or a smaller 512 sector size (ashift=9) will let you reduce the minimum allocation size, and with it reported available capacity.
or
|
Many thanks for the help! We will storing large files (>1M). This is backup S3 storage with minio on top of zfs. I tried to create all possible pools and evaluate free space. Command
Results
draid1 with 34d cannot be created:
draid2 with 33d cannot be created:
Summary Maximum free space:
Is it correct to create a draid with so many (32d) disks? Which option is more safe: draid2 with one spare or draid1 with two spares? draid2:32d:36c:2s and draid2:33d:36c:1s gives practicaly equal space, why? How many disks can fail to keep the pool running? I want to thank you again for your help to understand how to calculate free space for draid. |
Generally I'd recommend against going larger than about 16 data disks. As you can see from your free space table going wider doesn't directly equate to more capacity, but it will absolutely reduce performance and slow down distributed rebuilds. I find a draid2:8d config strikes a pretty reasonable balance between usable capacity, performance, and rebuild speed.
With dRAID you can lose up to the number of parity devices all at the same time. It does not depend on the "d" option.
Definitely draid2 with a single spare. With this configuration you can lose any two devices, then after the pool has rebuilt to the distributed spare the pool will be resilient to another failure. Meaning you could lose up to 3 devices depending on exactly when they fail.
When comparing the reported available capacity one thing worth keeping in mind is that it's an estimate based on some reasonable assumptions (expected average Specifically for 32d vs 33d it's because ZFS assumes an average |
I'm sorry to open this can of worms again, but I am in much the same pickle as described here.. Only when I go all the way down to something like raidz1 with 5 disks I get 65,3TB which is 16,3TB per disk and only a loss of 300GB. I would like to know if this major size difference is due to configuration, or if it is related to zfs 2.1.4 (you mentioned at first that you was unable to replicate this on 2.1.5) If you have any recommendations to setup this better, they are welcome. Our goal was to use draid with distributed spares because of the faster resilvering on these large disks... but if it comes at the price of 11% capacity los it may not seem so great after all... bare in mind that this los is after we have set aside 6 disks capacity already... |
It's a good question, and I can understand the concern. What probably needs to be better explained is that For a dRAID configuration this estimate may be lower than you'd expect because, unlike raidz, dRAID must always write a full stripe using every data drive. This constraint is what's makes a fully sequential rebuild to a distributed spare possible, but it does also mean some capacity is lost to padding. Let's look at your draid2:9d:24c:2s config and where that 262T estimate comes from for a pool with 16.4TB drives and 4k sectors. We need to make some assumptions for the estimate, and Now to store these 128K uncompressed blocks they will be effectively broken in to 32 - 4K sector size pieces and then spread over all the disks. Where A-K are drives, 1-32 are the data sectors, P1/P2 are parity sectors, and those 4 XXs sectors are padding which was added. That means this block was 32 / 36 = ~89% space efficient. Extrapolating that out to the whole pool, ignoring spare and parity drives, works out to 16.4TB * (24 - 6 drives) * 0.89 = 262.4TB. Which is what's reported by
But if we run the same calculation but instead assume an average It's also worth mentioning that in part it's because of this required padding that the man page recommends adding a |
Thank you very much for the quick response, and very good explanation, which makes sense. |
One follow up question, and I am sorry if this is not directly related. |
Support for 1M record sizes was added way back in 2015, but the default was left at 128K. As for send/receive it won't increase the original block size, so in this case you'll want to use something like rsync. It sounds like you already have some scripts to determine how large to size the special devices. Another nice way to do this is with |
Yes I can confirm that zfs send was not able to utilize the larger record size on the destination pool... |
Closing. The available space is being reported correctly. That said, I completely agree it's not intuitive why the value may be lower than expected and we should probably consider assuming a 1M block size instead for dRAID pools. |
System information
Describe the problem you're observing
We're going to use draid feature on a server with 36 disks.
At first we create two pools with raidz1 and raidz2:
Two raidz1 pools with 17 disks in each:
And we have 523T disk space.
Two raidz2 pools with 17 disks in each:
And we have 457T disk space.
We're create some tests draid pool with various parameters to understand how many space we can have.
First draid pool
2 x raidz1 analogue.
draid1:16d:36c:2s
(16 disks + 1 per parity) * 2 group + 2 spare = 36 disks
Looks as expected:
Same available space as with 2 x raidz1.
Second draid pool
draid2:15d:36c:2s
(15 disks + 2 per parity) * 2 group + 2 spare = 36 disks
Why only 349T space?
We expected same space as 2 x raidz2: 457T
Third draid pool
draid2:16d:36c:2s
(16 disks + 2 per parity) * 2 group + 2 spare = 38 disks (!!!)
495T it's ok, but why draid with such strange parameters can be created?
Why we have 495T space?
We expected same space as 2 x raidz2: 457T
More space is better, but why we have more space?
What does it mean?
Describe how to reproduce the problem
Just create pools as described above.
The text was updated successfully, but these errors were encountered: