Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Space usage difference since upgrade #2497

Closed
ldonzis opened this issue Jul 15, 2014 · 17 comments
Closed

Space usage difference since upgrade #2497

ldonzis opened this issue Jul 15, 2014 · 17 comments
Labels
Type: Documentation Indicates a requested change to the documentation
Milestone

Comments

@ldonzis
Copy link

ldonzis commented Jul 15, 2014

We have two systems that are kept synchronized through zfs send / zfs receive. Normally, the disk space usage on the two is identical, or at least extremely close.

Since installing 0.6.3 on the secondary server (was previously running 0.6.2), destroying the pool, and doing a complete re-sync with "zfs send -R", the space used is now quite different, about 10% higher as a rough guess. (Roughly 400GB out of 4TB).

I did notice that we had set "xattr=sa" on the source filesystems, and that attribute did not get replicated through the send/recv. But after starting over and setting xattr=sa on the destination filesystems, that made no difference.

Is there any reason that space usage on filesystems using 0.6.3 would be higher than 0.6.2?

As an aside, when blocks are sent and received, are they transferred in the native compressed format, or are they uncompressed, transferred, and then re-compressed on the destination? Just curious, thanks.

@DeHackEd
Copy link
Contributor

zfs send/receive does not replicate any compression options directly. It is necessary to specify the desired compression on the receiving side as the data written to disk is re-compressed as it goes. This can be done implicitly by property replication with zfs send -p (or -R).

To your original question, there are a few possibilities. Compression is certainly one. Dedup also if applicable. Different output is shown between zfs list and zpool list - most notably zpool list counts parity as space while zfs list shows more traditional disk usage reports.

If you've changed compression algorithms then maybe the older pool had some data compressed with different algorithms but now everything is re-compressed with a single algorithm resulting in different outputs.

These are just a few possibilities.

@ldonzis
Copy link
Author

ldonzis commented Jul 15, 2014

Thanks for the clarity. The compression algorithm is the same on both (lz4), and we are not using deduplication. Compression was set up initially the same when the pool was created (-Ocompress=lz4), but I think it's forced the same anyway because of zfs send -R. Interestingly, -R does not appear to propagate "xattr=sa", if that's of any relevance in space usage, but this time we set xattr=sa on the destination before starting the replication, to make sure all attributes are identical. The outputs of "zfs get all" and "zpool get all" are fundamentally the same, although 0.6.3 apparently adds a few new attributes that don't show up in 0.6.2. The following attributes appear on the 0.6.3 system that are not on 0.6.2: logicalused, logicalreferenced, acltype, context, fscontext, defcontext, rootcontext, relatime.

I would understand minor differences, but in this case, it's amounting to hundreds of gigabytes of additional storage used (out of a few terabytes).

@FransUrbo
Copy link
Contributor

'-R' "includes '-p'...

@ldonzis
Copy link
Author

ldonzis commented Jul 15, 2014

Sorry, not sure what you mean by that, i.e., your point?

@FransUrbo
Copy link
Contributor

zfs send/receive does not replicate any compression options directly.

Using '-R' (which includes '-p' - include properties in stream), it does in this case.

Why it differ 10% I don't know. Copies?

@behlendorf behlendorf added this to the 0.7.0 milestone Jul 17, 2014
@ldonzis
Copy link
Author

ldonzis commented Jul 21, 2014

There are no copies, it's an exact replica. Same script used in 0.6.2 and the space used was essentially identical. The command being used is essentially like this:

zfs send -v -R -I ds/ds1@replication-snapshot ds/ds1@replication-snapshot | rsh 10.0.0.x zfs receive -F ds/ds1

Theoretically, all properties of the source filesystem should get replicated because of -R. As I mentioned previously, the xattr attribute is not getting propagated, but that's a different matter and doesn't account for the difference in space used.

@ldonzis
Copy link
Author

ldonzis commented Nov 8, 2014

fyi, this continues to be a problem. The receiving filesystem is down to less than 25GB free while the sending filesystem has over 630GB free space. Or, put another way, the origin filesystem is using 4.49T, while the destination is using 4.94T, still a difference of around 10%. I can't think of a reason other than compression somehow being less effective in the newer code, which seems almost impossible, but all I can say is that before upgrading to 0.9.3, the two systems stayed nearly identical in space used.

I was thinking of destroying the destination and re-sync'ing everything just to be sure, since we're going to run out of space in a few days if we don't do that. But before I try that, is there anything we could look at that would be useful to help track this down?

@behlendorf
Copy link
Contributor

@ldonzis Is the pool geometry the same for both pools. For example are the both built from mirrors or are they both raidz? Depending on the exact geometry space is consumed slightly differently.

@ldonzis
Copy link
Author

ldonzis commented Nov 8, 2014

Yeah, they're identical (even same host adapters & drives, not that it matters). Each system has eight drives, configured as two raidz1 vdevs of four drives each. (This was due to my belief from reading various tips that this would perform better than one vdev using raidz2.)

They've been that way for quite a long time, but I first noticed the change after upgrading the backup/receiving side to 0.6.3. The primary/sending side is still running 0.6.2.

Here's the primary (sending) side:

root@ss1:# zpool list ds
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
ds 6.94T 5.92T 1.02T 85% 1.00x ONLINE -
root@ss1:
# zpool status ds
pool: ds
state: ONLINE
scan: scrub repaired 0 in 3h46m with 0 errors on Mon Nov 3 04:46:30 2014
config:

    NAME                                               STATE     READ WRITE CKSUM
    ds                                                 ONLINE       0     0     0
      raidz1-0                                         ONLINE       0     0     0
        ata-SAMSUNG_MZ7WD960HAGP-00003_S186NYAD800099  ONLINE       0     0     0
        ata-SAMSUNG_MZ7WD960HAGP-00003_S186NYAF547607  ONLINE       0     0     0
        ata-SAMSUNG_MZ7WD960HAGP-00003_S186NYAD800105  ONLINE       0     0     0
        ata-SAMSUNG_MZ7WD960HAGP-00003_S186NYAD800102  ONLINE       0     0     0
      raidz1-1                                         ONLINE       0     0     0
        ata-SAMSUNG_MZ7WD960HAGP-00003_S186NYAF547609  ONLINE       0     0     0
        ata-SAMSUNG_MZ7WD960HAGP-00003_S186NYAD800090  ONLINE       0     0     0
        ata-SAMSUNG_MZ7WD960HAGP-00003_S186NYAD800091  ONLINE       0     0     0
        ata-SAMSUNG_MZ7WD960HAGP-00003_S186NYAD800092  ONLINE       0     0     0

and here's the backup/receiving system:

root@ss2:# zpool list ds
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
ds 6.94T 6.68T 267G 96% 1.00x ONLINE -
root@ss2:
# zpool status ds
pool: ds
state: ONLINE
scan: scrub repaired 0 in 5h9m with 0 errors on Sun Nov 2 05:09:46 2014
config:

    NAME                                               STATE     READ WRITE CKSUM
    ds                                                 ONLINE       0     0     0
      raidz1-0                                         ONLINE       0     0     0
        ata-SAMSUNG_MZ7WD960HAGP-00003_S186NYAD800074  ONLINE       0     0     0
        ata-SAMSUNG_MZ7WD960HAGP-00003_S186NYAD800075  ONLINE       0     0     0
        ata-SAMSUNG_MZ7WD960HAGP-00003_S186NYAD800080  ONLINE       0     0     0
        ata-SAMSUNG_MZ7WD960HAGP-00003_S186NYAD800085  ONLINE       0     0     0
      raidz1-1                                         ONLINE       0     0     0
        ata-SAMSUNG_MZ7WD960HAGP-00003_S186NYAD800073  ONLINE       0     0     0
        ata-SAMSUNG_MZ7WD960HAGP-00003_S186NYAD800077  ONLINE       0     0     0
        ata-SAMSUNG_MZ7WD960HAGP-00003_S186NYAD800082  ONLINE       0     0     0
        ata-SAMSUNG_MZ7WD960HAGP-00003_S186NYAD800083  ONLINE       0     0     0

@maci0
Copy link
Contributor

maci0 commented Nov 8, 2014

ashift the same on both?
iirc there were some changes introduced with the default values in 0.6.3

@ldonzis
Copy link
Author

ldonzis commented Nov 8, 2014

Good question, but they are the same (both zero). All of the pool properties are the same except guid and the space used.

@behlendorf
Copy link
Contributor

Good thought. Use 'zdb -l ' to get the ashift for each vdev. I'm also going to assume you don't have any snapshots laying around.

@ldonzis
Copy link
Author

ldonzis commented Nov 9, 2014

Aha! It's "9" on the sending side and "13" on the receiving side. Oh, and there are snapshots... a little under 800 of them.

I never set the ashift, and the drives have 512-byte sectors. How did it get set to 13? Or, I guess better question is, how to set it to 9?

Sure seems like you're on to something!

@maci0
Copy link
Contributor

maci0 commented Nov 9, 2014

as i said, IIRC there was a change in the default behaviour in 0.6.3 #967

@ldonzis
Copy link
Author

ldonzis commented Nov 9, 2014

Ah, got it. Actually, it looks like there is a table of drives in the source that causes, in this case, ashift to default to 13. I presume this is believed to provide better performance? I ran a quick bonnie++ with ashift=9 and ashift=13 and the difference appears to be within the margin of error of the test, perhaps because with compression, it's greatly CPU bound anyway. In any event, for the moment, we can't tolerate the 10%-ish space loss, so I'll rebuild the pool with ashift=9.

I'm sorry to waste everyone's time with this. It never occurred to me to look at ashift with zdb instead of zpool.

@ldonzis
Copy link
Author

ldonzis commented Nov 9, 2014

Just to add a little closure/validation, I re-created the pool, forcing ashift=9, and re-sync'ed, and now the space usage is the same on both sides.

Thanks very much for your help, and I apologize for the red herring.

@behlendorf
Copy link
Contributor

@ldonzis I'm glad you got to the root cause here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Documentation Indicates a requested change to the documentation
Projects
None yet
Development

No branches or pull requests

5 participants