Compressed receive with different ashift can result in incorrect PSIZE on disk #8462

pcd1193182 · 2019-02-27T21:22:10Z

System information

N/A, problem found from code inspection, verified on Illumos

Describe the problem you're observing

When performing a compressed send, the PSIZE of the block is used as the compressed_size of the record. When the receiving system receives these records, they do a raw write, taking the compressed data and writing it directly to disk. When the sending pool has disks with ashift=9 and the receiving system has disks with ashift=12, this result in a bug.

For normal writes, zio_write_compress rounds the PSIZE of the blkptr up to the ashift of the smallest ashift device. For raw writes, this step is bypassed. Instead, the zio's io_size is used directly as the psize. The zio's io_size comes from the arc buf header's size via dbuf_sync_leaf -> dbuf_write -> arc_write -> zio_write. The arc buf header's size is set to be the compressed_size of the record in receive_read_record . Since this size is from a system with a different set of ashifts, this can result in a situation where a block is stored using 4kb on disk, but the psize is only 512 bytes.

So far the only negative effect of this issue I've found is that the dataset's bookkeeping is messed up, resulting in incorrect lrefer and compression ratio statistics. There may be other issues lurking, however.

Describe how to reproduce the problem

Create a pool on a system with ashift=9 disks, and a pool on a system with ashift=12 disks (or set spa_min_ashift when creating the second pool).
Create a filesystem with compression enabled on the source pool.
Populate that filesystem with compressible data.
Snapshot that filesystem
Send that filesystem with -c, and receive it on the target pool.
Verify that the asize is incorrect using zdb -vvvvv to examine the block pointers.

Include any warning/errors/backtraces from the system logs

N/A

The text was updated successfully, but these errors were encountered:

tcaputi · 2019-02-27T21:29:15Z

I don't know if I'l have time to work on this in the near future. Im going to unassign it from me for the moment.

ahrens · 2019-02-27T21:37:07Z

No worries, I accidentally assigned it to you, thinking that I was requesting a code review.

stale · 2020-08-24T21:53:39Z

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

stale · 2021-08-25T03:30:00Z

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

pcd1193182 · 2021-08-26T21:06:39Z

It looks like this issue may have been partially modified by the lightweight write changes. The code that directly manipulated the arc buf hdr in dmu_recv was removed, except in the case of spill blocks. There is no longer an arc_buf_hdr for the lightweight writes; instead, they stay as lightweight write structures until dbuf_sync_lightweight, when they get zio_writed. Because that doesn't correct the asize issue, it appears that the problem still exists. However, we can resolve it in one of two locations. Either in dbuf_sync_lightweight, by modifying the psize passed into zio_write, or in receive_read_record by modifying the DRR_WRITE case to bump up the size of the allocated abd to a multiple of asize, zero it before use if it had to be increased, and then do the receive_read_payload_and_next_header of the original psize.

For the DRR_SPILL case, only the second approach will work, so that is probably the approach to take.

ahrens · 2021-08-26T21:27:41Z

I think those would work with compressed streams, and then you could add assertions in the zio pipeline that the buffer/psize provided is a multiple of 1<<ashift. It might be cleaner if the zio pipeline accepted (raw) writes that are not sector-aligned and rounded them up itself, as it does for normal writes.

However, I'm not sure that either of these approaches will work in practice with encrypted streams, because the PSIZE (of L0-blocks) is verified by the MAC, so I think it can not change across an encrypted send. See zio_crypt_bp_zero_nonportable_blkprop().

pcd1193182 · 2021-08-26T23:11:00Z

After some offline discussions, we decided that the plan we're going to attempt moving forwards is to fix the PSIZE for future cross-ashift compressed-but-not-encrypted receives. This can be done pretty simply in the receive logic or the zio pipeline. Fixing existing incorrect blocks is much more challenging; rewriting the blocks to use the correct PSIZE would be slow, fixing just the ds_ metadata would also be somewhat slow, and no solution we've come up with would allow us to fix the PSIZE of received encrypted blocks.

We also considered modifying the ds_dataset_block_* functions to do the math based on the corrected PSIZE, but most ways that could be implemented continue to leave out cases or result in situations where the ds_* statistics can become negative, which is significantly worse than the current failure mode.

…E on disk (openzfs#8462) Signed-off-by: Paul Dagnelie <pcd@delphix.com>

…E on disk We round up the psize to the nearest multiple of the asize or to the lsize, whichever is smaller. Once that's done, we allocate a new buffer of the appropriate size, zero the tail, and copy the data into it. This adds a small performance cost to these kinds of writes, but fixes the bookkeeping problems. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Co-authored-by: Matthew Ahrens <matthew.ahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes openzfs#12522 Closes openzfs#8462

ahrens assigned tcaputi Feb 27, 2019

pcd1193182 added the Type: Defect Incorrect behavior (e.g. crash, hang) label Feb 27, 2019

ahrens unassigned tcaputi Feb 27, 2019

ahrens mentioned this issue Mar 3, 2019

Reported compressratio is incorrect #7639

Open

behlendorf added the Component: Send/Recv "zfs send/recv" feature label Apr 4, 2019

stale bot added the Status: Stale No recent activity for issue label Aug 24, 2020

ahrens added Status: Understood The root cause of the issue is known and removed Status: Stale No recent activity for issue labels Aug 25, 2020

stale bot added the Status: Stale No recent activity for issue label Aug 25, 2021

stale bot removed the Status: Stale No recent activity for issue label Aug 26, 2021

pcd1193182 added a commit to pcd1193182/zfs that referenced this issue Aug 30, 2021

Compressed receive with different ashift can result in incorrect PSIZ…

418f382

…E on disk (openzfs#8462) Signed-off-by: Paul Dagnelie <pcd@delphix.com>

pcd1193182 added a commit to pcd1193182/zfs that referenced this issue Aug 30, 2021

Compressed receive with different ashift can result in incorrect PSIZ…

85e8a53

…E on disk (openzfs#8462) Signed-off-by: Paul Dagnelie <pcd@delphix.com>

pcd1193182 mentioned this issue Aug 30, 2021

Compressed receive with different ashift can result in incorrect PSIZE on disk (#8462) #12522

Merged

13 tasks

behlendorf closed this as completed in c634320 Sep 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compressed receive with different ashift can result in incorrect PSIZE on disk #8462

Compressed receive with different ashift can result in incorrect PSIZE on disk #8462

pcd1193182 commented Feb 27, 2019

tcaputi commented Feb 27, 2019

ahrens commented Feb 27, 2019

stale bot commented Aug 24, 2020

stale bot commented Aug 25, 2021

pcd1193182 commented Aug 26, 2021

ahrens commented Aug 26, 2021 •

edited

pcd1193182 commented Aug 26, 2021

Compressed receive with different ashift can result in incorrect PSIZE on disk #8462

Compressed receive with different ashift can result in incorrect PSIZE on disk #8462

Comments

pcd1193182 commented Feb 27, 2019

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

tcaputi commented Feb 27, 2019

ahrens commented Feb 27, 2019

stale bot commented Aug 24, 2020

stale bot commented Aug 25, 2021

pcd1193182 commented Aug 26, 2021

ahrens commented Aug 26, 2021 • edited

pcd1193182 commented Aug 26, 2021

ahrens commented Aug 26, 2021 •

edited