File incorrectly zeroed when receiving incremental stream that toggles -L #10383

ahrens · 2020-05-28T03:41:21Z

Motivation and Context

By increasing the recordsize property above the default of 128KB, a
filesystem may have "large" blocks. By default, a send stream of such a
filesystem does not contain large WRITE records, instead it decreases
objects' block sizes to 128KB and splits the large blocks into 128KB
blocks, allowing the large-block filesystem to be received by a system
that does not support the large_blocks feature. A send stream
generated by zfs send -L (or --large-block) preserves the large
block size on the receiving system, by using large WRITE records.

When receiving an incremental send stream for a filesystem with large
blocks, if the send stream's -L flag was toggled, a bug is encountered
in which the file's contents are incorrectly zeroed out. The contents
of any blocks that were not modified by this send stream will be lost.
"Toggled" means that the previous send used -L, but this incremental
does not use -L (-L to no-L); or that the previous send did not use
-L, but this incremental does use -L (no-L to -L).

Closes #6224

Description

This commit addresses the problem with several changes to the semantics
of zfs send/receive:

"-L to no-L" incrementals are rejected. If the previous send used
-L, but this incremental does not use -L, the zfs receive will
fail with this error message:

    incremental send stream requires -L (--large-block), to match
    previous receive.

"no-L to -L" incrementals are handled correctly, preserving the
smaller (128KB) block size of any already-received files that used large
blocks on the sending system but were split by zfs send without the
-L flag.
A new send stream format flag is added, SWITCH_TO_LARGE_BLOCKS.
This feature indicates that we can correctly handle "no-L to -L"
incrementals. This flag is currently not set on any send streams. In
the future, we intend for incremental send streams of snapshots that
have large blocks to use -L by default, and these streams will also
have the SWITCH_TO_LARGE_BLOCKS feature set. This ensures that streams
from the default use of zfs send won't encounter the bug mentioned
above, because they can't be received by software with the bug.

Implementation notes:

To facilitate accessing the ZPL's generation number,
zfs_space_delta_cb() has been renamed to zpl_get_file_info() and
restructured to fill in a struct with ZPL-specific info including owner
and generation.

In the "no-L to -L" case, if this is a compressed send stream (from
zfs send -cL), large WRITE records that are being written to small
(128KB) blocksize files need to be decompressed so that they can be
written split up into multiple blocks. The zio pipeline will recompress
each smaller block individually.

A new test case, send-L_toggle, is added, which tests the "no-L to -L"
case and verifies that we get an error for the "-L to no-L" case.

How Has This Been Tested?

New test case added.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the ZFS on Linux code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

codecov-commenter · 2020-05-28T10:24:55Z

Codecov Report

Merging #10383 into master will increase coverage by 0.22%.
The diff coverage is 84.93%.

@@            Coverage Diff             @@
##           master   #10383      +/-   ##
==========================================
+ Coverage   79.25%   79.47%   +0.22%     
==========================================
  Files         391      391              
  Lines      123633   123706      +73     
==========================================
+ Hits        97980    98316     +336     
+ Misses      25653    25390     -263

Flag	Coverage Δ
#kernel	`79.97% <88.32%> (+0.03%)`	⬆️
#user	`65.68% <5.73%> (+1.49%)`	⬆️

Impacted Files	Coverage Δ
cmd/zhack/zhack.c	`54.12% <0.00%> (ø)`
include/sys/dmu.h	`100.00% <ø> (ø)`
module/zfs/zfs_quota.c	`86.36% <82.60%> (-0.22%)`	⬇️
module/zfs/dmu_recv.c	`76.07% <83.00%> (+0.18%)`	⬆️
lib/libzfs/libzfs_sendrecv.c	`76.51% <100.00%> (-0.06%)`	⬇️
module/os/linux/zfs/zfs_vfsops.c	`78.46% <100.00%> (ø)`
module/zfs/dmu_objset.c	`91.62% <100.00%> (+<0.01%)`	⬆️
cmd/zvol_id/zvol_id_main.c	`76.31% <0.00%> (-5.27%)`	⬇️
module/zfs/vdev_raidz.c	`89.32% <0.00%> (-3.93%)`	⬇️
module/zfs/bpobj.c	`86.86% <0.00%> (-3.76%)`	⬇️
... and 57 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b2f3709...9641a7e. Read the comment docs.

ahrens · 2020-06-03T19:53:39Z

@tcaputi Could you take a look at this as well?

pcd1193182 · 2020-06-03T20:54:08Z

include/sys/dmu_objset.h

@@ -191,6 +191,13 @@ struct objset {
 	int os_upgrade_status;
 };

+typedef struct zfs_file_info {


I'm not saying it's wrong necessarily, but it feels odd to me to have this in dmu_objset.h. It doesn't feel like it's objset-specific.

Yeah, a lot of this functionality works across multiple layers, so it wasn't clear where to put some of the code/declarations. Maybe dmu.h would be better for this struct. It's used there anyway.

pcd1193182 · 2020-06-03T21:01:10Z

module/zfs/dmu_recv.c

+		 * necessary, because with raw receives, the generation is
+		 * encrypted.  We also want to minimize dependence on the
+		 * ZPL, so that other types of datasets can also be received
+		 * (e.g. ZVOLs, although note that ZVOLS currently do not
+		 * reallocate their objects or change their structure).
+		 * Therefore, we check a number of different cases where we
+		 * know it is safe to discard the object's contents, before
+		 * using the ZPL's generation number to make the above
+		 * distinction.
+		 */
+		if (drro->drr_blksz != doi.doi_data_block_size) {
+			if (rwa->raw) {
+				/*
+				 * RAW streams always have large blocks, so
+				 * we are sure that the data is not needed
+				 * due to changing --large-block to be on.
+				 * Which is fortunate since the bonus buffer
+				 * (which contains the ZPL generation) is
+				 * encrypted, and the key might not be
+				 * loaded.
+				 */
+				do_free_range = B_TRUE;
+			} else if (rwa->full) {
+				/*
+				 * This is a full send stream, so it always
+				 * replaces what we have.  Even if the
+				 * generation numbers happen to match, this
+				 * can not actually be the same logical file.
+				 * This is relevant when receiving a full
+				 * send as a clone.
+				 */
+				do_free_range = B_TRUE;
+			} else if (drro->drr_type !=
+			    DMU_OT_PLAIN_FILE_CONTENTS ||
+			    doi.doi_type != DMU_OT_PLAIN_FILE_CONTENTS) {
+				/*
+				 * PLAIN_FILE_CONTENTS are the only type of
+				 * objects that have ever been stored with
+				 * large blocks, so we don't need the special
+				 * logic below.  ZAP blocks can shrink (when
+				 * there's only one block), so we don't want
+				 * to hit the error below about block size
+				 * only increasing.
+				 */
+				do_free_range = B_TRUE;
+			} else if (doi.doi_max_offset <=
+			    doi.doi_data_block_size) {
+				/*
+				 * There is only one block.  We can free it,
+				 * because its contents will be replaced by a
+				 * WRITE record.  This can not be the no-L ->
+				 * -L case, because the no-L case would have
+				 * resulted in multiple blocks.  If we
+				 * supported -L -> no-L, it would not be safe
+				 * to free the file's contents.  Fortunately,
+				 * that is not allowed (see
+				 * recv_check_large_blocks()).
+				 */
+				do_free_range = B_TRUE;
+			} else {
+				boolean_t is_same_gen;
+				err = receive_object_is_same_generation(rwa->os,
+				    drro->drr_object, doi.doi_bonus_type,
+				    drro->drr_bonustype, data, &is_same_gen);
+				if (err != 0)
+					return (SET_ERROR(EINVAL));
+
+				if (is_same_gen) {
+					/*
+					 * This is the same logical file, and
+					 * the block size must be increasing.
+					 * It could only decrease if
+					 * --large-block was changed to be
+					 * off, which is checked in
+					 * recv_check_large_blocks().
+					 */
+					if (drro->drr_blksz <=
+					    doi.doi_data_block_size)
+						return (SET_ERROR(EINVAL));
+					/*
+					 * We keep the existing blocksize and
+					 * contents.
+					 */
+					new_blksz =
+					    doi.doi_data_block_size;
+				} else {
+					do_free_range = B_TRUE;
+				}
+			}
+		}
+
+		/* nblkptr can only decrease if the object was reallocated */
+		if (nblkptr < doi.doi_nblkptr)
+			do_free_range = B_TRUE;
+
+		/* number of slots can only change on reallocation */
+		if (dn_slots != doi.doi_dnodesize >> DNODE_SHIFT)
+			do_free_range = B_TRUE;
+
+		/*


Should we maybe refactor some of this into a separate function? receive_object is getting pretty large.

For sure. I was able to refactor some of the code for handling an existing object out into its own function. Take a look and let me know what you think.

Background: By increasing the recordsize property above the default of 128KB, a filesystem may have "large" blocks. By default, a send stream of such a filesystem does not contain large WRITE records, instead it decreases objects' block sizes to 128KB and splits the large blocks into 128KB blocks, allowing the large-block filesystem to be received by a system that does not support the `large_blocks` feature. A send stream generated by `zfs send -L` (or `--large-block`) preserves the large block size on the receiving system, by using large WRITE records. When receiving an incremental send stream for a filesystem with large blocks, if the send stream's -L flag was toggled, a bug is encountered in which the file's contents are incorrectly zeroed out. The contents of any blocks that were not modified by this send stream will be lost. "Toggled" means that the previous send used `-L`, but this incremental does not use `-L` (-L to no-L); or that the previous send did not use `-L`, but this incremental does use `-L` (no-L to -L). Changes: This commit addresses the problem with several changes to the semantics of zfs send/receive: 1. "-L to no-L" incrementals are rejected. If the previous send used `-L`, but this incremental does not use `-L`, the `zfs receive` will fail with this error message: incremental send stream requires -L (--large-block), to match previous receive. 2. "no-L to -L" incrementals are handled correctly, preserving the smaller (128KB) block size of any already-received files that used large blocks on the sending system but were split by `zfs send` without the `-L` flag. 3. A new send stream format flag is added, `SWITCH_TO_LARGE_BLOCKS`. This feature indicates that we can correctly handle "no-L to -L" incrementals. This flag is currently not set on any send streams. In the future, we intend for incremental send streams of snapshots that have large blocks to use `-L` by default, and these streams will also have the `SWITCH_TO_LARGE_BLOCKS` feature set. This ensures that streams from the default use of `zfs send` won't encounter the bug mentioned above, because they can't be received by software with the bug. Implementation notes: To facilitate accessing the ZPL's generation number, `zfs_space_delta_cb()` has been renamed to `zpl_get_file_info()` and restructured to fill in a struct with ZPL-specific info including owner and generation. In the "no-L to -L" case, if this is a compressed send stream (from `zfs send -cL`), large WRITE records that are being written to small (128KB) blocksize files need to be decompressed so that they can be written split up into multiple blocks. The zio pipeline will recompress each smaller block individually. A new test case, `send-L_toggle`, is added, which tests the "no-L to -L" case and verifies that we get an error for the "-L to no-L" case. Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes openzfs#6224

…s -L Background: By increasing the recordsize property above the default of 128KB, a filesystem may have "large" blocks. By default, a send stream of such a filesystem does not contain large WRITE records, instead it decreases objects' block sizes to 128KB and splits the large blocks into 128KB blocks, allowing the large-block filesystem to be received by a system that does not support the `large_blocks` feature. A send stream generated by `zfs send -L` (or `--large-block`) preserves the large block size on the receiving system, by using large WRITE records. When receiving an incremental send stream for a filesystem with large blocks, if the send stream's -L flag was toggled, a bug is encountered in which the file's contents are incorrectly zeroed out. The contents of any blocks that were not modified by this send stream will be lost. "Toggled" means that the previous send used `-L`, but this incremental does not use `-L` (-L to no-L); or that the previous send did not use `-L`, but this incremental does use `-L` (no-L to -L). Changes: This commit addresses the problem with several changes to the semantics of zfs send/receive: 1. "-L to no-L" incrementals are rejected. If the previous send used `-L`, but this incremental does not use `-L`, the `zfs receive` will fail with this error message: incremental send stream requires -L (--large-block), to match previous receive. 2. "no-L to -L" incrementals are handled correctly, preserving the smaller (128KB) block size of any already-received files that used large blocks on the sending system but were split by `zfs send` without the `-L` flag. 3. A new send stream format flag is added, `SWITCH_TO_LARGE_BLOCKS`. This feature indicates that we can correctly handle "no-L to -L" incrementals. This flag is currently not set on any send streams. In the future, we intend for incremental send streams of snapshots that have large blocks to use `-L` by default, and these streams will also have the `SWITCH_TO_LARGE_BLOCKS` feature set. This ensures that streams from the default use of `zfs send` won't encounter the bug mentioned above, because they can't be received by software with the bug. Implementation notes: To facilitate accessing the ZPL's generation number, `zfs_space_delta_cb()` has been renamed to `zpl_get_file_info()` and restructured to fill in a struct with ZPL-specific info including owner and generation. In the "no-L to -L" case, if this is a compressed send stream (from `zfs send -cL`), large WRITE records that are being written to small (128KB) blocksize files need to be decompressed so that they can be written split up into multiple blocks. The zio pipeline will recompress each smaller block individually. A new test case, `send-L_toggle`, is added, which tests the "no-L to -L" case and verifies that we get an error for the "-L to no-L" case. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes openzfs#6224 Closes openzfs#10383

…s -L Background: By increasing the recordsize property above the default of 128KB, a filesystem may have "large" blocks. By default, a send stream of such a filesystem does not contain large WRITE records, instead it decreases objects' block sizes to 128KB and splits the large blocks into 128KB blocks, allowing the large-block filesystem to be received by a system that does not support the `large_blocks` feature. A send stream generated by `zfs send -L` (or `--large-block`) preserves the large block size on the receiving system, by using large WRITE records. When receiving an incremental send stream for a filesystem with large blocks, if the send stream's -L flag was toggled, a bug is encountered in which the file's contents are incorrectly zeroed out. The contents of any blocks that were not modified by this send stream will be lost. "Toggled" means that the previous send used `-L`, but this incremental does not use `-L` (-L to no-L); or that the previous send did not use `-L`, but this incremental does use `-L` (no-L to -L). Changes: This commit addresses the problem with several changes to the semantics of zfs send/receive: 1. "-L to no-L" incrementals are rejected. If the previous send used `-L`, but this incremental does not use `-L`, the `zfs receive` will fail with this error message: incremental send stream requires -L (--large-block), to match previous receive. 2. "no-L to -L" incrementals are handled correctly, preserving the smaller (128KB) block size of any already-received files that used large blocks on the sending system but were split by `zfs send` without the `-L` flag. 3. A new send stream format flag is added, `SWITCH_TO_LARGE_BLOCKS`. This feature indicates that we can correctly handle "no-L to -L" incrementals. This flag is currently not set on any send streams. In the future, we intend for incremental send streams of snapshots that have large blocks to use `-L` by default, and these streams will also have the `SWITCH_TO_LARGE_BLOCKS` feature set. This ensures that streams from the default use of `zfs send` won't encounter the bug mentioned above, because they can't be received by software with the bug. Implementation notes: To facilitate accessing the ZPL's generation number, `zfs_space_delta_cb()` has been renamed to `zpl_get_file_info()` and restructured to fill in a struct with ZPL-specific info including owner and generation. In the "no-L to -L" case, if this is a compressed send stream (from `zfs send -cL`), large WRITE records that are being written to small (128KB) blocksize files need to be decompressed so that they can be written split up into multiple blocks. The zio pipeline will recompress each smaller block individually. A new test case, `send-L_toggle`, is added, which tests the "no-L to -L" case and verifies that we get an error for the "-L to no-L" case. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #6224 Closes #10383

…s -L Background: By increasing the recordsize property above the default of 128KB, a filesystem may have "large" blocks. By default, a send stream of such a filesystem does not contain large WRITE records, instead it decreases objects' block sizes to 128KB and splits the large blocks into 128KB blocks, allowing the large-block filesystem to be received by a system that does not support the `large_blocks` feature. A send stream generated by `zfs send -L` (or `--large-block`) preserves the large block size on the receiving system, by using large WRITE records. When receiving an incremental send stream for a filesystem with large blocks, if the send stream's -L flag was toggled, a bug is encountered in which the file's contents are incorrectly zeroed out. The contents of any blocks that were not modified by this send stream will be lost. "Toggled" means that the previous send used `-L`, but this incremental does not use `-L` (-L to no-L); or that the previous send did not use `-L`, but this incremental does use `-L` (no-L to -L). Changes: This commit addresses the problem with several changes to the semantics of zfs send/receive: 1. "-L to no-L" incrementals are rejected. If the previous send used `-L`, but this incremental does not use `-L`, the `zfs receive` will fail with this error message: incremental send stream requires -L (--large-block), to match previous receive. 2. "no-L to -L" incrementals are handled correctly, preserving the smaller (128KB) block size of any already-received files that used large blocks on the sending system but were split by `zfs send` without the `-L` flag. 3. A new send stream format flag is added, `SWITCH_TO_LARGE_BLOCKS`. This feature indicates that we can correctly handle "no-L to -L" incrementals. This flag is currently not set on any send streams. In the future, we intend for incremental send streams of snapshots that have large blocks to use `-L` by default, and these streams will also have the `SWITCH_TO_LARGE_BLOCKS` feature set. This ensures that streams from the default use of `zfs send` won't encounter the bug mentioned above, because they can't be received by software with the bug. Implementation notes: To facilitate accessing the ZPL's generation number, `zfs_space_delta_cb()` has been renamed to `zpl_get_file_info()` and restructured to fill in a struct with ZPL-specific info including owner and generation. In the "no-L to -L" case, if this is a compressed send stream (from `zfs send -cL`), large WRITE records that are being written to small (128KB) blocksize files need to be decompressed so that they can be written split up into multiple blocks. The zio pipeline will recompress each smaller block individually. A new test case, `send-L_toggle`, is added, which tests the "no-L to -L" case and verifies that we get an error for the "-L to no-L" case. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes openzfs#6224 Closes openzfs#10383 (cherry picked from commit 7bcb7f0)

scineram · 2020-09-18T06:40:56Z

This should be in 0.8. @tonyhutter

…s -L Background: By increasing the recordsize property above the default of 128KB, a filesystem may have "large" blocks. By default, a send stream of such a filesystem does not contain large WRITE records, instead it decreases objects' block sizes to 128KB and splits the large blocks into 128KB blocks, allowing the large-block filesystem to be received by a system that does not support the `large_blocks` feature. A send stream generated by `zfs send -L` (or `--large-block`) preserves the large block size on the receiving system, by using large WRITE records. When receiving an incremental send stream for a filesystem with large blocks, if the send stream's -L flag was toggled, a bug is encountered in which the file's contents are incorrectly zeroed out. The contents of any blocks that were not modified by this send stream will be lost. "Toggled" means that the previous send used `-L`, but this incremental does not use `-L` (-L to no-L); or that the previous send did not use `-L`, but this incremental does use `-L` (no-L to -L). Changes: This commit addresses the problem with several changes to the semantics of zfs send/receive: 1. "-L to no-L" incrementals are rejected. If the previous send used `-L`, but this incremental does not use `-L`, the `zfs receive` will fail with this error message: incremental send stream requires -L (--large-block), to match previous receive. 2. "no-L to -L" incrementals are handled correctly, preserving the smaller (128KB) block size of any already-received files that used large blocks on the sending system but were split by `zfs send` without the `-L` flag. 3. A new send stream format flag is added, `SWITCH_TO_LARGE_BLOCKS`. This feature indicates that we can correctly handle "no-L to -L" incrementals. This flag is currently not set on any send streams. In the future, we intend for incremental send streams of snapshots that have large blocks to use `-L` by default, and these streams will also have the `SWITCH_TO_LARGE_BLOCKS` feature set. This ensures that streams from the default use of `zfs send` won't encounter the bug mentioned above, because they can't be received by software with the bug. Implementation notes: To facilitate accessing the ZPL's generation number, `zfs_space_delta_cb()` has been renamed to `zpl_get_file_info()` and restructured to fill in a struct with ZPL-specific info including owner and generation. In the "no-L to -L" case, if this is a compressed send stream (from `zfs send -cL`), large WRITE records that are being written to small (128KB) blocksize files need to be decompressed so that they can be written split up into multiple blocks. The zio pipeline will recompress each smaller block individually. A new test case, `send-L_toggle`, is added, which tests the "no-L to -L" case and verifies that we get an error for the "-L to no-L" case. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes openzfs#6224 Closes openzfs#10383

ahrens added Component: Send/Recv "zfs send/recv" feature Status: Code Review Needed Ready for review and testing Type: Defect Incorrect behavior (e.g. crash, hang) labels May 28, 2020

ahrens requested a review from pcd1193182 May 28, 2020 03:41

behlendorf self-requested a review May 30, 2020 04:25

pcd1193182 approved these changes Jun 3, 2020

View reviewed changes

ahrens added 2 commits June 8, 2020 10:27

restructure

9641a7e

ahrens force-pushed the send-L branch from 315ff70 to 9641a7e Compare June 8, 2020 17:28

behlendorf approved these changes Jun 8, 2020

View reviewed changes

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Jun 8, 2020

behlendorf merged commit 7bcb7f0 into openzfs:master Jun 9, 2020

ahrens deleted the send-L branch June 27, 2020 03:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File incorrectly zeroed when receiving incremental stream that toggles -L #10383

File incorrectly zeroed when receiving incremental stream that toggles -L #10383

ahrens commented May 28, 2020

codecov-commenter commented May 28, 2020 •

edited by codecov bot

Loading

ahrens commented Jun 3, 2020

pcd1193182 Jun 3, 2020

ahrens Jun 3, 2020

pcd1193182 Jun 3, 2020

ahrens Jun 8, 2020

scineram commented Sep 18, 2020

File incorrectly zeroed when receiving incremental stream that toggles -L #10383

File incorrectly zeroed when receiving incremental stream that toggles -L #10383

Conversation

ahrens commented May 28, 2020

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

codecov-commenter commented May 28, 2020 • edited by codecov bot Loading

Codecov Report

ahrens commented Jun 3, 2020

pcd1193182 Jun 3, 2020

Choose a reason for hiding this comment

ahrens Jun 3, 2020

Choose a reason for hiding this comment

pcd1193182 Jun 3, 2020

Choose a reason for hiding this comment

ahrens Jun 8, 2020

Choose a reason for hiding this comment

scineram commented Sep 18, 2020

codecov-commenter commented May 28, 2020 •

edited by codecov bot

Loading