Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental zfs send does not transmit hole #4050

Closed
Ringdingcoder opened this issue Nov 26, 2015 · 22 comments
Closed

Incremental zfs send does not transmit hole #4050

Ringdingcoder opened this issue Nov 26, 2015 · 22 comments

Comments

@Ringdingcoder
Copy link

Apparently, there are various related bug reports floating around, but AFAICT they all concern the @hole_birth feature. I have a case that seems very similar, but without having ever touched @hole_birth. The symptom is that on the receiving clone, some holes get filled with semi-random data.

Details on the mailing list:
http://list.zfsonlinux.org/pipermail/zfs-discuss/2015-November/023899.html
and
http://list.zfsonlinux.org/pipermail/zfs-discuss/2015-November/023909.html (self-reply to previous mail)

I've also tested this with the patch from openzfs/openzfs#37 applied, but this does not make the least bit of a difference.

@fling-
Copy link
Contributor

fling- commented Nov 30, 2015

@Ringdingcoder Is not this corruption reproducible with 4.1.13?

@Ringdingcoder
Copy link
Author

Yes, it does. Given that it happens with Illumos as well, I wouldn’t have expected anything else. It seems like this bug has been there "forever" (probably only since version 5000).

@bprotopopov
Copy link
Contributor

@Ringdingcoder, do you have compression enabled on your dataset and hole_birth feature activated your pool ? If so, then I think I know what is going on.

This is likely the same issue as described in issue #4023 where it is reproduced on zvols, but I can also reproduce it on files.

The problem is as follows. When a large hole (L1 and up) is created in a ZFS file and then partially filled with data, the remaining (non-filled) portions of that hole are assigned birth epoch zero. This breaks the following assumption made in zfs send implementation: all the holes created after the hole_birth feature is activated must have non-zero birth time. Based on this assumption, zfs send does not transmit these holes, so the source of your zfs send will have zero ranges in non-filled portions, whereas the target of your zfs receive will have old data (from the previous snapshot) in these ranges.

If you'd like to play with this, here is a set of commands to reproduce the issue:

zfs create -o recordsize=4k tpool/test_fs
zfs set compression=on tpool/test_fs
truncate -s 1G /tpool/test_fs/large_file
dd if=/dev/urandom of=/tpool/test_fs/large_file bs=4k count=$((3*128)) seek=$((1*128)) oflag=direct
zfs snapshot tpool/test_fs@s1

This creates a large sparse file with the 0.5M-1.5M range filled with random data. Let's create a hole in the middle and then partly fill it with data:

truncate -s $((2*128*4*1024)) /tpool/test_fs/large_file
dd if=/dev/urandom of=/tpool/test_fs/large_file bs=4k count=128 seek=$((3*128)) conv=notrunc
dd if=/dev/urandom of=/tpool/test_fs/large_file bs=4k count=10 seek=$((2*128)) conv=notrunc
zfs snapshot tpool/test_fs@s2

Now let's zfs send/recv this to another dataset in the pool, and compare the files:

zfs send tpool/test_fs@s1 | zfs recv tpool/test_fs_copy
zfs send -i tpool/test_fs@s1 tpool/test_fs@s2 | zfs recv tpool/test_fs_copy

Now, the files are of the same size, but have different contents:

ls -l /tpool/test_fs/large_file /tpool/test_fs_copy/large_file
-rw-r--r--. 1 root root 2097152 Dec  3 13:30 /tpool/test_fs_copy/large_file
-rw-r--r--. 1 root root 2097152 Dec  3 13:30 /tpool/test_fs/large_file
cmp /tpool/test_fs/large_file /tpool/test_fs_copy/large_file
/tpool/test_fs/large_file /tpool/test_fs_copy/large_file differ: byte 1089537, line 2288

So, what's different ? Let's see if the zero ranges in the partially filled L1 hole might be an issue:

dd if=/tpool/test_fs/large_file bs=4k count=1 skip=$((2*128+11)) | hexdump
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.00015878 s, 25.8 MB/s
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0001000
dd if=/tpool/test_fs_copy/large_file bs=4k count=1 skip=$((2*128+11)) | hexdump
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied0000000 87f4 8572 e0fd 4c0e 7228 8c5e 4b25 a26f
0000010 c343 f719 9558 301e 4368 68fb d1a9 8b5e
0000020 b6b8 fe32 4606 eeca 2864 dd7d f10c 6ebd
0000030 d586 b766 ef36 b2dc cbb8 6797 e776 f54d
0000040 10af 24d1 7291 9a1b e2ca 15f3 01de c0ba
, 0.000150659 s, 27.2 MB/s
0000050 03ab 037f 4651 d748 dacf 2a3b 1852 346f
0000060 9785 1319 d9e0 3bb8 3471 206d 479b 1466
0000070 6e4a 2ee3 20aa a0d4 f969 d0b9 e965 7eeb
0000080 031b f763 8439 73c8 599b 651e 670c d65a
0000090 bf43 3d63 8a0c 3690 a5fd cbe3 3c7b 589b
........ 

So, the source has zeros but the destination still has data from snapshot s1.

@Ringdingcoder
Copy link
Author

No, I don't have hole_birth activated and never have: http://list.zfsonlinux.org/pipermail/zfs-discuss/2015-November/023910.html. Compression is enabled.

@bprotopopov
Copy link
Contributor

Hm, the fact that the file is sparse suggests that it was truncated then written past the end, as I have done above or that writes of zero blocks were converted to holes because of compression. But without birth_hole feature enabled, all the holes are transmitted, so there must be something else going on.

You can use

zdb -ddddd pool/fs N

to see where the holes are where N is the dnode number (run zdb -ddddd pool/fs to see which number your file is).

P.S. You do 'sync' before backing stuff up? This is not a 'kernel page cache holding on to some data' issue?

@Ringdingcoder
Copy link
Author

All the info is in the mailing list links. There I have posted zdb output which clearly shows the holes. The file has been written in full, from beginning to end in one go. ZFS creates the holes by itself.

http://list.zfsonlinux.org/pipermail/zfs-discuss/2015-November/023909.html

bprotopopov referenced this issue in ahrens/illumos Dec 3, 2015
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>

In certain circumstances, "zfs send -i" (incremental send) can produce a
stream which will result in incorrect sparse file contents on the
target.

The problem manifests as regions of the received file that should be
sparse (and read a zero-filled) actually contain data from a file that
was deleted (and which happened to share this file's object ID).

Note: this can happen only with filesystems (not zvols, because they do
not free (or reuse) object IDs).

Note: This can happen only if, since the incremental source (FromSnap),
a file was deleted and then another file was created, and the new file
is sparse (i.e. has areas that were never written to and should be
implicitly zero-filled).

We suspect that this was introduced by 4370 (applies only if hole_birth
feature is enabled), and made worse by 5243 (applies if hole_birth
feature is disabled, and we never send any holes).

The bug is caused by the hole birth feature. When an object is deleted
and replaced, all the holes in the object have birth time zero. However,
zfs send cannot tell that the holes are new since the file was replaced,
so it doesn't send them in an incremental. As a result, you can end up
with invalid data when you receive incremental send streams. As a
short-term fix, we can always send holes with birth time 0 (unless it's
a zvol or a dataset where we can guarantee that no objects have been
reused).
@stevenburgess
Copy link

We were seeing similar things as described in the linked email on pools running 0.6.5 working with pools that had feature@hole_birth disabled. Rolling back to 0.6.4 fixed the incremental sendfile creation for new data, but (somewhat obviously) is not able to fix historical data that was sent wrong.

So basically: If for any reason you are running 0.6.5 with pools that have feature@hole_birth disabled, please check your receiving dataset to ensure that it matches your sending dataset (in our case an md5sum of the sparse files revealed different checksums)

@kernelOfTruth
Copy link
Contributor

Referencing:

ahrens/illumos@ca3f86b 6370 ZFS send fails to transmit some holes

https://www.illumos.org/issues/6370 ZFS send fails to transmit some holes

@stevenburgess
Copy link

Those do seem related, but only if the description is incorrect. The above links claim

"This can happen only if, since the incremental source (FromSnap), a file was deleted and then another file was created, and the new file is sparse (i.e. has areas that were never written to and should be implicitly zero-filled)."

However in our case (and I believe in the case in this ticket) the sparse files were not deleted. As best we can tell, at some point the incremental send code was altered in a way that any file with holes would generate an incorrect incremental sendfile, resulting in dataset mismatches. We have some evidence that this is only on system running up to date ZFS code, but with pools that have feature@hole_birth disabled.

This might also explain why both @bprotopopov and @Ringdingcoder applied the patch and did not see any improvement.

@Ringdingcoder
Copy link
Author

The claim might well be correct. I can confidently say that some files have been deleted on my file system. It does not matter if sparse or not. The only other ingredient is a new sparse file being created that somehow shares some metadata structures (?) or inode numbers (?) with the deleted files. Unfortunately, it does not take much for a file to become sparse. A few blocks of zeroes are enough for that to happen. They specifically do not have to be created in an unusual way.

@bprotopopov
Copy link
Contributor

How did you guys manage to create a pool with this feature disabled ? I tried this on Linux at pool create time, and it did not work (error returned). I assume this is the only way to do it because once it is enabled/activated, you cannot disable it either. I believe that there appear to be assumptions made in the code that once it is activated, you cannot disable it.

Boris.


From: Steven Burgess notifications@github.com
Sent: Wednesday, December 16, 2015 10:09 AM
To: zfsonlinux/zfs
Cc: Boris Protopopov
Subject: Re: [zfs] Incremental zfs send does not transmit hole (#4050)

We were seeing similar things as described in the linked email on pools running 0.6.5 working with pools that had feature@hole_birth disabled. Rolling back to 0.6.4 fixed the incremental sendfile creation for new data, but (somewhat obviously) is not able to fix historical data that was sent wrong.

So basically: If for any reason you are running 0.6.5 with pools that have feature@hole_birth disabled, please check your receiving dataset to ensure that it matches your sending dataset (in our case an md5sum of the sparse files revealed different checksums)

Reply to this email directly or view it on GitHubhttps://github.com//issues/4050#issuecomment-165136030.

[https://avatars1.githubusercontent.com/u/173088?v=3&s=400]#4050 (comment)

Incremental zfs send does not transmit hole · Issue #4050 ...#4050 (comment)
github.com
Apparently, there are various related bug reports floating around, but AFAICT they all concern the @hole_birth feature. I have a case that seems very similar, but ...

@Ringdingcoder
Copy link
Author

I created the pool several years ago. I'm not even sure on which OS. Probably OpenIndiana.

@stevenburgess
Copy link

Us as well, we have many pools that existed prior to hole_birth that went un-upgraded, and also some ancient pools that are created

zpool create -o version=28

Because some people still believe that they can switch over to zfs-fuse with no ill-consiquence. I know I have seen some people on the mailing list and GH issues make similar claims, so I was specifically trying to let them know, if they update the code but did not upgrade the pool, they should check the integrity of their sparse files.

@Ringdingcoder
Copy link
Author

@stevenburgess I just downgraded to 0.6.4.2 in order to test your assertion, however, this does not make any difference for me. I still get the same corruption. And as I already mentioned earlier, OpenIndiana 151a8, which I tend to keep on an USB stick, and which is a good deal older than ZoL 0.6.4, behaves exactly the same.

@bprotopopov
Copy link
Contributor

So, is there a simple recipe you can describe to reproduce this issue ?

I would be interested to understand the mechanics of the failure.

Boris.


From: Stefan Ring notifications@github.com
Sent: Thursday, December 17, 2015 5:10 PM
To: zfsonlinux/zfs
Cc: Boris Protopopov
Subject: Re: [zfs] Incremental zfs send does not transmit hole (#4050)

@stevenburgesshttps://github.com/stevenburgess I just downgraded to 0.6.4.2 in order to test your assertion, however, this does not make any difference for me. I still get the same corruption. And as I already mentioned earlier, OpenIndiana 151a8, which I tend to keep on an USB stick, and which is a good deal older than ZoL 0.6.4, behaves exactly the same.

[https://avatars3.githubusercontent.com/u/2973190?v=3&s=400]https://github.com/stevenburgess

stevenburgess (Steven Burgess) · GitHubhttps://github.com/stevenburgess
github.com
stevenburgess has 14 repositories written in Shell, Python, and Ruby. Follow their code on GitHub.

Reply to this email directly or view it on GitHubhttps://github.com//issues/4050#issuecomment-165594521.

@Ringdingcoder
Copy link
Author

No, I just have a state where I can rerun zfs send and zfs receive and check the resulting md5sum. I don't know how to manufacture that state in the first place.

@bprotopopov
Copy link
Contributor

Stefan, if you'd like, I can work with you to get more debug info.

This will involve locating a file that was updated incorrectly by send/recv and running zdb on that filesystem in the snapshots and in the resulting filesystem.

Typos courtesy of my iPhone

On Dec 18, 2015, at 4:08 AM, Stefan Ring <notifications@github.commailto:notifications@github.com> wrote:

No, I just have a state where I can rerun zfs send and zfs receive and check the resulting md5sum. I don't know how to manufacture that state in the first place.

Reply to this email directly or view it on GitHubhttps://github.com//issues/4050#issuecomment-165719000.

@Ringdingcoder
Copy link
Author

I have already posted most of this: http://list.zfsonlinux.org/pipermail/zfs-discuss/2015-November/023909.html

@Ringdingcoder
Copy link
Author

@stevenburgess No, you were right. I made a mistake when testing, and I can confirm that the corruption does not happen with 0.6.4.2.

@stevenburgess
Copy link

Good to hear! I was a little worried when I read that you were having this problem even on much older systems.

@ryao
Copy link
Contributor

ryao commented Jan 11, 2016

@behlendorf @fling- pointed this issue out to me. Any issue where the term corruption is mentioned is something we should prioritize. We should add it to the milestone for the next release.

@stevenburgess
Copy link

@ryao agreed that its important, I brought it up in IRC and elsewhere because I feel like ZoL users are particularly prone to having new code with old pools (I know some people do it on purpose for zfs-fuse compatibility).

behlendorf pushed a commit that referenced this issue Mar 15, 2016
6370 ZFS send fails to transmit some holes
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Stefan Ring <stefanrin@gmail.com>
Reviewed by: Steven Burgess <sburgess@datto.com>
Reviewed by: Arne Jansen <sensille@gmx.net>
Approved by: Robert Mustacchi <rm@joyent.com>

References:
  https://www.illumos.org/issues/6370
  illumos/illumos-gate@286ef71

In certain circumstances, "zfs send -i" (incremental send) can produce
a stream which will result in incorrect sparse file contents on the
target.

The problem manifests as regions of the received file that should be
sparse (and read a zero-filled) actually contain data from a file that
was deleted (and which happened to share this file's object ID).

Note: this can happen only with filesystems (not zvols, because they do
not free (and thus can not reuse) object IDs).

Note: This can happen only if, since the incremental source (FromSnap),
a file was deleted and then another file was created, and the new file
is sparse (i.e. has areas that were never written to and should be
implicitly zero-filled).

We suspect that this was introduced by 4370 (applies only if hole_birth
feature is enabled), and made worse by 5243 (applies if hole_birth
feature is disabled, and we never send any holes).

The bug is caused by the hole birth feature. When an object is deleted
and replaced, all the holes in the object have birth time zero. However,
zfs send cannot tell that the holes are new since the file was replaced,
so it doesn't send them in an incremental. As a result, you can end up
with invalid data when you receive incremental send streams. As a
short-term fix, we can always send holes with birth time 0 (unless it's
a zvol or a dataset where we can guarantee that no objects have been
reused).

Ported-by: Steven Burgess <sburgess@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4369
Closes #4050
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Aug 15, 2016
Adds a module option which disables the hole_birth optimization
which has been responsible for several recent bugs, including
issue openzfs#4050.

Original-patch: https://gist.github.com/pcd1193182/2c0cd47211f3aee623958b4698836c48
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#4833
nedbass pushed a commit to nedbass/zfs that referenced this issue Aug 26, 2016
Adds a module option which disables the hole_birth optimization
which has been responsible for several recent bugs, including
issue openzfs#4050.

Original-patch: https://gist.github.com/pcd1193182/2c0cd47211f3aee623958b4698836c48
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#4833
nedbass pushed a commit to nedbass/zfs that referenced this issue Sep 3, 2016
Adds a module option which disables the hole_birth optimization
which has been responsible for several recent bugs, including
issue openzfs#4050.

Original-patch: https://gist.github.com/pcd1193182/2c0cd47211f3aee623958b4698836c48
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#4833
nedbass pushed a commit to nedbass/zfs that referenced this issue Sep 5, 2016
Adds a module option which disables the hole_birth optimization
which has been responsible for several recent bugs, including
issue openzfs#4050.

Original-patch: https://gist.github.com/pcd1193182/2c0cd47211f3aee623958b4698836c48
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#4833
nedbass pushed a commit to nedbass/zfs that referenced this issue Sep 5, 2016
Adds a module option which disables the hole_birth optimization
which has been responsible for several recent bugs, including
issue openzfs#4050.

Original-patch: https://gist.github.com/pcd1193182/2c0cd47211f3aee623958b4698836c48
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#4833
tuxoko pushed a commit to tuxoko/zfs that referenced this issue Sep 8, 2016
Adds a module option which disables the hole_birth optimization
which has been responsible for several recent bugs, including
issue openzfs#4050.

Original-patch: https://gist.github.com/pcd1193182/2c0cd47211f3aee623958b4698836c48
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#4833
nedbass pushed a commit to nedbass/zfs that referenced this issue Sep 9, 2016
Adds a module option which disables the hole_birth optimization
which has been responsible for several recent bugs, including
issue openzfs#4050.

Original-patch: https://gist.github.com/pcd1193182/2c0cd47211f3aee623958b4698836c48
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#4833
DeHackEd pushed a commit to DeHackEd/zfs that referenced this issue Oct 19, 2016
Adds a module option which disables the hole_birth optimization
which has been responsible for several recent bugs, including
issue openzfs#4050.

Original-patch: https://gist.github.com/pcd1193182/2c0cd47211f3aee623958b4698836c48
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#4833
DeHackEd pushed a commit to DeHackEd/zfs that referenced this issue Oct 29, 2016
Adds a module option which disables the hole_birth optimization
which has been responsible for several recent bugs, including
issue openzfs#4050.

Original-patch: https://gist.github.com/pcd1193182/2c0cd47211f3aee623958b4698836c48
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#4833
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants