Incremental zfs send does not transmit hole #4050

Ringdingcoder · 2015-11-26T21:11:02Z

Apparently, there are various related bug reports floating around, but AFAICT they all concern the @hole_birth feature. I have a case that seems very similar, but without having ever touched @hole_birth. The symptom is that on the receiving clone, some holes get filled with semi-random data.

Details on the mailing list:
http://list.zfsonlinux.org/pipermail/zfs-discuss/2015-November/023899.html
and
http://list.zfsonlinux.org/pipermail/zfs-discuss/2015-November/023909.html (self-reply to previous mail)

I've also tested this with the patch from openzfs/openzfs#37 applied, but this does not make the least bit of a difference.

fling- · 2015-11-30T08:20:43Z

@Ringdingcoder Is not this corruption reproducible with 4.1.13?

Ringdingcoder · 2015-11-30T19:58:08Z

Yes, it does. Given that it happens with Illumos as well, I wouldn’t have expected anything else. It seems like this bug has been there "forever" (probably only since version 5000).

bprotopopov · 2015-12-03T18:56:40Z

@Ringdingcoder, do you have compression enabled on your dataset and hole_birth feature activated your pool ? If so, then I think I know what is going on.

This is likely the same issue as described in issue #4023 where it is reproduced on zvols, but I can also reproduce it on files.

The problem is as follows. When a large hole (L1 and up) is created in a ZFS file and then partially filled with data, the remaining (non-filled) portions of that hole are assigned birth epoch zero. This breaks the following assumption made in zfs send implementation: all the holes created after the hole_birth feature is activated must have non-zero birth time. Based on this assumption, zfs send does not transmit these holes, so the source of your zfs send will have zero ranges in non-filled portions, whereas the target of your zfs receive will have old data (from the previous snapshot) in these ranges.

If you'd like to play with this, here is a set of commands to reproduce the issue:

zfs create -o recordsize=4k tpool/test_fs
zfs set compression=on tpool/test_fs
truncate -s 1G /tpool/test_fs/large_file
dd if=/dev/urandom of=/tpool/test_fs/large_file bs=4k count=$((3*128)) seek=$((1*128)) oflag=direct
zfs snapshot tpool/test_fs@s1

This creates a large sparse file with the 0.5M-1.5M range filled with random data. Let's create a hole in the middle and then partly fill it with data:

truncate -s $((2*128*4*1024)) /tpool/test_fs/large_file
dd if=/dev/urandom of=/tpool/test_fs/large_file bs=4k count=128 seek=$((3*128)) conv=notrunc
dd if=/dev/urandom of=/tpool/test_fs/large_file bs=4k count=10 seek=$((2*128)) conv=notrunc
zfs snapshot tpool/test_fs@s2

Now let's zfs send/recv this to another dataset in the pool, and compare the files:

zfs send tpool/test_fs@s1 | zfs recv tpool/test_fs_copy
zfs send -i tpool/test_fs@s1 tpool/test_fs@s2 | zfs recv tpool/test_fs_copy

Now, the files are of the same size, but have different contents:

ls -l /tpool/test_fs/large_file /tpool/test_fs_copy/large_file
-rw-r--r--. 1 root root 2097152 Dec  3 13:30 /tpool/test_fs_copy/large_file
-rw-r--r--. 1 root root 2097152 Dec  3 13:30 /tpool/test_fs/large_file
cmp /tpool/test_fs/large_file /tpool/test_fs_copy/large_file
/tpool/test_fs/large_file /tpool/test_fs_copy/large_file differ: byte 1089537, line 2288

So, what's different ? Let's see if the zero ranges in the partially filled L1 hole might be an issue:

dd if=/tpool/test_fs/large_file bs=4k count=1 skip=$((2*128+11)) | hexdump
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.00015878 s, 25.8 MB/s
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0001000
dd if=/tpool/test_fs_copy/large_file bs=4k count=1 skip=$((2*128+11)) | hexdump
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied0000000 87f4 8572 e0fd 4c0e 7228 8c5e 4b25 a26f
0000010 c343 f719 9558 301e 4368 68fb d1a9 8b5e
0000020 b6b8 fe32 4606 eeca 2864 dd7d f10c 6ebd
0000030 d586 b766 ef36 b2dc cbb8 6797 e776 f54d
0000040 10af 24d1 7291 9a1b e2ca 15f3 01de c0ba
, 0.000150659 s, 27.2 MB/s
0000050 03ab 037f 4651 d748 dacf 2a3b 1852 346f
0000060 9785 1319 d9e0 3bb8 3471 206d 479b 1466
0000070 6e4a 2ee3 20aa a0d4 f969 d0b9 e965 7eeb
0000080 031b f763 8439 73c8 599b 651e 670c d65a
0000090 bf43 3d63 8a0c 3690 a5fd cbe3 3c7b 589b
........

So, the source has zeros but the destination still has data from snapshot s1.

Ringdingcoder · 2015-12-03T19:07:34Z

No, I don't have hole_birth activated and never have: http://list.zfsonlinux.org/pipermail/zfs-discuss/2015-November/023910.html. Compression is enabled.

bprotopopov · 2015-12-03T19:18:20Z

Hm, the fact that the file is sparse suggests that it was truncated then written past the end, as I have done above or that writes of zero blocks were converted to holes because of compression. But without birth_hole feature enabled, all the holes are transmitted, so there must be something else going on.

You can use

zdb -ddddd pool/fs N

to see where the holes are where N is the dnode number (run zdb -ddddd pool/fs to see which number your file is).

P.S. You do 'sync' before backing stuff up? This is not a 'kernel page cache holding on to some data' issue?

Ringdingcoder · 2015-12-03T19:22:10Z

All the info is in the mailing list links. There I have posted zdb output which clearly shows the holes. The file has been written in full, from beginning to end in one go. ZFS creates the holes by itself.

http://list.zfsonlinux.org/pipermail/zfs-discuss/2015-November/023909.html

Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Chris Williamson <chris.williamson@delphix.com> In certain circumstances, "zfs send -i" (incremental send) can produce a stream which will result in incorrect sparse file contents on the target. The problem manifests as regions of the received file that should be sparse (and read a zero-filled) actually contain data from a file that was deleted (and which happened to share this file's object ID). Note: this can happen only with filesystems (not zvols, because they do not free (or reuse) object IDs). Note: This can happen only if, since the incremental source (FromSnap), a file was deleted and then another file was created, and the new file is sparse (i.e. has areas that were never written to and should be implicitly zero-filled). We suspect that this was introduced by 4370 (applies only if hole_birth feature is enabled), and made worse by 5243 (applies if hole_birth feature is disabled, and we never send any holes). The bug is caused by the hole birth feature. When an object is deleted and replaced, all the holes in the object have birth time zero. However, zfs send cannot tell that the holes are new since the file was replaced, so it doesn't send them in an incremental. As a result, you can end up with invalid data when you receive incremental send streams. As a short-term fix, we can always send holes with birth time 0 (unless it's a zvol or a dataset where we can guarantee that no objects have been reused).

stevenburgess · 2015-12-16T15:09:19Z

We were seeing similar things as described in the linked email on pools running 0.6.5 working with pools that had feature@hole_birth disabled. Rolling back to 0.6.4 fixed the incremental sendfile creation for new data, but (somewhat obviously) is not able to fix historical data that was sent wrong.

So basically: If for any reason you are running 0.6.5 with pools that have feature@hole_birth disabled, please check your receiving dataset to ensure that it matches your sending dataset (in our case an md5sum of the sparse files revealed different checksums)

kernelOfTruth · 2015-12-16T16:55:47Z

Referencing:

ahrens/illumos@ca3f86b 6370 ZFS send fails to transmit some holes

https://www.illumos.org/issues/6370 ZFS send fails to transmit some holes

stevenburgess · 2015-12-16T17:20:20Z

Those do seem related, but only if the description is incorrect. The above links claim

"This can happen only if, since the incremental source (FromSnap), a file was deleted and then another file was created, and the new file is sparse (i.e. has areas that were never written to and should be implicitly zero-filled)."

However in our case (and I believe in the case in this ticket) the sparse files were not deleted. As best we can tell, at some point the incremental send code was altered in a way that any file with holes would generate an incorrect incremental sendfile, resulting in dataset mismatches. We have some evidence that this is only on system running up to date ZFS code, but with pools that have feature@hole_birth disabled.

This might also explain why both @bprotopopov and @Ringdingcoder applied the patch and did not see any improvement.

Ringdingcoder · 2015-12-16T21:59:27Z

The claim might well be correct. I can confidently say that some files have been deleted on my file system. It does not matter if sparse or not. The only other ingredient is a new sparse file being created that somehow shares some metadata structures (?) or inode numbers (?) with the deleted files. Unfortunately, it does not take much for a file to become sparse. A few blocks of zeroes are enough for that to happen. They specifically do not have to be created in an unusual way.

bprotopopov · 2015-12-16T22:55:31Z

How did you guys manage to create a pool with this feature disabled ? I tried this on Linux at pool create time, and it did not work (error returned). I assume this is the only way to do it because once it is enabled/activated, you cannot disable it either. I believe that there appear to be assumptions made in the code that once it is activated, you cannot disable it.

Boris.

From: Steven Burgess notifications@github.com
Sent: Wednesday, December 16, 2015 10:09 AM
To: zfsonlinux/zfs
Cc: Boris Protopopov
Subject: Re: [zfs] Incremental zfs send does not transmit hole (#4050)

We were seeing similar things as described in the linked email on pools running 0.6.5 working with pools that had feature@hole_birth disabled. Rolling back to 0.6.4 fixed the incremental sendfile creation for new data, but (somewhat obviously) is not able to fix historical data that was sent wrong.

So basically: If for any reason you are running 0.6.5 with pools that have feature@hole_birth disabled, please check your receiving dataset to ensure that it matches your sending dataset (in our case an md5sum of the sparse files revealed different checksums)

Reply to this email directly or view it on GitHubhttps://github.com//issues/4050#issuecomment-165136030.

[https://avatars1.githubusercontent.com/u/173088?v=3&s=400]#4050 (comment)

Incremental zfs send does not transmit hole · Issue #4050 ...#4050 (comment)
github.com
Apparently, there are various related bug reports floating around, but AFAICT they all concern the @hole_birth feature. I have a case that seems very similar, but ...

Ringdingcoder · 2015-12-17T08:18:49Z

I created the pool several years ago. I'm not even sure on which OS. Probably OpenIndiana.

stevenburgess · 2015-12-17T17:30:16Z

Us as well, we have many pools that existed prior to hole_birth that went un-upgraded, and also some ancient pools that are created

zpool create -o version=28

Because some people still believe that they can switch over to zfs-fuse with no ill-consiquence. I know I have seen some people on the mailing list and GH issues make similar claims, so I was specifically trying to let them know, if they update the code but did not upgrade the pool, they should check the integrity of their sparse files.

Ringdingcoder · 2015-12-17T22:01:00Z

@stevenburgess I just downgraded to 0.6.4.2 in order to test your assertion, however, this does not make any difference for me. I still get the same corruption. And as I already mentioned earlier, OpenIndiana 151a8, which I tend to keep on an USB stick, and which is a good deal older than ZoL 0.6.4, behaves exactly the same.

bprotopopov · 2015-12-18T03:13:03Z

So, is there a simple recipe you can describe to reproduce this issue ?

I would be interested to understand the mechanics of the failure.

Boris.

From: Stefan Ring notifications@github.com
Sent: Thursday, December 17, 2015 5:10 PM
To: zfsonlinux/zfs
Cc: Boris Protopopov
Subject: Re: [zfs] Incremental zfs send does not transmit hole (#4050)

@stevenburgesshttps://github.com/stevenburgess I just downgraded to 0.6.4.2 in order to test your assertion, however, this does not make any difference for me. I still get the same corruption. And as I already mentioned earlier, OpenIndiana 151a8, which I tend to keep on an USB stick, and which is a good deal older than ZoL 0.6.4, behaves exactly the same.

[https://avatars3.githubusercontent.com/u/2973190?v=3&s=400]https://github.com/stevenburgess

stevenburgess (Steven Burgess) · GitHubhttps://github.com/stevenburgess
github.com
stevenburgess has 14 repositories written in Shell, Python, and Ruby. Follow their code on GitHub.

Reply to this email directly or view it on GitHubhttps://github.com//issues/4050#issuecomment-165594521.

Ringdingcoder · 2015-12-18T09:08:08Z

No, I just have a state where I can rerun zfs send and zfs receive and check the resulting md5sum. I don't know how to manufacture that state in the first place.

bprotopopov · 2015-12-18T17:12:14Z

Stefan, if you'd like, I can work with you to get more debug info.

This will involve locating a file that was updated incorrectly by send/recv and running zdb on that filesystem in the snapshots and in the resulting filesystem.

Typos courtesy of my iPhone

On Dec 18, 2015, at 4:08 AM, Stefan Ring <notifications@github.com mailto:notifications@github.com> wrote:

No, I just have a state where I can rerun zfs send and zfs receive and check the resulting md5sum. I don't know how to manufacture that state in the first place.

Reply to this email directly or view it on GitHubhttps://github.com//issues/4050#issuecomment-165719000.

Ringdingcoder · 2015-12-18T19:32:54Z

I have already posted most of this: http://list.zfsonlinux.org/pipermail/zfs-discuss/2015-November/023909.html

Ringdingcoder · 2015-12-19T14:33:42Z

@stevenburgess No, you were right. I made a mistake when testing, and I can confirm that the corruption does not happen with 0.6.4.2.

stevenburgess · 2015-12-21T19:51:35Z

Good to hear! I was a little worried when I read that you were having this problem even on much older systems.

ryao · 2016-01-11T06:06:53Z

@behlendorf @fling- pointed this issue out to me. Any issue where the term corruption is mentioned is something we should prioritize. We should add it to the milestone for the next release.

stevenburgess · 2016-01-11T13:04:26Z

@ryao agreed that its important, I brought it up in IRC and elsewhere because I feel like ZoL users are particularly prone to having new code with old pools (I know some people do it on purpose for zfs-fuse compatibility).

6370 ZFS send fails to transmit some holes Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Chris Williamson <chris.williamson@delphix.com> Reviewed by: Stefan Ring <stefanrin@gmail.com> Reviewed by: Steven Burgess <sburgess@datto.com> Reviewed by: Arne Jansen <sensille@gmx.net> Approved by: Robert Mustacchi <rm@joyent.com> References: https://www.illumos.org/issues/6370 illumos/illumos-gate@286ef71 In certain circumstances, "zfs send -i" (incremental send) can produce a stream which will result in incorrect sparse file contents on the target. The problem manifests as regions of the received file that should be sparse (and read a zero-filled) actually contain data from a file that was deleted (and which happened to share this file's object ID). Note: this can happen only with filesystems (not zvols, because they do not free (and thus can not reuse) object IDs). Note: This can happen only if, since the incremental source (FromSnap), a file was deleted and then another file was created, and the new file is sparse (i.e. has areas that were never written to and should be implicitly zero-filled). We suspect that this was introduced by 4370 (applies only if hole_birth feature is enabled), and made worse by 5243 (applies if hole_birth feature is disabled, and we never send any holes). The bug is caused by the hole birth feature. When an object is deleted and replaced, all the holes in the object have birth time zero. However, zfs send cannot tell that the holes are new since the file was replaced, so it doesn't send them in an incremental. As a result, you can end up with invalid data when you receive incremental send streams. As a short-term fix, we can always send holes with birth time 0 (unless it's a zvol or a dataset where we can guarantee that no objects have been reused). Ported-by: Steven Burgess <sburgess@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4369 Closes #4050

Adds a module option which disables the hole_birth optimization which has been responsible for several recent bugs, including issue openzfs#4050. Original-patch: https://gist.github.com/pcd1193182/2c0cd47211f3aee623958b4698836c48 Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#4833

bprotopopov mentioned this issue Dec 3, 2015

6370 ZFS send fails to transmit some holes openzfs/openzfs#37

Closed

stevenburgess mentioned this issue Feb 26, 2016

6370 ZFS send fails to transmit some holes #4369

Closed

behlendorf closed this as completed in c352ec2 Mar 10, 2016

eborisch mentioned this issue Mar 11, 2016

Illumos #6370 ZFS send fails to transmit some holes #4230

Closed

jonwedell mentioned this issue Apr 19, 2016

Data Corruption During ZFS send/receieve #4530

Closed

loli10K mentioned this issue Jun 28, 2016

Silently corrupted file in snapshots after send/receive #4809

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental zfs send does not transmit hole #4050

Incremental zfs send does not transmit hole #4050

Ringdingcoder commented Nov 26, 2015

fling- commented Nov 30, 2015

Ringdingcoder commented Nov 30, 2015

bprotopopov commented Dec 3, 2015

Ringdingcoder commented Dec 3, 2015

bprotopopov commented Dec 3, 2015

Ringdingcoder commented Dec 3, 2015

stevenburgess commented Dec 16, 2015

kernelOfTruth commented Dec 16, 2015

stevenburgess commented Dec 16, 2015

Ringdingcoder commented Dec 16, 2015

bprotopopov commented Dec 16, 2015

Ringdingcoder commented Dec 17, 2015

stevenburgess commented Dec 17, 2015

Ringdingcoder commented Dec 17, 2015

bprotopopov commented Dec 18, 2015

Ringdingcoder commented Dec 18, 2015

bprotopopov commented Dec 18, 2015

Ringdingcoder commented Dec 18, 2015

Ringdingcoder commented Dec 19, 2015

stevenburgess commented Dec 21, 2015

ryao commented Jan 11, 2016

stevenburgess commented Jan 11, 2016

Incremental zfs send does not transmit hole #4050

Incremental zfs send does not transmit hole #4050

Comments

Ringdingcoder commented Nov 26, 2015

fling- commented Nov 30, 2015

Ringdingcoder commented Nov 30, 2015

bprotopopov commented Dec 3, 2015

Ringdingcoder commented Dec 3, 2015

bprotopopov commented Dec 3, 2015

Ringdingcoder commented Dec 3, 2015

stevenburgess commented Dec 16, 2015

kernelOfTruth commented Dec 16, 2015

stevenburgess commented Dec 16, 2015

Ringdingcoder commented Dec 16, 2015

bprotopopov commented Dec 16, 2015

Ringdingcoder commented Dec 17, 2015

stevenburgess commented Dec 17, 2015

Ringdingcoder commented Dec 17, 2015

bprotopopov commented Dec 18, 2015

Ringdingcoder commented Dec 18, 2015

bprotopopov commented Dec 18, 2015

Ringdingcoder commented Dec 18, 2015

Ringdingcoder commented Dec 19, 2015

stevenburgess commented Dec 21, 2015

ryao commented Jan 11, 2016

stevenburgess commented Jan 11, 2016