Unable to run btrfs send after deduplication #50

AlexFreeland · 2015-03-30T21:24:36Z

Hi there,
After running "duperemove -drv" on a btrfs filesystem and taking a readonly snapshot I am no longer able to back it up off disk using btrfs send. The following error is thrown: "ERROR: send ioctl failed with -5: Input/output error"

Checking dmesg shows a corresponding "BTRFS error (device sdc): did not find backref in send_root. inode=708396, offset=131072, disk_byte=379846254592 found extent=379846254592"

The send command still works for the previous snapshot (prior to deduplication) and btrfs scrub shows no errors.

I have tested under Ubuntu 14.04.1 and 14.10 using btrfs-tools versions 3.12 and 3.14.1, respectively, and the issue persists. I am running duperemove v0.09.1.

Is this a known issue? Is there a workaround? Cheers

markfasheh · 2015-03-31T23:05:49Z

Hi to be honest this sounds like some sort of file system corruption. Have you been able to run btrfsck against the disk in question? What does it report?

AlexFreeland · 2015-03-31T23:20:20Z

btrfs check --repair /dev/sdc
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
enabling repair mode
Checking filesystem on /dev/sdc
UUID: 9453d881-041f-431f-a810-0ac9105e93db
cache and super generation don't match, space cache will be invalidated
found 347049165171 bytes used err is 0
total csum bytes: 741816336
total tree bytes: 5194465280
total fs tree bytes: 4176560128
total extent tree bytes: 220987392
btree space waste bytes: 881550719
file data blocks allocated: 1657671786496
referenced 973693386752
Btrfs v3.14.1

I am still able to successfully run send on a snapshot taken immediately prior to running duperemove so the issue certainly seems to be related to the deduplication process. I've tested this several times with the same result.

FWIW if I do a btrfs inspect-internal inode-resolve -v on the reported inode and stat the afflicted file I see that the modify and change times have been updated as reported in #51

Not sure if this is related to the issue I'm having or not but trying to give you as much information to work with as possible.

markfasheh · 2015-03-31T23:34:34Z

The time change means that we did indeed run on the files so that helps, thanks :)

So for some reason I didn't realize that this was happening to you right after a dedupe run. That is definitely not expected behavior though I'm not sure it's the duperemove userspace causing it. Give me some time (it's nearing the end of my day) to try to reproduce this locally on my machine.

In the meantime, what kernel are you using? The output of 'uname -a' would tell me this. Also if you were to cut and paste the exact commands you use to reproduce this it would speed things up on my end.

AlexFreeland · 2015-03-31T23:58:43Z

On the Ubuntu 14.04.1 machine:
Linux TestbedA 3.13.0-39-generic #66-Ubuntu SMP Tue Oct 28 13:30:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
On the 14.10 machine:
Linux TestbedB 3.16.0-33-generic #44-Ubuntu SMP Thu Mar 12 12:19:35 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

To reproduce the issue on a newly created btrfs volume (mounted at /drive/2) I run the following commands:

cd /drive/2
mkdir .backup
btrfs receive -f /drive/1/originalFilesystem .backup
cp -aR .backup/originalFilesystem/* ./
btrfs subvolume snapshot -r ./ .backup/pre
duperemove -drv $(ls -1)
btrfs subvolume snapshot -r ./ .backup/post

At this point running btrfs send on .backup/pre will succeed however running it on .backup/post will fail with the error posted above.

mac-linux-free · 2015-04-06T07:24:37Z

I also have the problem...please tell me howto fix the send/receive.

mac-linux-free · 2015-04-06T20:47:27Z

ok I fixed it by doing a full balance...but this couldn´t be right or ?

AlexFreeland · 2015-04-08T03:28:41Z

I can also confirm that performing a balance allows me to use send as normal. Hopefully this can provide some clue as to what's occurring... :)

Ferroin · 2015-04-08T11:41:11Z

Personally, this sounds like a problem with the in-kernel implementation of IOC_BTRFS_EXTENT_SAME. Has anyone reported this on linux-btrfs@vger.kernel.org?

markfasheh · 2015-04-09T21:04:18Z

I'm also pretty sure it's a kernel bug, there's nothing userspace should be able to do to fail something like this. The extent same ioctl uses the clone code beneath some safety checks.

Clone is also used to do reflinks, so we could exercise (mostly) the same path just by making 'cp --reflink=always' copies of a file then trying to btrfs end that subvolume. If you can do that and tell me whether btrfs send still breaks, that would help narrow it down a bit.

Re the btrfs list, feel free to send them a bug report. You might want to CC me if you do that so I can help out with it.

AlexFreeland · 2015-04-09T23:20:05Z

Hi Mark,
I performed the following test:

cd /drive/2
mkdir .backup
btrfs receive -f /drive/1/originalFilesystem .backup
cp -aR --reflink=always .backup/originalFilesystem/* ./
btrfs subvolume snapshot -r ./ .backup/reflink

Running btrfs send on .backup/reflink completes without error. Hope this helps narrow things down.

mac-linux-free · 2015-04-13T18:37:36Z

a new info.....btrfs balance does only work if no other snapshots are present..that´s bad...how could i migrate snapshots in btrfs ?

mac-linux-free · 2015-05-02T19:27:36Z

new info....the btrfs send and receive works now with btrfs-progs 4.0 on both sides :) even after (un)successful dedup

shyblower · 2015-05-05T08:15:05Z

@mac-linux-free: unfortunately I cannot confirm this! Running on linux kernel 4.0.1 using btrfs-progs-4.0, I still get the error when trying to send a snapshot of a volume after having run dupremove on that volume.

mac-linux-free · 2015-05-05T08:16:47Z

ok ... we are waiting for 4.1 :)

mac-linux-free · 2015-06-26T08:36:05Z

it seams that kernel 4.1 and btrfs-progs 4.1 lead to a successful deduplication...but you have to balance your source btrfs pool first.

shyblower · 2015-06-26T08:40:01Z

I've just balanced my 6TB backup target... it took weeks :/

mac-linux-free · 2015-06-26T08:43:27Z

...hoping I don´t have to do it on my 110TB source :\

markfasheh · 2015-07-08T04:18:29Z

This sounds like it was fixed upstream - and to my knowledge the fix wasn't directly related to dedupe (let me know if otherwise). Going to close for now.

mac-linux-free · 2015-07-19T07:52:02Z

Sorry to open this again. This is still not fixed with kernel 4.1.2 and btrfs-progs 4.1. After an unsuccessful dedup:

Kernel processed data (excludes target files): 32.0G
Comparison of extent info shows a net change in shared extents of: 0.0

the send / receive does not work:

BTRFS error (device vdb): did not find backref in send_root. inode=17588, offset=1703936, disk_byte=20751888384 found extent=20751888384

I fixed send/receive again with a full balance. But how to dedup?

doudou · 2015-08-07T18:38:11Z

Same here. Ran a dedup and got the send_root bug. The problem is that balance currently randomly crashes my machine (with a hard lock). I am deleting the offending inodes one by one ...

markfasheh · 2015-08-10T18:07:44Z

Ooops ok I'm going to keep this one open, with the same comment I gave in issue#87:

"Regarding the btrfs error you're seeing during send, there isn't anything that duperemove is doing directly which would cause this behavior. My guess is that send is broken (again) or that the clone code (used in the kernel for dedupe) corrupted something on disk. Have you asked about this on the btrfs list?"

I'll try to reproduce this week (on vacation) and take it to the list if I can't figure out immediately what's causing it.

markfasheh · 2015-08-28T22:13:01Z

Ok I tried this a few times on 4.2-rc7 and was not able to reproduce the issue. Here's an example of what I was doing (I tried a few combinations of subvolumes and file trees)

fstest1:/ # uname -r
4.2.0-rc7+
fstest1:/ # cp -a /boot /btrfs/boot
fstest1:/ # cp -a /boot /btrfs/boot.1
fstest1:/ # duperemove -rdh /btrfs/boot /btrfs/boot.1 | tail -n 5
[0x13fe800] Dedupe 1 extents (id: 53d6821b) with target: (0.0, 57.8M), "/btrfs/boot/vmlinux-4.2.0-rc4+.gz"
[0x13fe8a0] Try to dedupe extents with id 7100870a
[0x13fe8a0] Dedupe 1 extents (id: 7100870a) with target: (0.0, 57.9M), "/btrfs/boot/vmlinux-4.2.0-rc6+.gz.old"
Kernel processed data (excludes target files): 529.6M
Comparison of extent info shows a net change in shared extents of: 388.4M
fstest1:/ # btrfs su sn -r /btrfs/ /btrfs/rosnap
Create a readonly snapshot of '/btrfs/' in '/btrfs/rosnap'
fstest1:/ # btrfs send /btrfs/rosnap/ > /dev/null
At subvol /btrfs/rosnap/
fstest1:/ # dmesg | tail -n 5
[360097.205463] BTRFS: device fsid 2ef606d8-6f69-460a-8f2e-ebdf477c2aba devid 1 transid 3 /dev/vdb1
[360100.424094] BTRFS info (device vdb1): disk space caching is enabled
[360100.424099] BTRFS: has skinny extents
[360100.424101] BTRFS: flagging fs with big metadata feature
[360100.429689] BTRFS: creating UUID tree

Does something like this reproduce it for you all or did I miss a step?

mac-linux-free · 2015-09-03T08:55:20Z

Sorry to say it is still not working.

Linux fbo-fs-02 4.2.0-1.el7.elrepo.x86_64 #1 SMP Sun Aug 30 21:25:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux

#dmesg
[18047.639439] BTRFS error (device vdc): did not find backref in send_root. inode=146553, offset=12058624, disk_byte=178733858816 found extent=178733858816

This error occurs on the fileserver which I had deduped before.

doudou · 2015-09-03T13:44:05Z

This error occurs on the fileserver which I had deduped before.

Could it be that the error is due to the previous dedup ?

By the way, it seems that btrfs-progs 4.1.2 can recover from these errors. I had one filesystem with those, and running a btrfs check --repair removed them

mac-linux-free · 2015-09-03T14:20:05Z

No I did not dedup until the Kernel 4.2 was finally released. And I do a send/receive backup to a different server nightly. After the dedup my scripted backup failed. I also know how to recover the error just with btrfs balance online.

markfasheh · 2015-09-03T17:27:22Z

I tried my test on 4.1 and didn't hit anything. Can anyone here give e a test case which reproduces this from a fresh file system? I doubt I'll be able to find this without a reproducer :(

mac-linux-free · 2015-09-05T08:40:53Z

I do not understand this. It is happening on one server only. I had 2 new installations last week and this errors did not occur on the new servers. Perhaps I shoud try the btrfs check --repair option instead of balancing only.

How could I debug ?

markfasheh · 2015-09-05T16:32:00Z

I'd start with btrfsck if you haven't already.

mac-linux-free · 2015-09-07T06:32:06Z

Now I have one more info. If duperemove runs (in background) you should avoid to run btrfs send/receive. And I found that on big filesystems sometimes occured OOM-errors.

Both of these things are leading to the error above.

mac-linux-free · 2015-10-14T05:00:32Z

the problem still exists on kernel 4.2.3 and btrfsprogs 4.2.2...I do have many snapshots and run duperemove with the -x switch. Is the the right way to do it or do I have to run duperemove on the whole pool?

mac-linux-free · 2015-10-14T14:29:08Z

Perhaps I should dedup on the pool level and not at the mounted subvol? ( /mnt/btrfs/files instead of /mnt/files) ... I´m testing.

ddiss · 2016-02-11T15:09:47Z

I raised a duplicate bug for this issue at:
https://bugzilla.suse.com/show_bug.cgi?id=966259

Filipe mentioned that this was fixed in the 4.3 kernel via:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d6589101b67a55107652050dfbf414403a93e351

markfasheh · 2016-07-26T04:04:53Z

Closing as this was fixed in the upstream kernel, thanks to Filipe Manana.

samcv · 2018-03-25T19:58:39Z

This does not seem to be fixed. At least I just encountered it. I ran duperemove and now send causes OOM which then causes a kernel panic after it kills every single program and cannot kill any more. Running rebalance and hopefully that will get things working again.

ddiss · 2018-03-25T21:20:57Z

@samcv which kernel version are you using? I'd suggest raising the issue on the linux-btrfs vger.kernel.org mailing list if you're hitting it with a recent kernel.

samcv · 2018-03-25T21:22:12Z

@ddiss I am using 4.15.10

I just did a full rebalance and now send/receive works fine.

markfasheh closed this as completed Jul 8, 2015

markfasheh reopened this Aug 10, 2015

markfasheh closed this as completed Aug 10, 2015

markfasheh added the bug label Aug 10, 2015

markfasheh reopened this Aug 10, 2015

This was referenced Nov 17, 2015

Truncate error during btrfs send AmesCornish/buttersink#15

Closed

buttersink uses wrong sizes when finding best sync path AmesCornish/buttersink#14

Closed

markfasheh closed this as completed Jul 26, 2016

SeeSpotRun mentioned this issue Sep 22, 2018

question: does deduplication on read-only snapshots break incremental send/receive sahib/rmlint#308

Closed

Unable to run btrfs send after deduplication #50

Unable to run btrfs send after deduplication #50

Comments

AlexFreeland commented Mar 30, 2015

markfasheh commented Mar 31, 2015

AlexFreeland commented Mar 31, 2015

markfasheh commented Mar 31, 2015

AlexFreeland commented Mar 31, 2015

mac-linux-free commented Apr 6, 2015

mac-linux-free commented Apr 6, 2015

AlexFreeland commented Apr 8, 2015

Ferroin commented Apr 8, 2015

markfasheh commented Apr 9, 2015

AlexFreeland commented Apr 9, 2015

mac-linux-free commented Apr 13, 2015

mac-linux-free commented May 2, 2015

shyblower commented May 5, 2015

mac-linux-free commented May 5, 2015

mac-linux-free commented Jun 26, 2015

shyblower commented Jun 26, 2015

mac-linux-free commented Jun 26, 2015

markfasheh commented Jul 8, 2015

mac-linux-free commented Jul 19, 2015

doudou commented Aug 7, 2015

markfasheh commented Aug 10, 2015

markfasheh commented Aug 28, 2015

mac-linux-free commented Sep 3, 2015

doudou commented Sep 3, 2015

mac-linux-free commented Sep 3, 2015

markfasheh commented Sep 3, 2015

mac-linux-free commented Sep 5, 2015

markfasheh commented Sep 5, 2015

mac-linux-free commented Sep 7, 2015

mac-linux-free commented Oct 14, 2015

mac-linux-free commented Oct 14, 2015

ddiss commented Feb 11, 2016

markfasheh commented Jul 26, 2016

samcv commented Mar 25, 2018

ddiss commented Mar 25, 2018

samcv commented Mar 25, 2018