btrfs-sxbackup can cause btrfs deadlock #17

bhelm · 2015-06-10T16:57:15Z

this problem is more related to btrfs than to btrfs-sxbackup. im still creating this issue to warn other users about it and because btrfs-sxbackup could avoid this problem.

when btrfs-sxbackup starts while another instance on the same target is running, btrfs-sxbackup does not recognize this and tries to delete the temp directories on both sides. this causes a deadlock in linux kernel versions < 4.0.3 (see https://www.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.0.3 ).

this deadlock brings the .sxbackup directory to a unusable state, only reboot will solve this.

btrfs-sxbackup could warn and refuse to run a backup job that appears already running because most users do not want to run two instances of the same job at the same time.

lkraav · 2015-06-10T16:58:51Z

Thanks. I have been wondering what happens when a remote transfer job takes more than an hour and the next job begins.

masc3d · 2015-06-10T18:26:07Z

thanks for the heads up. the expected result in this case is:

INFO preparing environment
ERROR Command '['bash', '-c', 'if [ -d "/.sxbackup/.temp"* ]; then btrfs sub del "/.sxbackup/.temp"*; fi']' returned non-zero exit status 1
ERROR: cannot delete '/.sxbackup/.temp.675e3049b71349b2aa8dc19cac6b349b' - Operation not permitted

As a temporary subvolume could exist for various reasons, also eg. unclean termination due to kill, reset, power outage etc. refusing to run essentially implies delegating the cleanup to the user for all other cases, which is (imho) a bad idea. Best to use kernel version which has the fix.

bhelm · 2015-06-11T06:55:26Z

my suggestion is to let btrfs-sxbackup check itself if it is already running, because it should not depend on (the currently buggy) btrfs to do so. btrfs-sxbackup should not run twice. even if running it two times will make it fail at a lower level, it is imho not the clean way to do so. The "Operation not permitted" message is not exactly a "Btrfs-Sxbackup is already running".

also please note that linux 4.0.3+ is rarly seen on any production systems. debian does not even offer a prebuild package for amd64. this bug hit me three times on different systems with linux 3.17 - 4.0.2, causing silent backup failures and reboots.

lkraav · 2015-06-11T08:07:07Z

@bhelm has a strong argument.

masc3d · 2015-06-11T08:21:36Z

Multiple instances of btrfs-sxbackup are allowed, since you could run multiple jobs at the same time.
I cannot replicate this on 4.0.1 (at all). I see delays, but no deadlocks. How do you replicate it?

The issue is already resolved on kernel level, I will not implement a workaround due to update woes of specific distros. But you could look into https://ma.ttias.be/prevent-cronjobs-from-overlapping-in-linux/ if you have to.

bhelm · 2015-07-08T13:18:03Z

Multiple instances of btrfs-sxbackup are allowed, since you could run multiple jobs at the same time.

that is right but that is not the problem im talking about. the problem occures sometimes when you run multiple instances of the SAME job, this can cause deadlocks on older kernels and btrfs-sxbackup does not support this use case either.

btrfs itself should have no problems if you create multiple snapshots of the same volume and transfer them simultaniously, but btrfs-sxbackup deletes the first partially transfered snapshot before it starts another transfer (without noticing that the transfer that belongs to that snapshot is still active)

this could be avoided by checking for already running btrfs-sxbackup instances on per-job basis and refuse to start another transfer on that condition. Im now using the linux flock utility to archive this, having this functionality native in btrfs-sxbackup would make job creation a bit easier and would prevent new users from running the same job twice by accident - and also running into that btrfs deadlock bug on common stable kernels that requires a reboot to fix.

masc3d · 2015-08-06T19:35:27Z

ok. if you provide a good implementation using fcntl I will pull from you, but I don't have time to implement it for now.

masc3d closed this as completed Jun 10, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

btrfs-sxbackup can cause btrfs deadlock #17

btrfs-sxbackup can cause btrfs deadlock #17

bhelm commented Jun 10, 2015

lkraav commented Jun 10, 2015

masc3d commented Jun 10, 2015

bhelm commented Jun 11, 2015

lkraav commented Jun 11, 2015

masc3d commented Jun 11, 2015

bhelm commented Jul 8, 2015

masc3d commented Aug 6, 2015

btrfs-sxbackup can cause btrfs deadlock #17

btrfs-sxbackup can cause btrfs deadlock #17

Comments

bhelm commented Jun 10, 2015

lkraav commented Jun 10, 2015

masc3d commented Jun 10, 2015

bhelm commented Jun 11, 2015

lkraav commented Jun 11, 2015

masc3d commented Jun 11, 2015

bhelm commented Jul 8, 2015

masc3d commented Aug 6, 2015