-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
btrfs-sxbackup can cause btrfs deadlock #17
Comments
Thanks. I have been wondering what happens when a remote transfer job takes more than an hour and the next job begins. |
thanks for the heads up. the expected result in this case is:
As a temporary subvolume could exist for various reasons, also eg. unclean termination due to kill, reset, power outage etc. refusing to run essentially implies delegating the cleanup to the user for all other cases, which is (imho) a bad idea. Best to use kernel version which has the fix. |
my suggestion is to let btrfs-sxbackup check itself if it is already running, because it should not depend on (the currently buggy) btrfs to do so. btrfs-sxbackup should not run twice. even if running it two times will make it fail at a lower level, it is imho not the clean way to do so. The "Operation not permitted" message is not exactly a "Btrfs-Sxbackup is already running". also please note that linux 4.0.3+ is rarly seen on any production systems. debian does not even offer a prebuild package for amd64. this bug hit me three times on different systems with linux 3.17 - 4.0.2, causing silent backup failures and reboots. |
@bhelm has a strong argument. |
Multiple instances of btrfs-sxbackup are allowed, since you could run multiple jobs at the same time. The issue is already resolved on kernel level, I will not implement a workaround due to update woes of specific distros. But you could look into https://ma.ttias.be/prevent-cronjobs-from-overlapping-in-linux/ if you have to. |
that is right but that is not the problem im talking about. the problem occures sometimes when you run multiple instances of the SAME job, this can cause deadlocks on older kernels and btrfs-sxbackup does not support this use case either. btrfs itself should have no problems if you create multiple snapshots of the same volume and transfer them simultaniously, but btrfs-sxbackup deletes the first partially transfered snapshot before it starts another transfer (without noticing that the transfer that belongs to that snapshot is still active) this could be avoided by checking for already running btrfs-sxbackup instances on per-job basis and refuse to start another transfer on that condition. Im now using the linux flock utility to archive this, having this functionality native in btrfs-sxbackup would make job creation a bit easier and would prevent new users from running the same job twice by accident - and also running into that btrfs deadlock bug on common stable kernels that requires a reboot to fix. |
ok. if you provide a good implementation using fcntl I will pull from you, but I don't have time to implement it for now. |
this problem is more related to btrfs than to btrfs-sxbackup. im still creating this issue to warn other users about it and because btrfs-sxbackup could avoid this problem.
when btrfs-sxbackup starts while another instance on the same target is running, btrfs-sxbackup does not recognize this and tries to delete the temp directories on both sides. this causes a deadlock in linux kernel versions < 4.0.3 (see https://www.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.0.3 ).
this deadlock brings the .sxbackup directory to a unusable state, only reboot will solve this.
btrfs-sxbackup could warn and refuse to run a backup job that appears already running because most users do not want to run two instances of the same job at the same time.
The text was updated successfully, but these errors were encountered: