Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] restoring a subdirectory doesn't play well with include/exclude filters #463

Closed
lpancescu opened this issue Sep 11, 2020 · 4 comments · Fixed by #694
Closed

[BUG] restoring a subdirectory doesn't play well with include/exclude filters #463

lpancescu opened this issue Sep 11, 2020 · 4 comments · Fixed by #694
Assignees

Comments

@lpancescu
Copy link

lpancescu commented Sep 11, 2020

Bug summary

rdiff-backup crashes due to a Python OSError exception while trying to restore a subdirectory if I specify include/exclude filters, and the files I'm trying to restore are completely gone from the destination directory.

Version, Python, Operating System

rdiff-backup 2.0.5, Python 3.8.5, Fedora 32

Call rdiff-backup -v9 and paste the output between the backquotes below, repeat for each environment impacted:

2020-09-11 22:59:21.708502 +0200  <CLIENT-7215>  Fatal Error: No arguments given
See the rdiff-backup manual page for more information.
2020-09-11 22:59:21.703167 +0200  <CLIENT-7215>  Using rdiff-backup version 2.0.5
2020-09-11 22:59:21.703273 +0200  <CLIENT-7215>  	with cpython /usr/bin/python3 version 3.8.5
2020-09-11 22:59:21.708363 +0200  <CLIENT-7215>  	on Linux-5.8.6-201.fc32.x86_64-x86_64-with-glibc2.2.5, fs encoding utf-8

rdiff-backup call

I have a fully reproducible test case. First, let's create a few files and directories to play with:

$ mkdir -p backup original/dir{1,2}
$ cd original
$ echo file1 | tee dir1/file1.txt > dir2/file1.txt
$ echo file2 | tee dir1/file2.txt > dir2/file2.txt
$ echo base > base.txt
$ cd -

Create our first backup in the empty backup directory, then overwrite a file with garbage and :

$ rdiff-backup original backup
$ echo garbage > original/dir1/file2.txt

We can now try to selectively restore dir1:

$ rdiff-backup -r now --force --include original/dir1/file2.txt --exclude '**' backup/dir1 original/dir1
Exception '[Errno 39] Directory not empty: b'original/dir1'' raised of class '<class 'OSError'>':
  File "/usr/lib64/python3.8/site-packages/rdiff_backup/Main.py", line 395, in error_check_Main
    Main(arglist)
  File "/usr/lib64/python3.8/site-packages/rdiff_backup/Main.py", line 417, in Main
    take_action(rps)
  File "/usr/lib64/python3.8/site-packages/rdiff_backup/Main.py", line 373, in take_action
    Restore(rps[0], rps[1], 1)
  File "/usr/lib64/python3.8/site-packages/rdiff_backup/Main.py", line 720, in Restore
    restore.Restore(
  File "/usr/lib64/python3.8/site-packages/rdiff_backup/restore.py", line 42, in Restore
    TargetS.patch(target, diff_iter)
  File "/usr/lib64/python3.8/site-packages/rdiff_backup/restore.py", line 337, in patch
    ITR.Finish()
  File "/usr/lib64/python3.8/site-packages/rdiff_backup/rorpiter.py", line 277, in Finish
    to_be_finished.end_process()
  File "/usr/lib64/python3.8/site-packages/rdiff_backup/restore.py", line 762, in end_process
    self.base_rp.rmdir()
  File "/usr/lib64/python3.8/site-packages/rdiff_backup/rpath.py", line 1222, in rmdir
    self.conn.os.rmdir(self.path)

Traceback (most recent call last):
  File "/usr/bin/rdiff-backup", line 32, in <module>
    rdiff_backup.Main.error_check_Main(sys.argv[1:])
  File "/usr/lib64/python3.8/site-packages/rdiff_backup/Main.py", line 395, in error_check_Main
    Main(arglist)
  File "/usr/lib64/python3.8/site-packages/rdiff_backup/Main.py", line 417, in Main
    take_action(rps)
  File "/usr/lib64/python3.8/site-packages/rdiff_backup/Main.py", line 373, in take_action
    Restore(rps[0], rps[1], 1)
  File "/usr/lib64/python3.8/site-packages/rdiff_backup/Main.py", line 720, in Restore
    restore.Restore(
  File "/usr/lib64/python3.8/site-packages/rdiff_backup/restore.py", line 42, in Restore
    TargetS.patch(target, diff_iter)
  File "/usr/lib64/python3.8/site-packages/rdiff_backup/restore.py", line 337, in patch
    ITR.Finish()
  File "/usr/lib64/python3.8/site-packages/rdiff_backup/rorpiter.py", line 277, in Finish
    to_be_finished.end_process()
  File "/usr/lib64/python3.8/site-packages/rdiff_backup/restore.py", line 762, in end_process
    self.base_rp.rmdir()
  File "/usr/lib64/python3.8/site-packages/rdiff_backup/rpath.py", line 1222, in rmdir
    self.conn.os.rmdir(self.path)
OSError: [Errno 39] Directory not empty: b'original/dir1'

What happened and what did you expect?

I expected rdiff-backup to restore the dir1/file1.txt instead of crashing. Even worse, dir1/file2.txt disappeared completely after the crash, but it can fortunately still be recovered from the backup directory:

$ find original/ -type f
original/dir2/file2.txt
original/dir2/file1.txt
original/dir1/file1.txt
$ ls backup/dir1
file1.txt  file2.txt

More information

Here's the -v9 output:

rdiff-backup.log

Possible workarounds:

  • restore the entire directory, losing changes to the other files: rdiff-backup -r now --force backup/dir1 original/dir1
  • use include/exclude filters without subdirectories: rdiff-backup -r now --force --include 'original/dir1/file2.txt' --exclude '**' backup original
  • use rsync, which offers its own filters (this only works for the latest backed up version, not older ones): rsync -a --include 'file2.txt' --exclude '**' backup/dir1/ original/dir1

The test case might seem contrived, since rdiff-backup backup/dir1/file1.txt original/dir1/file1.txt is easier, but I discovered this using rdiff-backup with a globbing filelist to restore several file in the /root directory - it was a bit scary to see it tried to remove the /root directory, running with root privileges.

I've been using rdiff-backup since at least 2009, many thanks for reviving the project!

@ericzolf
Copy link
Member

Yeah, I'm not exactly happy about how exclude/include is implemented currently, because either I'm too stupid or it's too complex to oversee, so either I need to become smarter or the code needs to be simplified, so that I can fix properly such issues (it's not the first of the same kind), but it'll take a bit of time until I'll come to this.

Thanks for the reproducer, it's always very helpful, and a good basis for a new test case.

@ericzolf ericzolf added the bug label Sep 26, 2020
@ericzolf
Copy link
Member

ericzolf commented Apr 1, 2022

OK, looking into it, it's due to some sub-directory effect. The first restore command does not fail whereas the 2nd one does fail (and neither does the 3rd option fail).

mkdir -p from/sub
date > from/sub/a
date > from/sub/b
rdiff-backup backup from bak
date > from/sub/b  # not necessary to reproduce
rdiff-backup --force restore --include from/sub/b --exclude '**' bak from
# vs.
rdiff-backup --force restore --include from/sub/b --exclude '**' bak/sub from/sub
# vs.
rdiff-backup --force restore --include from/sub/b --exclude '**' bak/sub/b from/sub/b

Note that replacing b with a doesn't show any difference. Without include/exclude recovery works in all cases.

@ericzolf
Copy link
Member

ericzolf commented Apr 9, 2022

I'm still not completely through with my analysis but basically the filtering is applied while recovering to both the source and the target, and the source filtering goes wrong because it mixes file index relative to the base directory of the repo (e.g. (sub, a)) with target directory relative to given path i.e. from/sub, which gives from/sub/sub/a, failing obviously selection, which is interpreted later on as a removed file in the target directory.

The removal of sub/b succeeds, but the removal of sub fails because sub/a still exists, but this is just a consequence not the actual issue. The target filtering works properly and makes sure that sub/a is ignored.

I need now to find a way to make the selection "more relative" without breaking other use cases...

@ericzolf ericzolf self-assigned this Apr 9, 2022
@ericzolf
Copy link
Member

I've tried to fix this issue but it creates issues throughout the whole code, and would introduce the need for some behavior changes which I can't justify right now (see unfinished branch ericzolf-fix-restore-sub-cludes-463). I let it sink for now, but I might just close this issue with a new "known issues" document, with a documented workaround.

ericzolf added a commit that referenced this issue Apr 17, 2022
DEV: get rid of unused RepoShadow.update_quoting function
DEV: introduce properly reference path, index, inc and type (partially renamed from restore_....)
FIX: restoring from sub-path while selecting is now forbidden to avoid data loss, workaround documented in FAQ, closes #463
ericzolf added a commit that referenced this issue Apr 17, 2022
DEV: get rid of unused RepoShadow.update_quoting function
DEV: introduce properly reference path, index, inc and type (partially renamed from restore_....)
FIX: restoring from sub-path while selecting is now forbidden to avoid data loss, workaround documented in FAQ, closes #463
ericzolf added a commit that referenced this issue Apr 21, 2022
DEV: get rid of unused RepoShadow.update_quoting function
DEV: introduce properly reference path, index, inc and type (partially renamed from restore_....)
FIX: restoring from sub-path while selecting is now forbidden to avoid data loss, workaround documented in FAQ, closes #463
ericzolf added a commit that referenced this issue Apr 29, 2022
DEV: get rid of unused RepoShadow.update_quoting function
DEV: introduce properly reference path, index, inc and type (partially renamed from restore_....)
FIX: restoring from sub-path while selecting is now forbidden to avoid data loss, workaround documented in FAQ, closes #463
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants