[Improvement] Feature to check for files [on an external disk] which are not present somewhere on the [backup] disk #162

Wikinaut · 2019-08-05T12:22:02Z

I wish to have a feature which makes intelligent use of the checksum/hashes of the huge "backup" drive X so that - when I connect a smaller drive Z to my computer - so that I can quickly list all those files which are

present on drive Z ; and/but
not present on drive X

This is a "one-way" check. I don't want to have the huge list of differences. I only want to know those files from Z which for one reason or another have not been copied (or later moved) to drive X, on any directory there. So basically, it's a checksum/hash issue.

Wikinaut · 2019-09-13T07:17:59Z

Hello, can we talk about such a new feature? If you wish, I can explain again why rsync is not a solution.

It's something like https://askubuntu.com/a/767988

fdupes is an excellent program to find the duplicate files but it does not list the non-duplicate files, which is what you are looking for. However, we can list the files that are not in the fdupes output using a combination of find and grep.

pixelb · 2019-09-13T09:28:06Z

OK an rsync solution should work if the structure in the dest was similar to that in the source.
I.E. something like rsync -rl --dry-run --out-format="%f" --checksum Z/ X/

So I presume the structure of your source Z is different to that in dest X.
I.E. you want to list files not backed up, no matter where they are in Z,
so that you can copy them to the appropriate location in X etc.

So you want the equivalent of the following, but with more efficient handling of unique file sizes etc:

    $ SRC=Z/; DST=X/
    $ find $SRC $DST -type f | xargs md5sum | sed "\|  $DST|p" |
      sort | uniq -w32 -u | cut -d' ' -f3

One could avoid the overhead of scanning and checksumming $DST if it was not updated between fslint dedupe runs. In that case fslint could write and index of size,checksum,name which could be used directly in the process above

Wikinaut · 2019-09-13T09:45:52Z

Yes, the structure is different, or may be different, so we have to "search" for the file hash.

I also found this proposal for "fdupes" adrianlopezroche/fdupes#19

It would be good to save the hash/parse/analyze information of a specific fdupes run, in order to compare later this "virtual"files tree with a real file tree.

Currently I run the suggested sequence from https://askubuntu.com/a/767988 (see above):
to list the files which are unique to backup (Z in my example), i. e. which are in backup but not in documents. [My use case is vice versa: to look for files which are not yet somewhere in the "backup"]

fdupes -r backup/ documents/ > dup.txt
find backup/ -type f | grep -Fxvf dup.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Improvement] Feature to check for files [on an external disk] which are not present somewhere on the [backup] disk #162

[Improvement] Feature to check for files [on an external disk] which are not present somewhere on the [backup] disk #162

Wikinaut commented Aug 5, 2019 •

edited

Loading

Wikinaut commented Sep 13, 2019 •

edited

Loading

pixelb commented Sep 13, 2019 •

edited

Loading

Wikinaut commented Sep 13, 2019 •

edited

Loading

[Improvement] Feature to check for files [on an external disk] which are *not* present somewhere on the [backup] disk #162

[Improvement] Feature to check for files [on an external disk] which are *not* present somewhere on the [backup] disk #162

Comments

Wikinaut commented Aug 5, 2019 • edited Loading

Wikinaut commented Sep 13, 2019 • edited Loading

pixelb commented Sep 13, 2019 • edited Loading

Wikinaut commented Sep 13, 2019 • edited Loading

[Improvement] Feature to check for files [on an external disk] which are not present somewhere on the [backup] disk #162

[Improvement] Feature to check for files [on an external disk] which are not present somewhere on the [backup] disk #162

Wikinaut commented Aug 5, 2019 •

edited

Loading

Wikinaut commented Sep 13, 2019 •

edited

Loading

pixelb commented Sep 13, 2019 •

edited

Loading

Wikinaut commented Sep 13, 2019 •

edited

Loading