New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADD deduplicate Dropbox files based on checksum ignoring name #1674
Comments
While there may be some use cases where it makes sense for rclone to have that option, why not pull this idea out as its own feature? I feel that it could be done as its own tool or modifying an existing tool to monitor your dropbox folder. |
there are already such tools out there, but none can access Dropbox on their own |
This would be nice implemented either as part of the |
Some random thoughts:
|
If you are thinking about scripting it then I'd use "rclone md5sum remote:path | sort | uniq -c | sort -n" as a starting point which will do nearly all the work for you. |
Wouldn't Most dupe finders first compare sizes, and then onlyif identical, compare checksum. |
It's unlikely. Comparing file size first is mostly a performance gain, since that information is usually available in a file system's index, i.e. quick to access. Computing the md5 requires reading the whole file on most filesystems, i.e. rather slow. |
will look into this as I am also interested into adding some fslint-alike functions |
does anyone have a solution? |
You could use something like this as a start
which will print all files with the same content hash |
Is there a equivalent for google drive? |
This will do it for google drive
|
Wow I didn't expect to have such a list of duplicated files...I thought rclone dedupe couldn't let this happen... So now how I get rid of the duplicates? |
It is not an easy problem... But I'd look through the results and work out if there are whole duplicated directories that I could get rid of. |
I think it should be clearer on the documentation about that then, because most people assume deleting duplicated means deleting duplicated files regardless if the filenames are the same or not - i.e by checking checksums. |
Bumping this. Would be really nice to have. |
FIXME needs tests FIXME needs docs Fixes #1674
I had an idea about this... It turned out to be very easy to add another flag to dedupe Anyone like to give Any thoughts on the user interface - is a v1.54.0-beta.4819.52f25b1b0.fix-1674-dedupe-by-hash on branch fix-1674-dedupe-by-hash (uploaded in 15-30 mins) |
How would this work against a gdrive remote (ie. Would gdrive report the hash or would I need to download anything to get it?) My current use case is backing up from Dropbox to gdrive. |
It will work against any remote which supplies hashes, which includes both Google Drive and Dropbox |
@ncw I'm trying out the |
You should be able to use any of the dedupe modes
Longest or shortest name is missing from that list though! |
I've merged the --by-hash flag to master now which means it will be in the latest beta in 15-30 mins and released in v1.54 |
Great feature, thanks. It would be even better if there was a similar "--by-size" flag. By itself --by-size would work similarly to --by-hash: consider files to be duplicates if they have the same size. But, much more useful, would be using the two together: --by-hash --by-size to mean consider them duplicates if they have the same size and the same hash. |
use case:
I use Dropbox (because I'm masochistic). Dropbox likes to append (1), (2) etc to filenames when merging 2 folders with identical files. I can't just delete everything with (1) etc in their names, because it might not be a duplicate anymore.
I would like to rclone dedup those folders by using the checksums of each files, ignoring names, but keeping the shortest name (so it would prefer
file.mp4
overfile (1).mp4
)The text was updated successfully, but these errors were encountered: