ddup: Detect duplicate files.

This simple utility reads from stdin a list of filenames, and check for duplicates files (i.e. with the same content).

The default output is a list of "ln -f" commands, but it can be changed using the '-f' option and adding a new format string.

This program does not writes any change to disk.

I wrote this simple tool out of frustration on how slow and complex are other similar utilities are.

A simple way to run this program is something like:

find . -type f | ddup > merge_files.sh

then you can review and run "merge_files.sh", or use it as a start for other scripts.

How this works

the algorithm is:

read file names from standard input
group files by size, skipping empty files and files with the same inode.
when more than one file with the same size was found calculate the md5 of the first bytes, and create a hash table indexed by md5.
when more than one file with the same md5 if found, do a full compare to check if are really the same file.
for every file marked as a duplicate of another, write a "ln -f" command to standard output.

There are two simple scripts:

rmlinked: recursively remove files with more than one hard link. useful after applying the output of "ddup", to remove duplicates on the directory you don't want to keep.
recrmdir: recursively remove all the empty directories. useful after running "rmlinked".

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
TODO.md		TODO.md
format.c		format.c
format.h		format.h
main.c		main.c
recrmdir		recrmdir
rmlinked		rmlinked