Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
A program to find duplicated files
Python
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
README.md
fdup.py

README.md

Description

fdup.py is a simple and fast program that finds duplicate files.

Why?

Because it is amazingly fast. Much faster than fdupes, which is written in C and much more readable than fslint/findup.

Python is not a limiting factor, but disc speed is. Therefore a sane algorithm to find/sort out potential duplicate files is much more important than the language used. In the end it is all about the algorithm and disc performance. Fstat, disc IO, hashing is in Python nearly as fast as in C, don't worry.

Usage

$ find $PWD -type f | ./fdup.py

or to exclude the time find needs:

$ find $PWD -type f > files.txt $ ./fdup.py < files.txt

RESULTS

Testdirectory is my $HOME which contained 62022 files. There are 18680 duplicate files (empty files, duplicates from svn and git repos)

Programusersystemcpu (%)total
fdup3.38s6.10s53:01.89
fslint18.04s9.20s123:41.20
fdupes62.35s15.46s206:16.49
duff22.59s4.42s67:18.13
dupseek18.33s6.55s48:30.35
ftwin15.94s7.50s39:57.91
Something went wrong with that request. Please try again.