Skip to content
FastDup is a tool to find copies of the same file within directory tree(s)
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
include
src
LICENSE
Makefile
README

README

FastDup - http://github.com/rburchell/fastdup

The current maintainer is:
 Robin Burchell <robin+git@viroteck.net>

However, most(/all) of the blood, sweat, and tears in FastDup - and all of the credit belongs to:
 John Brooks <special@dereferenced.net>

-- INTRODUCTION --

FastDup is a tool to find copies of the same file within directory tree(s), designed
for maximum speed and efficiency unlike most similar tools. Where many similar tools
rely on checksums or hashes of files, or simple comparisons, fastdup uses a number
of cleverly optimized tricks to reduce the number of actual comparisons necessary,
and as a result can scan large sets of data extremely quickly compared to
alternatives.

-- INSTALLATION --

make
make install

-- NOTES --

FastDup will currently only work on OS X and modern Linux platforms with GNU make.

Please report build problems (on non-BSD systems), bugs, or ideas at:

            http://github.com/rburchell/fastdup/issues

FastDup *may* exhibit high memory usage on data sets with large numbers of
potentially identical files.

-- TECHNICAL NOTES --

The method of comparing files falls in two steps; scanning and comparison.
Scanning is a simple(ish) walk of directory trees searching out files and indexing
them by their filesize. Multiple files with the same size are selected for detailed
comparison. Special efforts are taken to properly handle symbolic links and other
strange situations.

Comparison is, at the core, simple byte-by-byte comparison of files in blocks, but
there are many tricks and optimizations to reduce this from O(n^n) comparisons.
The exact and numerous methods used are too extensive to list here (see compare.cpp),
and are always open to improvement. This method is used in favor of checksums or
hashes to make non-matching files less expensive; these are usually eliminated
almost immediately. The most time consuming comparisons are those on files that are
identical, because the entire files must be compared.

There are many planned changes to the method for reading from files and general
memory usage.

You can’t perform that action at this time.