Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
FastDup is a tool to find copies of the same file within directory tree(s)
Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Type||Name||Latest commit message||Commit time|
|Failed to load latest commit information.|
FastDup - http://github.com/rburchell/fastdup The current maintainer is: Robin Burchell <firstname.lastname@example.org> However, most(/all) of the blood, sweat, and tears in FastDup - and all of the credit belongs to: John Brooks <email@example.com> -- INTRODUCTION -- FastDup is a tool to find copies of the same file within directory tree(s), designed for maximum speed and efficiency unlike most similar tools. Where many similar tools rely on checksums or hashes of files, or simple comparisons, fastdup uses a number of cleverly optimized tricks to reduce the number of actual comparisons necessary, and as a result can scan large sets of data extremely quickly compared to alternatives. -- INSTALLATION -- make make install -- NOTES -- FastDup will currently only work on OS X and modern Linux platforms with GNU make. Please report build problems (on non-BSD systems), bugs, or ideas at: http://github.com/rburchell/fastdup/issues FastDup *may* exhibit high memory usage on data sets with large numbers of potentially identical files. -- TECHNICAL NOTES -- The method of comparing files falls in two steps; scanning and comparison. Scanning is a simple(ish) walk of directory trees searching out files and indexing them by their filesize. Multiple files with the same size are selected for detailed comparison. Special efforts are taken to properly handle symbolic links and other strange situations. Comparison is, at the core, simple byte-by-byte comparison of files in blocks, but there are many tricks and optimizations to reduce this from O(n^n) comparisons. The exact and numerous methods used are too extensive to list here (see compare.cpp), and are always open to improvement. This method is used in favor of checksums or hashes to make non-matching files less expensive; these are usually eliminated almost immediately. The most time consuming comparisons are those on files that are identical, because the entire files must be compared. There are many planned changes to the method for reading from files and general memory usage.