Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

detect duplicate directories and files

branch: master

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 LICENSE
Octocat-spinner-32 Main.hs
Octocat-spinner-32 README.md
Octocat-spinner-32 Setup.hs
Octocat-spinner-32 dupdetect.cabal
README.md

dupdetect

Detect duplicate directories and files.

When told to only compare file size, is fast (eg. 1.3sec to compare a 100G directory with 13k files and 2k directories).

usage

dupdetect: help
  -d DIR  --directory=DIR  top directory
  -v      --verbose        be verbose
  -f      --files          also display duplicate files
  -s      --size-only      compare files by size only
  -h      --help           display help

todo

lazy hashing

Automatically compare on size and only compute the hash of a file when we need to, exploiting laziness.

fuzzy matching

Detect when directories "mostly" match. For example, one directory has a few extra files or has the same number of files but the files are slightly different sizes.

speed

Hashing is very, very slow.

Play with compiler options

output

Make it prettier, configurable, more machine-parseable

big directories

Stack space might be exhausted when operating on large dirs. Play with -K runtime options.

author

Mike Erickson mike.erickson@gmail.com

Something went wrong with that request. Please try again.