Skip to content

A duplicate file finder. Like pydupes, but faster and less featured.

Notifications You must be signed in to change notification settings

erikreed/rdupes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rdupes

rdupes is a fast duplicate file finder.

A port of pydupes.

Usage/install

# Collect counts and stage the duplicate files, null-delimited source-target pairs:
cargo run -- /path1 /path2 --output-path dupes.txt

# Sanity check a hardlinking of all matches:
xargs -0 -n2 echo ln --force --verbose < dupes.txt

Help

 cargo run -- --help
A rust port of pydupes. Super fast.

Usage: ./rdupes [OPTIONS] [PATHS]...

Arguments:
  [PATHS]...  Paths to traverse

Options:
  -o, --output-path <OUTPUT_PATH>
          Save null delimited source-dupe pairs to the output path. Use '-' for stdout
  -s, --checkpoint-save-path <CHECKPOINT_SAVE_PATH>
          Save a traversal checkpoint at the path if provided. Use '-' for stdout
  -l, --checkpoint-load-path <CHECKPOINT_LOAD_PATH>
          Load a traversal checkpoint at the path if provided
  -m, --min-file-size <MIN_FILE_SIZE>
          Filter for files of at least the minimum size [default: 4096]
  -r, --read-concurrency <READ_CONCURRENCY>
          I/O concurrency to use for reads. For SSDs, a higher value like 128 is reasonable, while HDDs should be very low, probably 1 if files are very large on average (multi-GB) [default: 4]
  -d, --disable-mmap
          Disable memory-mapped I/O for full file hashing. This helps memory benchmarking but may reduce hash speed for files already in the OS cache
  -v, --verbose
          Enable verbose/debug logging
  -h, --help
          Print help
  -V, --version
          Print version

About

A duplicate file finder. Like pydupes, but faster and less featured.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages