Python utility which will recursively scan one or more given directories for duplicate files.
Files are considered duplicate based on their identical binary representation:
- Files are scanned and grouped by file size to quickly rule out non-duplicates.
- Grouped files then have SHA-1 hashes calculated - those that match are duplicates.
Files to consider can optionally be filtered based on:
- One or more glob filespecs.
- Minimum file size.
usage: identix.py [-h] [--include [INCLUDE [INCLUDE ...]]]
[--min-size MIN_SIZE] [--progress]
[--report-file REPORT_FILE]
[--report-file-format {text,JSON}]
scandir [scandir ...]
Recursively scan one or more directories for duplicate files.
positional arguments:
scandir source directory/directories for scanning
optional arguments:
-h, --help show this help message and exit
--include [INCLUDE [INCLUDE ...]]
glob filespec(s) to include in scan, if omitted all
files are considered
--min-size MIN_SIZE minimum filesize considered
--progress show progress during file diffing
--report-file REPORT_FILE
send duplicate report to file, rather than console
--report-file-format {text,JSON}
format of duplicate report file
Notes:
-
The
--include
argument evaluates filename only, so expects globs such as*.jpg
orimage*.png
. -
Omitting
--report-file
output file argument will display results directly on the console -
Option
--report-file-format
enables--report-file
asJSON
- format example:[ { "sha-1": "xxxxx", "size": 12345, "fileList": ["/path/to/file","/path/to/another/file"] }, { "sha-1": "yyyyy", "size": 6789, "fileList": ["/path/to/yet/another/file","/one/more/file"] }, ]
Scan for duplicates greater than or equal to 2048
bytes in the directories of /dupe/path/one
and /dupe/path/two
:
$ ./identix.py \
--min-size 2048 \
-- /dupe/path/one /dupe/path/two
Find duplicates that match file globs of *.jpg
and *.png
in /my/images
, write results to /path/to/report.txt
and display processing progress to console:
$ ./identix.py \
--include "*.jpg" "*.png" \
--progress \
--report-file /path/to/report.txt \
-- /my/images