FAQ
You can use system predicates and queries to limit your search.
For instance, if you want to run it on 1000 files that are all under 50 MB that are in the archive containing Jacob and only imported an hour ago:
--query="system:filesize > 10MB" --query="system:limit 1000" --query="system:archive" --query="character:jacob" --query="system:import time < 1 hour"
Each query will reduce the number of files you process. By default all videos/animated are processed.
I highly recommend running the program in batches. You won't lose progress if you stop the program, but consider how much you really want to sort through in duplicates processing. I also HIGHLY recommend not hashing long videos if most of your library is small for performance reasons.
You can either use --skip-hashing
or use a query limiting when files were imported
--query="system:import time > 1 day"
Also, you can stop the program at any time with CTRL+C
All video and animated files (gifs, apng) are supported.
If you find a video that fails to perceptually hash that is not corrupted create an issue on GitHub.
If a bad file crashes the whole program also create an issue. Skipping files is fine, but crashing is not.
This is correct and should only affect searching for files BEFORE you set a new threshold.
Next run use --clear-search-cache
and then it should compare all videos to each other from the start.
The files are retrieved from Hydrus in increasing file size order. Naturally, this would also affect searching because the database is also ordered.
If this is an issue for you and think this should be changed, please create an issue and explain why.
While the perceptual hasher should have very few false-positives, you can reset your potential duplicates in Hydrus in duplicates processing:
Then, you also have to reset your search cache with --clear-search-cache