Skip to content
appleappleapplenanner edited this page Jul 6, 2023 · 16 revisions

I have a big library. How do I test this on just a few files?

You can use system predicates and queries to limit your search.

For instance, if you want to run it on 1000 files that are all under 50 MB that are in the archive containing Jacob and only imported an hour ago:

--query="system:filesize > 10MB" --query="system:limit 1000" --query="system:archive" --query="character:jacob" --query="system:import time < 1 hour"

Each query will reduce the number of files you process. By default all videos/animated are processed.

I highly recommend running the program in batches. You won't lose progress if you stop the program, but consider how much you really want to sort through in duplicates processing. I also HIGHLY recommend not hashing long videos if most of your library is small for performance reasons.


I want to search my duplicates without hashing new video files

You can either use --skip-hashing or use a query limiting when files were imported

  • --query="system:import time > 1 day"

Also, you can stop the program at any time with CTRL+C


What kind of files does it support?

All video and animated files (gifs, apng) are supported.

If you find a video that fails to perceptually hash that is not corrupted create an issue on GitHub.

If a bad file crashes the whole program also create an issue. Skipping files is fine, but crashing is not.


I changed my threshold but it didn't find any new duplicates

This is correct and should only affect searching for files BEFORE you set a new threshold.

Next run use --clear-search-cache and then it should compare all videos to each other from the start.


Why does everything slow down the longer it runs?

The files are retrieved from Hydrus in increasing file size order. Naturally, this would also affect searching because the database is also ordered.

If this is an issue for you and think this should be changed, please create an issue and explain why.

I set the threshold too low and now I have too many potential duplicates

While the perceptual hasher should have very few false-positives, you can reset your potential duplicates in Hydrus in duplicates processing:

Then, you also have to reset your search cache with --clear-search-cache

Clone this wiki locally