Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change image hash compare algorithm and add multithreading #762

Merged
merged 3 commits into from Jul 2, 2022

Conversation

qarmin
Copy link
Owner

@qarmin qarmin commented Jun 26, 2022

Closes #679
Closes #761
Helps #512

Algorithm performance for all similarity options should be very similar to fast compare option which become unnecessary and was removed.

Results should be a little worse than with previous algorithm, but scan speed boost especially with bigger similarities, should be clearly visible

Due using multi-threading, checking should be a lot of faster with bigger amount of files(0 similarity is special situation in which comparing images should happen almost instant), since most of operations are calculated on memory/CPU(without any disk operations)

Due removing Similarity enum, image cache file is not longer backward compatible, but still json config can be exported, changed manually and later imported to app(read docs for more info)

The only problem that I see is that I can't really test performance, because with ~8K images which I'm able to scan

With my testing on 20K images, new algorithm(4/8 processor) is a lot of faster than normal scan, but slightly less performant than fast compare

--- Similarity 20

New Normal 9
Old Fast Compare 6
Old Normal 106

--- Similarity 2

New Normal 2
Old Fast Compare 2
Old Normal 2

@qarmin qarmin added the enhancement New feature or request label Jun 26, 2022
@qarmin qarmin marked this pull request as draft June 26, 2022 19:28
@chchia
Copy link

chchia commented Jun 27, 2022

Due removing Similarity enum, image cache file is not longer backward compatible

that means i will have to rehash a total 1.8millions images again, that will be too bad, could you remain the option to use the old method? it will be too much for me to rehash 1.8 millions files as i store my file in online storage....

@qarmin
Copy link
Owner Author

qarmin commented Jun 28, 2022

From the app perspective, using integer instead enum is only cosmetic change(enum with 1 field doesn't have much sense), but maybe this is not worth breaking changes

@chchia
Copy link

chchia commented Jun 30, 2022

i was thinking maybe you can consider within folder comparison.

example the user may have hundreds of folders, but the picture in each folder was actually categorize, example i might shooting the same scene for 30 photos with very minor different, and all those folder is put under a single folder.

so if there is a option let say "Compare images within the same folder only", with this i believe we can get a significant merit of:

  1. shorter comparison time as we dont have to compare 1 file to all folder, but instead only image within the same folder.

  2. with this, i can set the Similarity to even like 20, those same scene picture are more likely to be categorize as same. otherwise if comparing all photo it is more likely to see mistake.
    image

@qarmin qarmin marked this pull request as ready for review July 2, 2022 19:03
@qarmin qarmin merged commit d1c66fd into master Jul 2, 2022
@qarmin qarmin deleted the change_hash_compare_algo branch July 2, 2022 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

is Fast Compare no longer working? Low CPU utilization during image hash comparison
2 participants