Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Images from reference folders being compared to each other #686

Closed
gotr3k opened this issue Apr 20, 2022 · 3 comments · Fixed by #826
Closed

Images from reference folders being compared to each other #686

gotr3k opened this issue Apr 20, 2022 · 3 comments · Fixed by #826
Labels
enhancement New feature or request PR welcome The given topic has already been analyzed and you can safely create a PR implementing this functiona

Comments

@gotr3k
Copy link

gotr3k commented Apr 20, 2022

Might be getting something wrong here but why are images from reference folders being compared to each other? As I understand it when selecting reference folder, all other non reference folder images are compared with one another and with the ones from reference folders. But the case is that even images in the reference folder are being compared with each other for no reason. If the point is to delete files from specific destination that can be achieved by custom select, thus making reference option unnecessary and just slows things down. In my specific case there is main dataset with ~500k images and if want to check folder with 100 images for possible duplicates before adding to the main dataset, the program compares images within reference folder making the whole process way longer than is actually needed. Again, might be wrong about this and just picked wrong options, but if not it really should be fixed.

@qarmin
Copy link
Owner

qarmin commented Apr 22, 2022

This is not a bug, but sub-optimal algorithm.

Due to simplicity of adding at the end of functions ~40 lines of code I used this solution.

For duplicate mode I don't think that this is needed to modify algorithm since hashes are cached and comparing them is quite cheap.
In similar image tool, comparing images with bigger amount of files may be slow, so in this situation early check for reference folders could be implemented

@timohuovinen
Copy link

Do you mean that each added folder is being treated as one? I think this is the way I prefer it to work.

@gotr3k
Copy link
Author

gotr3k commented Apr 28, 2022

Just my opinion but e.g. if there are 3 folders added [a b c] and [a] is checked as reference then files within folder [a] should only be compared with the ones in folders [b] and [c] and not with each other. Files in [b] and [c] are all compared individually (each file is compared to every other in [a] [b] and [c]). Same system that's implemented in https://github.com/arsenetar/dupeguru/. Many use cases for that, one of which I already mentioned; if I have 10 images and want to see if there are any similar images in a folder containing 10000, program should just compare these 10 images with the 10000, and not additionally compare every image in the 10000 set with each other. Not only is it unnecessary but also greatly increases processing time. Reference folder feature as it is implemented now is not needed really because you can already select custom path with custom select.
All of the above is in regards to similar image option; similar files where hashes are compared have the same problem, but the processing time is way smaller so it's not that bad.

@qarmin qarmin added the PR welcome The given topic has already been analyzed and you can safely create a PR implementing this functiona label May 4, 2022
@qarmin qarmin added the enhancement New feature or request label May 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request PR welcome The given topic has already been analyzed and you can safely create a PR implementing this functiona
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants