Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: chunking of source folder #22

Closed
ALCarter2 opened this issue Jun 21, 2022 · 1 comment
Closed

feature request: chunking of source folder #22

ALCarter2 opened this issue Jun 21, 2022 · 1 comment
Assignees
Labels
feature : new New feature for difPy. status : likely Feature will likely be implemented in difPy.

Comments

@ALCarter2
Copy link

Thank you for your library! Just giving a heads up that I edited one of your previous versions by adding an additional parameter that allows the src folder to be split into n chunks for processing. Scenario: I have image folders that contain over 50000 images in sequential time over.

For me, it is most likely that an image file is going to be a duplicate with other image files added around a similar time frame. Comparing against the entire 50000+ for each image took an enormous amount of time. So, I made it so that I could split the folder into chunks of 5000 (for example) and evaluate in sections. It also allowed me to restart from a position if I had to stop evaluation for some reason. There's a little more that I added to make it more robust (for example, for n+1 chunk would also include some amount of files from the previous chunk so that there would be some degree of overlap). Anyway, this worked out well for me and if you are still adding to this library then I found it to be very useful.

The route I took is not going to be as robust as going through EVERY image each time but in my personal tests, the performance was close enough and the time savings were significant! Cheers,

@elisemercury elisemercury added the feature : new New feature for difPy. label Jun 26, 2022
@elisemercury elisemercury self-assigned this Jun 26, 2022
@elisemercury
Copy link
Owner

Dear @ALCarter2,
Thanks a lot for your input and idea! Indeed, I agree that this feature can be very helpful and might signifcantly increase difPy's performance. Feel free to open a pull request with your version and I will be happy to review it.
Again, thanks!
All the best,
Elise

@elisemercury elisemercury added the status : likely Feature will likely be implemented in difPy. label Mar 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature : new New feature for difPy. status : likely Feature will likely be implemented in difPy.
Projects
None yet
Development

No branches or pull requests

2 participants