Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

de-dupe images from multiple runs of a classifier on same searchtag. #4

Open
mariochampion opened this issue Dec 21, 2017 · 0 comments

Comments

@mariochampion
Copy link
Owner

because you can run different classifiers on the same unsorted_{searchtag} images to see how they perform, you can get duplicate images in your sorted_{timestamp} folders, when you run the a classifier multiple times on the SAME searchtag images at different times.

for example: download 100 images tagged robotart and classify them. then download 100 more and run classify them again. it will look to the same root unsorted_robotart dir, and classify all 200 images (into a different sorted_{timestamp} dir as the time will have changed) BUT if you then just run a retrain with 'harvest' enabled, it ll take ALL the high-confidence images from the as yet un-harvested sorted_* dirs, and you get dupes in your training_photos dir.

potential solution: when running a classifier, look through any unharvested basetag/sorted_{timestamp} for the same image BEGINNING, since image names get a score appended, the exact image name wont likely exist. (ie, robotart_2_1234.jpg becomes, under one classifier, robotart_2_1234_875.jpg for a 87.5% score from a classifier and robotart_2_1234_825.jpg from another. so there is certainly a way to check the filename start for dupes, just havent done it.

current (temp) workaround: manually delete dupes by looking at filenames for a MAX value of a previous classifier run, and delete the overlap BEFORE a harvest run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant