New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Train- / Validation-Split of Imagenette #47
Comments
Yes 🙂 Imagenette was created from Imagenet, to provide a challenging and interesting research task. |
Yes, I am aware that Imagenette is build from Imagenet. As described at the top - the Top-1 Accuracy seems rather high, why I thought there might be data leakage happening. |
It's just 10 easily discernable classes, that might also be a factor here. Imagenet has 1000 classes, so the top-1 is hard to compare across. BTW if you would like to try something fun and experiment, maybe train with the CNN part of the model frozen and only fine-tune the new classification head 🙂 My guess is that you might get an even better result 😉 |
Alright so from your answer I assume there is no data leakage that leverages my accuracy 🙂 Thanks for the tip - that is exactly what I did, I only trained the last classification layer 😉 So yes, great results 😁 |
I am honestly not sure if there is an overlap between val of Imagenette and train of Imagenet - I am thinking there might be 🙂 But in the larger scheme of things I think this leakage would be clouded by the fact that we are comparing accuracies of top-1 on 10 classes vs top-1 on 1000 classes, that this is an even more powerful effect. Either way - what you did sounds like a fun experiment! 🙂 I did something similar some time ago and wrote about it here, not sure though how applicable it is to the current situation. |
Interesting article! |
@radekosmulski thanks for the great work! I am writing in this conversation because I've also noticed something strange on how the dataset is organized. Can I be sure there is no overlapping between training and testing data? For proof you can look at imagenette2-320\train\n03417042\ILSVRC2012_val_00036233.JPEG thanks, |
I am using Imagenette for fine-tuning of an Imagenet pre-trained VGG-16 from the PyTorch model zoo.
Is the validation-set of Imagenette build from the validation- / test-set of Imagenet? Or are there some Imagenet training examples in the Imagenette validation-set?
Because after fine-tuning the pre-trained VGG for 1 epoch (on Imagenette), I reach a Top-1 Accuracy of 98.4% on the validation-set. Am I dealing with some data leakage here?
The text was updated successfully, but these errors were encountered: