Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train- / Validation-Split of Imagenette #47

Closed
weberdavid opened this issue Apr 7, 2021 · 7 comments
Closed

Train- / Validation-Split of Imagenette #47

weberdavid opened this issue Apr 7, 2021 · 7 comments

Comments

@weberdavid
Copy link

I am using Imagenette for fine-tuning of an Imagenet pre-trained VGG-16 from the PyTorch model zoo.
Is the validation-set of Imagenette build from the validation- / test-set of Imagenet? Or are there some Imagenet training examples in the Imagenette validation-set?

Because after fine-tuning the pre-trained VGG for 1 epoch (on Imagenette), I reach a Top-1 Accuracy of 98.4% on the validation-set. Am I dealing with some data leakage here?

@radekosmulski
Copy link
Contributor

Yes 🙂 Imagenette was created from Imagenet, to provide a challenging and interesting research task.

@weberdavid
Copy link
Author

Yes, I am aware that Imagenette is build from Imagenet.
But might it be the case that Imagenet training-data is in the Imagenette validation-set?

As described at the top - the Top-1 Accuracy seems rather high, why I thought there might be data leakage happening.

@radekosmulski
Copy link
Contributor

It's just 10 easily discernable classes, that might also be a factor here. Imagenet has 1000 classes, so the top-1 is hard to compare across.

BTW if you would like to try something fun and experiment, maybe train with the CNN part of the model frozen and only fine-tune the new classification head 🙂

My guess is that you might get an even better result 😉

@weberdavid
Copy link
Author

Alright so from your answer I assume there is no data leakage that leverages my accuracy 🙂

Thanks for the tip - that is exactly what I did, I only trained the last classification layer 😉 So yes, great results 😁

@radekosmulski
Copy link
Contributor

radekosmulski commented Apr 7, 2021

I am honestly not sure if there is an overlap between val of Imagenette and train of Imagenet - I am thinking there might be 🙂

But in the larger scheme of things I think this leakage would be clouded by the fact that we are comparing accuracies of top-1 on 10 classes vs top-1 on 1000 classes, that this is an even more powerful effect.

Either way - what you did sounds like a fun experiment! 🙂 I did something similar some time ago and wrote about it here, not sure though how applicable it is to the current situation.

@weberdavid
Copy link
Author

Interesting article!
Yes, quite fun - I will be using this finetuned model for pruning and further connect that with explainability, for my master‘s thesis.

@stsavian
Copy link

@radekosmulski thanks for the great work!

I am writing in this conversation because I've also noticed something strange on how the dataset is organized.
There are validation images in the training data! Is this your desired behavior? What is the reasoning behind it?

Can I be sure there is no overlapping between training and testing data?

For proof you can look at imagenette2-320\train\n03417042\ILSVRC2012_val_00036233.JPEG

thanks,
Stefano

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants