Train- / Validation-Split of Imagenette #47

weberdavid · 2021-04-07T12:16:46Z

I am using Imagenette for fine-tuning of an Imagenet pre-trained VGG-16 from the PyTorch model zoo.
Is the validation-set of Imagenette build from the validation- / test-set of Imagenet? Or are there some Imagenet training examples in the Imagenette validation-set?

Because after fine-tuning the pre-trained VGG for 1 epoch (on Imagenette), I reach a Top-1 Accuracy of 98.4% on the validation-set. Am I dealing with some data leakage here?

radekosmulski · 2021-04-07T12:20:13Z

Yes 🙂 Imagenette was created from Imagenet, to provide a challenging and interesting research task.

weberdavid · 2021-04-07T12:29:24Z

Yes, I am aware that Imagenette is build from Imagenet.
But might it be the case that Imagenet training-data is in the Imagenette validation-set?

As described at the top - the Top-1 Accuracy seems rather high, why I thought there might be data leakage happening.

radekosmulski · 2021-04-07T12:58:06Z

It's just 10 easily discernable classes, that might also be a factor here. Imagenet has 1000 classes, so the top-1 is hard to compare across.

BTW if you would like to try something fun and experiment, maybe train with the CNN part of the model frozen and only fine-tune the new classification head 🙂

My guess is that you might get an even better result 😉

weberdavid · 2021-04-07T13:03:17Z

Alright so from your answer I assume there is no data leakage that leverages my accuracy 🙂

Thanks for the tip - that is exactly what I did, I only trained the last classification layer 😉 So yes, great results 😁

radekosmulski · 2021-04-07T13:09:54Z

I am honestly not sure if there is an overlap between val of Imagenette and train of Imagenet - I am thinking there might be 🙂

But in the larger scheme of things I think this leakage would be clouded by the fact that we are comparing accuracies of top-1 on 10 classes vs top-1 on 1000 classes, that this is an even more powerful effect.

Either way - what you did sounds like a fun experiment! 🙂 I did something similar some time ago and wrote about it here, not sure though how applicable it is to the current situation.

weberdavid · 2021-04-07T13:17:53Z

Interesting article!
Yes, quite fun - I will be using this finetuned model for pruning and further connect that with explainability, for my master‘s thesis.

stsavian · 2023-11-29T13:54:11Z

@radekosmulski thanks for the great work!

I am writing in this conversation because I've also noticed something strange on how the dataset is organized.
There are validation images in the training data! Is this your desired behavior? What is the reasoning behind it?

Can I be sure there is no overlapping between training and testing data?

For proof you can look at imagenette2-320\train\n03417042\ILSVRC2012_val_00036233.JPEG

thanks,
Stefano

weberdavid closed this as completed Apr 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train- / Validation-Split of Imagenette #47

Train- / Validation-Split of Imagenette #47

weberdavid commented Apr 7, 2021

radekosmulski commented Apr 7, 2021

weberdavid commented Apr 7, 2021

radekosmulski commented Apr 7, 2021

weberdavid commented Apr 7, 2021

radekosmulski commented Apr 7, 2021 •

edited

weberdavid commented Apr 7, 2021

stsavian commented Nov 29, 2023

Train- / Validation-Split of Imagenette #47

Train- / Validation-Split of Imagenette #47

Comments

weberdavid commented Apr 7, 2021

radekosmulski commented Apr 7, 2021

weberdavid commented Apr 7, 2021

radekosmulski commented Apr 7, 2021

weberdavid commented Apr 7, 2021

radekosmulski commented Apr 7, 2021 • edited

weberdavid commented Apr 7, 2021

stsavian commented Nov 29, 2023

radekosmulski commented Apr 7, 2021 •

edited