About ImageNet-21K train and eval label #237

GuoxiaWang · 2022-09-20T05:52:48Z

Hi, you mentioned in issue 28 that the ImageNet-21K data is cleaned, and I did not find any relevant cleaning information on the Internet. Can you provide a cleaned train.txt and val.txt with corresponding image paths and labels?

Hi, it was a typo, ImageNet-21k dataset we use has approximately 12.4M. Note, it is slightly lower than 14M, because ImageNet-21k has duplicate images, which we merge together.

Label files can be organized into the following structure, each line containing <image path, label> pairs:

images/n15093298/n15093298_5194.JPEG 21839

If you have your own format, it can also be provided according to your format.

By the way, is each image single-label or multi-label when ImageNet-21K is pretrained?

thx

The text was updated successfully, but these errors were encountered:

GuoxiaWang · 2022-09-27T06:02:57Z

@andsteing Can you help me?

andsteing · 2022-09-27T11:45:11Z

Images are single-label.

The dataset that we used for the pre-training is not currently in public TFDS, but maybe @lucasb-eyer can share the deduplication IDs that you requested.

GuoxiaWang · 2022-09-27T12:01:38Z

@andsteing thank you for your reply.

@lucasb-eyer Can you share a duplicate ID with me? I don't need the original image files, I just need train.txt and val.txt.

lucasb-eyer · 2022-09-27T14:56:13Z

Hi, it's not "cleaned" but basically the (exact) same image may appear under multiple folders (labels), and thus we only count it as one image with N labels, that's why the mentioned image count is smaller, it means "unique image count".

I don't have this laying around in a file, and even if I had, I cannot simply share a file publicly like this without approval, which I don't have the bandwidth to do right now, sorry. But this should be really simple for you to get from the files/folder structure in the tar file. Basically throw everything into a defaultdict(list).

GuoxiaWang · 2022-09-27T15:12:24Z

and thus we only count it as one image with N labels

@lucasb-eyer So how do I select a label for the same image distributed in multiple folders (label)？

GuoxiaWang · 2022-09-28T01:35:28Z

I mean, how do I select a label from multiple labels for the same image? Is it random or is there any way to choose?

GuoxiaWang · 2022-10-18T11:22:32Z

@andsteing @lucasb-eyer @akolesnikoff Can you help me? I want to reproduce ViT-L pre-training model accuracy on ImageNet21K. see mentioned in #62 (comment)

GuoxiaWang · 2022-11-21T05:46:25Z

I have reproduced the accuracy of ViT-L/16 pre-trained on ImageNet 21K and finetuned on ImageNet1K with the above suggestions and guidance. ImageNet21K data preparation for pre-training ViT-L

Thank you @lucasb-eyer

GuoxiaWang closed this as completed Nov 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About ImageNet-21K train and eval label #237

About ImageNet-21K train and eval label #237

GuoxiaWang commented Sep 20, 2022

GuoxiaWang commented Sep 27, 2022

andsteing commented Sep 27, 2022

GuoxiaWang commented Sep 27, 2022

lucasb-eyer commented Sep 27, 2022

GuoxiaWang commented Sep 27, 2022

GuoxiaWang commented Sep 28, 2022

GuoxiaWang commented Oct 18, 2022

GuoxiaWang commented Nov 21, 2022

About ImageNet-21K train and eval label #237

About ImageNet-21K train and eval label #237

Comments

GuoxiaWang commented Sep 20, 2022

GuoxiaWang commented Sep 27, 2022

andsteing commented Sep 27, 2022

GuoxiaWang commented Sep 27, 2022

lucasb-eyer commented Sep 27, 2022

GuoxiaWang commented Sep 27, 2022

GuoxiaWang commented Sep 28, 2022

GuoxiaWang commented Oct 18, 2022

GuoxiaWang commented Nov 21, 2022