-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About ImageNet-21K train and eval label #237
Comments
@andsteing Can you help me? |
Images are single-label. The dataset that we used for the pre-training is not currently in public TFDS, but maybe @lucasb-eyer can share the deduplication IDs that you requested. |
@andsteing thank you for your reply. @lucasb-eyer Can you share a duplicate ID with me? I don't need the original image files, I just need train.txt and val.txt. |
Hi, it's not "cleaned" but basically the (exact) same image may appear under multiple folders (labels), and thus we only count it as one image with N labels, that's why the mentioned image count is smaller, it means "unique image count". I don't have this laying around in a file, and even if I had, I cannot simply share a file publicly like this without approval, which I don't have the bandwidth to do right now, sorry. But this should be really simple for you to get from the files/folder structure in the tar file. Basically throw everything into a defaultdict(list). |
@lucasb-eyer So how do I select a label for the same image distributed in multiple folders (label)? |
I mean, how do I select a label from multiple labels for the same image? Is it random or is there any way to choose? |
@andsteing @lucasb-eyer @akolesnikoff Can you help me? I want to reproduce ViT-L pre-training model accuracy on ImageNet21K. see mentioned in #62 (comment) |
I have reproduced the accuracy of ViT-L/16 pre-trained on ImageNet 21K and finetuned on ImageNet1K with the above suggestions and guidance. ImageNet21K data preparation for pre-training ViT-L Thank you @lucasb-eyer |
Hi, you mentioned in issue 28 that the ImageNet-21K data is cleaned, and I did not find any relevant cleaning information on the Internet. Can you provide a cleaned train.txt and val.txt with corresponding image paths and labels?
Label files can be organized into the following structure, each line containing <image path, label> pairs:
If you have your own format, it can also be provided according to your format.
By the way, is each image single-label or multi-label when ImageNet-21K is pretrained?
thx
The text was updated successfully, but these errors were encountered: