-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About the ImageNet zero-shot performance with the released models #24
Comments
We're planning to release a code example for replicating the paper's zero-shot ImageNet results that can be used as a reference for helping with this, apologies for not including sufficient details in the paper! |
Hi, Alec, thanks for sharing this information. looking forward to seeing the updates! |
Just updated a code example here. You can also open in Colab. Hope this helps! |
Hi, @jongwook , this is fantastic! I will follow your Colab to reproduce your numbers! |
|
Hey just had a quick clarifying question, the results of the notebook you released achieve a zero-shot accuracy of 55.73% but Table 17 of the paper (last page) mentions that ViT-B/32 can achieve a 63.2% accuracy. Do you know how I can recreate this performance? |
Sorry it's confusing, the notebook evaluates on ImageNetV2 since it has a public evaluation set. When evaluated on ImageNet it should be within noise (+- few tenths of a percent) of results reported in Table 17. In the notebook this is explained as:
|
Update: had changed imagenetv2 to imagenet validation set, and verified that the performance matches the reported numbers in Table 17. Thanks! |
Hello, I followed this code https://github.com/openai/CLIP/blob/main/notebooks/Prompt_Engineering_for_ImageNet.ipynb to use the templates ensemble for zero-shot prediction on imagenet, the reproduced result is 57.81%, which is far from the 76.2% accuracy reported in Table 1 in the paper. Am I missing something? Updates on 19 Jul: |
Hi, CLIP authors,
Really great work! Appreciate much for releasing the code!
Recently, I am trying to evaluate the released two models (RN50 and ViT-B/32.) on imagenet validation set. What I can get with prompt engineering without ensemble are shown below:
ResNet-50 top-1: 55.09, top-5: 83.59
ViT-B/32 top-1: 59.06, top-5: 85.59
Not sure whether these numbers match those on your side. As a reference for us to do trial-and-errors, can you report the validation accuracies for these two models?
thanks,
Jianwei
The text was updated successfully, but these errors were encountered: