-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
have you tried different CLIP models? #53
Comments
Have you tried it? I just have same question. |
I tried ViT-L/14. You have to just change it in inference code and feature extractor code.
Just add argument choice. |
Hi there, would you mind sharing your ViT/L-14 model checkpoints? |
I would be very interested in having access to the checkpoints too! 😊 |
Hi @rmokady,
Thank you for your nice work, I learned a lot from it. Since the default CLIP model you are using seems to be the ViT-B32 version, I am wondering if you have tried other visual features e.g. from ViT-L or the resnet models? I can't find it mentioned in the paper. I'm trying to train a similar model at the moment and assume the features extracted from bigger vision encoders would contain more information.
Best, David
The text was updated successfully, but these errors were encountered: