Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

have you tried different CLIP models? #53

Open
dhansmair opened this issue Jul 24, 2022 · 4 comments
Open

have you tried different CLIP models? #53

dhansmair opened this issue Jul 24, 2022 · 4 comments

Comments

@dhansmair
Copy link

Hi @rmokady,
Thank you for your nice work, I learned a lot from it. Since the default CLIP model you are using seems to be the ViT-B32 version, I am wondering if you have tried other visual features e.g. from ViT-L or the resnet models? I can't find it mentioned in the paper. I'm trying to train a similar model at the moment and assume the features extracted from bigger vision encoders would contain more information.

Best, David

@eeyrw
Copy link

eeyrw commented Sep 27, 2022

Have you tried it? I just have same question.

@ret7020
Copy link

ret7020 commented Feb 24, 2023

I tried ViT-L/14. You have to just change it in inference code and feature extractor code.
For example parse_coco.py:

parser.add_argument('--clip_model_type', default="ViT-L/14", choices=('RN50', 'RN101', 'RN50x4', 'ViT-B/32', 'ViT-L/14'))

Just add argument choice.
But in train code we need to change prefix_dim. It is 768 for ViT-L/14

@eliphatfs
Copy link

I tried ViT-L/14. You have to just change it in inference code and feature extractor code. For example parse_coco.py:

parser.add_argument('--clip_model_type', default="ViT-L/14", choices=('RN50', 'RN101', 'RN50x4', 'ViT-B/32', 'ViT-L/14'))

Just add argument choice. But in train code we need to change prefix_dim. It is 768 for ViT-L/14

Hi there, would you mind sharing your ViT/L-14 model checkpoints?
Thanks.

@alexisthual
Copy link

I tried ViT-L/14. You have to just change it in inference code and feature extractor code. For example parse_coco.py:

parser.add_argument('--clip_model_type', default="ViT-L/14", choices=('RN50', 'RN101', 'RN50x4', 'ViT-B/32', 'ViT-L/14'))

Just add argument choice. But in train code we need to change prefix_dim. It is 768 for ViT-L/14

Hi there, would you mind sharing your ViT/L-14 model checkpoints? Thanks.

I would be very interested in having access to the checkpoints too! 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants