prefix size for Vit/ResNet #45

scfrank · 2022-06-27T13:05:57Z

Hi - I'm having trouble with prefix_size mismatch on my own finetuned model with CLIP features using ViT B-32. I'm learning just the transforming mapping and no gpt-2 finetuning, using the commands given in the readme.

My understanding is that CLIP with ViT B-32 uses prefix_size = 512 while ResNet encoders use prefix_size = 640, e.g. Radford et al Visual Transformers Appendix F, and also here in train.py:

CLIP_prefix_caption/train.py

Line 355 in 1ad805a

prefix_dim = 640 if args.is_rn else 512

However, in the prediction script, which I've essentially copied from the transformer notebook (https://github.com/rmokady/CLIP_prefix_caption/blob/main/notebooks/transformer_inference.ipynb)
the prefix size is set to 640.

model = ClipCaptionPrefix(prefix_length, clip_length=40, prefix_size=640,
                                  num_layers=8, mapping_type='transformer')

This worked for me for your pretrained coco model, but not my finetuned model, where I get a dimensionality mismatch between 512/640. Can you help me out here? Should I be using prefix_size = 640 or 512 for training/inference?

Thank you!
Stella

Ps: FYI there are a few typos in the commands in the Readme.md, where num_layres should be num_layers.

The text was updated successfully, but these errors were encountered:

scfrank · 2022-06-28T11:59:51Z

I solved the problem: I'd missed the CLIP model loading at inference.

In case someone else runs into this problem:
The transformer_inference notebook loads a ResNet 50:

clip_model, preprocess = clip.load("RN50x4", device=device, jit=False)

Changing this to ViT, and changing the prefix size from 640 to 512 (ClipCaptionPrefix(prefix_size=512)) makes everything work correctly.

clip_model, preprocess = clip.load("ViT-B/32", device=device, jit=False)

scfrank closed this as completed Jun 28, 2022

robertodessi mentioned this issue Jul 15, 2022

For the transformer (without fine-tuning GPT-2) we provide COCO pretrained model. #46

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prefix size for Vit/ResNet #45

prefix size for Vit/ResNet #45

scfrank commented Jun 27, 2022

scfrank commented Jun 28, 2022

prefix size for Vit/ResNet #45

prefix size for Vit/ResNet #45

Comments

scfrank commented Jun 27, 2022

scfrank commented Jun 28, 2022