Questions on the training of v2l. #2

WenLinLliu · 2023-12-11T13:39:05Z

Very nice work! One question is that, according to the paper, ClipCAP is pre-trained on COCO-captions using frozen RegionCLIP. However, there may exist domain gaps between COCO images and the datasets used in the paper, especially those stylized images. Does the gaps effect the pre-trained ClipCAP? Besides, it would be best to provide the complete configuration and the related code on the the training of ClipCAP with RegionCLIP.

sinamalakouti · 2023-12-11T15:26:20Z

@WenLinLliu
Thank you so much!

There may exist domain gaps between COCO images and the datasets used in the paper, especially those stylized images.

This is true. There is a domain gap between the COCO and the stylized images. So, as shown in the paper, the model doesn't produce meaningful captions on the stylized images in the beginning. Our goal is to resolve this by making the vision encoder robust. So, further training the RegionCLIP using the proposed approach to produce robust embedding for the image and its stylized version such that an arbitrary image-captioning model (in our case CLIPCAP) can produce meaningful captions.

It would be best to provide the complete configuration and the related code on the training of ClipCAP with RegionCLIP

Thanks for your comment. I have provided the parameters of the v2l module using the CLIPCAP code (https://drive.google.com/drive/folders/1agMffWa69paFGYs3s7ADT8VxStCWSTEt).

I also just included the instructions for running the CLIPCap code.

Please feel free to let me know if you have further questions!

WenLinLliu · 2023-12-12T12:01:34Z

great thanks for your reply！

sinamalakouti closed this as completed Dec 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions on the training of v2l. #2

Questions on the training of v2l. #2

WenLinLliu commented Dec 11, 2023

sinamalakouti commented Dec 11, 2023

WenLinLliu commented Dec 12, 2023

Questions on the training of v2l. #2

Questions on the training of v2l. #2

Comments

WenLinLliu commented Dec 11, 2023

sinamalakouti commented Dec 11, 2023

WenLinLliu commented Dec 12, 2023