Possible to finetune? #18

afiaka87 · 2021-09-14T17:52:00Z

Is it possible to finetune from the existing Open AI checkpoints rather than train them from scratch with this codebase?

mitchellnw · 2021-09-14T18:22:23Z

Do you mean fine-tune on something like ImageNet or fine-tune on more image-caption pairs?

afiaka87 · 2021-09-15T04:22:30Z

Do you mean fine-tune on something like ImageNet or fine-tune on more image-caption pairs?

The latter. Image-text pairs; not classes.

mitchellnw · 2021-09-15T04:37:08Z

I think you can do this with the flag --openai_pretrained but I do not know how tested this is.. let us know if this works!

mitchellnw · 2021-09-15T04:38:44Z

FWIW the former (fine-tune on ImageNet) will soon be available in the upcoming code release of this paper https://arxiv.org/abs/2109.01903

afiaka87 · 2021-09-21T20:20:13Z

@mitchellnw Exciting stuff; can't wait for the release!.

Indeed I found that --openai_pretrained parameter and had a few issues with it. I believe it was specific to the dataset I'm working with though. I'll be sure to push any fixes I find upstream although I'm a bit distracted with DALLE-pytorch atm.

Do you have experience with finetuning on lower batch sizes? My understanding is CLIP needs rather high batch sizes to work effectively. I'm able to pull a batch size of 108 on my RTX 2070 Super. Is there concern of "forgetting" too much when finetuning on lower batch sizes?

mitchellnw · 2021-09-23T17:56:15Z

Okay great! We haven't seen low batch sizes be a concern, especially less concerned for fine-tuning but do not know for sure..

MyLtYkRiTiK · 2021-10-07T17:34:04Z

@mitchellnw
Hello! Why do you use jit version of model for fine-tuning in --openai_pretrained? Is it bug or feature?

mitchellnw · 2021-10-07T20:04:22Z

We set jit=False when using the pretrained version as I think you cannot fine-tune a jit=True model

mitchellnw · 2021-11-03T16:20:28Z

As an update, you can now fine-tune CLIP on supervised learning tasks via this repo: https://github.com/mlfoundations/wise-ft

iremonur · 2022-02-09T05:08:10Z

Hello, I would like to fine-tune CLIP on my own specific dataset (app. 50k image-text pairs), I used provided ViT-B/32 checkpoints as an initial model but the accuracy starts with %1 and after 32 epochs, it reaches only around %30. (I tried various weight decay and LR combinations, the best of them is weight decay=0.001 and LR=5e-4.) Have you tried to fine-tune CLIP on a small specific dataset, if so how is the performance? @afiaka87

gabrielilharco · 2022-02-09T23:35:42Z

Hi @iremonur, a few questions: 1) when you say accuracy, what does this refer to? Are you doing image-to-text or text-to-image retrieval, using all samples from your dataset? 2) Do you expect performance on your dataset to be high? Would a human get reasonable performance on this task? I'm asking because it could be the case that the dataset has a lot of similar images/captions, which might make it hard to get a good retrieval peformance

iremonur · 2022-02-10T10:26:25Z

I reported the training accuracy on my own specific dataset during the fine-tuning, meaning I calculated the ratio of true matching with respect to ground truth using all samples from my dataset, during the fine-tuning.
Actually, the samples in my dataset are quite similar to each other, I intended to fine-tune the model on this challenging dataset in the first place because the official network parameters do not show a good performance on that.
I agree with you that it is difficult for the model to perform well on such similar samples but at the end of the first epoch, the accuracy reaches only the %1 (extremely low) which is surprising for me since I use the pre-trained model parameters as the initial parameters.

gabrielilharco · 2022-02-10T16:59:43Z

Hi @iremonur, thanks for the clarification. As a sanity check, it could be good to check what is the pre-trained model's accuracy without any fine-tuning. If it's significantly higher than 1%, there's probably something wrong with loading the checkpoint, or some hyperparameters might be destabilizing fine-tuning (e.g. learning rate is too high)

iremonur · 2022-02-11T09:53:05Z

Thank you for your response @gabrielilharco, I'll check that out. Also, when I reduced the batch size (like to 8), the model reaches much higher accuracy (around %50) but I worry that this may cause poor generalization performance since the model learns the similarity of samples in smaller groups. Did you encounter/think does reducing the batch size cause bad generalization performance? This is a kind of a similar question to #18 (comment) asked by @afiaka87.

gabrielilharco · 2022-02-11T17:06:03Z

I'm not sure about generalization, but note that the batch size should affect training accuracy: smaller batch sizes means you have less options to choose from, which means the task of finding the correct match for an image or text is easier. So the raw training accuracy numbers might not be representative of the model's capabilities.

afiaka87 closed this as completed Sep 25, 2021

mitchellnw mentioned this issue Nov 3, 2021

Have you tried to fine-tune the clip model (as official Vit-B-32) in your datasets? #25

Closed

rom1504 pushed a commit that referenced this issue Nov 23, 2022

Update README.md (#18)

29ee63d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible to finetune? #18

Possible to finetune? #18

afiaka87 commented Sep 14, 2021

mitchellnw commented Sep 14, 2021

afiaka87 commented Sep 15, 2021

mitchellnw commented Sep 15, 2021

mitchellnw commented Sep 15, 2021

afiaka87 commented Sep 21, 2021 •

edited

Loading

mitchellnw commented Sep 23, 2021

MyLtYkRiTiK commented Oct 7, 2021

mitchellnw commented Oct 7, 2021

mitchellnw commented Nov 3, 2021 •

edited by gabrielilharco

Loading

iremonur commented Feb 9, 2022

gabrielilharco commented Feb 9, 2022

iremonur commented Feb 10, 2022 •

edited

Loading

gabrielilharco commented Feb 10, 2022

iremonur commented Feb 11, 2022 •

edited

Loading

gabrielilharco commented Feb 11, 2022

Possible to finetune? #18

Possible to finetune? #18

Comments

afiaka87 commented Sep 14, 2021

mitchellnw commented Sep 14, 2021

afiaka87 commented Sep 15, 2021

mitchellnw commented Sep 15, 2021

mitchellnw commented Sep 15, 2021

afiaka87 commented Sep 21, 2021 • edited Loading

mitchellnw commented Sep 23, 2021

MyLtYkRiTiK commented Oct 7, 2021

mitchellnw commented Oct 7, 2021

mitchellnw commented Nov 3, 2021 • edited by gabrielilharco Loading

iremonur commented Feb 9, 2022

gabrielilharco commented Feb 9, 2022

iremonur commented Feb 10, 2022 • edited Loading

gabrielilharco commented Feb 10, 2022

iremonur commented Feb 11, 2022 • edited Loading

gabrielilharco commented Feb 11, 2022

afiaka87 commented Sep 21, 2021 •

edited

Loading

mitchellnw commented Nov 3, 2021 •

edited by gabrielilharco

Loading

iremonur commented Feb 10, 2022 •

edited

Loading

iremonur commented Feb 11, 2022 •

edited

Loading