Conceptual Captions Training #23

goel-shashank · 2022-02-01T02:38:21Z

I have trained the model (both MLP and GPT-2) using the CC3M dataset but the loss doesn't seem to decrease very much (stays around 3.0). What loss can I expect for a good model? How many epochs should I run it for? Also, is any specific hyperparameter tuning required for CC? I have a model trained for 5 epochs but it generates a similar caption for every image. I tried fitting on a batch of 512 image-caption pairs and everything works out so I don't think there is any logical issue with the pipeline. Please let me know.

rmokady · 2022-02-01T20:20:23Z

Hi @goel-shashank,
Are you using our default parameters?

Did you try both the GPT-2 fine-tuning and the frozen GPT-2?

goel-shashank · 2022-02-01T21:04:21Z

Hi @rmokady,
I tried the default parameters. Do you have the training logs for your run? One thing I'm certainly doing differently is that I have trained a separate CLIP model (RN50 with 20% imagenet zero-shot accuracy) which is trained on CC3M (not OpenAI's pretrained). The prefixes are generated from this model. I don't think this should be causing these issues.

rmokady · 2022-02-03T21:34:56Z

For COCO where we train both prefix and GPT-2 the loss got to 1.47
Unfortunately, the logs for the Conceptual got left on an old server and cannot access these anymore
5 epochs for 3M images is a lot using the standard clip

Anyway, outputting the same sentence for any prefix usually means there is a bug somewhere

goel-shashank · 2022-02-05T01:38:43Z

As I mentioned, I was able to fit on a batch of 512 image-caption pairs and everything works out so I don't think there is any logical issue with the pipeline. But still, I will check everything for once. Closing this issue! Please let me know if you find something useful!

rmokady · 2022-02-19T14:34:31Z

Hi @goel-shashank,
I find some logs for conceptual captions. This is with the resnet CLIP:

rmokady · 2022-02-19T14:37:36Z

This is with the Vit CLIP:

ycchanau · 2022-03-26T18:49:32Z

I have the same problem with my own dataset. It keeps generating similar captions...

surisdi · 2022-07-12T15:36:29Z

Hi, I have the same problem for Conceptual Captions + frozen model. Do you have loss values for that scenario? All the inputs end up converging to the same prefix. Thanks!

I followed the README and ran:

python parse_conceptual.py --clip_model_type ViT-B/32 --data_root /path/to/conceptual_captions --num_threads 100

and then

python train.py --only_prefix --data /path/to/conceptual_captions/conceptual_clip_ViT-B_32_train.pkl --out_dir /path/to/output_dir --mapping_type transformer --num_layers 8 --prefix_length 40 --prefix_length_clip 40

mmderakhshani · 2022-12-04T17:54:49Z

@surisdi Did you manage to reproduce the results?

goel-shashank changed the title ~~Conceptual Captions training~~ Conceptual Captions Training Feb 1, 2022

goel-shashank closed this as completed Feb 5, 2022

rmokady reopened this Feb 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conceptual Captions Training #23

Conceptual Captions Training #23

goel-shashank commented Feb 1, 2022

rmokady commented Feb 1, 2022

goel-shashank commented Feb 1, 2022

rmokady commented Feb 3, 2022

goel-shashank commented Feb 5, 2022

rmokady commented Feb 19, 2022

rmokady commented Feb 19, 2022

ycchanau commented Mar 26, 2022

surisdi commented Jul 12, 2022

mmderakhshani commented Dec 4, 2022

Conceptual Captions Training #23

Conceptual Captions Training #23

Comments

goel-shashank commented Feb 1, 2022

rmokady commented Feb 1, 2022

goel-shashank commented Feb 1, 2022

rmokady commented Feb 3, 2022

goel-shashank commented Feb 5, 2022

rmokady commented Feb 19, 2022

rmokady commented Feb 19, 2022

ycchanau commented Mar 26, 2022

surisdi commented Jul 12, 2022

mmderakhshani commented Dec 4, 2022