-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conceptual Captions Training #23
Comments
Hi @goel-shashank, Did you try both the GPT-2 fine-tuning and the frozen GPT-2? |
Hi @rmokady, |
For COCO where we train both prefix and GPT-2 the loss got to 1.47 Anyway, outputting the same sentence for any prefix usually means there is a bug somewhere |
As I mentioned, I was able to fit on a batch of 512 image-caption pairs and everything works out so I don't think there is any logical issue with the pipeline. But still, I will check everything for once. Closing this issue! Please let me know if you find something useful! |
Hi @goel-shashank, |
I have the same problem with my own dataset. It keeps generating similar captions... |
Hi, I have the same problem for Conceptual Captions + frozen model. Do you have loss values for that scenario? All the inputs end up converging to the same prefix. Thanks! I followed the README and ran: python parse_conceptual.py --clip_model_type ViT-B/32 --data_root /path/to/conceptual_captions --num_threads 100 and then python train.py --only_prefix --data /path/to/conceptual_captions/conceptual_clip_ViT-B_32_train.pkl --out_dir /path/to/output_dir --mapping_type transformer --num_layers 8 --prefix_length 40 --prefix_length_clip 40 |
@surisdi Did you manage to reproduce the results? |
I have trained the model (both MLP and GPT-2) using the CC3M dataset but the loss doesn't seem to decrease very much (stays around 3.0). What loss can I expect for a good model? How many epochs should I run it for? Also, is any specific hyperparameter tuning required for CC? I have a model trained for 5 epochs but it generates a similar caption for every image. I tried fitting on a batch of 512 image-caption pairs and everything works out so I don't think there is any logical issue with the pipeline. Please let me know.
The text was updated successfully, but these errors were encountered: