You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
anusham1990
changed the title
Image captioning model with its own language model for alignment scoring (removing dependency of transformer language model)
Image captioning model with its own language model for alignment scoring
Jan 21, 2021
We are using transformer language model (ex. distilbart) to do alignment scoring between given image and masked image captions, assuming an external model is a good surrogate for the original captioning model's language head. By using the captioning model's own language head, we could eliminate this assumption and remove the dependency. (ex. refer to text2text notebook examples).
Some more context:
Transformer Language Model 'distilbart' and tokenizer are being used here to tokenize the image caption. This makes the image to text scenario similar to a multi-class problem. 'distilbart' is used to do alignment scoring between the original image caption and masked image captions being generated i.e. how does the probability of getting the original image caption change when the context of a masked image caption is given? (a.k.a. we are teacher forcing 'distilbart' to always produce the original image caption for the masked images and getting change in logits for each tokenized word in the caption as part of the process).
Note: We are using 'distilbart' here because during experimentation process we found it to give the most meaningful explanations for images. We have compared with other language models such as 'openaigpt' and 'distilgpt2'. Please feel free to explore with other language models of your choice and compare the results.
No description provided.
The text was updated successfully, but these errors were encountered: