Image captioning model with its own language model for alignment scoring #1739

anusham1990 · 2021-01-13T21:50:28Z

No description provided.

anusham1990 · 2021-01-21T22:27:43Z

We are using transformer language model (ex. distilbart) to do alignment scoring between given image and masked image captions, assuming an external model is a good surrogate for the original captioning model's language head. By using the captioning model's own language head, we could eliminate this assumption and remove the dependency. (ex. refer to text2text notebook examples).

Some more context:

Transformer Language Model 'distilbart' and tokenizer are being used here to tokenize the image caption. This makes the image to text scenario similar to a multi-class problem. 'distilbart' is used to do alignment scoring between the original image caption and masked image captions being generated i.e. how does the probability of getting the original image caption change when the context of a masked image caption is given? (a.k.a. we are teacher forcing 'distilbart' to always produce the original image caption for the masked images and getting change in logits for each tokenized word in the caption as part of the process).

Note: We are using 'distilbart' here because during experimentation process we found it to give the most meaningful explanations for images. We have compared with other language models such as 'openaigpt' and 'distilgpt2'. Please feel free to explore with other language models of your choice and compare the results.

detrin · 2023-08-27T09:06:14Z

@anusham1990 So what is your proposed use of shap for this case?

anusham1990 mentioned this issue Jan 21, 2021

Comparative study of image explainers #1764

Open

anusham1990 changed the title ~~Image captioning model with its own language model for alignment scoring (removing dependency of transformer language model)~~ Image captioning model with its own language model for alignment scoring Jan 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image captioning model with its own language model for alignment scoring #1739

Image captioning model with its own language model for alignment scoring #1739

anusham1990 commented Jan 13, 2021

anusham1990 commented Jan 21, 2021

detrin commented Aug 27, 2023

Image captioning model with its own language model for alignment scoring #1739

Image captioning model with its own language model for alignment scoring #1739

Comments

anusham1990 commented Jan 13, 2021

anusham1990 commented Jan 21, 2021

detrin commented Aug 27, 2023