Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image captioning model with its own language model for alignment scoring #1739

Open
anusham1990 opened this issue Jan 13, 2021 · 2 comments
Open

Comments

@anusham1990
Copy link
Collaborator

No description provided.

@anusham1990 anusham1990 changed the title Image captioning model with its own language model for alignment scoring (removing dependency of transformer language model) Image captioning model with its own language model for alignment scoring Jan 21, 2021
@anusham1990
Copy link
Collaborator Author

We are using transformer language model (ex. distilbart) to do alignment scoring between given image and masked image captions, assuming an external model is a good surrogate for the original captioning model's language head. By using the captioning model's own language head, we could eliminate this assumption and remove the dependency. (ex. refer to text2text notebook examples).

Some more context:

Transformer Language Model 'distilbart' and tokenizer are being used here to tokenize the image caption. This makes the image to text scenario similar to a multi-class problem. 'distilbart' is used to do alignment scoring between the original image caption and masked image captions being generated i.e. how does the probability of getting the original image caption change when the context of a masked image caption is given? (a.k.a. we are teacher forcing 'distilbart' to always produce the original image caption for the masked images and getting change in logits for each tokenized word in the caption as part of the process).

Note: We are using 'distilbart' here because during experimentation process we found it to give the most meaningful explanations for images. We have compared with other language models such as 'openaigpt' and 'distilgpt2'. Please feel free to explore with other language models of your choice and compare the results.

@detrin
Copy link

detrin commented Aug 27, 2023

@anusham1990 So what is your proposed use of shap for this case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
New Image Explainers
Awaiting triage
Development

No branches or pull requests

2 participants