This is the official Github repository for our paper CommVQA: Situating Visual Question Answering in Communicative Contexts (Arxiv, 2024). We provide the code and data necessary to replicate our results. If you experience any issues, please email nanditan (at) cs.stanford.edu.
CommVQA, the VQA dataset introduced in our paper, consists of images, descriptions, contexts, questions, and a set of answers.
For details on downloading CommVQA, navigate to CommVQA_dataset/.
To reproduce the model experiments within our paper, please navigate to models/ for more details.
If you find this repo or the paper useful in your research, please feel free to cite our paper:
@unpublished{naik2024commvqa,
author = {Naik, Nandita Shankar and Potts, Christopher and Kreiss, Elisa},
note = {arXiv:2402.15002},
title = {{CommVQA}: Situating Visual Question Answering in Communicative Contexts},
url = {https://arxiv.org/abs/2402.15002},
year = {2024}}