Leveraging CLIP for Visual Question Answering

UCLA CS 269 Seminar 5 S22 Course Project (Group 11)

Name	UID
Rakesh Bal	605526216
Sri Keerthi Bolli	505525868

This repository is a PyTorch and Python implementation of the course project for UCLA CS 269 Sem 5 (Human Centred Artificial Intelligence) S22 Course.

We use different exisitng works for our project and we recommend installing them from their respective repositories.

Usage

The scripts are provided in this repository for running our experiments are changes we made to various files in the existing projects mentioned above. For example for MCAN-VQA, we recommend cloining the original repo git clone https://github.com/MILVLG/mcan-vqa.git and then replacing the original files with the files provided in the repository under mcan-vqa folder. Similarly, LSeg we provide the changes we made to the lseg_demo.ipynb in the original repository. For multilingual vqa, the scipt provided converts the multilingual files provided in the dataset (here) to the corresponding jsons to be used by MCAN model.

References

@inProceedings{yu2019mcan,
  author = {Yu, Zhou and Yu, Jun and Cui, Yuhao and Tao, Dacheng and Tian, Qi},
  title = {Deep Modular Co-Attention Networks for Visual Question Answering},
  booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  pages = {6281--6290},
  year = {2019}
}

@inproceedings{
li2022languagedriven,
title={Language-driven Semantic Segmentation},
author={Boyi Li and Kilian Q Weinberger and Serge Belongie and Vladlen Koltun and Rene Ranftl},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=RriDjddCLN}
}

@inProceedings{yu2019mcan,
  author = {Yu, Zhou and Yu, Jun and Cui, Yuhao and Tao, Dacheng and Tian, Qi},
  title = {Deep Modular Co-Attention Networks for Visual Question Answering},
  booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  pages = {6281--6290},
  year = {2019}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
dataset_processing_scripts		dataset_processing_scripts
generate_clip_features		generate_clip_features
mcan-vqa/core		mcan-vqa/core
multilingual-vqa		multilingual-vqa
README.md		README.md
lseg_demo.ipynb		lseg_demo.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset_processing_scripts

dataset_processing_scripts

generate_clip_features

generate_clip_features

mcan-vqa/core

mcan-vqa/core

multilingual-vqa

multilingual-vqa

README.md

README.md

lseg_demo.ipynb

lseg_demo.ipynb

Repository files navigation

Leveraging CLIP for Visual Question Answering

UCLA CS 269 Seminar 5 S22 Course Project (Group 11)

Usage

References

About

Releases

Packages

Languages

rakeshbal99/CS269_CLIP_VQA

Folders and files

Latest commit

History

Repository files navigation

Leveraging CLIP for Visual Question Answering

UCLA CS 269 Seminar 5 S22 Course Project (Group 11)

Usage

References

About

Resources

Stars

Watchers

Forks

Languages