This is an implementation of visual question answering models from the papers VQA: Visual Question Answering and Exploring Models and Data for Image Question Answering
Trained on VQA version 1 and COCO-QA Datasets
-
All Models trained with a batch size of 128 and 10 epochs
-
For VQA dataset considered only the questions with the 1000 most frequent answers which cover 86% of the data
-
Used GloVe pre-trained word vectors glove.6b with 300D with frozen parameters during training
-
Evaluation on VQA dataset done using VQA Evaluation
Training Questions | Validation Questions | |
---|---|---|
VQA 1 Dataset | 215264 | 104765 |
COCO-QA Dataset | 78736 | 38948 |
BOW | VIS BLSTM | LSTM QI | LSTM QI 2 | |
---|---|---|---|---|
VQA 1 | 54.35% | 58.85% | 60.73% | 61.52% |
COCO-QA | 49.28% | 54.65% | 55.67% | 55.21% |
Using train.py or train_batches.py
train_batches.py splits large data into chunks, and saves them to the hard disk, has an extra argument --chunk_size
python train_batches.py --model_name <model_name> --dataset <dataset>
-
-m --model_name
Name of the model [vis_blstm, lstm_qi, lstm_qi_2] -
-d --dataset
Name of the dataset [VQA_1, COCO-QA] -
-cs --chunk_size
Splits large data into chunks of size chunk_size saved on the hard disk, Default 20480 -
-ep --epochs
Default 10 -
-bz --batch_size
Default 128
python predict_answer.py --image_path <image_path> --question <question>
-
-i --image_path
Path of the image -
-q --question
The question -
-m --model_name
Default lstm_qi_2 -
-d --dataset
Default VQA_1
python Evaluation/evaluate_VQA_1.py --model_name <model_name>
-m --model_name
Name of the model [bow, vis_blstm, lstm_qi, lstm_qi_2]