Skip to content

mayank26saxena/visual-question-answering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Visual Question Answering

Built four different neural network models for visual question answering using Tensorflow 2.0. Trained the model together on images of MS Coco and the VQA 2.0 dataset.

YouTube Demo

URL: https://www.youtube.com/watch?v=5wNP7VoB4tM

Dataset

We have used the VQA v2 dataset for training the models.

Models

Experimented by implementing 4 different models. The four models are as follows:

  • Model 1: Append Image as Word
  • Model 2: Prepend Image as word
  • Model 3: Question through LSTM with image
  • Model 4: Attention Based Model

Accuracy

  • Trained the above models with 30K examples and started with 30 epochs.
Train Accuracy Train Loss Test Accuracy Test Loss
Model 1 19.47 % 8.10 % 19.43 % 8.09 %
Model 2 19.40 % 8.11 % 19.43 % 8.09 %
Model 3 18.31 % 8.11 % 18.35 % 8.11 %
Model 4 22.49 % 4.07 % 24.57 % 4.09 %

Sample Predictions

Sample predictions

About

Visual Question Answering Model using Tensorflow 2.0

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages