Sample Predictions

Visual Question Answering

Built four different neural network models for visual question answering using Tensorflow 2.0. Trained the model together on images of MS Coco and the VQA 2.0 dataset.

YouTube Demo

URL: https://www.youtube.com/watch?v=5wNP7VoB4tM

Dataset

We have used the VQA v2 dataset for training the models.

Models

Experimented by implementing 4 different models. The four models are as follows:

Model 1: Append Image as Word
Model 2: Prepend Image as word
Model 3: Question through LSTM with image
Model 4: Attention Based Model

Accuracy

Trained the above models with 30K examples and started with 30 epochs.

	Train Accuracy	Train Loss	Test Accuracy	Test Loss
Model 1	19.47 %	8.10 %	19.43 %	8.09 %
Model 2	19.40 %	8.11 %	19.43 %	8.09 %
Model 3	18.31 %	8.11 %	18.35 %	8.11 %
Model 4	22.49 %	4.07 %	24.57 %	4.09 %

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
images		images
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visual Question Answering

YouTube Demo

Dataset

Models

Accuracy

Sample Predictions

About

Releases

Packages

Languages

mayank26saxena/visual-question-answering

Folders and files

Latest commit

History

Repository files navigation

Visual Question Answering

YouTube Demo

Dataset

Models

Accuracy

Sample Predictions

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages