In case you haven't, please execute the following cell **once per Workspace Session** to install all the necessary requirements.

In [None]:
!bash run_me_first_on_floyd.sh

# Visual Question Answering: Part I

### Baseline Approach: A Bag of Words Model

This notebook is simply an execution of the code to build VQA model using a basic `Neural Network (Multilayer Perceptron) + Bag of Words`, I would highly encourage you to read the [full post here](https://sominwadhwa.github.io/blog/2018/01/01/de/)

<p align="center">
  <img src="https://github.com/sominwadhwa/sominwadhwa.github.io/blob/master/assets/vqa/5.jpg?raw=true"/>
</p>

**Let's get all the necessary library imports**

In [2]:
import sys, warnings
warnings.filterwarnings("ignore")
from random import shuffle, sample
import pickle as pk
import gc

import numpy as np
import pandas as pd
import scipy.io
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD
from keras.utils import np_utils, generic_utils
from progressbar import Bar, ETA, Percentage, ProgressBar    
from keras.models import model_from_json
from sklearn.preprocessing import LabelEncoder
import spacy
#from spacy.en import English

from src.utils import *
from src.features import *

Using TensorFlow backend.


## Preprocessed Data

The open-source VQA dataset contains multiple open-ended questions about various images. All my experiments were performed with v1 of the dataset (though I've processed v2 of the dataset as well), which contains:

- 82,783 training images from COCO (common objects in context) dataset.
- 215,407 question-answer pairs for training images.
- 40,504 validation images to perform own testing.
- 121,512 question-answer pairs for validation images.

In [7]:
training_questions = open("preprocessed/v1/ques_train.txt","rb").read().decode('utf8').splitlines()
answers_train      = open("preprocessed/v1/answer_train.txt","rb").read().decode('utf8').splitlines()
images_train       = open("preprocessed/v1/images_coco_id.txt","rb").read().decode('utf8').splitlines()
img_ids            = open('preprocessed/v1/coco_vgg_IDMap.txt').read().splitlines()
vgg_path           = "/floyd/input/vqa_data/coco/vgg_feats.mat"

Let's look at a couple of questions along with their answers. The first entry you see here is the **COCO Image ID** through with the image can be found at [http://cocodataset.org/#explore](http://cocodataset.org/#explore) by simply entering the image ID in the **search** column. 

In [4]:
sample(list(zip(images_train, training_questions, answers_train)), 5)

[('74523', 'What word is printed on the road?', 'stop'),
 ('378471', 'Is this man talking on his cell phone?', 'yes'),
 ('120655', 'What is covering the ground?', 'snow'),
 ('64557', 'Is the light on?', 'yes'),
 ('463434', 'Are there cars on the bridge?', 'no')]

In [5]:
nlp = spacy.load("en")
print ("Loaded WordVec")

Loaded WordVec


Load image features - `4096` sized vectors extracted from the last layer of a VGG network trained on the COCO Dataset.

In [8]:
%time vgg_features = scipy.io.loadmat(vgg_path)
img_features = vgg_features['feats']
id_map = dict()
print ("Loaded VGG Weights")

CPU times: user 10.2 s, sys: 3.07 s, total: 13.3 s
Wall time: 13.8 s
Loaded VGG Weights


In [9]:
gc.collect()

131

In [7]:
upper_lim = 1000 #Number of most frequently occurring answers in COCOVQA (Coverting >85% of the total data)
training_questions, answers_train, images_train = freq_answers(training_questions, answers_train, images_train, upper_lim)
print (len(training_questions), len(answers_train),len(images_train))

215407 215407 215407


In [8]:
lbl = LabelEncoder()
lbl.fit(answers_train)
nb_classes = len(list(lbl.classes_))
pk.dump(lbl, open('preprocessed/v1/label_encoder_mlp.sav','wb'))

### Defining the Network Architecture

In [10]:
num_hidden_units  = 1024
num_hidden_layers = 3
batch_size        = 256
dropout           = 0.5
activation        = 'tanh'
img_dim           = 4096
word2vec_dim      = 300

`num_epochs`: Set to the number of epochs you'd wish to run the network for.

`log_interval`: This parameter sets the epoch interval after which a copy of the model weights will be saved.

In [13]:
num_epochs = 200
log_interval = 25

In [10]:
for ids in img_ids:
    id_split = ids.split()
    id_map[id_split[0]] = int(id_split[1])

In [15]:
model = Sequential()
model.add(Dense(num_hidden_units, input_dim=word2vec_dim+img_dim, kernel_initializer='uniform'))
model.add(Dropout(dropout))
for i in range(num_hidden_layers):
    model.add(Dense(num_hidden_units, kernel_initializer='uniform'))
    model.add(Activation(activation))
    model.add(Dropout(dropout))
model.add(Dense(nb_classes, kernel_initializer='uniform'))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
#tensorboard = TensorBoard(log_dir='/output/Graph', histogram_freq=0, write_graph=True, write_images=True)
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 1024)              4502528   
_________________________________________________________________
dropout_1 (Dropout)          (None, 1024)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 1024)              1049600   
_________________________________________________________________
activation_1 (Activation)    (None, 1024)              0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 1024)              0         
_________________________________________________________________
dense_3 (Dense)              (None, 1024)              1049600   
_________________________________________________________________
activation_2 (Activation)    (None, 1024)              0         
__________

In [16]:
model_dump = model.to_json()
open('baseline_mlp'  + '.json', 'w').write(model_dump)

3290

Since I've already performed these experiments once, so it'd be a nice idea to leverage the model I already created so here I've loaded the weights saved after the 99th epoch during my training experiment, and simply retrain those!

### You may **skip** this step if you wish to build your model from scratch!

**And we're good to go!**

In [None]:
for k in range(num_epochs):
    index_shuffle = list(range(len(training_questions)))
    shuffle(index_shuffle)
    training_questions = [training_questions[i] for i in index_shuffle]
    answers_train = [answers_train[i] for i in index_shuffle]
    images_train = [images_train[i] for i in index_shuffle]
    progbar = generic_utils.Progbar(len(training_questions))
    for ques_batch, ans_batch, im_batch in zip(grouped(training_questions, batch_size, 
                                                       fillvalue=training_questions[-1]), 
                                               grouped(answers_train, batch_size, 
                                                       fillvalue=answers_train[-1]), 
                                               grouped(images_train, batch_size, fillvalue=images_train[-1])):
        X_ques_batch = get_questions_sum(ques_batch, nlp)
        X_img_batch = get_images_matrix(im_batch, id_map, img_features)
        X_batch = np.hstack((X_ques_batch, X_img_batch))
        Y_batch = get_answers_sum(ans_batch, lbl)
        #loss = model.train_on_batch(X_batch, Y_batch,callbacks= [tensorboard])
        loss = model.train_on_batch(X_batch, Y_batch)
        progbar.add(batch_size, values=[('train loss', loss)])

    if k%log_interval == 0:
        model.save_weights("weights/MLP" + "_epoch_{:02d}.hdf5".format(k))
model.save_weights("weights/MLP" + "_epoch_{:02d}.hdf5".format(k))

 33280/215407 [===>..........................] - ETA: 1:18 - train loss: 2.9500

# Let's evaluate our model!

We're going to evalute our model on the validation set provided by the **VQA Dataset** which I've already preprocessed much like our training datasets. Refer to [VQA Evaluation](http://visualqa.org/evaluation.html).

While I have evaluated my pre-trained models over here, you might like to change the paths in order to evaluate your own models. This can be easily done in the following way -

1. Add `model_from_json(open('lstm_structure.json').read())` (instead of loading model structure from my dataset, use the one you just created.
2. Modify -> `model.load_weights("weights/<weights_file>")` (instead of loading weights from my pretrained models, use your own set stored under `weights` directory.

By default, we're going to load the weights & the model created at the beginning of your training loop (for testing purposes).

In [11]:
model = model_from_json(open('baseline_mlp.json').read())
# In case you wish to evaluate the model you just trained, uncomment the following line of code & comment out the subsequent one -
#model.load_weights('weights/MLP_epoch_25.hdf5')
model.load_weights('/floyd/input/vqa_data/weights/MLP_epoch_25.hdf5')
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')

print ("Model Loaded with Weights")
model.summary()

Instructions for updating:
keep_dims is deprecated, use keepdims instead
Instructions for updating:
keep_dims is deprecated, use keepdims instead
Model Loaded with Weights
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 1024)              4502528   
_________________________________________________________________
dropout_1 (Dropout)          (None, 1024)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 1024)              1049600   
_________________________________________________________________
activation_1 (Activation)    (None, 1024)              0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 1024)              0         
_________________________________________________________________
dense_3 (Dense)              (None, 

**Loading the validation preprocessed data**

In [12]:
val_imgs = open('preprocessed/v1/val_images_coco_id.txt','rb').read().decode('utf-8').splitlines()
val_ques = open('preprocessed/v1/ques_val.txt','rb').read().decode('utf-8').splitlines()
val_ans  = open('preprocessed/v1/answer_val.txt','rb').read().decode('utf-8').splitlines()

**Replace location of LabelEncoder to your own, otherwise this may affect accuracy.** To do so, simply change the `file_path` to `preprocessed/v1/label_encoder_<type>.sav`. 

In [13]:
label_encoder = pk.load(open('preprocessed/v1/label_encoder_mlp.sav','rb'))

In [14]:
y_pred = []
batch_size = 128 

#print ("Word2Vec Loaded!")

widgets = ['Evaluating ', Percentage(), ' ', Bar(marker='#',left='[',right=']'), ' ', ETA()]
pbar = ProgressBar(widgets=widgets)
#i=1

In [16]:
for qu_batch,an_batch,im_batch in pbar(zip(grouped(val_ques, batch_size, fillvalue=val_ques[0]), grouped(val_ans, batch_size, fillvalue=val_ans[0]), grouped(val_imgs, batch_size, fillvalue=val_imgs[0]))):
    X_q_batch = get_questions_matrix(qu_batch, nlp)
    X_i_batch = get_images_matrix(im_batch, id_map, img_features)
    X_batch = np.hstack((X_q_batch, X_i_batch))
    y_predict = model.predict_classes(X_batch, verbose=0)
    y_pred.extend(label_encoder.inverse_transform(y_predict))
    #print (i,"/",len(val_ques))
    #i+=1
    #print(label_encoder.inverse_transform(y_predict))


Evaluating N/A% [#                                             ] Time:  0:03:26


In [19]:
correct_val = 0.0
total = 0
f1 = open('res.txt','w')

for pred, truth, ques, img in zip(y_pred, val_ans, val_ques, val_imgs):
    t_count = 0
    for _truth in truth.split(';'):
        if pred == truth:
            t_count += 1 
    if t_count >=2:
        correct_val +=1
    else:
        correct_val += float(t_count)/3

    total +=1

    try:
        f1.write(str(ques))
        f1.write('\n')
        f1.write(str(img))
        f1.write('\n')
        f1.write(str(pred))
        f1.write('\n')
        f1.write(str(truth))
        f1.write('\n')
        f1.write('\n')
    except:
        pass

print ("Accuracy: ", round((correct_val/total)*100,2))
#f1.write('Final Accuracy is ' + str(round(correct_val/total),2)*100)
f1.close()

Accuracy:  48.21


## Want to try out the VQA Model?

**Execute to `src/test.py` with any of the sample images provided with a question of your choice!**

There you go, all set to participate in the next VQA Challenge!

If you do, however, would like to try out these models on your own custom images do checkout **`src/test.py`** with an image and a characterstic question.