## Visual Question and Answering

This notebook is about Visual Question and answering.
To visualize the tensorboard tensorboard --logdir=boards/1

### Some changes that need to be done
1. Change the basic architecture to the attention architecture.
2. Visualize the tensorboard properly to keep it in the final report. 
3. Model design.
    1. Change the final outputs. I multiplied the question embedding with the answer embedding. This has to be chnaged. 
    2. Can directly take the fc7 outputs. 
    3. Can pass the image as the first token. -> This should not make much difference. 
    4. Activations somewhere in the network. 
    5. Change the random sampling to get the vaidation data. 

In [1]:
import cv2
import string
import time

import cv2
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import sklearn
from sklearn.model_selection import train_test_split

from helpers.config import *
from helpers.preprocessing import *
from helpers.utils_v2 import *

% load_ext autoreload
% autoreload 2


In [2]:
"""
Resetting default tensorflow computational Graph
"""
tf.reset_default_graph()


In [12]:
cfg = Config(batch_size=28, num_epochs=25)
print("Batch_Size: ", cfg.batch_size)
print("num_epochs: ", cfg.num_epochs)
print("Weights file is : ", cfg.weights_path)
print("Config data path is: ", cfg.data_path)
print("Glove vectors path is: ", cfg.glove_path)


Batch_Size:  28
num_epochs:  25
Weights file is :  ../weights/vgg16_weights.npz
Config data path is:  ../data/dataset_v7w_telling.json
Glove vectors path is:  ../data/glove.6B.50d.txt


### Load the data required for the Question Answering

In [13]:
samples = loadData(cfg.data_path.split('../')[1])
train_samples, val_samples = train_test_split(samples, test_size=0.2)
print("Total number of training samples are: ", len(train_samples))
print("Validation examples number: ", len(val_samples))


Total number of training samples are:  111894
Validation examples number:  27974


### Loading the Glove vectors.

In [14]:
## Loading glove vectors here. 
W2VEC = load_glove(cfg)


../data/glove.6B.50d.txt


### Functions available in the utils.py

1. Encoding the image
2. Encoding the text
3. Loading the Weights of the pretrained model
4. Loading placeholders
5. Variables class
6. 

## Generator
1. Load the image, question and answer here and train the network. 
2. Will get the output data in the shape (N, image, question, answer, groundtruth, option1, option2, option3)
3. 

In [15]:
W2VEC_LEN = 50

def vectorize(words_sequence, max_words=15, clean=False):
    'Takes a sentence and returns corresponding list of GloVecs'

    if clean:
        sent = _dataCleaning(words_sequence)

    words = words_sequence.lower().translate(string.punctuation).strip().split()
    # ignoring words beyond max_words
    words = words[:max_words]
    words2vec = np.empty((1, W2VEC_LEN))

    for w in words:
        word2vec = W2VEC.get(w.lower())

        if word2vec is None:
            word2vec = np.random.rand(W2VEC_LEN)

        word2vec = word2vec.reshape((1, W2VEC_LEN))
        words2vec = np.concatenate((words2vec, word2vec), axis=0)

    PADDING = np.zeros((1, W2VEC_LEN))

    for _ in np.arange(max_words - len(words)):
        words2vec = np.concatenate((words2vec, PADDING), axis=0)

    return words2vec[1:]

def generator(train_samples, batch_size=32):
    
    """
    1. Reads the image
    2. Reads the question and appends the word2vec for the sentence. 
    3. Reads the answer and the options and appends the word2vec to the corresponding lists. 
    4. Have to tokenize the question and answer here. 
    
    May need preprocessing of the question here. Get word 2 vecs of the word here. 
    The shape of the questions, answers, options1 to options3 is (N, T, D)
    
        N - number of samples
        T - time steps in the RNN
        D - dimension of the word 2 vector
        
    Returns: 1. Images batch, 
             2. Questions batch ,
             3. Answers batch, 
             4. option1 batch, 
             5. option2 batch, 
             6. option3 batch
    """
    
    num_samples = len(train_samples)
    
    while 1:
        
        sklearn.utils.shuffle(train_samples)
        
        path_to_images = "images/"
        
        for offset in range(0, num_samples, batch_size):
            
            batch_samples = train_samples[offset:offset+batch_size]
            
            train_images = []
            questions = []
            answers = []
            options1 = []
            options2 = []
            options3 = []
            
            for batch_sample in batch_samples:
                
                image_path = batch_sample[0]
                question   = batch_sample[1]
                answer     = batch_sample[2]
                choice1    = batch_sample[3]
                choice2    = batch_sample[4]
                choice3    = batch_sample[5]
                
                image1 = cv2.imread( path_to_images + batch_sample[0] )
                image1 = cv2.resize(image1, (448,448))
                train_images.append(image1)
                
                questions.append(vectorize(question, max_words = cfg.question_max_words))
                
                answers.append(vectorize(answer, max_words = cfg.answer_max_words))
                options1.append(vectorize(choice1, max_words = cfg.answer_max_words))
                options2.append(vectorize(choice2, max_words = cfg.answer_max_words))
                options3.append(vectorize(choice3, max_words = cfg.answer_max_words))
                
            
            train_images = np.array(train_images)
            questions = np.array(questions)
            answers = np.array(answers)
            options1 = np.array(options1)
            options2 = np.array(options2)
            options3 = np.array(options3)
            
            labels = np.zeros([batch_size,4])
            labels[:,0] = 1
            yield train_images, questions, answers, options1, options2, options3, labels


In [16]:
def compute_loss(logits, labels_placeholder):
    """
    Considering that the score is the final logit value without the softmax. 
    """
    final_loss = tf.reduce_sum(
        tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels_placeholder, dim=-1))
    return final_loss


### Building the computational Graph

In [17]:
tf.reset_default_graph()

In [18]:
## 1
inputIm_placeholder, question_placeholder, answer_placeholder, \
option1_placeholder, option2_placeholder, option3_placeholder, labels_placeholder = load_placeholders(cfg)

## 2. 
encode_image = encodeImage(cfg)

## 3. 
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
print("Weights path is: ", cfg.weights_path)

## 4. 
with sess.as_default():
    encode_image.load_weights(cfg.weights_path, sess, is_Train=True)


Weights path is:  ../weights/vgg16_weights.npz
The weights are trainable
weight_file is:  weights/vgg16_weights.npz


In [19]:
"""
1. Loading Placeholders for the computational graph
2. Creating object for the encoding image and encoding text
3. Creating default session in tensorflow
4. As the CNN model is pre trained, the loads are loaded in the encoder object within the defautl session. 
5. The computational graph is run for the convolution part. 
6. Encoding the question using the computational graph. 
7. 
"""

## 5. 
final_conv_layer = encode_image.forward_pass(inputIm_placeholder)
print("(Before) final_conv_layer shape: ", final_conv_layer.get_shape())

## Need to flatten the image here. 
final_conv_layer = tf.contrib.layers.flatten(final_conv_layer)
print("(After) final_conv_layer shape: ", final_conv_layer.get_shape())

fully_connected_object = fullyConnected(cfg)
output_fully_connected = fully_connected_object.forward_pass(final_conv_layer)
print("output_fully_connected shape is: ", output_fully_connected.get_shape())

# init_state = tf.Variable(tf.zeros([cfg.batch_size, cfg.state_size], dtype = tf.float32))

## 6. 
with tf.variable_scope("question", reuse=tf.AUTO_REUSE):
    """
    Reuse permission is given to all the variables within this module. 
    """
    encode_text = encodeText(cfg)
    output_fw_q, final_state_fw_q = encode_text.encode(question_placeholder, encoder_input=output_fully_connected)

with tf.variable_scope("answers", reuse=tf.AUTO_REUSE):
    """
    Reuse Permission is given to the answer as well.
    """
    encode_answer = encodeText(cfg)
    output_fw_a, final_state_fw_a = encode_answer.encode(answer_placeholder, final_state_fw_q)
    output_fw_opt1, final_state_fw_opt1 = encode_answer.encode(option1_placeholder, final_state_fw_q)
    output_fw_opt2, final_state_fw_opt2 = encode_answer.encode(option2_placeholder, final_state_fw_q)
    output_fw_opt3, final_state_fw_opt3 = encode_answer.encode(option3_placeholder, final_state_fw_q)

"""
Now I have to do the dot product of the two outputs and then send it to the loss function. 
"""
pro_value1 = tf.reduce_sum(tf.multiply(final_state_fw_q, final_state_fw_a), axis=1)
pro_value2 = tf.reduce_sum(tf.multiply(final_state_fw_q, final_state_fw_opt1), axis=1)
pro_value3 = tf.reduce_sum(tf.multiply(final_state_fw_q, final_state_fw_opt2), axis=1)
pro_value4 = tf.reduce_sum(tf.multiply(final_state_fw_q, final_state_fw_opt3), axis=1)

print("pro_value1 shape is: ", pro_value1.get_shape())

pro_value = tf.stack([pro_value1, pro_value2, pro_value3, pro_value4], axis=1)
print("pro_value shape is: ", pro_value.get_shape())
loss = compute_loss(pro_value, labels_placeholder)

accuracy_operation = tf.argmax(pro_value, axis=1)
print("Accuracy_operatin shape is: ", accuracy_operation.get_shape())
non_zero_count = tf.count_nonzero(accuracy_operation)
accuracy = cfg.batch_size - non_zero_count

train_step = tf.train.AdamOptimizer(learning_rate=3e-4).minimize(loss)


(Before) final_conv_layer shape:  (?, 8, 8, 512)
(After) final_conv_layer shape:  (?, 32768)
output_fully_connected shape is:  (?, 512)
pro_value1 shape is:  (?,)
pro_value shape is:  (?, 4)
Accuracy_operatin shape is:  (?,)


In [20]:
## Define the Classifier. 

saver = tf.train.Saver()
savefile = "models/model1.ckpt"

with sess.as_default():
    writer = tf.summary.FileWriter("boards/1")
    writer.add_graph(sess.graph)

    sess.run(tf.global_variables_initializer())

    for i in range(cfg.num_epochs):

        print("Epoch Number: ", i)
        batch_generator = generator(train_samples, cfg.batch_size)
        total_iterations = int(len(train_samples) / cfg.batch_size)

        for j in range(total_iterations):
            #if(j%10==0):
            #print ( "Iter: ",j)
            start_time = time.time()
            batch_images_gen, batch_questions_gen, batch_answers_gen, batch_o1, batch_o2, batch_o3, labels = batch_generator.__next__()

            sess.run(train_step, feed_dict= \
                {inputIm_placeholder: batch_images_gen, \
                 question_placeholder: batch_questions_gen, \
                 answer_placeholder: batch_answers_gen, \
                 option1_placeholder: batch_o1, \
                 option2_placeholder: batch_o2, \
                 option3_placeholder: batch_o3, \
                 labels_placeholder: labels
                 })

            if (j % 50 == 0):
                loss_value = sess.run(loss, feed_dict= \
                    {inputIm_placeholder: batch_images_gen, \
                     question_placeholder: batch_questions_gen, \
                     answer_placeholder: batch_answers_gen, \
                     option1_placeholder: batch_o1, \
                     option2_placeholder: batch_o2, \
                     option3_placeholder: batch_o3, \
                     labels_placeholder: labels
                     })

                end_time = time.time()

                v_batch_generator = generator(val_samples, cfg.batch_size)
                v_batch_images_gen, v_batch_questions_gen, v_batch_answers_gen, v_batch_o1, v_batch_o2, \
                v_batch_o3, v_labels = v_batch_generator.__next__()

                accuracy_value = sess.run(accuracy, feed_dict= \
                    {inputIm_placeholder: v_batch_images_gen, \
                     question_placeholder: v_batch_questions_gen, \
                     answer_placeholder: v_batch_answers_gen, \
                     option1_placeholder: v_batch_o1, \
                     option2_placeholder: v_batch_o2, \
                     option3_placeholder: v_batch_o3, \
                     labels_placeholder: v_labels
                     })
                print("Iter: ", j, ' Total iter: ', total_iterations, " Loss value is: ", loss_value, " Time taken: ",
                      end_time - start_time, " Accuracy value is : ", accuracy_value / float(cfg.batch_size))

    saver.save(sess, savefile)


ResourceExhaustedError: OOM when allocating tensor of shape [25088,4096] and type float
	 [[Node: zeros_13 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [25088,4096] values: [0 0 0]...>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

Caused by op 'zeros_13', defined at:
  File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/usr/local/lib/python3.5/dist-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/usr/local/lib/python3.5/dist-packages/ipykernel/kernelapp.py", line 478, in start
    self.io_loop.start()
  File "/usr/local/lib/python3.5/dist-packages/zmq/eventloop/ioloop.py", line 177, in start
    super(ZMQIOLoop, self).start()
  File "/usr/local/lib/python3.5/dist-packages/tornado/ioloop.py", line 888, in start
    handler_func(fd_obj, events)
  File "/usr/local/lib/python3.5/dist-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "/usr/local/lib/python3.5/dist-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "/usr/local/lib/python3.5/dist-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/ipykernel/kernelbase.py", line 281, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/usr/local/lib/python3.5/dist-packages/ipykernel/kernelbase.py", line 232, in dispatch_shell
    handler(stream, idents, msg)
  File "/usr/local/lib/python3.5/dist-packages/ipykernel/kernelbase.py", line 397, in execute_request
    user_expressions, allow_stdin)
  File "/usr/local/lib/python3.5/dist-packages/ipykernel/ipkernel.py", line 208, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/usr/local/lib/python3.5/dist-packages/ipykernel/zmqshell.py", line 533, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py", line 2728, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py", line 2850, in run_ast_nodes
    if self.run_code(code, result):
  File "/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py", line 2910, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-18-40cd511e1761>", line 13, in <module>
    encode_image.load_weights(cfg.weights_path, sess, is_Train=True)
  File "/home/svh2811/VisualQA/helpers/utils_v2.py", line 116, in load_weights
    keys, weights = load_weights(self.cfg.weights_path, sess, is_Train)
  File "/home/svh2811/VisualQA/helpers/utils_v2.py", line 71, in load_weights
    'fc6_W': tf.Variable(tf.zeros([25088, 4096], dtype=tf.float32), name='fc6_W', trainable=is_Train),
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/array_ops.py", line 1512, in zeros
    output = constant(zero, shape=shape, dtype=dtype, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/constant_op.py", line 218, in constant
    name=name).outputs[0]
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3076, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1561, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor of shape [25088,4096] and type float
	 [[Node: zeros_13 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [25088,4096] values: [0 0 0]...>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
