# Relational Networks Implementation

Implementation of Santoro, Raposo et al [A simple neural network module for relational reasoning](https://arxiv.org/pdf/1706.01427.pdf)

I will start with the Visual QA dataset [CLEVR](https://cs.stanford.edu/people/jcjohns/clevr/). The dataset consists of the following: 

- training set: 70,000 images and 699,989 questions
- validation set: 15,000 images and 149,991 questions
- test set: 15,000 images and 14,988 questions
- scene graph annotations for training and validations images including ground-truth locations, attributes, and relationships for objects

The network used to perform relational reasoning consists of: 
- 4 layer convolution to process the images
    - 24 - 3 x 3 kernels, with a stride= 2, RELU activation, and batch normalization
- 128 unit LSTM to process questions
- 32 unit word embedding layer following the LSTM
- Relational network, composed of 2 multilayer perceptrons,and RELU activation: 
   - MLP #1 (g$_\theta$): 4 dense layers with 256 units each
   - MLP #2 (f$_\theta$): 3 dense layers 256, 256, 29 units respectively
- Dropout before final layer
- Final layer linear layer that produced logits for a softmax over the answer vocabulary (softmax output). 
    - cross-entropy loss function using the Adam optimizer with a learning rate of 2.5e−4
    - 64 mini-batches
   
   
Try next:
- 1-D conv for text processing

## Import Libraries

In [1]:
import numpy as np
import tensorflow as tf

import keras
import keras.backend as K
from keras.layers import LSTM,GRU,Conv2D,SeparableConv2D,Embedding,Dense,Input,BatchNormalization, \
                         Reshape,Flatten,Dropout,Lambda,RepeatVector,Concatenate,Add,TimeDistributed
from keras.callbacks import ModelCheckpoint, Callback, EarlyStopping, ReduceLROnPlateau, TensorBoard
from keras.optimizers import Adam
from keras.models import Model, Sequential
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.utils import to_categorical

from PIL import Image
import os
import pandas as pd
import json
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

%matplotlib inline

Using TensorFlow backend.


In [None]:
K.set_image_dim_ordering('tf')
K.clear_session()

## Load data

In [None]:
!ls ../../Data/CLEVR/CLEVR_v1.0/scenes/

In [None]:
cwd = os.getcwd()
cwd


In [None]:
data_dir = '/home/odenigborig/Data/CLEVR/CLEVR_v1.0/'
images = os.path.join(data_dir,'images')
questions = os.path.join(data_dir,'questions')
scenes = os.path.join(data_dir,'scenes')

train_images_dir = os.path.join(images,'train')
valid_images_dir = os.path.join(images,'val')
test_images_dir = os.path.join(images,'test')


In [None]:
#train_qs = json.load(open(os.path.join(questions,'CLEVR_train_questions.json')))
#valid_qs = json.load(open(os.path.join(questions,'CLEVR_val_questions.json')))
#test_qs = json.load(open(os.path.join(questions,'CLEVR_test_questions.json')))


In [None]:
#train_qs.keys()

In [None]:
#train_qs['questions'][1]

In [None]:
#train_qs['questions'][0].keys()

In [None]:
#index = 890
#print(train_qs['questions'][index]['question'].encode('utf-8'))
#print(train_qs['questions'][index]['answer'].encode('utf-8'))
#img_name = train_qs['questions'][index]['image_filename']
#print('image name: ' + img_name)

#img = mpimg.imread(os.path.join(train_images_dir,img_name))
#this_img= img[:,:,:3]
#plt.imshow(this_img)
#plt.axis('off')

#plt.figure()
#plt.imshow(img)
#plt.axis('off')


In [None]:
#create lists of questions, filenames, and other pertinent details
#train_questions = [value['question'].encode('utf-8') for counter, value in enumerate(train_qs['questions'])]
#train_answers = [value['answer'].encode('utf-8') for counter, value in enumerate(train_qs['questions'])]
#train_img_fnames = [value['image_filename'].encode('utf-8') for counter,value in enumerate(train_qs['questions'])]

#valid_questions = [value['question'].encode('utf-8') for counter, value in enumerate(valid_qs['questions'])]
#valid_answers = [value['answer'].encode('utf-8') for counter, value in enumerate(valid_qs['questions'])]
#valid_img_fnames = [value['image_filename'].encode('utf-8') for counter,value in enumerate(valid_qs['questions'])]

#test_questions = [value['question'].encode('utf-8') for counter, value in enumerate(test_qs['questions'])]
#test_img_fnames = [value['image_filename'].encode('utf-8') for counter,value in enumerate(test_qs['questions'])]

In [None]:
#train_q_len = [len(value) for counter,value in enumerate(train_questions)]

In [None]:
#max(train_q_len)

In [None]:
#np.unique(train_answers)

In [None]:
len(np.unique(train_questions))

In [None]:
len(np.unique(train_answers))

## Modules

### CNN module
- 4 layer convolution to process the images
    - 24 - 3 x 3 kernels, with a stride= 2, RELU activation, and batch normalization

In [None]:
#cnn module
cnn_in = Input(shape=(128,128,3),name='img_in')

conv = Conv2D(filters=24,kernel_size=(3,3),strides=2,activation='relu',name='conv_1',padding='same')(cnn_in)
conv = BatchNormalization(axis=1,name='bn_1')(conv)

for i in range(3):
    conv = Conv2D(filters=24,kernel_size=(3,3),strides=2,activation='relu',name='conv_'+str(i+2),padding='same')(conv)
    conv = BatchNormalization(axis=1,name='bn_'+str(i+2))(conv)

cnn_module = Model(cnn_in,conv,name='cnn_module')
cnn_module.summary()

### RNN module
- 32 unit word embedding layer 
- 128 unit LSTM to process questions

In [None]:
#RNN module
max_sequence_len = 213 #max(train_q_len)+10

#rnn input shape = (batch size, time steps, dimensions)
question_input = Input(shape=(max_sequence_len,),name='question_in')
print(question_input._keras_shape)
embed = Embedding(input_dim=max_sequence_len,output_dim=32,name='embedding')(question_input)
print(embed.shape)
lstm = LSTM(units=128,name='lstm',use_bias=False)(embed)
print(lstm.shape)

rnn_module = Model(question_input,lstm,name='rnn_module')
rnn_module.summary()

### Relational network (RN) module 
Relational network, composed of 2 multilayer perceptrons,and RELU activation:
- MLP #1 (g$_\theta$): 4 dense layers with 256 units each
- MLP #2 (f$_\theta$): 3 dense layers 256, 256, 29 units respectively

The first MLP (g$_\theta$) acts on **pairs** of objects. The output of the cnn module is the object set *O* and the object pairs are obtained from *O*. Based on the cnn module above, there are 64 object pairs (8x8). 

#### Create object-object pair and object-object-question pair

In [None]:
def sub2ind(array_shape, rows, cols):
    return rows*array_shape[1] + cols

def pair_objects(x):
    in_shapes = K.int_shape(x)

    #convert to objects matrix, D of size  m by n
    # m objects, and each object consists of n vectors (or features) which describe properties of the object
    num_objects = in_shapes[1]

    pairs = [] 
    for i in range(num_objects):
        for j in range(num_objects):
            ind = sub2ind((num_objects,num_objects),i,j)
            pairs.append(K.concatenate([x[:,i,:],x[:,j,:]]))

    output = K.stack(pairs,axis=1)

    return output

def sum_objects(x):
    output = K.sum(x,axis=1)
    return output  

In [None]:
conv_shape = K.int_shape(conv)
conv_objects = Reshape((conv_shape[1]**2,conv_shape[3]))(conv)
object_pairs = Lambda(pair_objects,name='object_pair')(conv_objects)
print('object-object pair shape: {}'.format(K.int_shape(object_pairs)))

num_objects = (conv_shape[1]**2)
question_embed = RepeatVector(num_objects**2,name='repeat_q_embed')(lstm)
print('question embeddings shape: {}'.format(K.int_shape(question_embed)))

object_question_pairs = Concatenate(axis=-1,name='object_question_pair')([object_pairs,question_embed])
print('object pair-question shape: {}'.format(K.int_shape(object_question_pairs)))

#### Relational network module


In [None]:
# Alternative approach, using TimeDistributed Layer

shape_in = K.int_shape(object_question_pairs)[1:]

#sequential model 
g_mlp_layers = Sequential()
g_mlp_layers.add(Dense(units=256,activation='relu',name='g_theta_1',input_shape=(None,shape_in[1])))
g_mlp_layers.add(Dense(units=256,activation='relu',name='g_theta_2'))
g_mlp_layers.add(Dense(units=256,activation='relu',name='g_theta_3'))
g_mlp_layers.add(Dense(units=256,activation='relu',name='g_theta_4'))

#apply g_theta MLP to each object-question pair (i.e. row)
g_MLP_obj_q_pairs = TimeDistributed(g_mlp_layers,name='g_theta')(object_question_pairs)
print(g_MLP_obj_q_pairs.shape)

#apply element-wise sum
g_MLP_sum = Lambda(sum_objects,name='g_theta_sum')(g_MLP_obj_q_pairs)
print(g_MLP_sum.shape)

In [None]:
#mlp 2
f_MLP = Dense(units=256,activation='relu',name='f_theta_1')(g_MLP_sum)
f_MLP = Dense(units=256,activation='relu',name='f_theta_2')(f_MLP)
f_MLP = Dropout(rate=0.5,name='dropout')(f_MLP)
rn_out = Dense(units=29,activation='softmax',name='output')(f_MLP)
print(rn_out.shape)

In [None]:
def g_MLP(x):
    '''
    x: object question relations
    '''
    mlp = Dense(units=256,activation='relu',name='g_theta_1')(x)
    mlp = Dense(units=256,activation='relu',name='g_theta_2')(mlp)
    mlp = Dense(units=256,activation='relu',name='g_theta_3')(mlp)
    mlp_out = Dense(units=256,activation='relu',name='g_theta_4')(mlp)
    
    return mlp_out

def g_MLP_layer(x):
    '''
    apply the g_theta MLP to each object-question pair separately, followed by elementwise sum
    the parameters are shared.
    
    x: object question relations
    '''
    
    transformed_pairs = []

    for p in range(K.int_shape(x)[1]):
        transformed_pairs.append(g_MLP(x[:,p,:]))

    #element-wise sum    
    output = Add(name='sum_g_theta')(transformed_pairs)

    return output

In [None]:
##apply the g_theta MLP to each object-question pair separately, this is very slow
#g_MLP_obj_q_pairs = []

#for p in range(K.int_shape(object_question_pairs)[1]):
#    g_MLP_obj_q_pairs.append(g_MLP(object_question_pairs[:,p,:]))

##element-wise sum    
#g_MLP_sum = Add(name='sum_g_theta')(g_MLP_obj_q_pairs)

#print(g_MLP_sum.shape)

#g_MLP = Lambda(g_MLP_layer,name='g_MLP')(object_question_pairs)
#print(g_MLP.shape)

In [None]:
#object_question_pair_input = Input(shape=(K.int_shape(object_question_pairs)[1:]),name='object_question_in')
#mlp 1
#mlp_1 = Dense(units=256,activation='relu',name='g_theta_1')(object_question_pairs)
#mlp_1 = Dense(units=256,activation='relu',name='g_theta_2')(mlp_1)
#mlp_1 = Dense(units=256,activation='relu',name='g_theta_3')(mlp_1)
#mlp_1 = Dense(units=256,activation='relu',name='g_theta_4')(mlp_1)
#print(K.int_shape(mlp_1))

#element-wise sum 
#mlp_1_sum = Lambda(sum_objects,name='sum_objects')(mlp_1)
#print(K.int_shape(mlp_1_sum))

#mlp 2
#mlp_2 = Dense(units=256,activation='relu',name='f_theta_1')(mlp_1_sum)
#mlp_2 = Dense(units=256,activation='relu',name='f_theta_2')(mlp_2)
#mlp_2 = Dropout(rate=0.5)(mlp_2)
#rel_out = Dense(units=29,activation='softmax',name='output')(mlp_2)
#mlp_2 = Dense(units=29,activation='relu',name='f_theta_3')(mlp_2)
#print(K.int_shape(mlp_2))

#rel_out = Dense(units=1,activation='softmax',name='output')(mlp_2)
#print(K.int_shape(rel_out))

#rel_module = Model(object_question_pair_input,rel_out,name='rel_module')
#rel_module.summary()

### Combine modules: vqa network

In [None]:
vqa_model  = Model([cnn_in,question_input],rn_out,name='vqa_model')
vqa_model.summary()

### Preprocess text into sequences
- Tokenize: convert text into integers
- Pad sequences to the same length

In [None]:
#tokenizer = Tokenizer(num_words=1000)
#tokenizer.fit_on_texts(train_questions)

In [None]:
#print(len(tokenizer.word_counts))
#word_counts = tokenizer.word_counts

In [None]:
#(tokenizer.word_index)

In [None]:
#train_questions_sequences = tokenizer.texts_to_sequences(train_questions)

In [None]:
#index = 1100
#print(train_questions[index])
#print(train_questions_sequences[index])

In [None]:
#pad sequences with zeros at the end
#train_sequences = pad_sequences(train_questions_sequences,maxlen=max_sequence_len,padding='post',truncating='post')

In [None]:
#index = np.random.randint(len(train_questions))
#print('Question:')
#print(train_questions[index])
#print('')
#print('Sequence:')
#print(train_questions_sequences[index])
#print('')
#print('Sequence-Padded:')
#print(train_sequences[index])

In [None]:
#def process_text(text,num_words=1000,maxlen=None,return_tokenizer=False):
#    tokenizer = Tokenizer(num_words=1000)
#    tokenizer.fit_on_texts(text)
    
#    sequences = tokenizer.texts_to_sequences(text)
    
#    sequences = pad_sequences(sequences,maxlen=maxlen,padding='post',truncating='post')

#    if return_tokenizer:    
#        return sequences, tokenizer
#    else:
#        return sequences

In [None]:
#train_answers_num = process_text(train_answers)

#valid_sequences = process_text(valid_questions,maxlen=max_sequence_len)
#valid_answers_num = process_text(valid_answers)

#test_sequences = process_text(test_questions,maxlen=max_sequence_len)

In [None]:
#print('Train answers: {}'.format(np.unique(train_answers_num)))
#print('')
#print('Valid: {}'.format(np.unique(valid_sequences)))
#print('')
#print('Valid qs:{}'.format(np.unique(valid_answers_num)))
#print('')

#print('Test: {}'.format(np.unique(test_sequences)))

In [None]:
#save sequences 
#clear memory
del train_qs,valid_qs,test_qs,train_questions,valid_questions,test_questions

#load_train = np.load('training.npz')
#load_valid = np.load('valid.npz')
#load_test = np.load('testing.npz')

#train_answers_num = to_categorical(train_answers_num,num_classes=29)
#valid_answers_num = to_categorical(valid_answers_num,num_classes=29)

#np.savez_compressed('train_data',question_sequences=train_sequences,
#                    answers=train_answers_num,image_file=train_img_fnames)
#np.savez_compressed('valid_data',question_sequences=valid_sequences,
#                    answers=valid_answers_num,image_file=valid_img_fnames)
#np.savez_compressed('test_data',question_sequences=test_sequences,
#                    image_file=test_img_fnames)

### Process images
Here, what I'll do is generate a list of images and save that. this might take a long time and potentially run out of memory. Let's see `\_('_')_/`

Yeah, that's not going to work. Instead create a data generator for the images. 

- preprocessing:
    - downsample to 128 x 128
    - remove 4th channel (alpha channel) i.e. convert to RGB
    - rescale (divide by 255)

In [None]:
img_size = (128,128) 
img_file = train_qs['questions'][index]['image_filename'].encode('utf-8')
this_img = Image.open(os.path.join(train_images_dir,img_file))
this_img = np.asarray(this_img.convert('RGB').resize(img_size,Image.ANTIALIAS))/255. #convert to RGB, resize, and numpy array
plt.imshow(this_img)
plt.axis('off')

In [None]:
def load_img(path,new_size=(128,128)):
    '''
    Load image, convert to RGB, & resize
    '''
    img = Image.open(path)
    img = img.convert('RGB').resize(new_size,Image.ANTIALIAS)
    
    return img

def pad_img(img_in,new_size=(136,136)):
    '''
    Apply zero padding to PIL image object

    img_in: PIL image object
    new_size: (width,height) to be consistent with PIL Image objects
    '''
    
    w,h = img_in.size[0],img_in.size[1] #returns (width,height) tuple
    
    h_pad = np.abs(h - new_size[0]) // 2
    w_pad = np.abs(w - new_size[1]) // 2
    
    new_img = Image.new('RGB',new_size)
    
    #4 element tuple defining the left, upper, right, and lower pixel coordinate
    coordinates = (w_pad, h_pad, w_pad+w, h_pad+h)
    new_img.paste(img_in,coordinates)
    
    return new_img
    
def random_crop(img_in,crop_size=(128,128),seed=None):
    '''
    Randomly crop PIL image object
    Assumes 'channels last' data format. 
    
    img_in: PIL image object
    crop_size: (width,height) to be consistent with PIL Image objects
    seed  : random seed
    '''
    
    np.random.seed(seed)

    w,h = img_in.size[0],img_in.size[1]
    
    w_range = np.abs(w - crop_size[0]) // 2
    h_range = np.abs(h - crop_size[1]) // 2
    
    w_offset = 0 if w_range == 0 else np.random.randint(w_range)
    h_offset = 0 if h_range == 0 else np.random.randint(h_range)
    
    w_start,w_end = w_offset, w_offset + crop_size[0]
    h_start,h_end = h_offset, h_offset + crop_size[1]

    #4 element tuple defining the left, upper, right, and lower pixel coordinate
    coordinates = (w_start,h_start,w_end,h_end)
    
    img_out = img_in.crop(coordinates)
    
    return img_out
    
def random_rotate_img(img_in,rotation_range=(-2.86,2.86)):
    '''
    Apply random rotation from -0.05 to 0.05 radians (-2.86 to 2.86 degrees)
    
    img_in: PIL image object
    rotation_range: start and end angle (in degrees) for rotation
    '''
    
    angle = (rotation_range[1] - rotation_range[0])*np.random.rand() - rotation_range[0]
    img_out = img_in.rotate(angle,Image.BILINEAR)
    
    return img_out


def pil_img_to_array(img_in,rescale=True):
    '''
    Conver PIL image to numpy array
    '''
    img_out = np.asarray(img_in)
    
    if rescale:
        img_out = img_out / 255.

    return img_out

def process_images(img, augment=True):
    if augment:
        img = pad_img(img)
        img = random_crop(img)
        img = random_rotate_img(img)
    
    img_out = pil_img_to_array(img)
    
    return img_out

In [None]:
#check image transformation functions
index = np.random.randint(len(train_img_fnames))
img = load_img(os.path.join(train_images_dir,train_img_fnames[index]))
print(img.size)
plt.figure()
plt.imshow(pil_img_to_array(img))
plt.title('image: '+ train_img_fnames[index])
plt.axis('off')

img = pad_img(img)
print(img.size)
plt.figure()
plt.imshow(pil_img_to_array(img))
plt.title('padding: '+ train_img_fnames[index])
plt.axis('off')

img = random_crop(img)
print(img.size)
plt.figure()
plt.imshow(pil_img_to_array(img))
plt.title('random_crop: '+ train_img_fnames[index])
plt.axis('off')

img = random_rotate_img(img)
print(img.size)
plt.figure()
plt.imshow(pil_img_to_array(img))
plt.title('random_rotate: '+ train_img_fnames[index])
plt.axis('off')

In [None]:
img = load_img(os.path.join(train_images_dir,train_img_fnames[index]))
img = process_images(img)
print(img.shape)
plt.imshow(img)
plt.axis('off')

## Create Data Generator Class

In [None]:
#generator class
class VQDataGenerator(object):
    '''
    Generate data for model i.e. batch of questions and images
    '''
    
    def __init__(self,n=100,dim_x_img=128,dim_y_img=128,img_dir=None,sequence_len=213,num_classes=29,batch_size=32,shuffle=True,augment=True):
        '''
        Initialize class
        '''
        self.n = n
        self.dim_x_img = dim_x_img
        self.dim_y_img = dim_y_img
        self.img_dir = img_dir
        self.sequence_len = sequence_len
        self.num_classes = num_classes
        self.batch_size = batch_size
        self.shuffle = shuffle
        self.augment = augment
        
    def __get_data_order(self):
        '''
        get indices i.e. order of samples
        '''
        indexes = np.arange(self.n)
        if self.shuffle == True:
            np.random.shuffle(indexes)
        return indexes
            
    def __data_generation(self,batch_inds,data_dict):
        '''
        Generate batches of data of batch_size:   
        '''       
        
        batch_imgs = np.empty((self.batch_size,self.dim_y_img,self.dim_x_img,3))
        batch_qtns = np.empty((self.batch_size,self.sequence_len),dtype=int)
        batch_anrs = np.empty((self.batch_size,self.num_classes),dtype=int)
        
        for i,val in enumerate(batch_inds):
            this_img_file = os.path.join(self.img_dir,data_dict['image_file'][val])
            batch_imgs[i,:,:,:] = process_images(load_img(this_img_file),self.augment)
            batch_qtns[i,:] = data_dict['question_sequences'][val,:]
            batch_anrs[i,:] = data_dict['answers'][val]
            
        return batch_imgs,batch_qtns,batch_anrs
        
    def generate(self,data_dict):
        '''
        Generate batches of samples
        '''        
        while True:
            #Generate order of dataset
            indexes = self.__get_data_order()
            
            #Generate batches
            imax = len(indexes) // self.batch_size
            for i in range(imax):
                temp_ids = [k for k in indexes[i*self.batch_size:(i+1)*self.batch_size]]
                #generate data
                batch_images,batch_questions,batch_answers = self.__data_generation(temp_ids,data_dict)
                
                X = [batch_images,batch_questions]
                y = batch_answers
                
                yield (X,y)

In [None]:
#load data files
train_data = np.load('train_data.npz')
valid_data = np.load('valid_data.npz')
test_data = np.load('test_data.npz')

In [None]:
n_train_samples = len(train_data['image_file'])
n_valid_samples = len(valid_data['image_file'])
n_test_samples = len(test_data['image_file'])

In [None]:
params_train = {'n':n_train_samples,
                'dim_x_img': 128,
                'dim_y_img': 128,
                'img_dir': train_images_dir,
                'sequence_len': 213,
                'num_classes': 29,
                'batch_size': 32,
                'shuffle': True,
                'augment': True}

params_valid = {'n': n_valid_samples,
                'dim_x_img': 128,
                'dim_y_img': 128,
                'img_dir': valid_images_dir,
                'sequence_len': 213,
                'num_classes': 29,
                'batch_size': 32,
                'shuffle': True,
                'augment': False}

train_generator = VQDataGenerator(**params_train).generate(train_data)
valid_generator = VQDataGenerator(**params_valid).generate(valid_data)

In [None]:
#call next iteration i.e. batch
gen_flow = valid_generator.next()
batch_imgs,batch_questions = gen_flow[0][0], gen_flow[0][1]
batch_answers = gen_flow[1]

f,axarrs = plt.subplots(2,2,figsize=(20,8))
axs = axarrs.ravel()

for a in range(len(axs)):
    axs[a].imshow(batch_imgs[a])
    axs[a].set_title(str(a+1))
    axs[a].axis('off')
    print(str(a+1))
    print(batch_questions[a,:])
    print(batch_answers[a,:])
    print('')

print(batch_imgs.shape)
print(batch_answers.shape)
print(batch_questions.shape)

In [None]:
gen_flow =train_generator.next()
batch_imgs,batch_questions = gen_flow[0][0], gen_flow[0][1]
batch_answers = gen_flow[1]

f,axarrs = plt.subplots(2,2,figsize=(20,8))
axs = axarrs.ravel()

for a in range(len(axs)):
    axs[a].imshow(batch_imgs[a])
    axs[a].set_title(str(a+1))
    axs[a].axis('off')
    print(str(a+1))
    print(batch_questions[a,:])
    print(batch_answers[a,:])
    print('')

print(batch_imgs.shape)
print(batch_answers.shape)
print(batch_questions.shape)

In [None]:
del batch_imgs, batch_questions, batch_answers, gen_flow, train_generator, valid_generator

## Fit model

In [None]:
checkpt_dir = '/home/odenigborig/Github/relational_reasoning/models/'
checkpt_file = checkpt_dir+'vqa_model.h5'
log_dir = '/home/odenigborig/Github/relational_reasoning/log_dir/'

In [None]:
batch_size = 32 #64 batches used in the paper
epochs = 1e5 #1.4e6 #in the paper they went to a 1.4 million!!
patience = epochs
lr = 2.5e-4/float(64/float(batch_size)) #divide by ratio of batch size. paper used batch size of 64, hence divide by 4


callbacks_list = [ModelCheckpoint(filepath=checkpt_file,monitor='val_loss',save_best_only=True),
                  ReduceLROnPlateau(monitor='val_loss',factor=0.1,patience=patience)]
#                 EarlyStopping(monitor='acc',patience=patience),
#                 TensorBoard(log_dir=log_dir,batch_size=batch_size,histogram_freq=20,embeddings_freq=20)]

#authors used cross entropy loss function 
vqa_model  = Model([cnn_in,question_input],rn_out,name='vqa_model')
vqa_model.compile(Adam(lr=lr),loss='categorical_crossentropy',metrics=['acc'])


In [None]:
params_train = {'n':n_train_samples,
                'dim_x_img': 128,
                'dim_y_img': 128,
                'img_dir': train_images_dir,
                'sequence_len': max_sequence_len,
                'num_classes': 29,
                'batch_size': batch_size,
                'shuffle': True,
                'augment': True}

params_valid = {'n': n_valid_samples,
                'dim_x_img': 128,
                'dim_y_img': 128,
                'img_dir': valid_images_dir,
                'sequence_len': max_sequence_len,
                'num_classes': 29,
                'batch_size': batch_size,
                'shuffle': True,
                'augment': False}

train_generator = VQDataGenerator(**params_train).generate(train_data)
valid_generator = VQDataGenerator(**params_valid).generate(valid_data)

In [None]:
history = vqa_model.fit_generator(train_generator,epochs=epochs,verbose=1,
                                  callbacks=callbacks_list,
                                  steps_per_epoch=n_train_samples//batch_size,
                                  validation_data=valid_generator,
                                  validation_steps=n_valid_samples//batch_size)
                                  #use_multiprocessing=True,workers=2)

In [None]:
def plot_loss(history_model):
    f,(ax1,ax2)=plt.subplots(2,1,sharex=True)
    ax1.plot(history_model.history['val_loss'],label='valid')
    ax1.plot(history_model.history['loss'],label='train')
    ax2.plot(history_model.history['val_acc'],label='valid')
    ax2.plot(history_model.history['acc'],label='train')

    ax1.legend()
    ax2.legend()
    plt.xlabel('Epochs')
    ax1.set_ylabel('Loss')
    ax2.set_ylabel('Accuracy')

def print_score(model,X,angle,y):
    score = model.evaluate([X, angle], y, verbose=1)
    print('')
    print('Hold out score:', score[0])
    print('Hold out accuracy(%):', 100*score[1])


In [None]:
plot_loss(history)

## To Do:

<s>1. image preprocessing/data augmentation:
    - preprocessing:
        - downsample to 128 x 128
        - remove 4th channel (might be blank?)
    - augmentations:
        - pad to 136 x 136
        - random cropping back to to 128 x 128
        - random rotation from -0.05 to 0.05 radians (-2.86 to 2.86 degrees)
2. text preprocessing:
    - tokenize (convert to integers)
    - make the same length
3. write data generator to apply image augmentations and pair questions with images
    - makes more sense to start with questions, and find corresponding image. </s>