# Encoder - finetuned version

In this notebook, I retrained the VGG16 convolution net on the Flickr dataset. 

## Connecting notebook to google drive
First, we connect this notebook to google drive. This was done to make it easier to fetch and write files. If files were in local, then you had to upload them every time a new session started. 

Pydrive library handles the authentication and connection to your google drive. 

In [None]:
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

In [None]:
# Mount the drive on to google colab system
from google.colab import drive
drive.mount('/content/drive')

In [None]:
#Go to the directory containing the dataset.
%cd "/content/drive/My Drive/fine_tune_myself"

/content/drive/My Drive/fine_tune_myself


## Helper Functions

In this section, we go through the functions used for preparing the data and model.

The Flickr8K Dataset comes with files with image identifiers for training and testing images. Each identifier is in its own line. We can use it to automatically fetch files. We use it to create dataset referring to the actual image data.

Also, as explained in the documentation, finetuning consists of the following process:

1. Modify the VGG16 architecture with your own classification layer and initialize its weights by one epoch of training. In our case, I used the reduced vocabulary as a set of labels. The reduced vocabulary is consisting of 300 words. In the get_targets.py file, we use the captions to marks the words occuring as labels. All the words for an image coming in all 5 captions that occur in the reduced vocabulary are marked 1. The rest are 0.

2. We train and retrain the modified VGG16's last convolution layer. We take the history of the model to feed it as an initial point to the next level of training. The history object keeps track of the metrics recorded in a training cycle.




In [None]:
# -*- coding: utf-8 -*-
"""
Created on Sat Feb 23 21:38:45 2019

@author: ghrit
"""
from keras.utils import plot_model
from keras.optimizers import SGD
from os import listdir
import numpy as np
from keras.applications.vgg16 import VGG16
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.applications.vgg16 import preprocess_input
from keras.layers import Dense,Flatten,Dropout
from keras.models import Sequential, Model
from pickle import load
from keras.layers import Input


# This function is simply used to read a file and return the content.
def load_doc(filename):
    # open the file as read only
    file = open(filename, 'r')
    # read all text
    text = file.read()
    # close the file
    file.close()
    return text
 
# load a pre-defined list of photo identifiers
def load_set(filename):
    doc = load_doc(filename) #open file with identifiers.
    dataset = list()
    # process line by line
    for line in doc.split('\n'):
        # skip empty lines
        if len(line) < 1:
            continue
        # get the image identifier
        identifier = line.split('.')[0]  #remove the 'jpg' extension
        dataset.append(identifier)
    return list(set(dataset))


# The targets were created by me for use as classification labels
# To know more see get_targets.py
def load_targets(fn,dset):
    #unpickle the python dict containing the id:target mapping
    targets = load(open(fn, 'rb'))
    # dset is the collection of image identifiers.
    targets = {k: targets[k] for k in dset} # only add targets of train or test dataset.
    # returns a dict with mapping of image id to target vector
    return targets 

# generator function is used to feed the VGG16 an example. It is called when compiling the predefined VGG16 keras model.
def tr_genr(directory, targets,dset):
    # Run loop indefinitely
    while 1:
        # loop through all the images in the training or testing dataset
        for name in dset:
            # Add the extension to complete the file name
            filename = directory + '/' + name + ".jpg"
            # Use keras load_image function to get image in PIL format
            image = load_img(filename, target_size=(224, 224))
            # convert from PIL to a numpy array of pixels
            image = img_to_array(image)
            # Reshape numpy array in the format required by VGG16
            image= image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
            image= preprocess_input(image)
            # get the target vector for the image and reshape as required by keras layer
            y = np.array(targets[name]).reshape(1,300)
            x = np.array(image)
            
            # Generate an example (pair of features,labels) for the model to train on.
            yield [x,y]

# This function is used to initialize a modified VGG16 model for finetuning.
# We train it for one epoch to initialize the parameters of the customized layer.
# This is not the final training and is run only once.
def get_initialized_vgg(gen, old_model):
    
    # remove the final layer which is based on the imagenet classification labels
    old_model.layers.pop()
    
    # Take the output of the remaining model
    x= old_model.layers[-1].output
    
    # Replace with layer based on classification labels for our dataset captions (length 300)
    final = Dense(300, activation='sigmoid')(x)
    
    #Define model inputs and outputs
    model = Model(inputs= old_model.input, outputs= final)
    
    # save the updated model architecture
    plot_model(model, to_file='vgg2model.png',show_shapes=True)
    
    # We only train the final layer we added so as to use it as a representation of images
    # freeze pre-trained model area's layer. We do not want to mess with the actual model. 
    for layer in old_model.layers:
        layer.trainable = False

    # Define model loss and optimizer
    model.compile(optimizer='rmsprop', loss='binary_crossentropy')
    
    # Train the model using the generator defined before. Steps per epoch are equal to number of example images
    model.fit_generator(gen,steps_per_epoch=6000)
    return model

# This function trains and evaluates the CNN model.
def train_vgg(gen,te_gen,model):
    # set the first 25 layers (up to the last conv block)
    # to non-trainable (weights will not be updated)
    for layer in model.layers[:21]:
        layer.trainable = False
    for layer in model.layers[21:]:
        layer.trainable = True
    
    # we use a Stochastic gradient descent as optimizer.
    sgd = SGD(lr=1e-3, decay=1e-6, momentum=0.9)
    
    # we use categorical crossentropy because we have multiple labels for each image.
    model.compile(optimizer=sgd, loss='binary_crossentropy',  
                  metrics=['binary_accuracy','categorical_accuracy'])
    
    # We keep track of the metrics using the history callback provided by keras.
    history = model.fit_generator(gen,steps_per_epoch=6000, epochs=5,
                                  validation_data=te_gen,validation_steps=1000)
    return model  ,history  



Using TensorFlow backend.


## Put it all together

The images are stored in my drive so we specify that directory. We also specify the shape of the input layer required by VGG16 model. Then, we initialize the existing VGG16 model provided by keras with imagenet trained weights. We can also use an untrained VGG model. However, the point of finetuning is to take advantage of the existing model by extending it with your dataset. Retraining VGG would require a huge dataset for good results and we do not have required resources.


We also get the training and testing generators ready with the appropriate images and targets. This is commented because we only need to execute this code once.

In [None]:
directory='/content/drive/My Drive/Multi-label-Inception-net/image/images'
input_tensor = Input(shape=(224,224,3))
vggmodel = VGG16(weights='imagenet', include_top=True,input_tensor=input_tensor)
plot_model(vggmodel,to_file='vgg_orig.png',show_shapes=True)
'''
train = load_set('Flickr_8k.trainImages.txt')
tr_targets=load_targets("targets300.pkl",train[:])
tr_gen=tr_genr(directory,tr_targets,train[:])

test =load_set('Flickr_8k.testImages.txt')
te_targets=load_targets("targets300.pkl",test)
te_gen=tr_genr('/content/drive/My Drive/Multi-label-Inception-net/testim',te_targets,test)'''

Instructions for updating:
Colocations handled automatically by placer.
Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5


'\ntrain = load_set(\'Flickr_8k.trainImages.txt\')\ntr_targets=load_targets("targets300.pkl",train[:])\ntr_gen=tr_genr(directory,tr_targets,train[:])\n\ntest =load_set(\'Flickr_8k.testImages.txt\')\nte_targets=load_targets("targets300.pkl",test)\nte_gen=tr_genr(\'/content/drive/My Drive/Multi-label-Inception-net/testim\',te_targets,test)'

### Initializing VGG16

We create a VGG16 CNN model with weights initialized by training once on our dataset. Initial loss is **0.3159**.

In [None]:
model = get_initialized_vgg(tr_gen, vggmodel)


Epoch 1/1


### Training Encoder

Now, we finally are at the point of finetuning the VGG16 net on our data. We continue to train the updated model for 5 epochs at a time. The evaluation is also handled by the model, for which we provide the test generator as well. Model is passed between each training session. We get the history which we use for looking at the training metrics.

We save the model after every training session to keep a backup.

In [None]:
model,hist= train_vgg(tr_gen,te_gen,model)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


As we can see, after 5 iterations, loss has come down to **0.0623**. Training accuracy has increased for binary and categorical to **0.98 and 0.45**. For validation sets, they are **.96 and 0.38**. This is pretty good. 

In [None]:
model.save("ftv3004096_1.h5")

In [None]:
import keras
print(keras.__version__)

2.2.4


In [None]:
model,hist= train_vgg(tr_gen,te_gen,model)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


As we can see, after 10 iterations, loss has come down to **0.0388**. Training accuracy has increased for binary and categorical to **0.986**. For validation sets, they are **.97**. categorical accuracy has reduced a little.

In [None]:
model.save("ftv3004096_2.h5")

In [None]:
model,hist= train_vgg(tr_gen,te_gen,model)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


As we can see, after 15 iterations, loss has come down to **0.025**. Training accuracy has increased for binary to **0.994**. Other accuracy metrics have remained more or less same.

In [None]:
model.save("ftv3004096_3.h5")

In [None]:
model,hist= train_vgg(tr_gen,te_gen,model)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


I stop after 3 training sessions to avoid overtraining as test accuracy is not improving.

## Conclusion

We have fine tuned the default VGG16 on the Flickr8k dataset which seems to improve its performance. The real test is performance improvement of the overall image captioning system which tells us if this exercise has benefitted the system.