## The Directory Structure
```
Project
|-- datasets
|   |-- dev_set
|   |-- test
|   |-- test_set
|   |-- train
|   `-- train_set
|-- model
|-- pretrained-model
|-- submissions
|-- datalab.py
|-- dataset_clusterer.py
|-- make_file.py
|-- model.py
|-- vgg16.py
|-- predict.py
|-- test.py
`-- train.py
```

**Code with proper differentiation in scripts can be found [here](https://github.com/piyush2896/Transfer-Learning-Vgg-16)**

## Preprocessing and Batches Creation
Unfortunately we cannot directly feed data into the model as it is huge for machine and is of variable size.

Making each image of size (224, 224, 3) and creating batches of the images. Run [dataset_clusterer.py](https://github.com/piyush2896/Transfer-Learning-Vgg-16/blob/master/dataset_clusterer.py) to get through it

## VGG-16 Pretrained
I used [vgg-16](http://arxiv.org/abs/1409.1556.pdf) pretrained model and fine-tuned the last layer. The checkpoint file used in code of [vgg16.py](https://github.com/piyush2896/Transfer-Learning-Vgg-16/blob/master/vgg16.py) can be downloaded from [here](http://download.tensorflow.org/models/vgg_16_2016_08_28.tar.gz)

## Generators as Pipeline
I created a pipeline to feed data into model. This ensured that I donot load the complete dataset in the memory. There are two generators 
1. DataLabTrain - used for train and dev set
2. DataLabTest - used for test set

They simply load the batches made by dataset_clusterer.py and make them available using a generator method. Code can be found in [datalab.py](https://github.com/piyush2896/Transfer-Learning-Vgg-16/blob/master/datalab.py).

## Training time
I just trained for 1 epoc and got a dev set loss of around ~0.04. And a test set loss at kaggle of 0.08426. The code of train.py is displaye below

In [None]:
import tensorflow as tf
from vgg16 import vgg16
import numpy as np
import os
from datalab import DataLabTrain

In [None]:
def train(n_iters):
    model, params = vgg16(fine_tune_last=True, n_classes=2)
    X = model['input']
    Z = model['out']
    Y = tf.placeholder(dtype=tf.float32, shape=[None, 2])
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=Z[:, 0, 0, :], labels=Y))
    train_step = tf.train.AdamOptimizer(1e-4).minimize(loss)
    saver = tf.train.Saver()

    with tf.Session() as sess:
        try:
            sess.run(tf.global_variables_initializer())
            for i in range(n_iters):
                dl = DataLabTrain('./datasets/train_set/')
                train_gen = dl.generator()
                dev_gen = DataLabTrain('./datasets/dev_set/').generator()
                for X_train, Y_train in train_gen:
                    print('Samples seen: '.format(dl.cur_index), end='\r')
                    sess.run(train_step, feed_dict={X: X_train, Y: Y_train})
                print()
                l = 0
                count = 0
                for X_test, Y_test in dev_gen:
                    count += 1
                    l += sess.run(loss, feed_dict={X: X_test, Y: Y_test})

                print('Epoch: {}\tLoss: {}'.format(i, l/count))
                saver.save(sess, './model/vgg16-dog-vs-cat.ckpt')
                print("Model Saved")

        finally:
            sess.close()

In [None]:
train(n_iters=1)

## Prediction Time
The training script saves the model in "model" folder which can be restored and used for prediction. Code of predict.py is displayed below

In [None]:
from make_file import make_sub


def predict(model_path, batch_size):
    model, params = vgg16(fine_tune_last=True, n_classes=2)
    X = model['input']
    Y_hat = tf.nn.softmax(model['out'])

    saver = tf.train.Saver()

    dl_test = DataLabTest('./datasets/test_set/')
    test_gen = dl_test.generator()

    Y = []
    with tf.Session() as sess:
        saver.restore(sess, model_path)
        for i in range(12500//batch_size+1):
            y = sess.run(Y_hat, feed_dict={X: next(test_gen)})
            #print(y.shape, end='   ')
            Y.append(y[:,0, 0, 1])
            print('Complete: {}%'.format(round(len(Y) / dl_test.max_len * 100, 2)), end='\r')
    Y = np.concatenate(Y)

    print()
    print('Total Predictions: '.format(Y.shape))
    return Y

Y = predict('./model/vgg16-dog-vs-cat.ckpt', 16)
np.save('out.npy', Y)
make_sub('sub_1.csv')

[make_file.py](https://github.com/piyush2896/Transfer-Learning-Vgg-16/blob/master/make_file.py) is a helper script used to create submission file.