## 1. Import Libraries
* pickle - Used for importing created features
* numpy - Used for working with arrays
* TensorFlow - For creating deep neural network graphs and later processing them
* Keras ImageDataGenerator - Used for randomly changing input data for more robust learning

In [1]:
import pickle
import numpy as np
import tensorflow as tf
from keras.preprocessing.image import ImageDataGenerator

Using TensorFlow backend.


## 2. Import the data
The data is already pre-processed and we are just using mixture of horizontal and vertical data. This has provided me better classification result when compared against using only horizontal or vartical. Also, the performance is comparable (but not better :/) if we combine horizontal, verticle and summed horizontal and vertical images. But it reduces the size of our network. 
* provide the path of pickle files
* Load and convert the data into numpy arrays/matrices

In [2]:
pickle_train = open("train.pickle","rb")
pickle_valid = open("valid.pickle","rb")

trainX = pickle.load(pickle_train)
validX = pickle.load(pickle_valid)

X_train = np.array(trainX['xtrain'],dtype=np.float32)
X_valid = np.array(validX['xvalid'],dtype=np.float32)

y_train = np.array(trainX['ytrain'],dtype=np.int32)
y_valid = np.array(validX['yvalid'],dtype=np.int32)

datagen = ImageDataGenerator(
    rotation_range=20,  # randomly rotate images in the range (degrees, 0 to 180)
    width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
    height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
    horizontal_flip=True,  # randomly flip images
    vertical_flip=False)  # randomly flip images
    
datagen.fit(X_train.reshape([-1,75,75,1]))

## 3. Global Variables
Create some global variables/parameters used by our network.
* image_size - size of images, here 75 x 75
* n_class - one hot encoding of iceberg or ship
* batch_size - size of the batch which will be supplied to train our network. Using 32 as it create a medium sized tensors but if I use a bigger batch size (62, 128 etc), the size of tensors get large and problematic to train on my laptop.
* epocs - No. of times the network sees the whole data

In [3]:
image_size = 75
n_class = 2
batch_size = 32
epochs = 500

## 4. Design the Graph
----------------------------
### 4.1 Methods
Some helpful functions which will be repeatedly used while creating the graphs:
* weight_variable: initializes weights. Using Xavier Initialization as provides better starting weights than initializing with other techniques and results in faster converging solutions.
    * inputs shape of tensor and name for the variable
* bias_variable: creates bias variable. Initialize it with small constant weight.
    * inputs size based on weight and name of the bias
* conv2d: creates a 2D convolutional layer with stride as [1,1,1,1] and padding as same
    * inputs the tensor (original or from previous layer) and weights of the layer.
* max_pool: max pooling layer with a 3 x 3 window size. Helpful in reducing the dimension.
    * inputs the tensor (original or from previous layer)

In [4]:
def weight_variable(shape,nm):
  initial = tf.contrib.layers.xavier_initializer()
  return tf.get_variable(nm,shape=shape,initializer=initial)

def bias_variable(shape,nm):
  initial = tf.constant(0.1, shape=shape,name=nm)
  return tf.Variable(initial)

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool(x,stride):
  return tf.nn.max_pool(x, ksize=[1, 3, 3, 1],
                        strides=stride, padding='SAME')

### 4.2 Input Placeholders
Create placeholders for passing input and output while processing the graph at runtime. 'None' helps in creating dynamic shapes which gets initialized during runtime. It will depend on the size of input. So, while training our batch is 32, x will be a tensor of shape [32,75,75,1] and [32,2] for y_.

In [5]:
x = tf.placeholder(tf.float32, shape=[None, image_size, image_size, 1])
y_ = tf.placeholder(tf.float32, shape=[None, n_class])

### 4.3 Hidden Layer 1 
In first layer a window of 5X5 is convoluted over the input image. The first step gives us output of shape of [32,75,75,32] which is then passed into a relu activation function. After that we perform max pool across 3x3 window and this gives us [32,25,25,32]. h_pool1 is the output of the first convolution layer which will passed to another convolutional layer.

In [6]:
W_conv1 = weight_variable([5, 5, 1, 32],"W1")
b_conv1 = bias_variable([32],"b1")

h_conv1 = tf.nn.relu(conv2d(x, W_conv1) + b_conv1)
h_pool1 = max_pool(h_conv1,[1,3,3,1])

### 4.4 Hidden Layer 2 
Again a window of 5X5 is convoluted over the input image. This step gives us output of shape of [32,25,25,64] which is then passed into a relu activation function. After that we perform max pool across 3x3 window and this gives us [32,9,9,64]. h_pool2 is passed into the inception layer.
> Replacing this layer by another inception module

In [7]:
W_conv2 = weight_variable([5,5,32,64],'W2')
b_conv2 = bias_variable([64],'b2')

h_conv2 = tf.nn.relu(conv2d(h_pool1,W_conv2) + b_conv2)
h_pool2 = max_pool(h_conv2,[1,3,3,1])

In [8]:
# W_iconv1_1x1_1 = weight_variable([1,1,32,16],'W_i1c1')
# b_iconv1_1x1_1 = bias_variable([16],'b_i1c1')

# W_iconv1_1x1_2 = weight_variable([1,1,32,8],'W_i1c2')
# b_iconv1_1x1_2 = bias_variable([8],'b_i1c2')

# W_iconv1_1x1_3 = weight_variable([1,1,32,8],'W_i1c3')
# b_iconv1_1x1_3 = bias_variable([8],'b_i1c3')

# W_iconv1_3x3 = weight_variable([3,3,8,16],'W_i1c4')
# b_iconv1_3x3 = bias_variable([16],'b_i1c4')

# W_iconv1_5x5 = weight_variable([5,5,8,16],'W_i1c5')
# b_iconv1_5x5 = bias_variable([16],'b_i1c5')

# W_iconv1_1x1_4 = weight_variable([1,1,32,16],'W_i1c6')
# b_iconv1_1x1_4 = bias_variable([16],'b_i1c6')

# h_conv1_1x1_1 = conv2d(h_pool1,W_iconv1_1x1_1) + b_iconv1_1x1_1
# h_conv1_1x1_2 = tf.nn.relu(conv2d(h_pool1,W_iconv1_1x1_2)+b_iconv1_1x1_2)
# h_conv1_1x1_3 = tf.nn.relu(conv2d(h_pool1,W_iconv1_1x1_3)+b_iconv1_1x1_3)
# h_conv1_3x3 = conv2d(h_conv1_1x1_2,W_iconv1_3x3)+b_iconv1_3x3
# h_conv1_5x5 = conv2d(h_conv1_1x1_3,W_iconv1_5x5)+b_iconv1_5x5
# h_maxpool1 = max_pool(h_pool1,[1,1,1,1])
# h_conv1_1x1_4 = conv2d(h_maxpool1,W_iconv1_1x1_4)+ b_iconv1_1x1_4
 
# h_pool_incep = max_pool(tf.concat([h_conv1_1x1_1,h_conv1_3x3,h_conv1_5x5,h_conv1_1x1_4],3),[1,3,3,1])
# inception = tf.nn.relu(h_pool_incep)

### 4.5 Hidden Layer 3
Lets add a layer from Inception model used by google. The model consists of 4 1x1 convolution layer, a 3x3 and a 5x5 convolution layer. The first 3 1x1 layer take the input from the previous convoluted layer. The 3x3 and 5x5 layers take input from 2 of the 1x1 convolution layers. The remaining 1x1 conv layer is get input from the  max pool of the previous layer. Finally, the ouput from the layers are combined, then passed through a max pool and relu. The output from this inception module is flattened and passed in the next layer.

In [9]:
W_iconv_1x1_1 = weight_variable([1,1,64,64],'W_ic1')
b_iconv_1x1_1 = bias_variable([64],'b_ic1')

W_iconv_1x1_2 = weight_variable([1,1,64,32],'W_ic2')
b_iconv_1x1_2 = bias_variable([32],'b_ic2')

W_iconv_1x1_3 = weight_variable([1,1,64,32],'W_ic3')
b_iconv_1x1_3 = bias_variable([32],'b_ic3')

W_iconv_3x3 = weight_variable([3,3,32,64],'W_ic4')
b_iconv_3x3 = bias_variable([64],'b_ic4')

W_iconv_5x5 = weight_variable([5,5,32,64],'W_ic5')
b_iconv_5x5 = bias_variable([64],'b_ic5')

W_iconv_1x1_4 = weight_variable([1,1,32,64],'W_ic6')
b_iconv_1x1_4 = bias_variable([64],'b_ic6')

h_conv1_1x1_1 = conv2d(h_pool2,W_iconv_1x1_1) + b_iconv_1x1_1
h_conv1_1x1_2 = tf.nn.relu(conv2d(h_pool2,W_iconv_1x1_2)+b_iconv_1x1_2)
h_conv1_1x1_3 = tf.nn.relu(conv2d(h_pool2,W_iconv_1x1_3)+b_iconv_1x1_3)
h_conv1_3x3 = conv2d(h_conv1_1x1_2,W_iconv_3x3)+b_iconv_3x3
h_conv1_5x5 = conv2d(h_conv1_1x1_3,W_iconv_5x5)+b_iconv_5x5
h_maxpool1 = max_pool(h_pool1,[1,3,3,1])
h_conv1_1x1_4 = conv2d(h_maxpool1,W_iconv_1x1_4)+ b_iconv_1x1_4
 
#concatenate all the feature maps and then pass through max pool and relu.
h_pool_incep = tf.concat([h_conv1_1x1_1,h_conv1_3x3,h_conv1_5x5,h_conv1_1x1_4],3)#max_pool(tf.concat([h_conv1_1x1_1,h_conv1_3x3,h_conv1_5x5,h_conv1_1x1_4],3),[1,3,3,1])
inception = tf.nn.relu(h_pool_incep)

h_fc_flat = tf.reshape(inception,[-1,9*9*256])

In [10]:
# W_i2conv_1x1_1 = weight_variable([1,1,64,64],'W_i2c1')
# b_i2conv_1x1_1 = bias_variable([64],'b_i2c1')

# W_i2conv_1x1_2 = weight_variable([1,1,64,32],'W_i2c2')
# b_i2conv_1x1_2 = bias_variable([32],'b_i2c2')

# W_i2conv_1x1_3 = weight_variable([1,1,64,32],'W_i2c3')
# b_i2conv_1x1_3 = bias_variable([32],'b_i2c3')

# W_i2conv_3x3 = weight_variable([3,3,32,64],'W_i2c4')
# b_i2conv_3x3 = bias_variable([64],'b_i2c4')

# W_i2conv_5x5 = weight_variable([5,5,32,64],'W_i2c5')
# b_i2conv_5x5 = bias_variable([64],'b_i2c5')

# W_i2conv_1x1_4 = weight_variable([1,1,64,64],'W_i2c6')
# b_i2conv_1x1_4 = bias_variable([64],'b_i2c6')

# h_i2conv1_1x1_1 = conv2d(inception,W_i2conv_1x1_1) + b_i2conv_1x1_1
# h_i2conv1_1x1_2 = tf.nn.relu(conv2d(inception,W_i2conv_1x1_2)+b_i2conv_1x1_2)
# h_i2conv1_1x1_3 = tf.nn.relu(conv2d(inception,W_i2conv_1x1_3)+b_i2conv_1x1_3)
# h_i2conv1_3x3 = conv2d(h_i2conv1_1x1_2,W_i2conv_3x3)+b_i2conv_3x3
# h_i2conv1_5x5 = conv2d(h_i2conv1_1x1_3,W_i2conv_5x5)+b_i2conv_5x5
# h_i2maxpool1 = max_pool(inception,[1,1,1,1])
# h_i2conv1_1x1_4 = conv2d(h_i2maxpool1,W_i2conv_1x1_4)+ b_i2conv_1x1_4
 
# #concatenate all the feature maps and hit them with a relu
# h_i2pool_incep = max_pool(tf.concat([h_i2conv1_1x1_1,h_i2conv1_3x3,h_i2conv1_5x5,h_i2conv1_1x1_4],3),[1,3,3,1])
# inception2 = tf.nn.relu(h_i2pool_incep)

# h_fc_flat = tf.reshape(inception2,[-1,3*3*256])

### 4.6 Hidden Layer 4
After the convolutional and inception layers, we will add fully connected layers. Output is of size [32,2048]. 

In [11]:
W_fc1 = weight_variable([9*9*256, 2048],"W3")
b_fc1 = bias_variable([2048],"b2")

h_fc1 = tf.nn.relu(tf.matmul(h_fc_flat, W_fc1) + b_fc1)

### 4.7 Hidden Layer 5
Another fully connected layer with droput. Adding dropout for more roboust learning by the layer. Output with shape [32, 704]

In [12]:
drop_prob = tf.placeholder(tf.float32)

W_fc2 = weight_variable([2048,704],'W4')
b_fc2 = bias_variable([704],'b4')

h_fc2 = tf.nn.relu(tf.matmul(h_fc1,W_fc2) + b_fc2)
h_fc2_drop = tf.nn.dropout(h_fc2,drop_prob)

### 4.8 Output Layer
Final Layer which will provide the predictions.

In [13]:
W_fc3 = weight_variable([704,n_class],'W5')
b_fc3 = bias_variable([n_class],'b5')

y_out = tf.matmul(h_fc2_drop,W_fc3) + b_fc3

### 4.9 Loss, Optimization and Accuracy
* Cross entropy between true class y_ and value predict by our model y_out is calculated and then softmax operation is performed. Then mean of the cross entropy over the batch is calculated.  
* Among the choices availabe for gradient descent, it's best to use adam optimizer which has momentum and decay function. Among different values I used for learning rate, values around 1e-4 gives the best results.
* Accuracy is calculated to quantify the performance of the our model

In [14]:
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_,logits=y_out))

train_optimizer = tf.train.AdamOptimizer(learning_rate=5e-4).minimize(cross_entropy)

correct_pred = tf.equal(tf.argmax(y_out, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

## 5. Visualization
After designing the NN graph, we will plot some TF graphs for see how our TF network trains and performs over time.

In [15]:
tf.summary.scalar(name='Loss', tensor=cross_entropy)
tf.summary.scalar(name='Accuracy',tensor=accuracy)
graph_summary = tf.summary.merge_all()
tf_saver = tf.train.Saver()

## 6. Trainig the NN Graph
Intialize the weights, biases and variable for writing summaries into disk. Generate the training and validation data using the Keras image data generator and pass for training and validation.
Lets train our network and find how it performs.

In [16]:
global_step = 0
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    writer = tf.summary.FileWriter(logdir='./Iceberg/CNNInception', graph=sess.graph)
    for e in range(epochs):
        n_batches = int(X_train.shape[0] / batch_size)
        print("Epoch:{} ".format(e))
        gen_train = datagen.flow(X_train, y_train,
                                           batch_size=batch_size)
        for batch in range(n_batches):
            batch_data = gen_train.next()
            sess.run(train_optimizer,feed_dict={x: batch_data[0], 
                                                y_: batch_data[1], 
                                                drop_prob: 0.5})
            if(batch % 10 == 0):
                train_acc = accuracy.eval(feed_dict={x: batch_data[0], 
                                                y_: batch_data[1], 
                                                drop_prob: 0.5})
                batch_loss, summary = sess.run([train_optimizer,graph_summary],
                                               feed_dict={x: batch_data[0], 
                                                            y_: batch_data[1], 
                                                            drop_prob: 0.5})
                writer.add_summary(summary,global_step=global_step)
                print(" - Batch no: {}, train accuracy: {}".format(batch,train_acc))
            global_step += 1
        valid_accuracy = accuracy.eval(feed_dict={x: X_valid, y_: y_valid, drop_prob: 1.0})
        print("-Validation accuracy after epoc: {} ".format(valid_accuracy))
    #Save your model
    tf_saver.save(sess, save_path='./Iceberg/CNNInception/SavedModel')

Epoch:0 
 - Batch no: 0, train accuracy: 0.5
 - Batch no: 10, train accuracy: 0.6875
 - Batch no: 20, train accuracy: 0.5
-Validation accuracy after epoc: 0.6359102129936218 
Epoch:1 
 - Batch no: 0, train accuracy: 0.59375
 - Batch no: 10, train accuracy: 0.625
 - Batch no: 20, train accuracy: 0.59375
-Validation accuracy after epoc: 0.6234413385391235 
Epoch:2 
 - Batch no: 0, train accuracy: 0.59375
 - Batch no: 10, train accuracy: 0.71875
 - Batch no: 20, train accuracy: 0.65625
-Validation accuracy after epoc: 0.5112219452857971 
Epoch:3 
 - Batch no: 0, train accuracy: 0.65625
 - Batch no: 10, train accuracy: 0.625
 - Batch no: 20, train accuracy: 0.625
-Validation accuracy after epoc: 0.6234414577484131 
Epoch:4 
 - Batch no: 0, train accuracy: 0.6875
 - Batch no: 10, train accuracy: 0.5625
 - Batch no: 20, train accuracy: 0.6875
-Validation accuracy after epoc: 0.6384040117263794 
Epoch:5 
 - Batch no: 0, train accuracy: 0.6875
 - Batch no: 10, train accuracy: 0.59375
 - Batch 

KeyboardInterrupt: 