### Deep Learning for Image Recognition

In this session, we are going to build a bunch of neural nets and compare them.

In [None]:
from keras.datasets import mnist
from keras.utils import np_utils
import numpy as np

In [None]:
(train_x, train_y), (test_x, test_y) = mnist.load_data()

print("Training set has length {0} and consists of images of size {1} by {2}".format(*train_x.shape))
print("Testing set has length {0} and consists of images of size {1} by {2}".format(*test_x.shape))

### Our data

The first dataset we will work on consists of images of handwritten digits. The task at hand is to classify the image as a digit. These are a few samples from the dataset.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

samples = train_x[np.random.choice(range(len(train_x)), size=10, replace=False), :, :]

In [None]:
figure = plt.figure(figsize=(8,4))
for i, sample in enumerate(samples):
    ax = figure.add_subplot(2, 5, i+1)
    ax.imshow(sample, cmap='gray')

### Simple feed forward neural net

Our first approach will be to build a simple feed forward net, with no hidden layers. This is equivalent to multinomial logistic regression.

We have to flatten our image data into a one dimensional vector before it can be fed into a simple feed forward neural net. The input has to be normalized before feeding into the neural net. Normalization is done by dividing the each element in the vector by the max (255).

In [None]:
num_train_samples, width, height = train_x.shape
num_test_samples = test_x.shape[0]

flat_train_x = np.reshape(train_x, (num_train_samples, width * height))
flat_test_x  = np.reshape(test_x, (num_test_samples, width * height))

dummied_train_y = np_utils.to_categorical(train_y)
dummied_test_y = np_utils.to_categorical(test_y)

print("Modified training set has length {0} and consists of vectors of size {1}".format(*flat_train_x.shape))
print("Modified testing set has length {0} and consists of vectors of size {1}".format(*flat_test_x.shape))
print("Modified training labels are of size {}".format(dummied_train_y.shape[1]))
print("Modified testing labels are of size {}".format(dummied_test_y.shape[1]))

In [None]:
# normalize your input vectors
# we do 0-1 normalization by dividing by 255
flat_train_x = flat_train_x / 255.0
flat_test_x = flat_test_x / 255.0

In [None]:
from keras.layers import Dense
from keras.models import Sequential

In [None]:
lenet_1 = Sequential()
lenet_1.add(Dense(units=10,
                  name="output",
                  activation="softmax",
                  input_shape=(width * height,)))
lenet_1.summary()

In [None]:
import os
from keras import backend as K
from lib.default_utils import default_callbacks

K.set_learning_phase(True)  # important if you have modules like dropout in your model

lenet_1.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
logpath, dfcb = default_callbacks(lenet_1, prefix='lenet-1', batch_size=32)

# start training, use tensorboard to show overfitting with more iterations
lenet_1.fit(x=flat_train_x, 
          y=dummied_train_y, 
          batch_size=32, 
          epochs=10, 
          verbose=True, 
          callbacks=dfcb, 
          validation_split=0.2,
          shuffle=True)

# save final weights after completion of training
lenet_1.save_weights(os.path.join(logpath, "model_weights.h5"))

[TensorBoard](http://localhost:9001)

In [None]:
# evaluate on test_dataset
_, test_accuracy = lenet_1.evaluate(flat_test_x, dummied_test_y)
print("Model accuracy on test dataset is {:.3f}".format(test_accuracy * 100))

### Neural net with one hidden layer

We add more complexity by adding one hidden layer into our network. We will compare the performance of this network with the previous model using tensorboard.

In [None]:
lenet_2 = Sequential()
lenet_2.add(Dense(units=16,
                  name="hidden_1",
                  activation="relu",
                  input_shape=(width * height,)))

lenet_2.add(Dense(units=10,
                  name="output",
                  activation="softmax"))
lenet_2.summary()

In [None]:
## build lenet-2
lenet_2.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
logpath, dfcb = default_callbacks(lenet_2, prefix='lenet-2', batch_size=32)

# start training, use tensorboard to show overfitting with more iterations
lenet_2.fit(x=flat_train_x, 
          y=dummied_train_y, 
          batch_size=32, 
          epochs=20, 
          verbose=True, 
          callbacks=dfcb, 
          validation_split=0.2,
          shuffle=True)

# save final weights after completion of training
lenet_2.save_weights(os.path.join(logpath, "model_weights.h5"))

[TensorBoard](http://localhost:9001/)

In [None]:
# evaluate on test_dataset
_, test_accuracy = lenet_2.evaluate(flat_test_x, dummied_test_y)
print("Model accuracy on test dataset is {:.3f}".format(test_accuracy * 100))

#### Simple convolutional net

Regular neural nets don't scale well for images. There is structure in images which we can exploit to our advantage. Convolutional neural nets exploit this structure to perform image related tasks.

** Draft **
This explains it better than I could
https://cs231n.github.io/convolutional-networks/

In [None]:
# lenet 3
# 2 convolutional layers - 3x3 and 5x5 patches
from keras.layers import Conv2D, Flatten

lenet_3 = Sequential()
lenet_3.add(Conv2D(filters=32,
                  name="conv_1",
                  kernel_size=(3,3),
                  activation="relu",
                  padding='valid',
                  input_shape=(width, height, 1)))

lenet_3.add(Conv2D(filters=12,
                  kernel_size=(5,5),
                  padding='valid',
                  name="conv_2",
                  activation="relu"))

lenet_3.add(Flatten())
lenet_3.add(Dense(units=10,
                  name='output',
                  activation='softmax'))
lenet_3.summary()

In [None]:
train_x = train_x / 255.0
test_x = test_x / 255.0

train_x = np.expand_dims(train_x, axis=-1)
test_x = np.expand_dims(test_x, axis=-1)

In [None]:
## build lenet-3
lenet_3.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
logpath, dfcb = default_callbacks(lenet_3, prefix='conv-lenet-3', batch_size=32)
    
# start training, use tensorboard to show overfitting with more iterations
lenet_3.fit(x=train_x, 
            y=dummied_train_y, 
            batch_size=32, 
            epochs=10, 
            verbose=True, 
            callbacks=dfcb, 
            validation_split=0.2,
            shuffle=True)

# save final weights after completion of training
lenet_3.save_weights(os.path.join(logpath, "model_weights.h5"))

[Tensorboard](http://localhost:9001)

In [None]:
# evaluate on test_dataset
_, test_accuracy = lenet_3.evaluate(test_x, dummied_test_y)
print("Model accuracy on test dataset is {:.3f}".format(test_accuracy * 100))

### Miscellaneous stuff

 - Overfitting
 
 Neural networks are notoriously easy to overfit, make sure your data set is big enough for the model that you are building. Always use a large validation set. Cross-validation can be time consuming.
 
 - Learning rate decay
 
 Always normalize your data before you feed it into the model. Gradient descent can be difficult to converge/tune without normalization

 - optimizers
 
 There is a large variety of optimizers out there - sgd with momentum, adam, rmsprop. We recommend sticking to sgd if you want good generalization. [The Marginal Value of Adaptive Gradient Methods in Machine Learning](https://arxiv.org/abs/1705.08292)
 
 - pooling in convolutional nets
 
 Feature pooling is a way to reduce feature size as you go deeper in the neural net.
 
 - class imbalance
 
 Make sure your classes are balanced

### [Exercise] Add dropout to the model

Dropout is a way to prevent overfitting. add more explanation, insert link to paper.
https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf

In [None]:
from keras.layers import Dropout

lenet_4 = Sequential()
lenet_4.add(Conv2D(filters=32,
                  name="conv_1",
                  kernel_size=(3,3),
                  activation="relu",
                  padding='valid',
                  input_shape=(width, height, 1)))

lenet_4.add(Dropout(0.2))

lenet_4.add(Conv2D(filters=12,
                  kernel_size=(5,5),
                  padding='valid',
                  name="conv_2",
                  activation="relu"))

lenet_4.add(Dropout(0.2))

lenet_4.add(Flatten())
lenet_4.add(Dense(units=10,
                  name='output',
                  activation='softmax'))
lenet_4.summary()

In [None]:
lenet_4.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
logpath, dfcb = default_callbacks(lenet_4, prefix='conv-lenet-4-dropout', batch_size=32)
    
# start training, use tensorboard to show overfitting with more iterations
lenet_4.fit(x=train_x, 
            y=dummied_train_y, 
            batch_size=32, 
            epochs=10, 
            verbose=True, 
            callbacks=dfcb, 
            validation_split=0.2,
            shuffle=True)

# save final weights after completion of training
lenet_4.save_weights(os.path.join(logpath, "model_weights.h5"))

In [None]:
# evaluate on test_dataset
K.set_learning_phase(False)
_, test_accuracy = lenet_4.evaluate(test_x, dummied_test_y)
print("Model accuracy on test dataset is {:.3f}".format(test_accuracy * 100))