## Keras Digit Demonstration

In this exercise, we will use a CNN to classify hand-written digits using Keras.

### Imports

In [0]:

import numpy as np
import time
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import LabelBinarizer, minmax_scale

plt.rcParams['figure.figsize'] = (15.0, 3.0)

### Install libraries we will need to visualize our DNN

In [0]:
!pip install pydot
!apt-get install -yq graphviz

### Keras Libraries and Modules

In [0]:
import keras
from keras.datasets import mnist
from keras.models import Sequential, Model, load_model
from keras.layers import Dense, Dropout, Flatten, Input, Lambda, Concatenate
from keras.layers import Conv2D, MaxPooling2D, BatchNormalization
from keras.callbacks import LearningRateScheduler
from keras import backend as K
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot

### Import dataset from Keras

We create a combined dataset with test and training rows.  We will features in `model.fit` to create a validation dataset later in the exercise.

Notice that the digits are 4D tensors.  The first dimension is the "row" or sample image index.  The next to dimensions are the spatial dimensions, x/y.  The last dimension is for color channels.  In this case, we have one color channels since the images are black and white.

In [0]:
(x_train_, y_train_), (x_test_, y_test_) = mnist.load_data()
X = np.vstack([x_train_,x_test_])[...,np.newaxis]
y_ = np.hstack([y_train_,y_test_])
label_coder = LabelBinarizer()

label_coder.fit(y_)

Y = label_coder.transform(y_)

y_test = label_coder.transform(y_test_)
y_train = label_coder.transform(y_train_)
x_test = x_test_[...,np.newaxis]
x_train = x_train_[...,np.newaxis]

X.shape, Y.shape

### Plot a random row

In [0]:
ind = np.random.choice(range(X.shape[0]))
plt.imshow(X[ind,...,0])

### Check the histogram of labels

In [0]:
plt.hist(y_);

# Simple Fully Connected Model

We start with a three-layer fully-connected ANN. 

We have a categorical classification problem which means that our network should classify the image as only of the ten classes (0-9).  Because of this, we need the output activation function to be `softmax`, which approximates a probability distribution where the output layer outputs all sum to 1.

In [0]:
input_ = Input((28,28,1))

x = input_
x = Flatten()(x)
x = Dense(100,activation='relu')(x)
x = Dense(100,activation='relu')(x)


x = Dense(10, activation='softmax')(x)



model = Model(input_, x)

model.summary()

### Check the model weights

The model is initialized with random weights and can be executed before training.  Let's use Keras to examine the model weights.

#### Model layers as a list

In [0]:
model.layers

#### Model weights in the last layer

Each layer has a NxM mapping matrix of weights and M bias values.

In [0]:
model.layers[-1].weights

#### Use `K.eval` to view weight values

In [0]:
K.eval(model.layers[-1].weights[0])

#### Plot weights

In [0]:
plt.imshow(K.eval(model.layers[-1].weights[0]),aspect='auto')

## We can also run the untrained model on the first two inputs

In [0]:
plt.imshow(X[0,...,0])
plt.show()
plt.imshow(X[1,...,0])
activations = model.predict(X[:2,...])

activations.shape

In [0]:

plt.plot(activations.T,'s')

## Train It!

In [0]:
model.compile(loss='categorical_crossentropy', 
              optimizer=keras.optimizers.Adam(lr=.001),
              metrics=['accuracy'])
model.fit(x_train,y_train, 
          batch_size=400, 
          validation_data=(x_test,y_test),
          epochs=10)

### Check Trained model activations

In [0]:
activations = model.predict(x_test, verbose=1)
plt.imshow(x_test[0,...,0])
plt.show()
plt.plot(activations[0,:],'s')
plt.show()
plt.imshow(x_test[1,...,0])
plt.show()
plt.plot(activations[1,:],'s')

In [0]:
from sklearn.metrics import confusion_matrix
import itertools
def plot_confusion_matrix(y_test, y_pred, classes,
                          normalize=False,
                          title='Confusion matrix',
                          cmap=plt.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    plt.rcParams['figure.figsize'] = (10.0, 10.0)
    cm = confusion_matrix(y_test, y_pred,)
    np.set_printoptions(precision=2)
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
    else:
        print('Confusion matrix, without normalization')

    # print(cm)

    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, format(cm[i, j], fmt),
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    
    plt.rcParams['figure.figsize'] = (15.0, 3.0)


plot_confusion_matrix(y_test_, label_coder.inverse_transform(activations), label_coder.classes_);

# CNN Model

65% accurcay is impressive for such a simple model, but it isn't really usable.  In order to make the next level of improvement, we need to use convolutional layers.

In [0]:
input_ = Input((28,28,1))

x = Conv2D(32, (3,3), activation='relu')(input_)
x = Conv2D(64, (3,3), activation='relu')(x)
x = MaxPooling2D(pool_size=(2,2))(x)

x = Flatten()(x)
x = BatchNormalization()(x)
x = Dense(100,activation='relu')(x)

x = Dense(10, activation='softmax')(x)

cnn_model = Model(input_, x)
cnn_original_weights = cnn_model.get_weights()
cnn_model.summary()

### Examine the convolution kernel values

Just as we did with the fully connected weight matrix, we can extract and visualize the weight matrix for the convolution kernals.

In [0]:
kerns = K.eval(cnn_model.layers[1].weights[0])
kerns.shape

In [0]:
plt.rcParams['figure.figsize'] = (15.0, 15.0)
nrow = 8; ncol = 4;
fig, axs = plt.subplots(nrows=nrow, ncols=ncol)

k = 0
for ax in axs.reshape(-1): 
    ax.imshow(kerns[:,:,0,k])
    k += 1

### Plot Layer Flow

We can also plot the layer flow as a SVG

In [0]:
SVG(model_to_dot(model,True,True).create(prog='dot', format='svg'))

## Fit the CNN Model

In [0]:
cnn_model.compile(loss='categorical_crossentropy', 
                  optimizer=keras.optimizers.Adam(lr=.001),
                  metrics=['accuracy'])
cnn_model.fit(x_train,y_train, 
              batch_size=100, 
              validation_data=(x_test,y_test),
              epochs=4)

### Predict Model Output


In [0]:
y_hat = cnn_model.predict(x_test,batch_size=200, verbose=1,)

### Plot Model Performance

In [0]:
plot_confusion_matrix(y_test_, label_coder.inverse_transform(y_hat), label_coder.classes_);

# Callbacks

Keras has a convenient mechanism for taking actions in between epochs called "callbacks".  See https://keras.io/callbacks/ for details.

### Learning rate schedule
We are going to define a `LearningRateScheduler` callback, which will adjust the learning rate as we progress through epochs.

In [0]:
def lrs(epoch):
    print(f'epoch = {epoch}')
    lr = 0.001**(1+epoch/10)
    print(f'lr = {lr}')
    return lr

calls = [LearningRateScheduler(lrs)]

## Now retrain with this callback

In [0]:
cnn_model.set_weights(cnn_original_weights) # reset model
cnn_model.compile(loss='categorical_crossentropy', 
                  optimizer=keras.optimizers.Adam(lr=.001),
                  metrics=['accuracy'])
cnn_model.fit(x_train,y_train, 
              batch_size=400, 
              validation_data=(x_test,y_test),
              epochs=4,
              callbacks=calls)

## We can also callback to tensorboard

To set this up on colab, I followed the instructions at https://www.dlology.com/blog/quick-guide-to-run-tensorboard-in-google-colab/

These instructions start tensorboard on this machine and tunnel to the tensorboard port so that we can hit the address externally.

In [0]:
LOG_DIR = './log2'
! rm -rf $LOG_DIR

get_ipython().system_raw(
    'tensorboard --logdir {} --host 0.0.0.0 --port 6006 &'
    .format(LOG_DIR)
)
# Install
! npm install -g localtunnel

# Tunnel port 6006 (TensorBoard assumed running)
get_ipython().system_raw('lt --port 6006 >> url.txt 2>&1 &')

# Get url
! cat url.txt

In [0]:
from keras.callbacks import TensorBoard
cnn_model.set_weights(cnn_original_weights) # reset model

cnn_model.compile(loss='categorical_crossentropy', 
                  optimizer=keras.optimizers.Adam(lr=.001),
                  metrics=['accuracy'])


tbCallBack = TensorBoard(log_dir='./log2/cnn_model/', histogram_freq=1,
                         write_graph=True,
                         write_grads=True,
                         batch_size=400,
                         write_images=True)

cnn_model.fit(x_train,y_train, 
              batch_size=100, 
              validation_data=(x_test,y_test),
              epochs=4,
              callbacks=[tbCallBack, LearningRateScheduler(lrs)])