<a href="https://colab.research.google.com/github/eswens13/deep_learning/blob/master/keras/mnist_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MNIST Classifier

Exploring an introductory deep learning problem to solidify some of the concepts I've learned thusfar.

## Prepare Environment and Data

We need to pull in the MNIST dataset and set up the environment so that we can checkpoint the model.  I have decided to use a utility called ngrok that acts as a tunnel between my local machine and the Colab server so that I can run TensorBoard and examine the training process.  This first cell spits out a URL where the TensorBoard visualizations can be viewed.

Note that the samples in the MNIST dataset are grayscale so we have to artificially append a channel dimension to them.

In [1]:
from google.colab import drive
import os

drive.mount('/content/drive')
LINUX_BASE_PATH = '/content/drive/My\ Drive/deep_learning'
PYTHON_BASE_PATH = '/content/drive/My Drive/deep_learning'
LINUX_MNIST_DIR = os.path.join(LINUX_BASE_PATH, 'mnist')
PYTHON_MNIST_DIR = os.path.join(PYTHON_BASE_PATH, 'mnist')
os.system('mkdir -p ' + LINUX_MNIST_DIR)

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive


0

In [3]:
from keras.datasets import mnist
from keras.utils import to_categorical

(X_train, y_train), (X_test, y_test) = mnist.load_data()

# One-hot encode the labels.
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

# Need to give a "channels" dimension even though the images are grayscale.
X_train = X_train[:,:,:,None]
X_test = X_test[:,:,:,None]

print("Data Shapes:\n\tX_train: {}\n\ty_train: {}\n\tX_test: {}\n\ty_test{}"\
      .format(X_train.shape, y_train.shape, X_test.shape, y_test.shape))

import os

# Get a tool called ngrok to use as a tunnel between my local machine and the
# Google Colab server. This will allow us to use TensorBoard to visualize
# helpful metrics of the network.
#
# Tutorial:
#   https://www.dlology.com/blog/quick-guide-to-run-tensorboard-in-google-colab/
#
!wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
!unzip ngrok-stable-linux-amd64.zip

# Now the ngrok exectuable is extracted to the current directory. Check to make
# sure there is a log directory for Keras to use.
cwd = os.getcwd()
LOG_DIR = os.path.join(cwd, 'log')
print("Log Dir: {}".format(LOG_DIR))
if not os.path.exists(LOG_DIR):
  os.system('mkdir -p {}'.format(LOG_DIR))

# Run tensorboard in the background.
get_ipython().system_raw(
    'tensorboard --logdir {} --host 0.0.0.0 --port 6006 &'
    .format(LOG_DIR)
)

# Tell ngrok (in the background) to tunnel TensorBoard port 6006 to the outside
# world.
get_ipython().system_raw('./ngrok http 6006 &')

# Get the URL that I can use to hook into TensorBoard from my local machine.
!curl -s http://localhost:4040/api/tunnels | python3 -c \
    "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"

Data Shapes:
	X_train: (60000, 28, 28, 1)
	y_train: (60000, 10)
	X_test: (10000, 28, 28, 1)
	y_test(10000, 10)
--2019-05-31 04:59:17--  https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
Resolving bin.equinox.io (bin.equinox.io)... 34.206.36.121, 52.200.123.104, 3.214.163.243, ...
Connecting to bin.equinox.io (bin.equinox.io)|34.206.36.121|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16648024 (16M) [application/octet-stream]
Saving to: ‘ngrok-stable-linux-amd64.zip.1’


2019-05-31 04:59:18 (20.2 MB/s) - ‘ngrok-stable-linux-amd64.zip.1’ saved [16648024/16648024]

Archive:  ngrok-stable-linux-amd64.zip
replace ngrok? [y]es, [n]o, [A]ll, [N]one, [r]ename: n
Log Dir: /content/log
https://b145d8c2.ngrok.io


## Network Architecture

This next cell defines the network architecture.

In [0]:
from keras.models import Sequential
from keras.layers import Activation, Conv2D, Dense, Flatten, MaxPooling2D
from keras.optimizers import Adam
from keras.regularizers import l2

import numpy as np

# Build the Keras model
def get_keras_model():
  
  model = Sequential();
  num_filters = 32
  kernel_size = [3,3]
  stride_size = [1,1]
  pad_str = 'same'
  format_str = 'channels_last'
  act_str = 'relu'
  
  model.add(Conv2D(num_filters, \
                   kernel_size, \
                   strides=stride_size, \
                   padding=pad_str, \
                   data_format=format_str, \
                   use_bias=True, \
                   activation=act_str))

  num_filters = 64
  model.add(Conv2D(num_filters, \
                   kernel_size, \
                   strides=stride_size, \
                   padding=pad_str, \
                   data_format=format_str, \
                   use_bias=True, \
                   activation=act_str))
  
  num_filters = 64
  model.add(Conv2D(num_filters, \
                   kernel_size, \
                   strides=stride_size, \
                   padding=pad_str, \
                   data_format=format_str, \
                   use_bias=True, \
                   activation=act_str))
  
  model.add(Flatten(data_format=format_str))
  
  model.add(Dense(512, activation=act_str, use_bias=True))
  model.add(Dense(256, activation=act_str, use_bias=True))
  model.add(Dense(128, activation=act_str, use_bias=True))
  
  # Output Layer
  model.add(Dense(10, activation=act_str, use_bias=True))
  
  # Compile the model.
  adam = Adam(lr=5e-4, decay=1e-7)
  model.compile(loss='categorical_crossentropy', \
                optimizer=adam, \
                metrics=['accuracy'])
  
  return model

## Train the Model

In [13]:
# Create a Keras callback so that it outputs to TensorBoard rather than this
# console.
import keras.backend as K
from keras.callbacks import TensorBoard
from keras.models import load_model
from datetime import datetime

K.clear_session()
tbCallback = TensorBoard(log_dir=LOG_DIR, \
                         histogram_freq=0, \
                         write_graph=False, \
                         write_grads=True, \
                         batch_size=100, \
                         write_images=False)



# Run training on the model.
total_epochs = 10
curr_epochs = 0
model = None
full_model_path = ""

while curr_epochs < total_epochs:
  # Handle the checkpoint and first iteration cases.
  if model == None:
    if full_model_path == "":
      # This means we're starting from scratch
      model = get_keras_model()
    else:
      model = load_model(full_model_path)
  
  
  model.fit(X_train, y_train, \
            epochs=10, batch_size=1000, verbose=2, callbacks=[tbCallback], \
            validation_data=(X_test, y_test))
  
  now = datetime.now()
  curr_date_time = now.strftime("%m%d%Y_%H%M%S")
  model_file = "{}.h5".format(curr_date_time)
  full_model_path = os.path.join(PYTHON_MNIST_DIR, model_file)

  model.save(full_model_path)
  del model
  model = None
  curr_epochs += 10

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
 - 9s - loss: 3.0534 - acc: 0.4062 - val_loss: 2.4959 - val_acc: 0.6900
Epoch 2/10
 - 9s - loss: 1.8255 - acc: 0.7309 - val_loss: 0.7322 - val_acc: 0.8961
Epoch 3/10
 - 9s - loss: 1.0793 - acc: 0.6281 - val_loss: 0.7918 - val_acc: 0.8276
Epoch 4/10
 - 9s - loss: 0.5773 - acc: 0.9053 - val_loss: 0.4311 - val_acc: 0.9330
Epoch 5/10
 - 9s - loss: 0.4027 - acc: 0.9357 - val_loss: 0.3566 - val_acc: 0.9491
Epoch 6/10
 - 9s - loss: 0.3651 - acc: 0.9405 - val_loss: 0.3398 - val_acc: 0.9512
Epoch 7/10
 - 9s - loss: 0.8048 - acc: 0.7450 - val_loss: 0.5728 - val_acc: 0.9035
Epoch 8/10
 - 9s - loss: 0.5593 - acc: 0.8834 - val_loss: 0.3842 - val_acc: 0.9365
Epoch 9/10
 - 9s - loss: 0.5231 - acc: 0.8524 - val_loss: 0.3348 - val_acc: 0.9458
Epoch 10/10
 - 9s - loss: 0.4303 - acc: 0.8895 - val_loss: 0.2850 - val_acc: 0.9460


In [6]:
model_to_eval = load_model(full_model_path)
scores = model_to_eval.evaluate(X_test, y_test)
for i in range(len(model_to_eval.metrics_names)):
  print("{}: {}".format(model_to_eval.metrics_names[i], scores[i]))

loss: 0.788991606760025
acc: 0.8739


## What Did I Learn?

Doubled the size of my dense layers (three of them before output).  Increased batch size and learning rate.

Took out max pooling after the two convolutional layers and that made a HUGE difference!  (Now ~93% accuracy.)

####Current Best (05/30/19):  Consistently >93% from Epoch 3

Conv 1: \[3,3\] kernel, 32 feature maps
Conv 2: \[3,3\] kernel, 64 feature maps
Conv 3: \[3,3\] kernel, 64 feature maps
Flatten
Dense:  512
Dense:  256
Dense:  128
Output
Adam Optimizer
Learning Rate:  5e-4
Decay:  1e-7

###Archive

####(05/30/19) 96% Accuracy, Epoch 9

Conv 1: \[3,3\] kernel, 32 feature maps
Conv 2: \[3,3\] kernel, 64 feature maps
Conv 3: \[3,3\] kernel, 64 feature maps
Flatten
Dense:  512
Dense:  256
Dense:  128
Output
Adam Optimizer
Learning Rate:  1e-4