# Module 6: MNIST handwritten digits

In this lab, we are going to learn to recognize hand-written digits from the MNIST data set.

Before we get started, you need to do a couple things:
1. Ensure you are in the Tensorflow CPU container image.  If not, go to your first tab and use the Quit button, then re-access the environment and select the Tensorflow CPU container image.
2. Open a Terminal and install the codetiming library: `pip install codetiming`

This dataset is the "IRIS" data of image analysis neural networks.
 * http://yann.lecun.com/exdb/mnist/

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt

import os, sys
import itertools, functools
import numpy as np
import pandas as pd
import tensorflow as tf
from dsa_automation import tf_limit, tf_keras_reset
sess_config = tf_limit(tf, 4, glbs=globals())
tf.logging.set_verbosity(tf.logging.ERROR)
tf_keras_reset(tf, sess_config)

from sklearn.preprocessing import scale, LabelBinarizer
from sklearn.metrics import f1_score, confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# Random seed for numpy
np.random.seed(18937)

<IPython.core.display.Javascript object>

In [2]:
print(tf.__version__)

1.13.1


# Revisiting Keras

Keras is an abstraction of the TensorFlow API to facilitate more easily constructed models.
And actually, it is a general Python library for model construction that supports TensorFlow and some other underlying lirbaries. 
  * https://keras.io/

It has since been wrapped into TensorFlow as `tf.keras`.

In the cell below, we construct a [Convolutional Neural Network](http://deeplearning.net/tutorial/lenet.html) that has the following structure:
  * Convolution with 5x5 pixel kernels, 32 of them, and using the [Rectified Linear Unit](https://en.wikipedia.org/wiki/Rectifier_(neural_networks)
  * Max Pooling with 2x2 kernel: Find the strongest response in each 2x2 neuron area of a generated feature map (from the convolution)
     * Good Pooling Page: http://ufldl.stanford.edu/tutorial/supervised/Pooling/
  * Convolve with 64 5x5 kernels, then Max Pooling again
  * Strecth all the feature maps out into a vector
  * A feed forward, fully connected layer -- think just dense vector -- of 1024 neurons
  * 10 class activation using SoftMax, a logit layer, with all neurons normalized to sum to 1.0
    * https://en.wikipedia.org/wiki/Softmax_function
  

In [3]:
layers = [   
    ############################
    # Input Prep
    ############################

    # we define the input as 784 image pixels
    tf.keras.Input(shape=(784,)),
    
    # We un-flatten the square image (28x28 pixels) from the vector of values
    tf.keras.layers.Reshape((28, 28, 1)),
    
    
    ############################
    # Convolutional layers, aka
    #   Feature Extraction Phase
    ############################
    
    tf.keras.layers.Conv2D(32, (5, 5), activation='relu', padding='SAME'),
    tf.keras.layers.MaxPooling2D((2,2), strides=(2,2), padding='SAME'),
    
    
    tf.keras.layers.Conv2D(64, (5, 5), activation='relu', padding='SAME'),
    tf.keras.layers.MaxPooling2D((2,2), strides=(2,2), padding='SAME'),
        
    ############################
    # Fully connected network
    #  aka, Classification Phase
    ############################

    ######
    # STOP : Figure out why this size is 7x7x64 = 3136 features, 
    #        the math is right above, if you did readings
    ######
    tf.keras.layers.Reshape((7*7*64,)),
    
    # A layer of 1024 neurons
    tf.keras.layers.Dense(1024, activation='relu'),
    tf.keras.layers.Dropout(rate = 0.5),  # this is the drop-out concept, see readings for how this aids network generalizations
    
    # The final classification layer, where each 
    # output is a layer-normalized logistic function
    tf.keras.layers.Dense(10, activation='softmax'),
]

# This function walks throguh the layers composing them into chain of nested functions
y_pred = functools.reduce(lambda f1, f2: f2(f1), layers)

# Define the model as input, and the function chain that produces the output
model = tf.keras.models.Model(inputs = [layers[0]], outputs = [y_pred])

# Compile the model
model.compile(optimizer=tf.train.AdamOptimizer(),
    loss='categorical_crossentropy',
    metrics=['categorical_accuracy'])



In [4]:
# Let's look at this beast!
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 784)               0         
_________________________________________________________________
reshape (Reshape)            (None, 28, 28, 1)         0         
_________________________________________________________________
conv2d (Conv2D)              (None, 28, 28, 32)        832       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 14, 14, 64)        51264     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 7, 7, 64)          0         
_________________________________________________________________
reshape_1 (Reshape)          (None, 3136)              0         
__________

#### Note the trainable parameters!
Over 3 million trainable parameters, which means we need lots of data, computation, and time to learn all these parameters.
These are often much more complicated models and therefore more complicated decision surface chains than other machine learning models you have worked with.


----

## Load dataset

In [5]:
from tensorflow.examples.tutorials import mnist
# Notice we are converting the labels to 1-hots
dataset = mnist.input_data.read_data_sets('/dsa/data/all_datasets/MNIST_data', one_hot=True)

Extracting /dsa/data/all_datasets/MNIST_data/train-images-idx3-ubyte.gz
Extracting /dsa/data/all_datasets/MNIST_data/train-labels-idx1-ubyte.gz
Extracting /dsa/data/all_datasets/MNIST_data/t10k-images-idx3-ubyte.gz
Extracting /dsa/data/all_datasets/MNIST_data/t10k-labels-idx1-ubyte.gz


If you are not sure what just happened with the `*.gz` files, 
go back and review the link about the data set.

----

## Train the model

This should look similar to what you have seen in other courses, such as Applied Machine Leanring, 
just using some different syntatical access to the data and labels 
because we are using TF instead of SciKit Learn.

In [6]:

from codetiming import Timer
t = Timer(name="class")
t.start()

model.fit(x=[dataset.train.images], # Notice, we have dataset.train, below we will use dataset.test
          y=[dataset.train.labels], 
          batch_size=50, 
          epochs=3,
          validation_data=(dataset.validation.images, dataset.validation.labels), 
          shuffle=True, 
          verbose=1)

learn_time = t.stop()


Train on 55000 samples, validate on 5000 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3
Elapsed time: 314.3766 seconds


In [7]:
print("{:.2f} s used to train the CNN".format(learn_time))

314.38 s used to train the CNN


You will notice that training time is significantly longer than our prior experience.
Welcome to deep learning, and for reference... we are not even *deep* yet!
This has 2x convolutional layers.
Contemporary models may have over 100 convolutional layers!

---
## Evaluate the model

In [8]:
print('Evaluation')
print('loss: %.4f  accuracy: %.4f' % tuple(
    model.evaluate(x=[dataset.test.images], # Note we are using the blind dataset.test
                   y=[dataset.test.labels], 
                   batch_size=50, verbose=2)
    )
)

Evaluation
 - 6s - loss: 0.0303 - categorical_accuracy: 0.9910
loss: 0.0303  accuracy: 0.9910


# Save your notebook