<a href="https://colab.research.google.com/github/jfogarty/machine-learning-intro-workshop/blob/master/external/cnn_basic.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tensorflow only CNN on MNIST with MiniBatch evaluation

- From Deep Learning Models -- A collection of various deep learning architectures, models, and tips for TensorFlow and PyTorch in Jupyter Notebooks.
    - Author: Sebastian Raschka
    - GitHub Repository: https://github.com/rasbt/deeplearning-models

Updated by [John Fogarty](https://github.com/jfogarty) for Python 3.6 and [Base2 MLI](https://github.com/base2solutions/mli) and [colab](https://colab.research.google.com) standalone evaluation.

# Model Zoo -- Convolutional Neural Network

**Usage NOTE!** Use `Shift+Enter` to step through this notebook, executing the code as you go.

In [1]:
class Context:
    VERBOSE=False    # True for extensive logging during execution.
    QUIET=False      # True for minimal logging during execution.
    WARNINGS=False   # True to enable display of annoying but rarely useful messages.

In [18]:
#@title Import Directives
import os
import timeit
import time
import numpy as np
import tensorflow as tf
from datetime import timedelta

# Suppress Tensorflow log spew.
if not Context.WARNINGS:
    tf.logging.set_verbosity(tf.logging.ERROR)
    os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

try:
   device_name = os.environ['COLAB_TPU_ADDR']
   TPU_ADDRESS = 'grpc://' + device_name
   print(f'Running with TPU acceleration at {TPU_ADDRESS}')
except KeyError:
  GPU_NAME = tf.test.gpu_device_name()
  if GPU_NAME.startswith('/device:GPU'): 
      print(f"Running with GPU acceleration at {GPU_NAME}")
  else:
      print("Running on normal CPU without GPU acceleration.")
        
def elapsed_time(func, *args, msg=''):
    ''' Display the elapsed time of the function.
        Return the function value.
    '''
    stime = time.time()
    result = func(*args)
    etime = time.time() - stime
    log(msg + "Elapsed test time: {0}", timedelta(seconds=etime))
    return result        

Running on normal CPU without GPU acceleration.


### Low-level Implementation

In [3]:
from functools import reduce
from tensorflow.examples.tutorials.mnist import input_data

##########################
### DATASET
##########################

mnist = input_data.read_data_sets("./", one_hot=True)

Extracting ./train-images-idx3-ubyte.gz
Extracting ./train-labels-idx1-ubyte.gz
Extracting ./t10k-images-idx3-ubyte.gz
Extracting ./t10k-labels-idx1-ubyte.gz


In [4]:
##########################
### SETTINGS
##########################

# Hyperparameters
learning_rate = 0.1
dropout_keep_proba = 0.5
epochs = 3
batch_size = 32

# Architecture
input_size = 784
image_width, image_height = 28, 28
n_classes = 10

# Other
print_interval = 500
random_seed = 123

In [5]:
##########################
### WRAPPER FUNCTIONS
##########################

def conv2d(input_tensor, output_channels,
           kernel_size=(5, 5), strides=(1, 1, 1, 1),
           padding='SAME', activation=None, seed=None,
           name='conv2d'):

    with tf.name_scope(name):
        input_channels = input_tensor.get_shape().as_list()[-1]
        weights_shape = (kernel_size[0], kernel_size[1],
                         input_channels, output_channels)

        weights = tf.Variable(tf.truncated_normal(shape=weights_shape,
                                                  mean=0.0,
                                                  stddev=0.01,
                                                  dtype=tf.float32,
                                                  seed=seed),
                              name='weights')
        biases = tf.Variable(tf.zeros(shape=(output_channels,)), name='biases')
        conv = tf.nn.conv2d(input=input_tensor,
                            filter=weights,
                            strides=strides,
                            padding=padding)

        act = conv + biases
        if activation is not None:
            act = activation(conv + biases)
        return act


def fully_connected(input_tensor, output_nodes,
                    activation=None, seed=None,
                    name='fully_connected'):

    with tf.name_scope(name):
        input_nodes = input_tensor.get_shape().as_list()[1]
        weights = tf.Variable(tf.truncated_normal(shape=(input_nodes,
                                                         output_nodes),
                                                  mean=0.0,
                                                  stddev=0.01,
                                                  dtype=tf.float32,
                                                  seed=seed),
                              name='weights')
        biases = tf.Variable(tf.zeros(shape=[output_nodes]), name='biases')

        act = tf.matmul(input_tensor, weights) + biases
        if activation is not None:
            act = activation(act)
        return act

In [6]:
##########################
### GRAPH DEFINITION
##########################

g = tf.Graph()
with g.as_default():
    
    tf.set_random_seed(random_seed)

    # Input data
    tf_x = tf.placeholder(tf.float32, [None, input_size, 1], name='inputs')
    tf_y = tf.placeholder(tf.float32, [None, n_classes], name='targets')
    
    keep_proba = tf.placeholder(tf.float32, shape=None, name='keep_proba')

    # Convolutional Neural Network:
    # 2 convolutional layers with maxpool and ReLU activation
    input_layer = tf.reshape(tf_x, shape=[-1, image_width, image_height, 1])
    
    conv1 = conv2d(input_tensor=input_layer,
                   output_channels=8,
                   kernel_size=(3, 3),
                   strides=(1, 1, 1, 1),
                   activation=tf.nn.relu,
                   name='conv1')
                              
    pool1 = tf.nn.max_pool(conv1,
                           ksize=(1, 2, 2, 1), 
                           strides=(1, 2, 2, 1),
                           padding='SAME',
                           name='maxpool1')
    
    conv2 = conv2d(input_tensor=pool1,
                   output_channels=16,
                   kernel_size=(3, 3),
                   strides=(1, 1, 1, 1),
                   activation=tf.nn.relu,
                   name='conv2')
    
    pool2 = tf.nn.max_pool(conv2,
                           ksize=(1, 2, 2, 1), 
                           strides=(1, 2, 2, 1),
                           padding='SAME',
                           name='maxpool2')
    
    dims = pool2.get_shape().as_list()[1:]
    dims = reduce(lambda x, y: x * y, dims, 1)
    flat = tf.reshape(pool2, shape=(-1, dims))
    
    out_layer = fully_connected(flat, n_classes, activation=None, 
                                name='logits')

    # Loss and optimizer
    loss = tf.nn.softmax_cross_entropy_with_logits_v2(logits=out_layer, labels=tf_y)
    cost = tf.reduce_mean(loss, name='cost')
    optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
    train = optimizer.minimize(cost, name='train')

    # Prediction
    correct_prediction = tf.equal(tf.argmax(tf_y, 1), 
                                  tf.argmax(out_layer, 1), 
                         name='correct_prediction')
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, 
                                      tf.float32), 
                              name='accuracy')

In [25]:
##########################
### TRAINING & EVALUATION
##########################

def train(g, mnist):
    with tf.Session(graph=g) as sess:
        sess.run(tf.global_variables_initializer())

        np.random.seed(random_seed) # random seed for mnist iterator
        for epoch in range(1, epochs + 1):
            avg_cost = 0.
            total_batch = mnist.train.num_examples // batch_size
           
            stime = time.time()
            for i in range(total_batch):
                batch_x, batch_y = mnist.train.next_batch(batch_size)
                batch_x = batch_x[:, :, None] # add "missing" color channel

                _, c = sess.run(['train', 'cost:0'], 
                                feed_dict={'inputs:0': batch_x,
                                           'targets:0': batch_y,
                                           'keep_proba:0': dropout_keep_proba})
                avg_cost += c
                if not i % print_interval:
                    print("--- Minibatch: %5d | Cost: %.3f" % (i + 1, c))
                    
            etime = time.time() - stime
            print(f"- Batch elapsed test time: {timedelta(seconds=etime)}")
            
            train_acc = sess.run('accuracy:0', 
                                 feed_dict={'inputs:0': mnist.train.images[:, :, None],
                                            'targets:0': mnist.train.labels,
                                            'keep_proba:0': 1.0})
            valid_acc = sess.run('accuracy:0', 
                                 feed_dict={'inputs:0': mnist.validation.images[:, :, None],
                                            'targets:0': mnist.validation.labels,
                                            'keep_proba:0': 1.0})

            print("- Epoch: %03d | AvgCost: %.3f" % (epoch, avg_cost / (i + 1)), end="")
            print(" | Train/Valid ACC: %.3f/%.3f" % (train_acc, valid_acc))

        test_acc = sess.run('accuracy:0', 
                            feed_dict={'inputs:0': mnist.test.images[:, :, None],
                                       'targets:0': mnist.test.labels,
                                       'keep_proba:0': 1.0})
    return test_acc

In [26]:
stime = time.time()

test_acc = train(g, mnist)        

etime = time.time() - stime
print(f"- Total training time: {timedelta(seconds=etime)}")
print(f'- Test Accuracy: {test_acc:.3f}')

--- Minibatch:     1 | Cost: 2.303
--- Minibatch:   501 | Cost: 0.203
--- Minibatch:  1001 | Cost: 0.212
--- Minibatch:  1501 | Cost: 0.036
- Batch elapsed test time: 0:00:18.667635
- Epoch: 001 | AvgCost: 0.588 | Train/Valid ACC: 0.970/0.969
--- Minibatch:     1 | Cost: 0.030
--- Minibatch:   501 | Cost: 0.025
--- Minibatch:  1001 | Cost: 0.032
--- Minibatch:  1501 | Cost: 0.080
- Batch elapsed test time: 0:00:19.126648
- Epoch: 002 | AvgCost: 0.098 | Train/Valid ACC: 0.977/0.975
--- Minibatch:     1 | Cost: 0.089
--- Minibatch:   501 | Cost: 0.057
--- Minibatch:  1001 | Cost: 0.038
--- Minibatch:  1501 | Cost: 0.121
- Batch elapsed test time: 0:00:19.213511
- Epoch: 003 | AvgCost: 0.078 | Train/Valid ACC: 0.981/0.980
- Total training time: 0:01:22.681443
- Test Accuracy: 0.982


### Results

With 3 epochs (using 4 minibatches per epoch), this model achieves about **97%+ test accuracy** on MNNIST.  This **very** respectable by any standard.

- The [best performance so far](https://benchmarks.ai/mnist) is well beyond human with an error rate of 0.21%, substantially better than our 2.2% rate here. Such rates are well into overtraining.

### End of notebook