## MNIST Classification - DNN with Relu activation

## Agenda:
1. Deep Neural Networks
2. MNIST data
	- Collect data with one_hot
	- Get train and test data
3. Define Parameters
	- Logging path
	- Strating Learning Rate
	- Features and classes
	- Batch size and Training epochs
	- Neurons in hidden layers
4. Build the Graph
	- Define placeholders and Vriables for all layers
	- Prediction
	- Loss
	- Optimizer
	- Model Accuracy
	- Logging
5. Execute the Graph
	- Create Sesson
	- Run session on training set
	- Run session on Test set
	- Log the loss and accuracy for train and test
	- Save the model
	- Close the session

#### Neural networks
A beautiful biologically-inspired programming paradigm which enables a computer to learn from observational data. It mimic the human brain.
Artificial neuron called a perceptron. Set of layerd fully connected perceptrons call a **Neural Network**. In the network, the first column of perceptrons - what we'll call the first layer of nueron - is making very simple decisions, by weighing the input evidence. Neurons in the second layer making a decision by weighing up the results from the first layer of decision-making. In this way a neuron in the second layer can make a decision at a more complex and more abstract level than perceptrons in the first layer. And even more complex decisions can be made by the neurons in the third layer. In this way, a many-layer network of neurons can engage in sophisticated decision making. We apply 4 layered artificial neural nentwork to train and test the MNSIT data set.


In [1]:
import tensorflow as tf

In [2]:
tf.reset_default_graph()

In [3]:
#Set random seed for reproducible results
tf.set_random_seed(42)

## MNIST data
The MNIST database of handwritten digits, available from http://yann.lecun.com/exdb/mnist/ page, has a training set of 55,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image of 28x28.
It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.

Four files are available on this database:    
train-images-idx3-ubyte.gz:  training set images  
train-labels-idx1-ubyte.gz:  training set labels   
t10k-images-idx3-ubyte.gz:   test set images    
t10k-labels-idx1-ubyte.gz:   test set labels 

28x28 handwritten digits image data faltten in to a vector form of size 784

### Collect Data

In [4]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use urllib or similar directly.
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data\train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data\train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data\t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data\t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py fr

Get Training and Test Data

In [5]:
trainX = mnist.train.images
trainY = mnist.train.labels

In [6]:
testX = mnist.test.images
testY = mnist.test.labels

Lets define some parameters

In [22]:
#Directory to save logs and graph, change it as needed
logs_path='/tmp/mnist/relu'

#Learning rate
starting_learning_rate = 0.003

#Number of input features - MNIST has 784 features
n_features = trainX.shape[1]

#Number of possible output classess - 10 for MNIST
n_classes = trainY.shape[1]

#Model name for storage
model_name = 'mnist_dnn_relu_decay.ckpt'

#How many examples to feed for training at one time
batch_size = 100

#How many times all the data to be shown
training_epochs = 60

#Define number of neurons in 4 hidden layers
K = 200
L = 100
M = 60
N = 30

Parameters for Learning rate decay

In [8]:
#Decay steps 
decay_steps = 10000 

#Decay rate - should be less than 1
decay_rate = 0.96

# Build the Graph

Input placeholders

In [9]:
with tf.name_scope('input'):
    
    # None -> batch size can be any size, with n_features
    x = tf.placeholder(tf.float32, shape=[None, n_features], name="x-input") 
    
    # target n_classes output classes
    y_ = tf.placeholder(tf.float32, shape=[None, n_classes], name="y-input")

Layer 1

In [10]:
with tf.name_scope('layer_1'):
    W1 = tf.Variable(tf.truncated_normal([n_features, K], stddev=0.1))
    b1 = tf.Variable(tf.zeros([K]))
    Y1 = tf.nn.relu(tf.add(tf.matmul(x,W1),b1))

Layer 2

In [11]:
with tf.name_scope('layer_2'):
    W2 = tf.Variable(tf.truncated_normal([K, L], stddev=0.1))
    b2 = tf.Variable(tf.zeros([L]))
    Y2 = tf.nn.relu(tf.add(tf.matmul(Y1,W2),b2))

Layer 3

In [12]:
with tf.name_scope('layer_3'):
    W3 = tf.Variable(tf.truncated_normal([L, M], stddev=0.1))
    b3 = tf.Variable(tf.zeros([M]))
    Y3 = tf.nn.relu(tf.add(tf.matmul(Y2,W3),b3))

Layer 4

In [13]:
with tf.name_scope('layer_4'):
    W4 = tf.Variable(tf.truncated_normal([M, N], stddev=0.1))
    b4 = tf.Variable(tf.zeros([N]))
    Y4 = tf.nn.relu(tf.add(tf.matmul(Y3,W4),b4))

Output Layer   

In [14]:
with tf.name_scope("Output"):
    W5 = tf.Variable(tf.truncated_normal([N,n_classes], stddev=0.1))
    b5 = tf.Variable(tf.zeros([n_classes]))
    
    #To use more stable version of cross_entropy loss in tensorflow
    Ylogits = tf.matmul(Y4,W5) + b5
    
    #Actual prediction
    y = tf.nn.softmax(tf.matmul(Y4,W5) + b5)

Loss

In [15]:
with tf.name_scope('Loss'):
    
    #More stable version of cross_entroy    
    cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(logits=Ylogits, 
                                                               labels=y_)
    cross_entropy = tf.reduce_mean(cross_entropy)*100

GradientDescent Optimizer

In [16]:
with tf.name_scope('train'):
    
    #Capture current step
    current_step = tf.Variable(0, trainable=False)
    
    #Exponential decay rate
    #Formula for decay = 
    #  starting_learning_rate * decay_rate^(current_step/decay_steps)
    learn_rate = tf.train.exponential_decay(starting_learning_rate, 
                                            current_step,
                                            decay_steps, 
                                            decay_rate)    
    
    #GradientDescent optimizer with variable rate
    #Current_step gets incremented by Optimizer step
    train_op = tf.train.GradientDescentOptimizer(learn_rate).minimize(
        cross_entropy, global_step=current_step)

Model Accuracy

In [17]:
with tf.name_scope('Accuracy'):
    
    #Get the number with highest probability
    prediction = tf.argmax(y,1,name="Predict")
    
    #Compare prediction with actual
    correct_prediction = tf.equal(prediction, tf.argmax(y_,1))
    
    #Calculate mean accuracy across all data examples
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32),
                              name="accuracy")

Loss and Accuracy Logging

In [18]:
# Training loss and accuracy
training_loss = tf.summary.scalar("training_loss", cross_entropy)
training_accuracy = tf.summary.scalar("training_accuracy", accuracy)

#Test loss and accuracy
test_loss = tf.summary.scalar("test_loss", cross_entropy)
test_accuracy = tf.summary.scalar("test_accuracy", accuracy)

# create log writer object
writer = tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())

# Execute the Graph

In [23]:
#Start Graph execution
with tf.Session() as sess:
    
    # variables need to be initialized before we can use them
    sess.run(tf.global_variables_initializer())
   
    # perform training cycles
    for epoch in range(training_epochs):
        
        # number of batches in one epoch
        batch_count = int(trainX.shape[0]/batch_size)
        
        for i in range(batch_count):
            batch_x  = trainX[i*batch_size:i*batch_size+batch_size]
            batch_y  = trainY[i*batch_size:i*batch_size+batch_size]

            # perform the operations we defined earlier on batch
            _,acc,loss = sess.run([train_op, 
                                   training_accuracy,training_loss], 
                                  feed_dict={x: batch_x, y_: batch_y})
            
            #log training accuracy and loss
            writer.add_summary(acc, epoch * batch_count + i)
            writer.add_summary(loss, epoch * batch_count + i)    
                       
        #Test loss and accuracy
        #Please note we are giving test data
        acc,loss = sess.run([test_accuracy,test_loss],
                                   feed_dict={x: testX, y_: testY})
        
        writer.add_summary(acc, epoch * batch_count + i)
        writer.add_summary(loss, epoch * batch_count + i)
        
        if epoch % 5 == 0: 
            print ("Epoch: ", epoch)
            print ("Test Accuracy: ", accuracy.eval(feed_dict={x: testX, 
                                                               y_: testY}))               
    
    
    #Create a Saver to save the graph
    saver = tf.train.Saver()
    saver.save(sess, logs_path + '/' + model_name)

Epoch:  0
Test Accuracy:  0.9428
Epoch:  5
Test Accuracy:  0.974
Epoch:  10
Test Accuracy:  0.9753
Epoch:  15
Test Accuracy:  0.9746
Epoch:  20
Test Accuracy:  0.9792
Epoch:  25
Test Accuracy:  0.9789
Epoch:  30
Test Accuracy:  0.9795
Epoch:  35
Test Accuracy:  0.9796
Epoch:  40
Test Accuracy:  0.9798
Epoch:  45
Test Accuracy:  0.9793
Epoch:  50
Test Accuracy:  0.9793
Epoch:  55
Test Accuracy:  0.9793


**ANN outperformed the Logistic Regression classification or other Machine Learning models with Accuracy 97.93%**

In [24]:
sess.close()