# Neural Networks with MNIST

## Data Ingestion

In [545]:
# seed value for random number generators to obtain reproducible results
RANDOM_SEED = 1

# import base packages 
import numpy as np
import pandas as pd
import matplotlib

In [546]:
#import relevant machine learning packages
import tensorflow as tf
import keras 

In [547]:
#other packages
import time

In [548]:
#set working directory
import os
os.chdir('C:\\Users\\R\\Desktop\\MSDS 422\\Assignment 6')

In [549]:
#import dataset from Tensorflow
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz


### Note:
Models 1, 2 and 3 will analyze how increasing the number of nodes (10, 20, 50) impacts a 2 layer model

## Model 1 (2 layers,  10 nodes per layer, learning rate = .5, epochs = 10, batch size = 100)

In [550]:
tf.reset_default_graph()

# Set hyperparameters
learning_rate = 0.5
n_epochs = 10
batch_size = 100

image_size = 784
num_classes = 10

n_inputs = image_size
n_hidden1 = 10
n_hidden2 = 10
n_outputs = num_classes

X = tf.placeholder(tf.float32, shape = (None, n_inputs), name = 'X')
y = tf.placeholder(tf.float32, shape = (None), name = 'Y')

In [551]:
#Specify classifying method
with tf.name_scope("dnn"): 
    hidden1 = tf.layers.dense(X, n_hidden1, name = "hidden1", activation = tf.nn.relu) 
    hidden2 = tf.layers.dense(hidden1, n_hidden2, name = "hidden2", activation = tf.nn.relu) 
    logits = tf.layers.dense(hidden2, n_outputs, name = "outputs")

In [552]:
#Specify method to compute loss
with tf.name_scope("loss"): 
    xentropy = tf.nn.softmax_cross_entropy_with_logits(labels = y, logits = logits) 
    loss = tf.reduce_mean(xentropy, name = "loss")

In [553]:
#Specify training method
with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate) 
    training_op = optimizer.minimize(loss)

In [554]:
#Specify evaluation method
labels = tf.argmax(y, -1)

with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, labels, 1) 
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
    loss = tf.reduce_mean(xentropy, name = "loss")

In [555]:
init = tf.global_variables_initializer() 

In [556]:
%%time
with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs): 
        for iteration in range( mnist.train.num_examples // batch_size): 
            X_batch, y_batch = mnist.train.next_batch(batch_size) 
            sess.run(training_op, feed_dict ={ X: X_batch, y : y_batch}) 
            acc_train = accuracy.eval(feed_dict ={ X: X_batch, y : y_batch}) 
            acc_val = accuracy.eval(feed_dict ={ X: mnist.validation.images, y: mnist.validation.labels})
            loss_train = loss.eval(feed_dict = { X: X_batch, y : y_batch})
            loss_val = loss.eval(feed_dict ={ X: mnist.validation.images, y: mnist.validation.labels}) 
            print("Epoch:", epoch, "---Training accuracy:---", acc_train, "Training Loss:",loss_train,"---Validation accuracy:---", acc_val, "Validation Loss:", loss_val)  


Epoch: 0 ---Training accuracy:--- 0.1 Training Loss: 2.2626557 ---Validation accuracy:--- 0.078 Validation Loss: 2.2781837
Epoch: 0 ---Training accuracy:--- 0.16 Training Loss: 2.2032392 ---Validation accuracy:--- 0.0966 Validation Loss: 2.2477267
Epoch: 0 ---Training accuracy:--- 0.11 Training Loss: 2.210988 ---Validation accuracy:--- 0.1084 Validation Loss: 2.239252
Epoch: 0 ---Training accuracy:--- 0.15 Training Loss: 2.0570135 ---Validation accuracy:--- 0.0976 Validation Loss: 2.3596864
Epoch: 0 ---Training accuracy:--- 0.1 Training Loss: 2.267749 ---Validation accuracy:--- 0.1242 Validation Loss: 2.2703063
Epoch: 0 ---Training accuracy:--- 0.29 Training Loss: 2.1849194 ---Validation accuracy:--- 0.1502 Validation Loss: 2.223752
Epoch: 0 ---Training accuracy:--- 0.23 Training Loss: 2.1740236 ---Validation accuracy:--- 0.207 Validation Loss: 2.1903005
Epoch: 0 ---Training accuracy:--- 0.24 Training Loss: 2.1103039 ---Validation accuracy:--- 0.1972 Validation Loss: 2.1535132
Epoch: 0

In [557]:
#Calculate averages
print("Average Training Accuracy:", np.mean(acc_train))
print("Average Training Loss:", np.mean(loss_train))
print("Average Validation Accuracy:", np.mean(acc_val))
print("Average Validation Loss:", np.mean(loss_val))  

Average Training Accuracy: 0.95
Average Training Loss: 0.16784315
Average Validation Accuracy: 0.9366
Average Validation Loss: 0.23860742


## Model 2 (2 layers, 20 nodes per layer, learning rate = .5, epochs = 10, batch size = 100)

In [690]:
tf.reset_default_graph()

# Set hyperparameters
learning_rate = 0.5
n_epochs = 10
batch_size = 100

image_size = 784
num_classes = 10

n_inputs = image_size
n_hidden1 = 20
n_hidden2 = 20
n_outputs = num_classes

X = tf.placeholder(tf.float32, shape = (None, n_inputs), name = 'X')
y = tf.placeholder(tf.float32, shape = (None), name = 'Y')

In [691]:
#Specify classifying method
with tf.name_scope("dnn"): 
    hidden1 = tf.layers.dense(X, n_hidden1, name = "hidden1", activation = tf.nn.relu) 
    hidden2 = tf.layers.dense(hidden1, n_hidden2, name = "hidden2", activation = tf.nn.relu) 
    logits = tf.layers.dense(hidden2, n_outputs, name = "outputs")

In [692]:
#Specify method to compute loss
with tf.name_scope("loss"): 
    xentropy = tf.nn.softmax_cross_entropy_with_logits(labels = y, logits = logits) 
    loss = tf.reduce_mean(xentropy, name = "loss")

In [693]:
#Specify training method
with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate) 
    training_op = optimizer.minimize(loss)

In [694]:
#Specify evaluation method
labels = tf.argmax(y, -1)

with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, labels, 1) 
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32)) 

In [695]:
init = tf.global_variables_initializer() 

In [696]:
%%time
with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs): 
        for iteration in range( mnist.train.num_examples // batch_size): 
            X_batch, y_batch = mnist.train.next_batch(batch_size) 
            sess.run(training_op, feed_dict ={ X: X_batch, y : y_batch}) 
            acc_train = accuracy.eval(feed_dict ={ X: X_batch, y : y_batch}) 
            acc_val = accuracy.eval(feed_dict ={ X: mnist.validation.images, y: mnist.validation.labels}) 
            loss_train = loss.eval(feed_dict = { X: X_batch, y : y_batch})
            loss_val = loss.eval(feed_dict ={ X: mnist.validation.images, y: mnist.validation.labels}) 
            print("Epoch:", epoch, "---Training accuracy:---", acc_train, "Training Loss:",loss_train,"---Validation accuracy:---", acc_val, "Validation Loss:", loss_val)
            

Epoch: 0 ---Training accuracy:--- 0.14 Training Loss: 2.224299 ---Validation accuracy:--- 0.114 Validation Loss: 2.277481
Epoch: 0 ---Training accuracy:--- 0.23 Training Loss: 2.1850417 ---Validation accuracy:--- 0.164 Validation Loss: 2.2291486
Epoch: 0 ---Training accuracy:--- 0.23 Training Loss: 2.1468143 ---Validation accuracy:--- 0.1736 Validation Loss: 2.1708136
Epoch: 0 ---Training accuracy:--- 0.31 Training Loss: 2.037787 ---Validation accuracy:--- 0.2794 Validation Loss: 2.0798
Epoch: 0 ---Training accuracy:--- 0.5 Training Loss: 1.8895535 ---Validation accuracy:--- 0.378 Validation Loss: 1.9703861
Epoch: 0 ---Training accuracy:--- 0.49 Training Loss: 1.7285556 ---Validation accuracy:--- 0.3674 Validation Loss: 1.9137049
Epoch: 0 ---Training accuracy:--- 0.25 Training Loss: 1.9468378 ---Validation accuracy:--- 0.2028 Validation Loss: 2.0504744
Epoch: 0 ---Training accuracy:--- 0.29 Training Loss: 2.3574073 ---Validation accuracy:--- 0.2166 Validation Loss: 2.5413537
Epoch: 0 -

In [697]:
#Calculate averages
print("Average Training Accuracy:", np.mean(acc_train))
print("Average Training Loss:", np.mean(loss_train))
print("Average Validation Accuracy:", np.mean(acc_val))
print("Average Validation Loss:", np.mean(loss_val))      

Average Training Accuracy: 1.0
Average Training Loss: 0.017909564
Average Validation Accuracy: 0.9546
Average Validation Loss: 0.15524781


## Model 3 (2 layers, 50 nodes per layer, learning rate = .5, epochs = 10, batch size = 100)

In [646]:
tf.reset_default_graph()

# Set hyperparameters
learning_rate = 0.5
n_epochs = 10
batch_size = 100

image_size = 784
num_classes = 10

n_inputs = image_size
n_hidden1 = 20
n_hidden2 = 20
n_outputs = num_classes

X = tf.placeholder(tf.float32, shape = (None, n_inputs), name = 'X')
y = tf.placeholder(tf.float32, shape = (None), name = 'Y')

In [647]:
#Specify classifying method
with tf.name_scope("dnn"): 
    hidden1 = tf.layers.dense(X, n_hidden1, name = "hidden1", activation = tf.nn.relu) 
    hidden2 = tf.layers.dense(hidden1, n_hidden2, name = "hidden2", activation = tf.nn.relu) 
    logits = tf.layers.dense(hidden2, n_outputs, name = "outputs")

In [648]:
#Specify method to compute loss
with tf.name_scope("loss"): 
    xentropy = tf.nn.softmax_cross_entropy_with_logits(labels = y, logits = logits) 
    loss = tf.reduce_mean(xentropy, name = "loss")

In [649]:
#Specify training method
with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate) 
    training_op = optimizer.minimize(loss)

In [650]:
#Specify evaluation method
labels = tf.argmax(y, -1)

with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, labels, 1) 
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32)) 

In [652]:
init = tf.global_variables_initializer() 

In [653]:
%%time
with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs): 
        for iteration in range( mnist.train.num_examples // batch_size): 
            X_batch, y_batch = mnist.train.next_batch(batch_size) 
            sess.run(training_op, feed_dict ={ X: X_batch, y : y_batch}) 
            acc_train = accuracy.eval(feed_dict ={ X: X_batch, y : y_batch}) 
            acc_val = accuracy.eval(feed_dict ={ X: mnist.validation.images, y: mnist.validation.labels}) 
            loss_train = loss.eval(feed_dict = { X: X_batch, y : y_batch})
            loss_val = loss.eval(feed_dict ={ X: mnist.validation.images, y: mnist.validation.labels}) 
            print("Epoch:", epoch, "---Training accuracy:---", acc_train, "Training Loss:",loss_train,"---Validation accuracy:---", acc_val, "Validation Loss:", loss_val)

Epoch: 0 ---Training accuracy:--- 0.19 Training Loss: 2.2265763 ---Validation accuracy:--- 0.1224 Validation Loss: 2.2618487
Epoch: 0 ---Training accuracy:--- 0.22 Training Loss: 2.0797074 ---Validation accuracy:--- 0.1144 Validation Loss: 2.1995797
Epoch: 0 ---Training accuracy:--- 0.2 Training Loss: 2.0754874 ---Validation accuracy:--- 0.1176 Validation Loss: 2.217519
Epoch: 0 ---Training accuracy:--- 0.3 Training Loss: 2.0705547 ---Validation accuracy:--- 0.2168 Validation Loss: 2.178334
Epoch: 0 ---Training accuracy:--- 0.38 Training Loss: 1.9379627 ---Validation accuracy:--- 0.2852 Validation Loss: 2.0200233
Epoch: 0 ---Training accuracy:--- 0.46 Training Loss: 1.8623488 ---Validation accuracy:--- 0.3998 Validation Loss: 1.9511642
Epoch: 0 ---Training accuracy:--- 0.38 Training Loss: 1.8690786 ---Validation accuracy:--- 0.3454 Validation Loss: 1.880064
Epoch: 0 ---Training accuracy:--- 0.36 Training Loss: 1.8266654 ---Validation accuracy:--- 0.3348 Validation Loss: 1.8460598
Epoch

In [654]:
#Calculate averages
print("Average Training Accuracy:", np.mean(acc_train))
print("Average Training Loss:", np.mean(loss_train))
print("Average Validation Accuracy:", np.mean(acc_val))
print("Average Validation Loss:", np.mean(loss_val))   

Average Training Accuracy: 0.95
Average Training Loss: 0.12107233
Average Validation Accuracy: 0.9456
Average Validation Loss: 0.17633778


### Note:
It is apparent that Model 2 (20 nodes for each layer) results in the best performance (both time and accuracy) of the three models. This same analysis will be applied to 5 layer models (Models 4, 5 and 6).

## Model 4 (5 layers, 10 nodes per layer, learning rate = .5, epochs = 10, batch size = 100)

In [566]:
tf.reset_default_graph()

# Set hyperparameters
learning_rate = 0.5
n_epochs = 10
batch_size = 100

image_size = 784
num_classes = 10

n_inputs = image_size
n_hidden1 = 10
n_hidden2 = 10
n_hidden3 = 10
n_hidden4 = 10
n_hidden5 = 10
n_outputs = num_classes

X = tf.placeholder(tf.float32, shape = (None, n_inputs), name = 'X')
y = tf.placeholder(tf.float32, shape = (None), name = 'Y')

In [567]:
#Specify classifying method
with tf.name_scope("dnn"): 
    hidden1 = tf.layers.dense(X, n_hidden1, name = "hidden1", activation = tf.nn.relu) 
    hidden2 = tf.layers.dense(hidden1, n_hidden2, name = "hidden2", activation = tf.nn.relu) 
    hidden3 = tf.layers.dense(hidden2, n_hidden3, name = "hidden3", activation = tf.nn.relu) 
    hidden4 = tf.layers.dense(hidden3, n_hidden4, name = "hidden4", activation = tf.nn.relu) 
    hidden5 = tf.layers.dense(hidden4, n_hidden5, name = "hidden5", activation = tf.nn.relu) 
    logits = tf.layers.dense(hidden5, n_outputs, name = "outputs")

In [568]:
#Specify method to compute loss
with tf.name_scope("loss"): 
    xentropy = tf.nn.softmax_cross_entropy_with_logits(labels = y, logits = logits) 
    loss = tf.reduce_mean(xentropy, name = "loss")

In [569]:
#Specify training method
with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate) 
    training_op = optimizer.minimize(loss)

In [570]:
#Specify evaluation method
labels = tf.argmax(y, -1)

with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, labels, 1) 
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32)) 

In [571]:
init = tf.global_variables_initializer() 

In [572]:
%%time
with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs): 
        for iteration in range( mnist.train.num_examples // batch_size): 
            X_batch, y_batch = mnist.train.next_batch(batch_size) 
            sess.run(training_op, feed_dict ={ X: X_batch, y : y_batch}) 
            acc_train = accuracy.eval(feed_dict ={ X: X_batch, y : y_batch}) 
            acc_val = accuracy.eval(feed_dict ={ X: mnist.validation.images, y: mnist.validation.labels}) 
            loss_train = loss.eval(feed_dict = { X: X_batch, y : y_batch})
            loss_val = loss.eval(feed_dict ={ X: mnist.validation.images, y: mnist.validation.labels}) 
            print("Epoch:", epoch, "---Training accuracy:---", acc_train, "Training Loss:",loss_train,"---Validation accuracy:---", acc_val, "Validation Loss:", loss_val) 

Epoch: 0 ---Training accuracy:--- 0.14 Training Loss: 2.291224 ---Validation accuracy:--- 0.096 Validation Loss: 2.2998035
Epoch: 0 ---Training accuracy:--- 0.07 Training Loss: 2.280573 ---Validation accuracy:--- 0.0942 Validation Loss: 2.2940798
Epoch: 0 ---Training accuracy:--- 0.14 Training Loss: 2.270166 ---Validation accuracy:--- 0.1168 Validation Loss: 2.2889724
Epoch: 0 ---Training accuracy:--- 0.16 Training Loss: 2.2850127 ---Validation accuracy:--- 0.135 Validation Loss: 2.2902858
Epoch: 0 ---Training accuracy:--- 0.09 Training Loss: 2.2702405 ---Validation accuracy:--- 0.107 Validation Loss: 2.283296
Epoch: 0 ---Training accuracy:--- 0.16 Training Loss: 2.2632182 ---Validation accuracy:--- 0.1374 Validation Loss: 2.2747679
Epoch: 0 ---Training accuracy:--- 0.11 Training Loss: 2.2334428 ---Validation accuracy:--- 0.0938 Validation Loss: 2.2934442
Epoch: 0 ---Training accuracy:--- 0.26 Training Loss: 2.2692146 ---Validation accuracy:--- 0.1908 Validation Loss: 2.2811277
Epoch: 

In [615]:
#Calculate averages
print("Average Training Accuracy:", np.mean(acc_train))
print("Average Training Loss:", np.mean(loss_train))
print("Average Validation Accuracy:", np.mean(acc_val))
print("Average Validation Loss:", np.mean(loss_val))  

Average Training Accuracy: 0.82
Average Training Loss: 0.4776445
Average Validation Accuracy: 0.756
Average Validation Loss: 0.4732594


## Model 5 (5 layers, 20 nodes per layer, learning rate = .5, epochs = 10, batch size = 100)

In [658]:
tf.reset_default_graph()

# Set hyperparameters
learning_rate = 0.5
n_epochs = 10
batch_size = 100

image_size = 784
num_classes = 10

n_inputs = image_size
n_hidden1 = 20
n_hidden2 = 20
n_hidden3 = 20
n_hidden4 = 20
n_hidden5 = 20
n_outputs = num_classes

X = tf.placeholder(tf.float32, shape = (None, n_inputs), name = 'X')
y = tf.placeholder(tf.float32, shape = (None), name = 'Y')

In [659]:
#Specify classifying method
with tf.name_scope("dnn"): 
    hidden1 = tf.layers.dense(X, n_hidden1, name = "hidden1", activation = tf.nn.relu) 
    hidden2 = tf.layers.dense(hidden1, n_hidden2, name = "hidden2", activation = tf.nn.relu) 
    hidden3 = tf.layers.dense(hidden2, n_hidden3, name = "hidden3", activation = tf.nn.relu) 
    hidden4 = tf.layers.dense(hidden3, n_hidden4, name = "hidden4", activation = tf.nn.relu) 
    hidden5 = tf.layers.dense(hidden4, n_hidden5, name = "hidden5", activation = tf.nn.relu) 
    logits = tf.layers.dense(hidden5, n_outputs, name = "outputs")

In [660]:
#Specify method to compute loss
with tf.name_scope("loss"): 
    xentropy = tf.nn.softmax_cross_entropy_with_logits(labels = y, logits = logits) 
    loss = tf.reduce_mean(xentropy, name = "loss")

In [661]:
#Specify training method
with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate) 
    training_op = optimizer.minimize(loss)

In [662]:
#Specify evaluation method
labels = tf.argmax(y, -1)

with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, labels, 1) 
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32)) 

In [663]:
init = tf.global_variables_initializer() 

In [664]:
%%time
with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs): 
        for iteration in range( mnist.train.num_examples // batch_size): 
            X_batch, y_batch = mnist.train.next_batch(batch_size) 
            sess.run(training_op, feed_dict ={ X: X_batch, y : y_batch}) 
            acc_train = accuracy.eval(feed_dict ={ X: X_batch, y : y_batch}) 
            acc_val = accuracy.eval(feed_dict ={ X: mnist.validation.images, y: mnist.validation.labels}) 
            loss_train = loss.eval(feed_dict = { X: X_batch, y : y_batch})
            loss_val = loss.eval(feed_dict ={ X: mnist.validation.images, y: mnist.validation.labels}) 
            print("Epoch:", epoch, "---Training accuracy:---", acc_train, "Training Loss:",loss_train,"---Validation accuracy:---", acc_val, "Validation Loss:", loss_val)

Epoch: 0 ---Training accuracy:--- 0.14 Training Loss: 2.2785103 ---Validation accuracy:--- 0.0916 Validation Loss: 2.2913618
Epoch: 0 ---Training accuracy:--- 0.17 Training Loss: 2.2680907 ---Validation accuracy:--- 0.1404 Validation Loss: 2.2794037
Epoch: 0 ---Training accuracy:--- 0.12 Training Loss: 2.235782 ---Validation accuracy:--- 0.13 Validation Loss: 2.2681599
Epoch: 0 ---Training accuracy:--- 0.22 Training Loss: 2.230075 ---Validation accuracy:--- 0.1756 Validation Loss: 2.2581968
Epoch: 0 ---Training accuracy:--- 0.12 Training Loss: 2.2616296 ---Validation accuracy:--- 0.1146 Validation Loss: 2.2523844
Epoch: 0 ---Training accuracy:--- 0.21 Training Loss: 2.21822 ---Validation accuracy:--- 0.2038 Validation Loss: 2.2311654
Epoch: 0 ---Training accuracy:--- 0.25 Training Loss: 2.1881368 ---Validation accuracy:--- 0.1694 Validation Loss: 2.2275846
Epoch: 0 ---Training accuracy:--- 0.33 Training Loss: 2.1264346 ---Validation accuracy:--- 0.2708 Validation Loss: 2.1812437
Epoch:

In [665]:
#Calculate averages
print("Average Training Accuracy:", np.mean(acc_train))
print("Average Training Loss:", np.mean(loss_train))
print("Average Validation Accuracy:", np.mean(acc_val))
print("Average Validation Loss:", np.mean(loss_val))  

Average Training Accuracy: 0.98
Average Training Loss: 0.057237938
Average Validation Accuracy: 0.949
Average Validation Loss: 0.18635759


## Model 6 (5 hidden layers, 50 nodes per layer, learning rate = .5, epochs = 10, batch size = 100)

In [667]:
tf.reset_default_graph()

# Set hyperparameters
learning_rate = 0.5
n_epochs = 10
batch_size = 100

image_size = 784
num_classes = 10

n_inputs = image_size
n_hidden1 = 50
n_hidden2 = 50
n_hidden3 = 50
n_hidden4 = 50
n_hidden5 = 50
n_outputs = num_classes

X = tf.placeholder(tf.float32, shape = (None, n_inputs), name = 'X')
y = tf.placeholder(tf.float32, shape = (None), name = 'Y')

In [668]:
#Specify classifying method
with tf.name_scope("dnn"): 
    hidden1 = tf.layers.dense(X, n_hidden1, name = "hidden1", activation = tf.nn.relu) 
    hidden2 = tf.layers.dense(hidden1, n_hidden2, name = "hidden2", activation = tf.nn.relu) 
    hidden3 = tf.layers.dense(hidden2, n_hidden3, name = "hidden3", activation = tf.nn.relu) 
    hidden4 = tf.layers.dense(hidden3, n_hidden4, name = "hidden4", activation = tf.nn.relu) 
    hidden5 = tf.layers.dense(hidden4, n_hidden5, name = "hidden5", activation = tf.nn.relu) 
    logits = tf.layers.dense(hidden5, n_outputs, name = "outputs")

In [669]:
#Specify method to compute loss
with tf.name_scope("loss"): 
    xentropy = tf.nn.softmax_cross_entropy_with_logits(labels = y, logits = logits) 
    loss = tf.reduce_mean(xentropy, name = "loss")

In [670]:
#Specify training method
with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate) 
    training_op = optimizer.minimize(loss)

In [671]:
#Specify evaluation method
labels = tf.argmax(y, -1)

with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, labels, 1) 
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32)) 

In [672]:
init = tf.global_variables_initializer() 

In [673]:
%%time
with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs): 
        for iteration in range( mnist.train.num_examples // batch_size): 
            X_batch, y_batch = mnist.train.next_batch(batch_size) 
            sess.run(training_op, feed_dict ={ X: X_batch, y : y_batch}) 
            acc_train = accuracy.eval(feed_dict ={ X: X_batch, y : y_batch}) 
            acc_val = accuracy.eval(feed_dict ={ X: mnist.validation.images, y: mnist.validation.labels}) 
            loss_train = loss.eval(feed_dict = { X: X_batch, y : y_batch})
            loss_val = loss.eval(feed_dict ={ X: mnist.validation.images, y: mnist.validation.labels}) 
            print("Epoch:", epoch, "---Training accuracy:---", acc_train, "Training Loss:",loss_train,"---Validation accuracy:---", acc_val, "Validation Loss:", loss_val)

Epoch: 0 ---Training accuracy:--- 0.22 Training Loss: 2.2736356 ---Validation accuracy:--- 0.1202 Validation Loss: 2.290353
Epoch: 0 ---Training accuracy:--- 0.21 Training Loss: 2.248796 ---Validation accuracy:--- 0.1406 Validation Loss: 2.2772996
Epoch: 0 ---Training accuracy:--- 0.37 Training Loss: 2.2276382 ---Validation accuracy:--- 0.2382 Validation Loss: 2.258215
Epoch: 0 ---Training accuracy:--- 0.28 Training Loss: 2.2037609 ---Validation accuracy:--- 0.2392 Validation Loss: 2.2423542
Epoch: 0 ---Training accuracy:--- 0.25 Training Loss: 2.1656106 ---Validation accuracy:--- 0.2148 Validation Loss: 2.2025418
Epoch: 0 ---Training accuracy:--- 0.34 Training Loss: 2.1594949 ---Validation accuracy:--- 0.2846 Validation Loss: 2.1878815
Epoch: 0 ---Training accuracy:--- 0.3 Training Loss: 2.1404524 ---Validation accuracy:--- 0.2772 Validation Loss: 2.1293032
Epoch: 0 ---Training accuracy:--- 0.32 Training Loss: 2.0483098 ---Validation accuracy:--- 0.3608 Validation Loss: 2.0354571
Epoc

In [674]:
#Calculate averages
print("Average Training Accuracy:", np.mean(acc_train))
print("Average Training Loss:", np.mean(loss_train))
print("Average Validation Accuracy:", np.mean(acc_val))
print("Average Validation Loss:", np.mean(loss_val)) 

Average Training Accuracy: 0.98
Average Training Loss: 0.078473985
Average Validation Accuracy: 0.9714
Average Validation Loss: 0.10947518


### Note:
It appears that adding more layers does not necessarily improve accuracy or time for this data set. In fact, the 5 layer 10 node model was the worst performer (avg training accuracy: 82%, avg validation accuracy: 75.6%) compared to all 2 layer and 5 layer models. In both 2 and 5 layer models, increasing the number of nodes generally increased performance. The best performing model, however, was Model 2 (2 layer 20 node) which averaged 100% training accuracy and 95.6% validation accuracy. 

In the next model, I will build off of Model 2 and adjust other hyperparameters including number of epochs, batch size and learning rate.

## Model 7 (2 layers, 20 nodes, 20 epochs, batch size = 200, learning rate = .01)

In [624]:
tf.reset_default_graph()

# Set hyperparameters
learning_rate = 0.01
n_epochs = 20
batch_size = 200

image_size = 784
num_classes = 10

n_inputs = image_size
n_hidden1 = 10
n_hidden2 = 10
n_outputs = num_classes

#Create placeholder variables
X = tf.placeholder(tf.float32, shape = (None, n_inputs), name = 'X')
y = tf.placeholder(tf.float32, shape = (None), name = 'Y')

In [625]:
#Specify classifying method
with tf.name_scope("dnn"): 
    hidden1 = tf.layers.dense(X, n_hidden1, name = "hidden1", activation = tf.nn.relu) 
    hidden2 = tf.layers.dense(hidden1, n_hidden2, name = "hidden2", activation = tf.nn.relu) 
    logits = tf.layers.dense(hidden2, n_outputs, name = "outputs")

In [626]:
#Specify method to compute loss
with tf.name_scope("loss"): 
    xentropy = tf.nn.softmax_cross_entropy_with_logits(labels = y, logits = logits) 
    loss = tf.reduce_mean(xentropy, name = "loss")

In [627]:
#Specify training method
with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate) 
    training_op = optimizer.minimize(loss)

In [628]:
#Specify evaluation method
labels = tf.argmax(y, -1)

with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, labels, 1) 
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32)) 

In [629]:
init = tf.global_variables_initializer() 

In [630]:
%%time
with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs): 
        for iteration in range( mnist.train.num_examples // batch_size): 
            X_batch, y_batch = mnist.train.next_batch(batch_size) 
            sess.run(training_op, feed_dict ={ X: X_batch, y : y_batch}) 
            acc_train = accuracy.eval(feed_dict ={ X: X_batch, y : y_batch}) 
            acc_val = accuracy.eval(feed_dict ={ X: mnist.validation.images, y: mnist.validation.labels}) 
            loss_train = loss.eval(feed_dict = { X: X_batch, y : y_batch})
            loss_val = loss.eval(feed_dict ={ X: mnist.validation.images, y: mnist.validation.labels}) 
            print("Epoch:", epoch, "---Training accuracy:---", acc_train, "Training Loss:",loss_train,"---Validation accuracy:---", acc_val, "Validation Loss:", loss_val) 

Epoch: 0 ---Training accuracy:--- 0.11 Training Loss: 2.3163033 ---Validation accuracy:--- 0.0822 Validation Loss: 2.3313577
Epoch: 0 ---Training accuracy:--- 0.065 Training Loss: 2.3221838 ---Validation accuracy:--- 0.085 Validation Loss: 2.3293192
Epoch: 0 ---Training accuracy:--- 0.08 Training Loss: 2.344547 ---Validation accuracy:--- 0.0862 Validation Loss: 2.326922
Epoch: 0 ---Training accuracy:--- 0.115 Training Loss: 2.3015022 ---Validation accuracy:--- 0.0888 Validation Loss: 2.3251207
Epoch: 0 ---Training accuracy:--- 0.08 Training Loss: 2.3274288 ---Validation accuracy:--- 0.0914 Validation Loss: 2.3230243
Epoch: 0 ---Training accuracy:--- 0.06 Training Loss: 2.3316236 ---Validation accuracy:--- 0.093 Validation Loss: 2.320863
Epoch: 0 ---Training accuracy:--- 0.09 Training Loss: 2.2992556 ---Validation accuracy:--- 0.0944 Validation Loss: 2.3192058
Epoch: 0 ---Training accuracy:--- 0.09 Training Loss: 2.3066673 ---Validation accuracy:--- 0.097 Validation Loss: 2.3172815
Epoc

In [631]:
#Calculate averages
print("Average Training Accuracy:", np.mean(acc_train))
print("Average Training Loss:", np.mean(loss_train))
print("Average Validation Accuracy:", np.mean(acc_val))
print("Average Validation Loss:", np.mean(loss_val))  

Average Training Accuracy: 0.93
Average Training Loss: 0.37265828
Average Validation Accuracy: 0.9182
Average Validation Loss: 0.29666436


### Note:
It seems that increasing the number of epochs and batch size while decreasing the learning rate does not increase model performance. 10 epochs, batch size = 100 and learning rate = .5 seem to be suitable parameters for the model. The one parameter that has shown a positive impact on model performance is adjusting the number of nodes per layer. 

In the next model (Model 8) I will use these parameters and look at how significantly increasing and staggering the amount of nodes per layer impacts model performance. The first layer will have 3x more nodes than the second layer (300 and 100 nodes).


## Model 8 (2 layers, 1 layer = 300 nodes, 2 layer = 100 nodes, epoch=10, batch size = 100, learning rate = .5)

In [698]:
tf.reset_default_graph()

# Set hyperparameters
learning_rate = 0.5
n_epochs = 10
batch_size = 100

image_size = 784
num_classes = 10

n_inputs = image_size
n_hidden1 = 300
n_hidden2 = 100
n_outputs = num_classes

#Create placeholder variables
X = tf.placeholder(tf.float32, shape = (None, n_inputs), name = 'X')
y = tf.placeholder(tf.float32, shape = (None), name = 'Y')

In [699]:
#Specify classifying method
with tf.name_scope("dnn"): 
    hidden1 = tf.layers.dense(X, n_hidden1, name = "hidden1", activation = tf.nn.relu) 
    hidden2 = tf.layers.dense(hidden1, n_hidden2, name = "hidden2", activation = tf.nn.relu) 
    logits = tf.layers.dense(hidden2, n_outputs, name = "outputs")

In [700]:
#Specify method to compute loss
with tf.name_scope("loss"): 
    xentropy = tf.nn.softmax_cross_entropy_with_logits(labels = y, logits = logits) 
    loss = tf.reduce_mean(xentropy, name = "loss")

In [701]:
#Specify training method
with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate) 
    training_op = optimizer.minimize(loss)

In [702]:
#Specify evaluation method
labels = tf.argmax(y, -1)

with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, labels, 1) 
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32)) 

In [703]:
init = tf.global_variables_initializer() 

In [704]:
%%time
with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs): 
        for iteration in range( mnist.train.num_examples // batch_size): 
            X_batch, y_batch = mnist.train.next_batch(batch_size) 
            sess.run(training_op, feed_dict ={ X: X_batch, y : y_batch}) 
            acc_train = accuracy.eval(feed_dict ={ X: X_batch, y : y_batch}) 
            acc_val = accuracy.eval(feed_dict ={ X: mnist.validation.images, y: mnist.validation.labels}) 
            loss_train = loss.eval(feed_dict = { X: X_batch, y : y_batch})
            loss_val = loss.eval(feed_dict ={ X: mnist.validation.images, y: mnist.validation.labels}) 
            print("Epoch:", epoch, "---Training accuracy:---", acc_train, "Training Loss:",loss_train,"---Validation accuracy:---", acc_val, "Validation Loss:", loss_val)

Epoch: 0 ---Training accuracy:--- 0.54 Training Loss: 1.9371701 ---Validation accuracy:--- 0.2572 Validation Loss: 2.1115193
Epoch: 0 ---Training accuracy:--- 0.71 Training Loss: 1.6735977 ---Validation accuracy:--- 0.4934 Validation Loss: 1.9213179
Epoch: 0 ---Training accuracy:--- 0.52 Training Loss: 1.5299875 ---Validation accuracy:--- 0.3962 Validation Loss: 1.7890176
Epoch: 0 ---Training accuracy:--- 0.22 Training Loss: 2.0698125 ---Validation accuracy:--- 0.1372 Validation Loss: 2.3025618
Epoch: 0 ---Training accuracy:--- 0.29 Training Loss: 2.220332 ---Validation accuracy:--- 0.195 Validation Loss: 2.494863
Epoch: 0 ---Training accuracy:--- 0.38 Training Loss: 2.1099882 ---Validation accuracy:--- 0.3008 Validation Loss: 2.232044
Epoch: 0 ---Training accuracy:--- 0.52 Training Loss: 1.8572631 ---Validation accuracy:--- 0.4484 Validation Loss: 1.8880098
Epoch: 0 ---Training accuracy:--- 0.52 Training Loss: 1.6035874 ---Validation accuracy:--- 0.4904 Validation Loss: 1.6918489
Epoc

In [705]:
#Calculate averages
training_average = "Average Training Accuracy:", np.mean(acc_train)
print(training_average)
training_loss_average ="Average Training Loss:", np.mean(loss_train)
print(training_loss_average)
validation_average ="Average Validation Accuracy:", np.mean(acc_val)
print(validation_average)
validation_loss_average = "Average Validation Loss:", np.mean(loss_val)
print(training_loss_average)


('Average Training Accuracy:', 1.0)
('Average Training Loss:', 0.0009903791)
('Average Validation Accuracy:', 0.9808)
('Average Training Loss:', 0.0009903791)


## Conclustion:

Of the seven models tested, Model 2 was the best overall performer in terms of accuracy, loss minimization and computation time. Model 2 had the following hyperparameters: 2 hidden layers, 20 nodes per layer, learning rate = .5, epochs = 10 and batch size = 100. Model 2 achieved an average of 100% training accuracy and 95.6% validation accuracy in 1 min 48 seconds. Model 8  also resulted in similar prediction results. Model 8 had the same parameters as Model 2 but had 300 nodes in the first layer and 100 in the second layer. It achieved an average training accuracy score of 100% and an average validation score of 98%. The main difference is Model 7 took 7 min 30 seconds of computation time compared to 1 min 48 seconds. Therefore, Model 2 was deemed the optimal model.

Model 4 was by far the worst performer of the models with average training accuracy of 82% and 75.6% average validation accuracy in 1 min 48 seconds. Model 4 had the following hyperparameters: 5 hidden layers, 10 nodes per layer, learning rate = .5, epochs = 10, batch size = 100)

A conclusion from these results is that in general, increasing the number of nodes per layer has a positive impact on accuracy. However, as nodes per layer increase so does computation time. Another conclusion is that increasing the number of layers does not necessarily lead to more accurate results. This may be a result of the data not being very complex, causing the algorithm to overfit the training data and not generalize to new data.

Using Model 2 as a reference, increasing other hyperparameters like batch size and number of epochs while adjusting learning rate had mostly negative effects on model performance.