# Assignment 4: Benchmarking Fashion-MNIST with Deep Neural Nets

Kathryn Young
<br>
kmy9ca
<br>
04/21/2019

### CS 4501 Machine Learning - Department of Computer Science - University of Virginia
"The original MNIST dataset contains a lot of handwritten digits. Members of the AI/ML/Data Science community love this dataset and use it as a benchmark to validate their algorithms. In fact, MNIST is often the first dataset researchers try. "If it doesn't work on MNIST, it won't work at all", they said. "Well, if it does work on MNIST, it may still fail on others." - **Zalando Research, Github Repo.**"

Fashion-MNIST is a dataset from the Zalando's article. Each example is a 28x28 grayscale image, associated with a label from 10 classes. They intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms.

![Here's an example how the data looks (each class takes three-rows):](https://github.com/zalandoresearch/fashion-mnist/raw/master/doc/img/fashion-mnist-sprite.png)

In this assignment, you will attempt to benchmark the Fashion-MNIST using Neural Networks. You must use it to train some neural networks on TensorFlow and predict the final output of 10 classes. For deliverables, you must write code in Python and submit this Jupyter Notebook file (.ipynb) to earn a total of 100 pts. You will gain points depending on how you perform in the following sections.


In [0]:
# You might want to use the following packages
import numpy as np
import os
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.ERROR) #reduce annoying warning messages
from functools import partial

# to make this notebook's output stable across runs
def reset_graph(seed=42):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)


---
## 1. PRE-PROCESSING THE DATA (10 pts)

You can load the Fashion MNIST directly from Tensorflow. **Partition of the dataset** so that you will have 50,000 examples for training, 10,000 examples for validation, and 10,000 examples for testing. Also, make sure that you platten out each of examples so that it contains only a 1-D feature vector.

Write some code to output the dimensionalities of each partition (train, validation, and test sets).



In [0]:
# Your code goes here for this section.
from sklearn.preprocessing import StandardScaler

(X_train, y_train), (X_test, y_test) = tf.keras.datasets.fashion_mnist.load_data();

In [0]:
# is this flattened???!

X_train = X_train.astype(np.float32).reshape(-1, 28*28) / 255.0
X_test = X_test.astype(np.float32).reshape(-1, 28*28) / 255.0
y_train = y_train.astype(np.int32)
y_test = y_test.astype(np.int32)

X_train = StandardScaler().fit_transform(X_train)
X_test = StandardScaler().fit_transform(X_test)

X_valid, X_train = X_train[:10000], X_train[10000:]
y_valid, y_train = y_train[:10000], y_train[10000:]

In [4]:
# Output dimensionalities... ??? 

print("X Validation set: ", X_valid.shape)
print("Y Validation set: ", y_valid.shape)
print("X Training set: ", X_train.shape)
print("Y Training set: ", y_train.shape)
print("X Testing set: ", X_test.shape)
print("Y Testing set: ", y_test.shape)

X Validation set:  (10000, 784)
Y Validation set:  (10000,)
X Training set:  (50000, 784)
Y Training set:  (50000,)
X Testing set:  (10000, 784)
Y Testing set:  (10000,)


- - -
## 2. CONSTRUCTION PHASE (30 pts)

In this section, define at least three neural networks with different structures. Make sure that the input layer has the right number of inputs. The best structure often is found through a process of trial and error experimentation:
- You may start with a fully connected network structure with two hidden layers.
- You may try a few settings of the number of nodes in each layer.
- You may try a few activation functions to see if they affect the performance.

**Important Implementation Note:** For the purpose of learning Tensorflow, you must use low-level TensorFlow API to construct the network. Usage of high-level tools (ie. Keras) is not permited. 

In [0]:
# Your code goes here
reset_graph()

# Set some configuration here
n_inputs = 28*28  # Fashion-MNIST
learning_rate = 0.01
n_outputs = 10

# Construct placeholder for the input layer. nO data goes here until it is run. When you run, you feed trainnig data in.
X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
y = tf.placeholder(tf.int32, shape=(None), name="y")

In [0]:
def neuron_layer(X, n_neurons, name, activation=None):
    with tf.name_scope(name):
        n_inputs = int(X.get_shape()[1])
        stddev = 2 / np.sqrt(n_inputs)
        init = tf.truncated_normal((n_inputs, n_neurons), stddev=stddev)
        W = tf.Variable(init, name="kernel")
        b = tf.Variable(tf.zeros([n_neurons]), name="bias")
        Z = tf.matmul(X, W) + b
        if activation is not None:
            return activation(Z)
        else:
            return Z

In [0]:
n_inputs = 28*28  # MNIST
n_hidden1 = 515
n_hidden2 = 350
n_hidden3 = 225
n_outputs = 10

# change number of layers, number of nodes in layer, activation function

In [0]:
with tf.name_scope("dnn1"):
    hidden1 = neuron_layer(X, n_hidden1, name="hidden1",
                           activation=tf.nn.relu)
    hidden2 = neuron_layer(hidden1, n_hidden2, name="hidden2",
                           activation=tf.nn.relu)
    hidden3 = neuron_layer(hidden2, n_hidden3, name="hidden3",
                           activation=tf.nn.relu)
    logits = neuron_layer(hidden3, n_outputs, name="outputs")

In [0]:
n2_inputs = 28*28  # MNIST
n2_hidden1 = 500
n2_hidden2 = 200
n2_outputs = 10

with tf.name_scope("dnn2"):
  #implementation of the second net here - changed number of nodes in hidden layers
  
  dnn2_hidden1 = neuron_layer(X, n2_hidden1, name="n2_hidden1",
                           activation=tf.nn.relu)
  dnn2_hidden2 = neuron_layer(dnn2_hidden1, n2_hidden2, name="n2_hidden2",
                           activation=tf.nn.relu)
  dnn2_logits = neuron_layer(dnn2_hidden2, n2_outputs, name="n2_outputs")

In [0]:
n3_inputs = 28*28  # MNIST
n3_hidden1 = 500
n3_hidden2 = 200
n3_hidden3 = 50
n3_outputs = 10

with tf.name_scope("dnn3"):
  #implementation of the third net here
  dnn3_hidden1 = tf.layers.dense(X, n3_hidden1, name="n3_hidden1",
                           activation=tf.nn.relu)
  dnn3_hidden2 = tf.layers.dense(dnn3_hidden1, n3_hidden2, name="n3_hidden2",
                           activation=tf.nn.relu)
  dnn3_hidden3 = tf.layers.dense(dnn3_hidden2, n3_hidden3, name="n3_hidden3",
                           activation=tf.nn.relu)
  dnn3_logits = tf.layers.dense(dnn3_hidden3, n3_outputs, name="n3_outputs")

In [0]:
with tf.name_scope("loss"):
#implementation of the loss function net here
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,
                                                              logits=logits)
    loss = tf.reduce_mean(xentropy, name="loss")

In [0]:
with tf.name_scope("loss2"):
#implementation of the loss function net here
    xentropy2 = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,
                                                              logits=dnn2_logits)
    loss2 = tf.reduce_mean(xentropy2, name="loss2")

In [0]:
with tf.name_scope("loss3"):
#implementation of the loss function net here
    xentropy3 = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,
                                                              logits=dnn3_logits)
    loss3 = tf.reduce_mean(xentropy3, name="loss3")

In [0]:
learning_rate = 0.01

with tf.name_scope("train"):
  #implementation of the training optimizer here
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    training_op = optimizer.minimize(loss)

In [0]:
learning_rate2 = 0.01

with tf.name_scope("train2"):
  #implementation of the training optimizer here
    optimizer2 = tf.train.GradientDescentOptimizer(learning_rate2)
    training_op2 = optimizer2.minimize(loss2)

In [0]:
learning_rate3 = 0.01

with tf.name_scope("train3"):
  #implementation of the training optimizer here
    optimizer3 = tf.train.GradientDescentOptimizer(learning_rate3)
    training_op3 = optimizer3.minimize(loss3)

In [0]:
with tf.name_scope("eval"):
  #implementation of the evaluation procedure here
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

In [0]:
with tf.name_scope("eval2"):
  #implementation of the evaluation procedure here
    correct2 = tf.nn.in_top_k(dnn2_logits, y, 1)
    accuracy2 = tf.reduce_mean(tf.cast(correct2, tf.float32))

In [0]:
with tf.name_scope("eval3"):
  #implementation of the evaluation procedure here
    correct3 = tf.nn.in_top_k(dnn3_logits, y, 1)
    accuracy3 = tf.reduce_mean(tf.cast(correct3, tf.float32))

- - -
## 3. EXECUTION PHASE (30 pts)

After you construct the three models of neural networks, you can compute the performance measure as the class accuracy. You will need to define the number of epochs and size of the training batch. You also might need to reset the graph each time your try a different model. To save time and avoid retraining, you should save the trained model and load it from disk to evaluate a test set. Pick the best model and answer the following:
- Which model yields the best performance measure for your dataset? Provide a reason why it yields the best performance.
- Why did you pick this many hidden layers?
- Provide some justifiable reasons for selecting the number of neurons per hidden layers. 
- Which activation functions did you use?

In the next session you will get a chance to finetune it further .



In [0]:
# Your code goes here
init = tf.global_variables_initializer()
saver = tf.train.Saver()

n_epochs = 15
batch_size = 20

# shuffle_batch() shuffle the examples in a batch before training
def shuffle_batch(X, y, batch_size):
    rnd_idx = np.random.permutation(len(X))
    n_batches = len(X) // batch_size
    for batch_idx in np.array_split(rnd_idx, n_batches):
        X_batch, y_batch = X[batch_idx], y[batch_idx]
        yield X_batch, y_batch


In [21]:
with tf.Session() as sess:
  init.run()
  for epoch in range(n_epochs):
    # implementation of the training ops here
    # implementation of the validation accuracy here
    for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
        sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
    acc_batch = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
    acc_val = accuracy.eval(feed_dict={X: X_valid, y: y_valid})
    print(epoch, "Batch accuracy:", acc_batch, "Val accuracy:", acc_val)
    
  save_path = saver.save(sess, "./my_dnn_model.ckpt")

0 Batch accuracy: 0.9 Val accuracy: 0.8599
1 Batch accuracy: 0.9 Val accuracy: 0.8657
2 Batch accuracy: 1.0 Val accuracy: 0.8764
3 Batch accuracy: 1.0 Val accuracy: 0.8814
4 Batch accuracy: 1.0 Val accuracy: 0.8795
5 Batch accuracy: 1.0 Val accuracy: 0.8755
6 Batch accuracy: 0.95 Val accuracy: 0.8876
7 Batch accuracy: 1.0 Val accuracy: 0.8824
8 Batch accuracy: 1.0 Val accuracy: 0.8784
9 Batch accuracy: 1.0 Val accuracy: 0.8877
10 Batch accuracy: 1.0 Val accuracy: 0.8837
11 Batch accuracy: 1.0 Val accuracy: 0.8897
12 Batch accuracy: 1.0 Val accuracy: 0.8856
13 Batch accuracy: 1.0 Val accuracy: 0.8891
14 Batch accuracy: 1.0 Val accuracy: 0.8873


In [22]:
# eval2 !?????
# this is for DNN2
n_epochs2 = 15
batch_size2 = 20

with tf.Session() as sess2:
  init.run()
  for epoch in range(n_epochs2):
    # implementation of the training ops here
    # implementation of the validation accuracy here
    for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size2):
        sess2.run(training_op2, feed_dict={X: X_batch, y: y_batch})
    acc_batch2 = accuracy2.eval(feed_dict={X: X_batch, y: y_batch})
    acc_val2 = accuracy2.eval(feed_dict={X: X_valid, y: y_valid})
    print(epoch, "Batch accuracy:", acc_batch2, "Val accuracy:", acc_val2)
    
  save_path = saver.save(sess2, "./my_dnn_model2.ckpt")

0 Batch accuracy: 0.9 Val accuracy: 0.8632
1 Batch accuracy: 0.9 Val accuracy: 0.8743
2 Batch accuracy: 0.95 Val accuracy: 0.8694
3 Batch accuracy: 1.0 Val accuracy: 0.8804
4 Batch accuracy: 0.95 Val accuracy: 0.8831
5 Batch accuracy: 1.0 Val accuracy: 0.8823
6 Batch accuracy: 1.0 Val accuracy: 0.8864
7 Batch accuracy: 0.95 Val accuracy: 0.8852
8 Batch accuracy: 1.0 Val accuracy: 0.8815
9 Batch accuracy: 1.0 Val accuracy: 0.888
10 Batch accuracy: 1.0 Val accuracy: 0.8829
11 Batch accuracy: 1.0 Val accuracy: 0.8836
12 Batch accuracy: 1.0 Val accuracy: 0.8862
13 Batch accuracy: 1.0 Val accuracy: 0.8878
14 Batch accuracy: 1.0 Val accuracy: 0.8883


In [23]:
# this is for DNN3
n_epochs3 = 15
batch_size3 = 20

with tf.Session() as sess3:
  init.run()
  for epoch in range(n_epochs3):
    # implementation of the training ops here
    # implementation of the validation accuracy here
    for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size3):
        sess3.run(training_op3, feed_dict={X: X_batch, y: y_batch})
    acc_batch3 = accuracy3.eval(feed_dict={X: X_batch, y: y_batch})
    acc_val3 = accuracy3.eval(feed_dict={X: X_valid, y: y_valid})
    print(epoch, "Batch accuracy:", acc_batch3, "Val accuracy:", acc_val3)
    
  save_path = saver.save(sess3, "./my_dnn_model3.ckpt")

0 Batch accuracy: 0.95 Val accuracy: 0.8615
1 Batch accuracy: 0.9 Val accuracy: 0.8735
2 Batch accuracy: 0.95 Val accuracy: 0.88
3 Batch accuracy: 0.95 Val accuracy: 0.8857
4 Batch accuracy: 0.9 Val accuracy: 0.8811
5 Batch accuracy: 1.0 Val accuracy: 0.8845
6 Batch accuracy: 1.0 Val accuracy: 0.8859
7 Batch accuracy: 0.95 Val accuracy: 0.8869
8 Batch accuracy: 1.0 Val accuracy: 0.881
9 Batch accuracy: 0.95 Val accuracy: 0.8852
10 Batch accuracy: 1.0 Val accuracy: 0.8914
11 Batch accuracy: 1.0 Val accuracy: 0.892
12 Batch accuracy: 1.0 Val accuracy: 0.8891
13 Batch accuracy: 1.0 Val accuracy: 0.884
14 Batch accuracy: 1.0 Val accuracy: 0.8896


In [24]:
with tf.Session() as sess:
    saver.restore(sess, "./my_dnn_model.ckpt")
    # implementation of the test set evaluation here
    X_new_scaled = X_test[:20]
    Z = logits.eval(feed_dict={X: X_new_scaled})
    y_pred = np.argmax(Z, axis=1)
    accuracy_val = accuracy.eval(feed_dict={X: X_test, y: y_test})
    

print("Predicted classes:", y_pred)
print("Actual classes:   ", y_test[:20])
print("Accuracy DNN 1 = ", accuracy_val)

Predicted classes: [9 2 1 1 6 1 4 6 5 7 4 5 7 3 4 1 2 2 8 0]
Actual classes:    [9 2 1 1 6 1 4 6 5 7 4 5 7 3 4 1 2 4 8 0]
Accuracy DNN 1 =  0.8803


In [25]:
with tf.Session() as sess2:
    saver.restore(sess2, "./my_dnn_model2.ckpt")
    # implementation of the test set evaluation here
    X_new_scaled2 = X_test[:20]
    Z2 = dnn2_logits.eval(feed_dict={X: X_new_scaled2})
    y_pred2 = np.argmax(Z2, axis=1)
    accuracy_val2 = accuracy2.eval(feed_dict={X: X_test, y: y_test})
    

print("Predicted classes:", y_pred2)
print("Actual classes:   ", y_test[:20])
print("Accuracy DNN 2 = ", accuracy_val2)

Predicted classes: [9 2 1 1 0 1 4 6 5 7 4 5 5 3 4 1 2 2 8 0]
Actual classes:    [9 2 1 1 6 1 4 6 5 7 4 5 7 3 4 1 2 4 8 0]
Accuracy DNN 2 =  0.8784


In [26]:
with tf.Session() as sess3:
    saver.restore(sess3, "./my_dnn_model3.ckpt")
    # implementation of the test set evaluation here
    X_new_scaled3 = X_test[:20]
    Z3 = dnn3_logits.eval(feed_dict={X: X_new_scaled3})
    y_pred3 = np.argmax(Z3, axis=1)
    accuracy_val3 = accuracy3.eval(feed_dict={X: X_test, y: y_test})
    
    

print("Predicted classes:", y_pred3)
print("Actual classes:   ", y_test[:20])
print("Accuracy DNN 3 = ", accuracy_val3)

Predicted classes: [9 2 1 1 6 1 4 6 5 7 4 5 5 3 4 1 2 2 8 0]
Actual classes:    [9 2 1 1 6 1 4 6 5 7 4 5 7 3 4 1 2 4 8 0]
Accuracy DNN 3 =  0.8842


In [27]:
# print out the final accuracy here
print("Accuracies: ")
print("Model 1: ", accuracy_val)
print("Model 2: ", accuracy_val2)
print("Model 3: ", accuracy_val3)

Accuracies: 
Model 1:  0.8803
Model 2:  0.8784
Model 3:  0.8842


Model 3 yields the best performance measure for this dataset by a small margin. Model 1 has three hidden layer, each having ~66% the amount of nodes from the previous layer, Model 2 has two hidden layers, and Model 3 which has three layers as well with smaller numbers of nodes at each level. I picked three hidden layers because it made sense that one hidden layer would attempt to pick up lines, the next layer would attempt to put those lines together into shapes (i.e. squares, circles, rectangles, etc.), and the third layer would put those shapes together to form more complicated shapes.

<br>

For Model 1, I selected a high number of neurons 515 for the first layer because it was about 2/3 of the neurons from the first layer and would hopefully be the best number to allow the network to pick up on lines the best. I funneled those 515 neurons into 350 other categories of shapes around (~2/3 of 515), and then I figured there would be multiple variations of a single item (therefore placing the number of neurons in the last level at 225, or ~2/3 of 350). The number of output neurons is 10 because there are 10 classes to be classified.

<br>

In Model 3, I decided to change the number of nodes a bit so that they funneled into the outputs more steeply. I started the first layer at 500 nodes, then moved to 200, then moved to 50 before ending the output layer at 10. I figured this would give a different look on accuracy than taking the standard ~2/3 of the previous layer.

<br>

I tried relu, elu, sigmoid, tanh, sigmoid and leaky_relu activation functions, and relu ended up coming out with the best results, which is why Model 1, Model 2, and Model 3 all use relu.

- - -
## 4. FINETUNING THE NETWORK (25 pts)

The best performance on the Fashion MNIST of a non-neural-net classifier is the Support Vector Classifier {"C":10,"kernel":"poly"} with 0.897 accuracy. In this section, you will see how close you can get to that accuracy, or (better yet) beat it! You will be able to see the performance of other ML methods below:
http://fashion-mnist.s3-website.eu-central-1.amazonaws.com

Use the best model from the previous section and see if you can improve it further. To improve the performance of your model, You must make some modifications based upon the practical guidelines discuss in class. Here are a few decisions about the recommended network configurations you have to make:
1. Initialization: Use He Initialization for your model
2. Activation: Add ELU as the activation function throughout your hidden layers
3. Normalization: Incorporate the batch normalization at every layer
4. Regularization: Configure the dropout policy at 50% rate
5. Optimization: Change Gradient Descent into Adam Optimization
6. Your choice: make any other changes in 1-5 you deem necessary

Keep in mind that the execution phase is essentially the same, so you can just run it from the above. See how much you gain in classification accuracy. Provide some justifications for the gain in performance. 






In [0]:
# this is where you try to get high accuracy.
# 1. initialization - what activation each first neuron starts at.
# 2. change activation functions - Leaky ReLU might be best option
# 3. Your choice can be #neurons, hidden layers, learning rate, batch sizes, epochs.

n_inputs = 28*28  # MNIST
n_hidden1 = 500
n_hidden2 = 200
n_hidden3 = 50
n_outputs = 10

he_init = tf.variance_scaling_initializer()

training = tf.placeholder_with_default(False, shape=(), name='training')

dropout_rate = 0.5  # == 1 - keep_prob
X_drop = tf.layers.dropout(X, dropout_rate, training=training)

with tf.name_scope("dnnBenchmark"):
  # implementation of the new benchmarking DNN here
    hidden1 = tf.layers.dense(X_drop, n_hidden1, activation=tf.nn.relu, kernel_initializer=he_init, name="hidden1")
    
    hidden2 = tf.layers.dense(hidden1, n_hidden2, activation=tf.nn.relu, kernel_initializer=he_init, name="hidden2")
    
    hidden3 = tf.layers.dense(hidden2, n_hidden1, activation=tf.nn.relu, kernel_initializer=he_init, name="hidden3")
    
    logits = tf.layers.dense(hidden3, n_outputs, name="outputs")
    
    

with tf.name_scope("loss"):
#implementation of the loss function net here
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,
                                                              logits=logits)
    loss = tf.reduce_mean(xentropy, name="loss")
    
learning_rate = 0.01


with tf.name_scope("train"):
  #implementation of the training optimizer here
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    training_op = optimizer.minimize(loss)

with tf.name_scope("eval"):
  #implementation of the evaluation procedure here
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

In [29]:
init = tf.global_variables_initializer()
saver = tf.train.Saver()

n_epochs = 15
batch_size = 20

with tf.Session() as sess:
  init.run()
  for epoch in range(n_epochs):
    # implementation of the training ops here
    # implementation of the validation accuracy here
    for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
        sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
    acc_batch = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
    acc_val = accuracy.eval(feed_dict={X: X_valid, y: y_valid})
    print(epoch, "Batch accuracy:", acc_batch, "Val accuracy:", acc_val)
    
  save_path = saver.save(sess, "./my_dnn_model.ckpt")

with tf.Session() as sess:
    saver.restore(sess, "./my_dnn_model.ckpt")
    # implementation of the test set evaluation here
    X_new_scaled = X_test[:20]
    Z = logits.eval(feed_dict={X: X_new_scaled})
    y_pred = np.argmax(Z, axis=1)
    accuracy_val = accuracy.eval(feed_dict={X: X_test, y: y_test})
    

print("Predicted classes:", y_pred)
print("Actual classes:   ", y_test[:20])
print("Accuracy DNN 3 = ", accuracy_val)

0 Batch accuracy: 1.0 Val accuracy: 0.8585
1 Batch accuracy: 1.0 Val accuracy: 0.8732
2 Batch accuracy: 0.95 Val accuracy: 0.8835
3 Batch accuracy: 0.9 Val accuracy: 0.8888
4 Batch accuracy: 1.0 Val accuracy: 0.8883
5 Batch accuracy: 0.95 Val accuracy: 0.89
6 Batch accuracy: 1.0 Val accuracy: 0.893
7 Batch accuracy: 0.95 Val accuracy: 0.8936
8 Batch accuracy: 0.95 Val accuracy: 0.8904
9 Batch accuracy: 1.0 Val accuracy: 0.8925
10 Batch accuracy: 1.0 Val accuracy: 0.8895
11 Batch accuracy: 1.0 Val accuracy: 0.8922
12 Batch accuracy: 1.0 Val accuracy: 0.8878
13 Batch accuracy: 1.0 Val accuracy: 0.8974
14 Batch accuracy: 1.0 Val accuracy: 0.89
Predicted classes: [9 2 1 1 6 1 4 6 5 7 4 5 8 3 4 1 2 2 8 0]
Actual classes:    [9 2 1 1 6 1 4 6 5 7 4 5 7 3 4 1 2 4 8 0]
Accuracy DNN 3 =  0.8817


There was a minor increase in accuracy after the optimizations I ran. The accuracy for my final model is typically around mid 88%. The slight increase in accuracy was due to modifications made such as adding He initialization. This was the modification that made the most noticable positive difference, although interestingly on occaison the original model accuracy is a little better.

<br> 

I experimented with changing relu activation to elu activation, implementing batch normalization at every layer, putting in a dropout policy at each layer, and changing Gradient Descent into Adam Optimization. Many of these modifications led to either no apparent change in accuracy or decrease in accuracy, so they were not included in the final model.

- - -
## 5. OUTLOOK (5 pts)

Plan for the outlook of your system: This may lead to the direction of your future project:
- Did your neural network outperform other "traditional ML technique? Why/why not?
- Does your model work well? If not, which model should be further investigated?
- Do you satisfy with your system? What do you think needed to improve?



My neural network outperformed the majority of the other traditional ML techniques but it did not quite outperform the best of the traditional ML techniques. My model missed the mark on this by a fraction of a percent. After making all of the various optimizations, the accuracies weren’t getting any better when added on at a certain point. My model works pretty well. To have that high of accuracy compared to other models on the same data means that it is worth keeping and maybe exploring further to a greater extent. I am satisfied with my system because it is intuitive to me and I tried everything suggested in order to get better results — including things and excluding things as necessary. However, I think there is probably an even bigger world of things I could examine to improve accuracy.

- - - 
### NEED HELP?

In case you get stuck in any step in the process, you may find some useful information from:

 * Consult my lectures and/or the textbook
 * Talk to the TA, they are available and there to help you during OH
 * Come talk to me or email me <nn4pj@virginia.edu> with subject starting "CS4501 Assignment 4:...".
 * More on the Fashion-MNIST to be found here: https://hanxiao.github.io/2018/09/28/Fashion-MNIST-Year-In-Review/

Best of luck and have fun!