# Assignment 4: Benchmarking Fashion-MNIST with Deep Neural Nets

### CS 4501 Machine Learning - Department of Computer Science - University of Virginia
"The original MNIST dataset contains a lot of handwritten digits. Members of the AI/ML/Data Science community love this dataset and use it as a benchmark to validate their algorithms. In fact, MNIST is often the first dataset researchers try. "If it doesn't work on MNIST, it won't work at all", they said. "Well, if it does work on MNIST, it may still fail on others." - **Zalando Research, Github Repo.**"

Fashion-MNIST is a dataset from the Zalando's article. Each example is a 28x28 grayscale image, associated with a label from 10 classes. They intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms.

![Here's an example how the data looks (each class takes three-rows):](https://github.com/zalandoresearch/fashion-mnist/raw/master/doc/img/fashion-mnist-sprite.png)

In this assignment, you will attempt to benchmark the Fashion-MNIST using Neural Networks. You must use it to train some neural networks on TensorFlow and predict the final output of 10 classes. For deliverables, you must write code in Python and submit this Jupyter Notebook file (.ipynb) to earn a total of 100 pts. You will gain points depending on how you perform in the following sections.


In [1]:
# You might want to use the following packages
import numpy as np
import os
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.ERROR) #reduce annoying warning messages
from functools import partial

# to make this notebook's output stable across runs
def reset_graph(seed=42):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)


---
## 1. PRE-PROCESSING THE DATA (10 pts)

You can load the Fashion MNIST directly from Tensorflow. **Partition of the dataset** so that you will have 50,000 examples for training, 10,000 examples for validation, and 10,000 examples for testing. Also, make sure that you platten out each of examples so that it contains only a 1-D feature vector.

Write some code to output the dimensionalities of each partition (train, validation, and test sets).



In [2]:
# Your code goes here for this section.
fmnist = tf.keras.datasets.fashion_mnist.load_data();

(X_train, y_train), (X_test, y_test) = fmnist
X_train = X_train.astype(np.float32).reshape(-1, 28*28) / 255.0
X_test = X_test.astype(np.float32).reshape(-1, 28*28) / 255.0
y_train = y_train.astype(np.int32)
y_test = y_test.astype(np.int32)
X_valid, X_train = X_train[:10000], X_train[10000:]
y_valid, y_train = y_train[:10000], y_train[10000:]
print(len(X_train))
print(len(X_test))
print(len(X_valid))

50000
10000
10000


- - -
## 2. CONSTRUCTION PHASE (30 pts)

In this section, define at least three neural networks with different structures. Make sure that the input layer has the right number of inputs. The best structure often is found through a process of trial and error experimentation:
- You may start with a fully connected network structure with two hidden layers.
- You may try a few settings of the number of nodes in each layer.
- You may try a few activation functions to see if they affect the performance.

**Important Implementation Note:** For the purpose of learning Tensorflow, you must use low-level TensorFlow API to construct the network. Usage of high-level tools (ie. Keras) is not permited. 

In [3]:
# Your code goes here
reset_graph()

def leaky_relu(z, name=None):
    return tf.maximum(0.01 * z, z, name=name)

# Set some configuration here
n_inputs = 28*28  # Fashion-MNIST
learning_rate = 0.01
n_outputs = 10

# Construct placeholder for the input layer
X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
y = tf.placeholder(tf.int32, shape=(None), name="y")

"Beginning here"

'Beginning here'

In [4]:
with tf.name_scope("dnn1"):
    #implementation of the first net here
    n_hidden1 = 300
    n_hidden2 = 100
    hidden1 = tf.layers.dense(X, n_hidden1, activation=leaky_relu, name="hidden1")
    hidden2 = tf.layers.dense(hidden1, n_hidden2, activation=leaky_relu, name="hidden2")
    dnn1_logits = tf.layers.dense(hidden2, n_outputs, name="outputs")

In [5]:
with tf.name_scope("dnn2"):
    #implementation of the second net here
    n_hidden1 = 300
    n_hidden2 = 100
    n_hidden3 = 50
    hidden1 = tf.layers.dense(X, n_hidden1, activation=leaky_relu, name="dnn2_hidden1")
    hidden2 = tf.layers.dense(hidden1, n_hidden2, activation=tf.nn.elu, name="dnn2_hidden2")
    hidden3 = tf.layers.dense(hidden2, n_hidden3, activation=tf.nn.elu, name="dnn2_hidden3")
    dnn2_logits = tf.layers.dense(hidden3, n_outputs, name="dnn2_outputs")

In [6]:
with tf.name_scope("dnn3"):
    #implementation of the third net here
    n_hidden1 = 150
    n_hidden2 = 50
    hidden1 = tf.layers.dense(X, n_hidden1, activation=tf.nn.elu, name="dnn3_hidden1")
    hidden2 = tf.layers.dense(hidden1, n_hidden2, activation=tf.nn.elu, name="dnn3_hidden2")
    dnn3_logits = tf.layers.dense(hidden2, n_outputs, name="dnn3_outputs")

In [7]:
with tf.name_scope("loss"):
    #implementation of the loss function net here
    dnn1_xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=dnn1_logits)
    dnn2_xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=dnn2_logits)
    dnn3_xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=dnn3_logits)
    dnn1_loss = tf.reduce_mean(dnn1_xentropy, name="dnn1_loss")
    dnn2_loss = tf.reduce_mean(dnn2_xentropy, name="dnn2_loss")
    dnn3_loss = tf.reduce_mean(dnn3_xentropy, name="dnn3_loss")

In [8]:
with tf.name_scope("train"):
    #implementation of the training optimizer here
    dnn1_optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    dnn1_training_op = dnn1_optimizer.minimize(dnn1_loss)
    
    dnn2_optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    dnn2_training_op = dnn2_optimizer.minimize(dnn2_loss)
    
    dnn3_optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    dnn3_training_op = dnn3_optimizer.minimize(dnn3_loss)

In [9]:
with tf.name_scope("eval"):
    #implementation of the evaluation procedure here
    dnn1_correct = tf.nn.in_top_k(dnn1_logits, y, 1)
    dnn1_accuracy = tf.reduce_mean(tf.cast(dnn1_correct, tf.float32))
    
    dnn2_correct = tf.nn.in_top_k(dnn2_logits, y, 1)
    dnn2_accuracy = tf.reduce_mean(tf.cast(dnn2_correct, tf.float32))
    
    dnn3_correct = tf.nn.in_top_k(dnn3_logits, y, 1)
    dnn3_accuracy = tf.reduce_mean(tf.cast(dnn3_correct, tf.float32))

- - -
## 3. EXECUTION PHASE (30 pts)

After you construct the three models of neural networks, you can compute the performance measure as the class accuracy. You will need to define the number of epochs and size of the training batch. You also might need to reset the graph each time your try a different model. To save time and avoid retraining, you should save the trained model and load it from disk to evaluate a test set. Pick the best model and answer the following:
- Which model yields the best performance measure for your dataset? Provide a reason why it yields the best performance.
- Why did you pick this many hidden layers?
- Provide some justifiable reasons for selecting the number of neurons per hidden layers. 
- Which activation functions did you use?

In the next session you will get a chance to finetune it further .



In [10]:
# Your code goes here
init = tf.global_variables_initializer()
saver = tf.train.Saver()

n_epochs = 20
batch_size = 100

# shuffle_batch() shuffle the examples in a batch before training
def shuffle_batch(X, y, batch_size):
    rnd_idx = np.random.permutation(len(X))
    n_batches = len(X) // batch_size
    for batch_idx in np.array_split(rnd_idx, n_batches):
        X_batch, y_batch = X[batch_idx], y[batch_idx]
        yield X_batch, y_batch


In [11]:
with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
            sess.run(dnn1_training_op, feed_dict={X: X_batch, y: y_batch})
        if epoch % 5 == 0:
            acc_batch = dnn1_accuracy.eval(feed_dict={X: X_batch, y: y_batch})
            acc_valid = dnn1_accuracy.eval(feed_dict={X: X_valid, y: y_valid})
            print(epoch, "Batch accuracy:", acc_batch, "Validation accuracy:", acc_valid)

    save_path = saver.save(sess, "./my_dnn1_model.ckpt")

0 Batch accuracy: 0.75 Validation accuracy: 0.7766
5 Batch accuracy: 0.89 Validation accuracy: 0.8396
10 Batch accuracy: 0.82 Validation accuracy: 0.8576
15 Batch accuracy: 0.87 Validation accuracy: 0.8659


In [12]:
with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
            sess.run(dnn2_training_op, feed_dict={X: X_batch, y: y_batch})
        if epoch % 5 == 0:
            acc_batch = dnn2_accuracy.eval(feed_dict={X: X_batch, y: y_batch})
            acc_valid = dnn2_accuracy.eval(feed_dict={X: X_valid, y: y_valid})
            print(epoch, "Batch accuracy:", acc_batch, "Validation accuracy:", acc_valid)

    save_path = saver.save(sess, "./my_dnn2_model.ckpt")

0 Batch accuracy: 0.82 Validation accuracy: 0.7872
5 Batch accuracy: 0.84 Validation accuracy: 0.8536
10 Batch accuracy: 0.95 Validation accuracy: 0.8644
15 Batch accuracy: 0.88 Validation accuracy: 0.8686


In [13]:
with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
            sess.run(dnn3_training_op, feed_dict={X: X_batch, y: y_batch})
        if epoch % 5 == 0:
            acc_batch = dnn3_accuracy.eval(feed_dict={X: X_batch, y: y_batch})
            acc_valid = dnn3_accuracy.eval(feed_dict={X: X_valid, y: y_valid})
            print(epoch, "Batch accuracy:", acc_batch, "Validation accuracy:", acc_valid)

    save_path = saver.save(sess, "./my_dnn3_model.ckpt")

0 Batch accuracy: 0.78 Validation accuracy: 0.7824
5 Batch accuracy: 0.78 Validation accuracy: 0.8266
10 Batch accuracy: 0.84 Validation accuracy: 0.8509
15 Batch accuracy: 0.86 Validation accuracy: 0.8577


In [14]:
with tf.Session() as sess:
    saver.restore(sess, "./my_dnn1_model.ckpt")
    # implementation of the test set evaluation here
    for epoch in range(n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
            sess.run(dnn1_training_op, feed_dict={X: X_batch, y: y_batch})
        accuracy_val = dnn1_accuracy.eval(feed_dict={X: X_test, y: y_test})
        print(epoch, "Testing accuracy:", accuracy_val)

    save_path = saver.save(sess, "./my_dnn1_model_final.ckpt") 

0 Testing accuracy: 0.8491
1 Testing accuracy: 0.8584
2 Testing accuracy: 0.8498
3 Testing accuracy: 0.8549
4 Testing accuracy: 0.8532
5 Testing accuracy: 0.8624
6 Testing accuracy: 0.8631
7 Testing accuracy: 0.8649
8 Testing accuracy: 0.8624
9 Testing accuracy: 0.8642
10 Testing accuracy: 0.8645
11 Testing accuracy: 0.865
12 Testing accuracy: 0.8652
13 Testing accuracy: 0.864
14 Testing accuracy: 0.8642
15 Testing accuracy: 0.8648
16 Testing accuracy: 0.8653
17 Testing accuracy: 0.8695
18 Testing accuracy: 0.863
19 Testing accuracy: 0.8714


In [15]:
with tf.Session() as sess:
    saver.restore(sess, "./my_dnn2_model.ckpt")
    # implementation of the test set evaluation here
    for epoch in range(n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
            sess.run(dnn2_training_op, feed_dict={X: X_batch, y: y_batch})
        accuracy_val = dnn2_accuracy.eval(feed_dict={X: X_test, y: y_test})
        print(epoch, "Testing accuracy:", accuracy_val)

    save_path = saver.save(sess, "./my_dnn2_model_final.ckpt") 

0 Testing accuracy: 0.8591
1 Testing accuracy: 0.8652
2 Testing accuracy: 0.8653
3 Testing accuracy: 0.8659
4 Testing accuracy: 0.8679
5 Testing accuracy: 0.8686
6 Testing accuracy: 0.8687
7 Testing accuracy: 0.8639
8 Testing accuracy: 0.8627
9 Testing accuracy: 0.8645
10 Testing accuracy: 0.8663
11 Testing accuracy: 0.8708
12 Testing accuracy: 0.8686
13 Testing accuracy: 0.8716
14 Testing accuracy: 0.8693
15 Testing accuracy: 0.8725
16 Testing accuracy: 0.8758
17 Testing accuracy: 0.8747
18 Testing accuracy: 0.8733
19 Testing accuracy: 0.8768


In [16]:
with tf.Session() as sess:
    saver.restore(sess, "./my_dnn3_model.ckpt")
    # implementation of the test set evaluation here
    for epoch in range(n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
            sess.run(dnn3_training_op, feed_dict={X: X_batch, y: y_batch})
        accuracy_val = dnn3_accuracy.eval(feed_dict={X: X_test, y: y_test})
        print(epoch, "Testing accuracy:", accuracy_val)

    save_path = saver.save(sess, "./my_dnn3_model_final.ckpt") 

0 Testing accuracy: 0.8493
1 Testing accuracy: 0.851
2 Testing accuracy: 0.8469
3 Testing accuracy: 0.853
4 Testing accuracy: 0.8521
5 Testing accuracy: 0.8546
6 Testing accuracy: 0.8502
7 Testing accuracy: 0.8492
8 Testing accuracy: 0.852
9 Testing accuracy: 0.8536
10 Testing accuracy: 0.8556
11 Testing accuracy: 0.856
12 Testing accuracy: 0.8545
13 Testing accuracy: 0.8529
14 Testing accuracy: 0.8563
15 Testing accuracy: 0.8599
16 Testing accuracy: 0.86
17 Testing accuracy: 0.8592
18 Testing accuracy: 0.8616
19 Testing accuracy: 0.8621


In [17]:
# print out the final accuracy here

- - -
## 4. FINETUNING THE NETWORK (25 pts)

The best performance on the Fashion MNIST of a non-neural-net classifier is the Support Vector Classifier {"C":10,"kernel":"poly"} with 0.897 accuracy. In this section, you will see how close you can get to that accuracy, or (better yet) beat it! You will be able to see the performance of other ML methods below:
http://fashion-mnist.s3-website.eu-central-1.amazonaws.com

Use the best model from the previous section and see if you can improve it further. To improve the performance of your model, You must make some modifications based upon the practical guidelines discuss in class. Here are a few decisions about the recommended network configurations you have to make:
1. Initialization: Use He Initialization for your model
2. Activation: Add ELU as the activation function throughout your hidden layers
3. Normalization: Incorporate the batch normalization at every layer
4. Regularization: Configure the dropout policy at 50% rate
5. Optimization: Change Gradient Descent into Adam Optimization
6. Your choice: make any other changes in 1-5 you deem necessary

Keep in mind that the execution phase is essentially the same, so you can just run it from the above. See how much you gain in classification accuracy. Provide some justifications for the gain in performance. 






In [11]:
reset_graph()

X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
y = tf.placeholder(tf.int32, shape=(None), name="y")
training = tf.placeholder_with_default(False, shape=(), name='training')
batch_norm_momentum = 0.9

with tf.name_scope("dnnBenchmark"):
    # implementation of the new benchmarking DNN here
    he_init = tf.variance_scaling_initializer()

    my_batch_norm_layer = partial(
            tf.layers.batch_normalization,
            training=training,
            momentum=batch_norm_momentum)

    my_dense_layer = partial(
            tf.layers.dense,
            kernel_initializer=he_init)
    
    n_hidden1 = 300
    n_hidden2 = 100
    n_hidden3 = 50
    
    hidden1 = my_dense_layer(X, n_hidden1, name="dnnBenchmark_hidden1")
    bn1 = tf.nn.elu(my_batch_norm_layer(hidden1))
    hidden2 = my_dense_layer(bn1, n_hidden2, name="dnnBenchmark_hidden2")
    bn2 = tf.nn.elu(my_batch_norm_layer(hidden2))
    hidden3 = my_dense_layer(bn2, n_hidden3, name="dnnBenchmark_hidden3")
    bn3 = tf.nn.elu(my_batch_norm_layer(hidden3))
    logits_before_bn = tf.layers.dense(bn3, n_outputs, name="dnnBenchmark_outputs")
    dnnBenchmark_logits = my_batch_norm_layer(logits_before_bn)
    
with tf.name_scope("loss"):
    dnnBenchmark_xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=dnnBenchmark_logits)
    dnnBenchmark_loss = tf.reduce_mean(dnnBenchmark_xentropy, name="dnnBenchmark_loss")
    
with tf.name_scope("train"):
    dnnBenchmark_optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
    dnnBenchmark_training_op = dnnBenchmark_optimizer.minimize(dnnBenchmark_loss)
    
with tf.name_scope("eval"):
    dnnBenchmark_correct = tf.nn.in_top_k(dnnBenchmark_logits, y, 1)
    dnnBenchmark_accuracy = tf.reduce_mean(tf.cast(dnnBenchmark_correct, tf.float32))

In [14]:
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
init = tf.global_variables_initializer()
saver = tf.train.Saver()

with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
            sess.run([dnnBenchmark_training_op, extra_update_ops],
                     feed_dict={training: True, X: X_batch, y: y_batch})
        accuracy_val = dnnBenchmark_accuracy.eval(feed_dict={X: X_valid, y: y_valid})
        print(epoch, "Validation accuracy:", accuracy_val)

    save_path = saver.save(sess, "./my_dnnBenchmark_model.ckpt")

0 Validation accuracy: 0.8341
1 Validation accuracy: 0.8527
2 Validation accuracy: 0.8594
3 Validation accuracy: 0.8614
4 Validation accuracy: 0.871
5 Validation accuracy: 0.8743
6 Validation accuracy: 0.8766
7 Validation accuracy: 0.8789
8 Validation accuracy: 0.8763
9 Validation accuracy: 0.8789
10 Validation accuracy: 0.8868
11 Validation accuracy: 0.8866
12 Validation accuracy: 0.8871
13 Validation accuracy: 0.8887
14 Validation accuracy: 0.8793
15 Validation accuracy: 0.8858
16 Validation accuracy: 0.879
17 Validation accuracy: 0.8853
18 Validation accuracy: 0.8825
19 Validation accuracy: 0.8857


In [15]:
with tf.Session() as sess:
    saver.restore(sess, "./my_dnnBenchmark_model.ckpt")
    # implementation of the test set evaluation here
    for epoch in range(n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
            sess.run(dnnBenchmark_training_op, feed_dict={X: X_batch, y: y_batch})
        accuracy_val = dnnBenchmark_accuracy.eval(feed_dict={X: X_test, y: y_test})
        print(epoch, "Testing accuracy:", accuracy_val)

    save_path = saver.save(sess, "./my_dnnBenchmark_model_final.ckpt") 

0 Testing accuracy: 0.8753
1 Testing accuracy: 0.8702
2 Testing accuracy: 0.8801
3 Testing accuracy: 0.8772
4 Testing accuracy: 0.8839
5 Testing accuracy: 0.8797
6 Testing accuracy: 0.8818
7 Testing accuracy: 0.8695
8 Testing accuracy: 0.8816
9 Testing accuracy: 0.8822
10 Testing accuracy: 0.8881
11 Testing accuracy: 0.8839
12 Testing accuracy: 0.8721
13 Testing accuracy: 0.8819
14 Testing accuracy: 0.8834
15 Testing accuracy: 0.8794
16 Testing accuracy: 0.8817
17 Testing accuracy: 0.8797
18 Testing accuracy: 0.8895
19 Testing accuracy: 0.8837


- - -
## 5. OUTLOOK (5 pts)

Plan for the outlook of your system: This may lead to the direction of your future project:
- Did your neural network outperform other "traditional ML technique? Why/why not?
- Does your model work well? If not, which model should be further investigated?
- Do you satisfy with your system? What do you think needed to improve?



## Did your neural network outperform other "traditional ML technique? Why/why not?

#### It did not, however it got very close to beating the best with an accuracy rating of 0.895 at its peak. I think this is very good, especially given the far less time for training it needed. Whereas the traditional training took on the scale of hours, this was done in about twenty seconds and got to testing in that time with nearly as good of a accuracy score. I believe if I were to invest the time, I could easily get it to cross 0.897. This is because DNN's are inherently better at these classification tasks. The complexity and intracacy of the structure of the neural networks allow them to thrive in image classification.

## Does your model work well? If not, which model should be further investigated?

#### Yes it does work well. My second model with three layers worked the best initially. Then it added He initialization along with ELU and Adam optimization and this boosted the score by a few percentage points. The difference was not staggering since the first model already performed decently.  But in terms of best models, it got a lot closer to that ceiling of 0.897 of non-neural networks model.

## Do you satisfy with your system? What do you think needed to improve?

#### Yes I am very satisfied. It was a rewarding experience in comparison to the time put in to configure  it. I think it can be easily improved upon. However, for the amount of work I did, I think my model performs much better than I could have ever dreamed of. Having only put in about four hours of work to build a SVM to classify these images would not have gotten me anywhere near this success rate.

- - - 
### NEED HELP?

In case you get stuck in any step in the process, you may find some useful information from:

 * Consult my lectures and/or the textbook
 * Talk to the TA, they are available and there to help you during OH
 * Come talk to me or email me <nn4pj@virginia.edu> with subject starting "CS4501 Assignment 4:...".
 * More on the Fashion-MNIST to be found here: https://hanxiao.github.io/2018/09/28/Fashion-MNIST-Year-In-Review/

Best of luck and have fun!