# Assignment 4: Benchmarking Fashion-MNIST with Deep Neural Nets

### CS 4501 Machine Learning - Department of Computer Science - University of Virginia
"The original MNIST dataset contains a lot of handwritten digits. Members of the AI/ML/Data Science community love this dataset and use it as a benchmark to validate their algorithms. In fact, MNIST is often the first dataset researchers try. "If it doesn't work on MNIST, it won't work at all", they said. "Well, if it does work on MNIST, it may still fail on others." - **Zalando Research, Github Repo.**"

Fashion-MNIST is a dataset from the Zalando's article. Each example is a 28x28 grayscale image, associated with a label from 10 classes. They intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms.

![Here's an example how the data looks (each class takes three-rows):](https://github.com/zalandoresearch/fashion-mnist/raw/master/doc/img/fashion-mnist-sprite.png)

In this assignment, you will attempt to benchmark the Fashion-MNIST using Neural Networks. You must use it to train some neural networks on TensorFlow and predict the final output of 10 classes. For deliverables, you must write code in Python and submit this Jupyter Notebook file (.ipynb) to earn a total of 100 pts. You will gain points depending on how you perform in the following sections.


In [0]:
# You might want to use the following packages
import numpy as np
import os
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.ERROR) #reduce annoying warning messages
from functools import partial

# to make this notebook's output stable across runs
def reset_graph(seed=42):
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)


---
## 1. PRE-PROCESSING THE DATA (10 pts)

You can load the Fashion MNIST directly from Tensorflow. **Partition of the dataset** so that you will have 50,000 examples for training, 10,000 examples for validation, and 10,000 examples for testing. Also, make sure that you platten out each of examples so that it contains only a 1-D feature vector.

Write some code to output the dimensionalities of each partition (train, validation, and test sets).



In [2]:
# Your code goes here for this section.
fmnist = tf.keras.datasets.fashion_mnist.load_data();


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz


In [0]:
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()
X_train = X_train.astype(np.float32).reshape(-1, 28*28) / 255.0
X_test = X_test.astype(np.float32).reshape(-1, 28*28) / 255.0
y_train = y_train.astype(np.int32)
y_test = y_test.astype(np.int32)
X_valid, X_train = X_train[:10000], X_train[10000:]
y_valid, y_train = y_train[:10000], y_train[10000:]

In [4]:
print("X_train shape:" + str(X_train.shape))
print("y_train shape:" + str(y_train.shape))
print("X_vaid shape:" + str(X_valid.shape))
print("y_valid shape:" + str(y_valid.shape))
print("X_test shape:" + str(X_test.shape))
print("y_test shape:" + str(y_test.shape))

X_train shape:(50000, 784)
y_train shape:(50000,)
X_vaid shape:(10000, 784)
y_valid shape:(10000,)
X_test shape:(10000, 784)
y_test shape:(10000,)


- - -
## 2. CONSTRUCTION PHASE (30 pts)

In this section, define at least three neural networks with different structures. Make sure that the input layer has the right number of inputs. The best structure often is found through a process of trial and error experimentation:
- You may start with a fully connected network structure with two hidden layers.
- You may try a few settings of the number of nodes in each layer.
- You may try a few activation functions to see if they affect the performance.

**Important Implementation Note:** For the purpose of learning Tensorflow, you must use low-level TensorFlow API to construct the network. Usage of high-level tools (ie. Keras) is not permited. 

not sure to create 'different' tree neural networks

In [0]:
# Your code goes here
reset_graph()

# Set some configuration here
n_inputs = 28*28  # Fashion-MNIST
learning_rate = 0.01
n_outputs = 10

# Construct placeholder for the input layer
X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
y = tf.placeholder(tf.int32, shape=(None), name="y")

In [0]:
n_hidden1 = 300
n_hidden2 = 300
with tf.name_scope("dnn1"):
  #implementation of the first net here
  hidden1 = tf.layers.dense(X, n_hidden1, activation=tf.nn.leaky_relu, name="hidden1")
  hidden2 = tf.layers.dense(hidden1, n_hidden2, activation=tf.nn.leaky_relu, name="hidden2")
  logits = tf.layers.dense(hidden2, n_outputs, name="outputs")

In [0]:
n2_hidden1 = 300
n2_hidden2 = 100
with tf.name_scope("dnn2"):
  #implementation of the first net here
  hidden2_1 = tf.layers.dense(X, n2_hidden1, activation=tf.nn.leaky_relu, name="hidden2_1")
  hidden2_2 = tf.layers.dense(hidden2_1, n2_hidden2, activation=tf.nn.leaky_relu, name="hidden2_2")
  logits = tf.layers.dense(hidden2_2, n_outputs, name="outputs2")

In [0]:
n3_hidden1 = 300
n3_hidden2 = 100
with tf.name_scope("dnn3"):
  #implementation of the first net here
  hidden3_1 = tf.layers.dense(X, n3_hidden1, activation=tf.nn.elu, name="hidden3_1")
  hidden3_2 = tf.layers.dense(hidden3_1, n3_hidden2, activation=tf.nn.elu, name="hidden3_2")
  logits = tf.layers.dense(hidden3_2, n_outputs, name="outputs3")

In [0]:
with tf.name_scope("loss"):
  #implementation of the loss function net here
  xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
  loss = tf.reduce_mean(xentropy, name="loss")

In [0]:
learning_rate = 0.01

with tf.name_scope("train"):
  #implementation of the training optimizer here
  optimizer = tf.train.GradientDescentOptimizer(learning_rate)
  training_op = optimizer.minimize(loss)

In [0]:
with tf.name_scope("eval"):
  #implementation of the evaluation procedure here
  correct = tf.nn.in_top_k(logits, y, 1)
  accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

- - -
## 3. EXECUTION PHASE (30 pts)



After you construct the three models of neural networks, you can compute the performance measure as the class accuracy. You will need to define the number of epochs and size of the training batch. You also might need to reset the graph each time your try a different model. To save time and avoid retraining, you should save the trained model and load it from disk to evaluate a test set. Pick the best model and answer the following:
- Which model yields the best performance measure for your dataset? Provide a reason why it yields the best performance.
        - the first model which has the number of neurons in the first and second hidden layers ar both 300 with leaky ReLU activation function at accuracy = 87.65% 
- Why did you pick this many hidden layers?
        - I chose 300 neurons for hidden layers because I think it is significantly large enough for the machine to learn efficiently. The smaller number tends to yield more errors, but it might be a good choice if time is constrainted. 
- Provide some justifiable reasons for selecting the number of neurons per hidden layers. 
        - The neurons should be enough for the machine to learn. I have a constraint that the proper number should give the accuracy at least 85 %.
- Which activation functions did you use?
        - I use both leaky ReLU (model 1 and 2) and ELU (model 3)
        - By comparison at the same hyperparameters, the ELU yields higher error than leaky ReLU
In the next session you will get a chance to finetune it further .



In [0]:
# Your code goes here

# shuffle_batch() shuffle the examples in a batch before training
def shuffle_batch(X, y, batch_size):
    rnd_idx = np.random.permutation(len(X))
    n_batches = len(X) // batch_size
    for batch_idx in np.array_split(rnd_idx, n_batches):
        X_batch, y_batch = X[batch_idx], y[batch_idx]
        yield X_batch, y_batch


In [13]:
# dnn1 
reset_graph()

# Set some configuration here
n_inputs = 28*28  # Fashion-MNIST
learning_rate = 0.01
n_outputs = 10

# Construct placeholder for the input layer
X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
y = tf.placeholder(tf.int32, shape=(None), name="y")


n_hidden1 = 300
n_hidden2 = 300
with tf.name_scope("dnn1"):
  #implementation of the first net here
  hidden1 = tf.layers.dense(X, n_hidden1, activation=tf.nn.leaky_relu, name="hidden1")
  hidden2 = tf.layers.dense(hidden1, n_hidden2, activation=tf.nn.leaky_relu, name="hidden2")
  logits = tf.layers.dense(hidden2, n_outputs, name="outputs")
  
with tf.name_scope("loss"):
  #implementation of the loss function net here
  xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
  loss = tf.reduce_mean(xentropy, name="loss")

learning_rate = 0.01

with tf.name_scope("train"):
  #implementation of the training optimizer here
  optimizer = tf.train.GradientDescentOptimizer(learning_rate)
  training_op = optimizer.minimize(loss)  

with tf.name_scope("eval"):
  #implementation of the evaluation procedure here
  correct = tf.nn.in_top_k(logits, y, 1)
  accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

init = tf.global_variables_initializer()
saver = tf.train.Saver()

n_epochs = 40
batch_size = 50
  
with tf.Session() as sess:
  init.run()
  for epoch in range(n_epochs):
    # implementation of the training ops here
    for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
      sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
    if (epoch % 5) == 0:
      acc_batch = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
      # implementation of the validation accuracy here
      acc_valid = accuracy.eval(feed_dict={X: X_valid, y: y_valid})
      print(epoch, "Batch accuracy:", acc_batch, "Validation accuracy:", acc_valid)
          
  save_path = saver.save(sess, "./my_dnn_model.ckpt")

0 Batch accuracy: 0.9 Validation accuracy: 0.8076
5 Batch accuracy: 0.92 Validation accuracy: 0.8526
10 Batch accuracy: 0.88 Validation accuracy: 0.8692
15 Batch accuracy: 0.9 Validation accuracy: 0.8753
20 Batch accuracy: 0.98 Validation accuracy: 0.8771
25 Batch accuracy: 0.86 Validation accuracy: 0.8809
30 Batch accuracy: 0.98 Validation accuracy: 0.8878
35 Batch accuracy: 0.88 Validation accuracy: 0.8832


In [14]:
with tf.Session() as sess:
  
    saver.restore(sess, "./my_dnn_model.ckpt")
    # implementation of the test set evaluation here
    acc_test = accuracy.eval(feed_dict={X: X_test, y: y_test})
    print("Final test accuracy: {:.2f}%".format(acc_test * 100))

Final test accuracy: 87.56%


In [15]:
# dnn2 
reset_graph()

# Set some configuration here
n_inputs = 28*28  # Fashion-MNIST
learning_rate = 0.01
n_outputs = 10

# Construct placeholder for the input layer
X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
y = tf.placeholder(tf.int32, shape=(None), name="y")


n2_hidden1 = 300
n2_hidden2 = 100
with tf.name_scope("dnn2"):
  #implementation of the first net here
  hidden2_1 = tf.layers.dense(X, n2_hidden1, activation=tf.nn.leaky_relu, name="hidden2_1")
  hidden2_2 = tf.layers.dense(hidden2_1, n2_hidden2, activation=tf.nn.leaky_relu, name="hidden2_2")
  logits = tf.layers.dense(hidden2_2, n_outputs, name="outputs2")
  
with tf.name_scope("loss"):
  #implementation of the loss function net here
  xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
  loss = tf.reduce_mean(xentropy, name="loss")

learning_rate = 0.01

with tf.name_scope("train"):
  #implementation of the training optimizer here
  optimizer = tf.train.GradientDescentOptimizer(learning_rate)
  training_op = optimizer.minimize(loss)  

with tf.name_scope("eval"):
  #implementation of the evaluation procedure here
  correct = tf.nn.in_top_k(logits, y, 1)
  accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

init = tf.global_variables_initializer()
saver = tf.train.Saver()

n_epochs = 40
batch_size = 50
  
with tf.Session() as sess:
  init.run()
  for epoch in range(n_epochs):
    # implementation of the training ops here
    for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
      sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
    if (epoch % 5) == 0:
      acc_batch = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
      # implementation of the validation accuracy here
      acc_valid = accuracy.eval(feed_dict={X: X_valid, y: y_valid})
      print(epoch, "Batch accuracy:", acc_batch, "Validation accuracy:", acc_valid)
          
  save_path = saver.save(sess, "./my_dnn2_model.ckpt")

0 Batch accuracy: 0.86 Validation accuracy: 0.8089
5 Batch accuracy: 0.92 Validation accuracy: 0.8534
10 Batch accuracy: 0.86 Validation accuracy: 0.868
15 Batch accuracy: 0.88 Validation accuracy: 0.8746
20 Batch accuracy: 0.98 Validation accuracy: 0.8737
25 Batch accuracy: 0.86 Validation accuracy: 0.8826
30 Batch accuracy: 0.96 Validation accuracy: 0.886
35 Batch accuracy: 0.9 Validation accuracy: 0.8822


In [16]:
with tf.Session() as sess:
  
    saver.restore(sess, "./my_dnn2_model.ckpt")
    # implementation of the test set evaluation here
    acc_test = accuracy.eval(feed_dict={X: X_test, y: y_test})
    print("Final test accuracy: {:.2f}%".format(acc_test * 100))

Final test accuracy: 87.50%


In [17]:
# dnn3 
reset_graph()

# Set some configuration here
n_inputs = 28*28  # Fashion-MNIST
learning_rate = 0.01
n_outputs = 10

# Construct placeholder for the input layer
X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
y = tf.placeholder(tf.int32, shape=(None), name="y")


n3_hidden1 = 300
n3_hidden2 = 100
with tf.name_scope("dnn3"):
  #implementation of the first net here
  hidden3_1 = tf.layers.dense(X, n3_hidden1, activation=tf.nn.elu, name="hidden3_1")
  hidden3_2 = tf.layers.dense(hidden3_1, n3_hidden2, activation=tf.nn.elu, name="hidden3_2")
  logits = tf.layers.dense(hidden3_2, n_outputs, name="outputs3")

with tf.name_scope("loss"):
  #implementation of the loss function net here
  xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
  loss = tf.reduce_mean(xentropy, name="loss")

learning_rate = 0.01

with tf.name_scope("train"):
  #implementation of the training optimizer here
  optimizer = tf.train.GradientDescentOptimizer(learning_rate)
  training_op = optimizer.minimize(loss)  

with tf.name_scope("eval"):
  #implementation of the evaluation procedure here
  correct = tf.nn.in_top_k(logits, y, 1)
  accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

init = tf.global_variables_initializer()
saver = tf.train.Saver()

n_epochs = 40
batch_size = 50
  
with tf.Session() as sess:
  init.run()
  for epoch in range(n_epochs):
    # implementation of the training ops here
    for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
      sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
    if (epoch % 5) == 0:
      acc_batch = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
      # implementation of the validation accuracy here
      acc_valid = accuracy.eval(feed_dict={X: X_valid, y: y_valid})
      print(epoch, "Batch accuracy:", acc_batch, "Validation accuracy:", acc_valid)
          
  save_path = saver.save(sess, "./my_dnn3_model.ckpt")

0 Batch accuracy: 0.84 Validation accuracy: 0.8126
5 Batch accuracy: 0.92 Validation accuracy: 0.8516
10 Batch accuracy: 0.86 Validation accuracy: 0.8645
15 Batch accuracy: 0.86 Validation accuracy: 0.8709
20 Batch accuracy: 0.94 Validation accuracy: 0.8697
25 Batch accuracy: 0.88 Validation accuracy: 0.8761
30 Batch accuracy: 0.96 Validation accuracy: 0.8788
35 Batch accuracy: 0.82 Validation accuracy: 0.8785


In [18]:
with tf.Session() as sess:
  
    saver.restore(sess, "./my_dnn3_model.ckpt")
    # implementation of the test set evaluation here
    acc_test = accuracy.eval(feed_dict={X: X_test, y: y_test})
    print("Final test accuracy: {:.2f}%".format(acc_test * 100))

Final test accuracy: 86.89%


- - -
## 4. FINETUNING THE NETWORK (25 pts)

The best performance on the Fashion MNIST of a non-neural-net classifier is the Support Vector Classifier {"C":10,"kernel":"poly"} with 0.897 accuracy. In this section, you will see how close you can get to that accuracy, or (better yet) beat it! You will be able to see the performance of other ML methods below:
http://fashion-mnist.s3-website.eu-central-1.amazonaws.com

Use the best model from the previous section and see if you can improve it further. To improve the performance of your model, You must make some modifications based upon the practical guidelines discuss in class. Here are a few decisions about the recommended network configurations you have to make:
1. Initialization: Use He Initialization for your model
2. Activation: Add ELU as the activation function throughout your hidden layers
3. Normalization: Incorporate the batch normalization at every layer
4. Regularization: Configure the dropout policy at 50% rate
5. Optimization: Change Gradient Descent into Adam Optimization
6. Your choice: make any other changes in 1-5 you deem necessary

Keep in mind that the execution phase is essentially the same, so you can just run it from the above. See how much you gain in classification accuracy. Provide some justifications for the gain in performance. 






In [0]:
# Your code goes here
reset_graph()

# Set some configuration here
n_inputs = 28*28  # Fashion-MNIST
learning_rate = 0.002
n_outputs = 10
batch_norm_momentum = 0.9

# Construct placeholder for the input layer
X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
y = tf.placeholder(tf.int32, shape=(None), name="y")
training = tf.placeholder_with_default(False, shape=(), name='training')


In [0]:
# Bath Norm, 5 layers, no drop out
n_hidden1 = 300
n_hidden2 = 100
n_hidden3 = 50
n_hidden4 = 50
    
with tf.name_scope("dnnBenchmark"):
  # implementation of the new benchmarking DNN here
    he_init = tf.variance_scaling_initializer()

    my_batch_norm_layer = partial(
            tf.layers.batch_normalization,
            training=training,
            momentum=batch_norm_momentum)

    my_dense_layer = partial(
            tf.layers.dense,
            kernel_initializer=he_init)

    hidden1 = my_dense_layer(X, n_hidden1, name="hidden1")
    bn1 = tf.nn.relu(my_batch_norm_layer(hidden1))
    hidden2 = my_dense_layer(bn1, n_hidden2, name="hidden2")    
    bn2 = tf.nn.relu(my_batch_norm_layer(hidden2))
    hidden3 = my_dense_layer(bn2, n_hidden3, name="hidden3")    
    bn3 = tf.nn.relu(my_batch_norm_layer(hidden3))
    hidden4 = my_dense_layer(bn3, n_hidden4, name="hidden4")    
    bn4 = tf.nn.relu(my_batch_norm_layer(hidden4))
    logits_before_bn = my_dense_layer(bn4, n_outputs, name="outputs")
    logits = my_batch_norm_layer(logits_before_bn)

In [0]:
with tf.name_scope("loss"):
  #implementation of the loss function net here
  xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
  loss = tf.reduce_mean(xentropy, name="loss")

In [0]:
with tf.name_scope("train"):
  #implementation of the training optimizer here
  optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
  training_op = optimizer.minimize(loss)

In [0]:
with tf.name_scope("eval"):
  #implementation of the evaluation procedure here
  correct = tf.nn.in_top_k(logits, y, 1)
  accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

In [0]:
# Your code goes here
init = tf.global_variables_initializer()
saver = tf.train.Saver()

n_epochs = 100
batch_size = 50

# shuffle_batch() shuffle the examples in a batch before training
def shuffle_batch(X, y, batch_size):
    rnd_idx = np.random.permutation(len(X))
    n_batches = len(X) // batch_size
    for batch_idx in np.array_split(rnd_idx, n_batches):
        X_batch, y_batch = X[batch_idx], y[batch_idx]
        yield X_batch, y_batch


In [25]:
with tf.Session() as sess:
  init.run()
  for epoch in range(n_epochs):
    # implementation of the training ops here
    for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
      sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
    if (epoch % 5) == 0:
      acc_batch = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
      # implementation of the validation accuracy here
      acc_valid = accuracy.eval(feed_dict={X: X_valid, y: y_valid})
      print(epoch, "Batch accuracy:", acc_batch, "Validation accuracy:", acc_valid)
          
  save_path = saver.save(sess, "./my_dnnBenchmark_model.ckpt")

0 Batch accuracy: 0.94 Validation accuracy: 0.8499
5 Batch accuracy: 0.96 Validation accuracy: 0.8767
10 Batch accuracy: 0.9 Validation accuracy: 0.8837
15 Batch accuracy: 0.92 Validation accuracy: 0.8828
20 Batch accuracy: 1.0 Validation accuracy: 0.8956
25 Batch accuracy: 0.88 Validation accuracy: 0.8913
30 Batch accuracy: 0.98 Validation accuracy: 0.8959
35 Batch accuracy: 0.9 Validation accuracy: 0.8966
40 Batch accuracy: 0.96 Validation accuracy: 0.893
45 Batch accuracy: 0.98 Validation accuracy: 0.8983
50 Batch accuracy: 0.96 Validation accuracy: 0.8959
55 Batch accuracy: 1.0 Validation accuracy: 0.895
60 Batch accuracy: 0.94 Validation accuracy: 0.894
65 Batch accuracy: 0.94 Validation accuracy: 0.8962
70 Batch accuracy: 1.0 Validation accuracy: 0.8975
75 Batch accuracy: 0.98 Validation accuracy: 0.8915
80 Batch accuracy: 1.0 Validation accuracy: 0.8912
85 Batch accuracy: 0.98 Validation accuracy: 0.8969
90 Batch accuracy: 0.98 Validation accuracy: 0.8956
95 Batch accuracy: 1.0 

In [26]:
with tf.Session() as sess:
  
    saver.restore(sess, "./my_dnnBenchmark_model.ckpt")
    # implementation of the test set evaluation here
    acc_test = accuracy.eval(feed_dict={X: X_test, y: y_test})
    print("Final test accuracy: {:.2f}%".format(acc_test * 100))

Final test accuracy: 89.04%


- - -
## 5. OUTLOOK (5 pts)

Plan for the outlook of your system: This may lead to the direction of your future project:
- Did your neural network outperform other "traditional ML technique? Why/why not?
        No, the mean accuracy is not overcome 0.897 from kernel SVM. My model has average accuracy at .8905.
- Does your model work well? If not, which model should be further investigated?
        Even though the accuracy does not reach that of SVM, the accuracy is good enough in practice. The further investigation I plan to try is using CNN.
- Do you satisfy with your system? What do you think needed to improve?
        The accuracy is good, but I still want to improve it. Since the dataset is image, the CNN can work better. The more complicated algorithm such as the fully connected layer in the algorithm may increase the accuracy of the program. 



- - - 
### NEED HELP?

In case you get stuck in any step in the process, you may find some useful information from:

 * Consult my lectures and/or the textbook
 * Talk to the TA, they are available and there to help you during OH
 * Come talk to me or email me <nn4pj@virginia.edu> with subject starting "CS4501 Assignment 4:...".
 * More on the Fashion-MNIST to be found here: https://hanxiao.github.io/2018/09/28/Fashion-MNIST-Year-In-Review/

Best of luck and have fun!