<a href="https://colab.research.google.com/github/kroush/Thinkful-Capstones/blob/main/Fashion_MNIST_Deep_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fashion MNIST
Basics of Deep Learning and Articial Neural Networks

## Data download and preprocessing

In [1]:
from tensorflow.keras.models import Sequential
import pandas as pd

import warnings
warnings.filterwarnings("ignore")

from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.layers import Dense
from tensorflow.keras import optimizers

from IPython.display import clear_output
clear_output()

In [2]:
# load dataset
(X_train, Y_train), (X_test, Y_test) = fashion_mnist.load_data()

# loaded data summary
print(f'Train: X= {X_train.shape}, Y={Y_train.shape}')
print(f'Test: X= {X_test.shape}, Y={Y_test.shape}')

# reshape, 2D -> 1D, each pixel is a feature, 
# normalize as max value is 255

input_dim = 784  # 28*28
output_dim = nb_classes = 10

X_train = X_train.reshape(60000, input_dim)
X_test = X_test.reshape(10000, input_dim)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

# processed data summary
print(f'Train: X= {X_train.shape}, Y={Y_train.shape}')
print(f'Test: X= {X_test.shape}, Y={Y_test.shape}')

# one-hot encode categories
Y_train = to_categorical(Y_train, nb_classes)
Y_test = to_categorical(Y_test, nb_classes)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
Train: X= (60000, 28, 28), Y=(60000,)
Test: X= (10000, 28, 28), Y=(10000,)
Train: X= (60000, 784), Y=(60000,)
Test: X= (10000, 784), Y=(10000,)


Here we can see that there are 60,000 examples in the training set and 10,000 in the test. Each is a 28X28 pixel image.

## ANN Model Optimization

In [3]:
def ann_model(layers, neurons, act, opt, loss, bs):
  model = Sequential()
  for i in range(1,layers):
    model.add(Dense(neurons, activation=act, input_shape=(784,)))
  model.add(Dense(10, activation='softmax', input_shape=(784,)))
  model.compile(optimizer=opt, loss=loss, metrics=['accuracy'])

  model.fit(X_train, Y_train, batch_size=bs, epochs=20, verbose=0)
  
  ann_model.test_score = model.evaluate(X_test, Y_test, verbose=0)
  ann_model.train_score = model.evaluate(X_train, Y_train, verbose=0)
  return 

In [4]:
# test function to validate performance

ann_model(3, 64, 'relu', 'sgd', 'categorical_crossentropy', 8)
print(ann_model.test_score[1], ann_model.train_score[1])

0.8810999989509583 0.9178500175476074


We're only going to vary the number of layers, number of neurons, activation functions of the layers (save the final one), and the batch sizes.

In [None]:
layer_list = [3,4,5]
neuron_list = [128,64,16,8]
activation_list = ['sigmoid', 'tanh', 'relu']
batch_list = [8,32,128]
results = []
count = 0

for w in layer_list:
    for x in neuron_list:
        for y in activation_list:
            for z in batch_list:
                count+=1
                print(f'{count}/{len(layer_list)*len(neuron_list)*len(activation_list)*len(batch_list)}')
                
                ann_model(w, x, y, 'sgd', 'categorical_crossentropy', z)
                test_score = ann_model.test_score[1]
                train_score = ann_model.train_score[1]
                results.append((w,x,y,z,test_score,train_score))
                clear_output()                

In [None]:
results_df = pd.DataFrame(results, columns=('layers', 'neurons', 'activation_fxn', 'batch_size', 'test_score', 'train_score'))
results_df.sort_values(by='test_score', ascending=False, inplace=True)
results_df.head(50)

Unnamed: 0,layers,neurons,activation_fxn,batch_size,test_score,train_score
3,3,128,tanh,8,0.8867,0.925317
75,5,128,tanh,8,0.8847,0.929017
79,5,128,relu,32,0.8844,0.916367
39,4,128,tanh,8,0.884,0.9249
84,5,64,tanh,8,0.8825,0.922417
78,5,128,relu,8,0.8801,0.921633
87,5,64,relu,8,0.8799,0.921517
51,4,64,relu,8,0.8795,0.917567
43,4,128,relu,32,0.8793,0.91055
76,5,128,tanh,32,0.879,0.909033


In [None]:
ann_model(3, 128, 'tanh', 'sgd', 'categorical_crossentropy', 8)

In [7]:
print(f'Best Model\n Test Score: {ann_model.test_score[1]}\n Training Score:{ann_model.train_score[1]}')

Best Model
 Test Score: 0.8848999738693237
 Training Score:0.9212499856948853


Here, I've displayed the top 50 models out of 108. In general, 'tanh' and 'relu' activators and the larger numbers of neurons (128 and 64) gave the best results. Other hyperparameters seem well mixed in the top 50 results. The training score is generally 0.2 - 0.4 higher than the test score, showing that there are quite a few good model options here to continue to fine tune with minimal overfitting. To continue optimization efforts, I'd likely start with the highest rank 'relu' model. With a mini batch size of 32, I have less concerns of overfitting than the other models with batch sizes of 8. The best performing model in both the test and training set is shown above.