# Analyzing Fashion MNIST Data

## About the Data
Fashon-MNIST (https://github.com/zalandoresearch/fashion-mnist) is an up and coming new dataset of Zalando's article images. There are 60,000 training samples and 10,000 test samples in the dataset. Each sample is a 28x28 grayscale image which are associated with one of ten clothing labels, which look like this:

![FashonMNIST](https://github.com/zalandoresearch/fashion-mnist/raw/master/doc/img/fashion-mnist-sprite.png)

The ten clothing labels are:
0. T-Shirt/top
1. Trouser
2. Pullover
3. Dress
4. Coat
5. Sandal
6. Shirt
7. Sneaker
8. Bag
9. Ankle Boot

(although in the dataset, the labels are zero indexed so T-Shirt/Top was labeled as '0' and Ankle Boot was labeled as '9')


#### Why Fashion MNIST instead of the MNIST dataset?
This might sound a bit strange because most data scientists tend to use the original MNIST dataset which contains several handwritten number samples. However, I really wanted to go down this route because of one key reason in that MNIST is too simple. The primary reason is that the Fashion MNIST dataset is relatively easy to predict nowadays with the advancement of many machine learning and neural network models.

Even though this study was created by the team that created Fashion-MNIST, I don't have a reason to doubt their experiment. In this study, they created many machine learning models, ingested both the MNIST dataset and the Fashion-MNIST dataset into those models, and compared accuracy (http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/). Generally speaking, almost all of the Fashion-MNIST scores were less accurate than MNIST, which would support the theory that MNIST has become too easy to predict.

To also build on the rationale, there's also a current script on GitHubGist where someone was able to compare MNIST digits based on one pixel, which (I'd hope) many machine learning models would quickly pick up on if it found it (https://gist.github.com/dgrtwo/aaef94ecc6a60cd50322c0054cc04478)

Ultimately, I didn't want a dataset which would tell me 'stick to a machine learning model' again. I wanted a dataset that was just complex enough so that it's at least conceivable that I'd need a neural network model. I also wanted a dataset that was large enough so that I could definitiely conclude if I was overtraining my model or not.


----
# The Goal
I wanted to determine which model could best model the Fashion-MNIST dataset. I have assistance here because the Zalando Research team has already tested many machine learning models on this dataset, so now all I'd have to do was try various neural network approaches. 

----
# Machine Learning Models
As specified earlier, the Zalando Research Team already tested many machine learning models on this dataset and the top 8 models were:

![MachineLearning](Jupyter/MachineLearning.jpg)

My takeaways from this were:
- SVC is generally the best performer, but takes a long amount of training time edging out at 1 hour minimum.
- GradientBoost isn't worth the runtime pains.
- RandomForest has a lot of promise with a small training time and reasonable accuracy, but it still peaks out at 0.879 accuracy. I suspect I can do better.

----
# Technical Setup

For this project, I used
- Anaconda 5.0.0 which uses Python 3.6.3
  - TensorFlow 1.1.0
  - Keras 2.0.8
  - Theano 0.9.0 (Do not assume I'm using Theano unless otherwise specified)
- iMac running macOS High Sierra with a:
  - 3.8GHz quad‑core Intel Core i5
  - [When a GPU was required] EVGA GeForce GTX 1050 2 GBs

`Keras` is a high level Python Package which lets me build neural networks and use either `TensorFlow`, `Theano`, or `Microsoft's CNTK` as the computation engine. It gives me the opportunity to test my model against all those computation packages without rewriting my model for each model.

I began this project using Theano 0.9.0, and unfortunately midway, Theano was announced that it'll be depricated. At that point, I've switched to TensorFlow. I've redone most of my studies for TensorFlow but there will be certain tests that will stick remain on Theano. I will note this when it happens.

----
# Non-Model Specific Code
I first created a function to handle the **input arguments** I might pass into my script.

In [11]:
from argparse import ArgumentParser
import numpy as np

import models
import params
import plot

np.set_printoptions(precision=2)

def parse_args(inargs=None):
    """ Parses input arguments """
    parser = ArgumentParser("./loader.py")
    standard_path = os.path.dirname(os.path.realpath(__file__))

    iargs = parser.add_argument_group('Input Files/Data')
    iargs.add_argument('--csv_file',
                       default=os.path.join(standard_path, 'data.csv'),
                       help='Path to CSV File')
    iargs.add_argument('--model', default='cnn',
                       help='Select: cnn (default), rnn, neural')

    oargs = parser.add_argument_group('Output Files/Data')
    oargs.add_argument('--out',
                       default=os.path.join(standard_path, 'Run'),
                       help='Path to save output files')

    if not inargs:
        args = parser.parse_args()
    else:
        args = parser.parse_args(inargs)
    return args



Using TensorFlow backend.


ModuleNotFoundError: No module named 'tensorflow'

And I had to create a function to **re-shape my data** to the appropriate shape. I'm using the `channels_first` setting in Keras which means that the quantity of my samples will be the first dimension of the dataset. I also had to reshape this array to four dimensions so that my dimensionality is (Quantity of Pictures, Quantity of Colors (just 1 since this is grayscale), Pixels Width, and Pixels Height).

In [12]:
def flatten_data(args, x_train, x_test, y_train, y_test):
    """ Flattens data into a one dimension Numpy Array
    """
    x_train = x_train.astype('float32') / 255
    x_test = x_test.astype('float32') / 255

    if args.model != 'rnn':
        x_train = x_train.reshape(x_train.shape[0], 1, 28, 28)
        x_test = x_test.reshape(x_test.shape[0], 1, 28, 28)

    y_train = np_utils.to_categorical(y_train, 28)
    y_test = np_utils.to_categorical(y_test, 28)
    return x_train, y_train, x_test, y_test

I intend to run my script so that it can loop between a variety of options for a specific parameter. Because of that, I'd like it to save the:
* Confusion Matrix for each option it tests
* Some plot to compare each option it tests

I created two **plotting functions** to do that.

_Note: When code is ran in the Jupyter Notebook, it will NOT use the coding function below, but rather it'll use plot.py in the original directory where this Notebook rescides. That file is replicated below for completeness._

In [13]:
import itertools
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.metrics import confusion_matrix

def conf_matrix(y_test, y_test_predict, classes, title='Confusion Matrix',
                out=None):
    # Converts both output arrays into just one column based on the class
    y_test_predict_class = y_test_predict.argmax(1)
    y_test_class = y_test.argmax(1)

    # Creates confusion matrix
    cm_data = confusion_matrix(y_test_class, y_test_predict_class)
    np.set_printoptions(precision=2)

    # Plots Confusion Matrix
    plt.figure()
    plt.imshow(cm_data, interpolation='nearest', cmap=plt.cm.Blues)
    plt.title(title)
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=90)
    plt.xlabel('Predicted Label')
    plt.yticks(tick_marks, classes)
    plt.ylabel('True Label')

    # Plots data on chart
    thresh = cm_data.max() / 2.
    for i, j in itertools.product(range(cm_data.shape[0]), range(cm_data.shape[1])):
        plt.text(j, i, format(cm_data[i, j], 'd'),
                 horizontalalignment="center",
                 color="white" if cm_data[i, j] > thresh else "black")

    plt.tight_layout()

    # Saves or Shows Plot
    if out:
        plt.savefig(out)
    else:
        plt.show()


def dict_trends(data, xlabel='Variable', out=None):
    """ Plots a dictionary's worth of trends """
    data_df = pd.DataFrame.from_dict(data, orient='index')
    ax = data_df.plot()

    # Sets Axes
    ax.set_xlabel(xlabel)
    ax.set_ylabel('Score')
    ax.set_title('Modifying {}'.format(xlabel))

    # Saves or Shows Plot
    if out:
        plt.savefig(out)
    else:
        plt.show()  

As my script looks between different parameters, I want it to always have a default parameter value so that if I don't specify anything, it'll use the proper default setting. To do that, I created a **Parameters Configuration File** as params.py, which is effectively a dictionary.

_Note: Like plot.py, when this function is called, it'll use the version in params.py rather than the version in Jupyter Notebook. params.py is replicated below for completeness._

In [14]:
def standard():
    params = {}

    # Build Parameters
    params['conv_filters'] = 32
    params['nb_pool'] = 2
    params['nb_conv'] = 2
    params['optimizer'] = 'nadam'
    params['loss'] = 'categorical_crossentropy'

    # Fit Parameters
    params['epoch'] = 8
    params['dropout'] = 0.1
    params['batch_size'] = 128

    # Dense Activation
    params['dense_1'] = 120
    params['activate_1'] = 'relu'
    return params

As my script tests different models, I'd presumably like to run test data through the models to see if the predictions seem accurate. Because this fit function is the universal same function across all models, I chose to make a unified fit function in models.py as seen below.

_Note: Like all other things, as I use this function, it'll use models.py rather than the version seen below_

In [16]:
def basic_neural(model_params, shape):
    """ Builds basic neural network model """
    from keras.layers import Dense, Flatten, InputLayer
    from keras.layers.normalization import BatchNormalization
    from keras.models import Sequential

    model = Sequential()

    model.add(InputLayer(input_shape=(shape[1], shape[2], shape[3])))
    model.add(BatchNormalization())

    model.add(Flatten())

    model.add(Dense(model_params['dense_1'], activation=model_params['activate_1']))
    model.add(Dense(28, activation='softmax'))

    model.compile(loss=model_params['loss'],
                  optimizer=model_params['optimizer'],
                  metrics=['accuracy'])

    print(model.summary())
    return model

Finally, I created a **Main Function** which will connect all of the aforementioned functions as well as the model functions to be. It will run each model three times with each variable permutation option, average the results from those runs, and store it in a dictionary for comparisions later.

_Note: This function shows the final state of the Main Function after all expansion. I'll talk about specific additions to this function in the Neural Network sections below._

In [15]:
from keras.datasets import fashion_mnist
from keras.utils import np_utils
import pandas as pd
import os

def main(args):
    # Loads CSV File
    (x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
    x_train, y_train, x_test, y_test = flatten_data(args, x_train, x_test, y_train, y_test)

    # Creates output directory
    if not os.path.isdir(args.out):
        os.makedirs(args.out)

    # Creates range to loop filter between
    change = 'activation'
    range = ['softplus', 'softsign', 'relu', 'tanh', 'sigmoid', 'hard_sigmoid', 'selu',
             'elu', 'linear']
    history_dict = {x: {'loss': 0.0, 'acc': 0.0} for x in range}

    # Runs Model
    for new in range:
        history_dict[new] = {'loss': [], 'acc': []}
        for loop in [1, 2, 3]:
            print('Creating Model with the {} {}'.format(new, change))
            model_params = params.standard()
            model_params['activate_1'] = new

            if args.model == 'rnn':
                model = models.basic_rnn(model_params, x_train.shape)
            elif args.model == 'neural':
                model = models.basic_neural(model_params, x_train.shape)
            else:
                model = models.double_cnn(model_params, x_train.shape)

            y_pred, metrics = models.fit_model(model, model_params, 
                                               x_train, y_train, x_test, y_test)

            # Adds Data to Trends
            history_dict[new]['loss'].append(metrics['loss'])
            history_dict[new]['acc'].append(metrics['acc'])

        # Calculates Average
        history_dict[new]['loss'] = np.mean(history_dict[new]['loss'])
        history_dict[new]['acc'] = np.mean(history_dict[new]['acc'])

        # Plots Confusion Matrix
        classes = {0: 'T-Shirt/top',
                   1: 'Trouser',
                   2: 'Pullover',
                   3: 'Dress',
                   4: 'Coat',
                   5: 'Sandal',
                   6: 'Shirt',
                   7: 'Sneaker',
                   8: 'Bag',
                   9: 'Ankle boot'}
        class_values = list(classes.values())
        title = "{} (Loss {} & Acc {})".format(new, metrics['loss'], metrics['acc'])
        conf_png = '{}/{}_{}.png'.format(args.out, new, change)
        plot.conf_matrix(y_test, y_pred, class_values, out=conf_png, title=title)

    # Plots Accuracy & Loss Trends
    trends_png = '{}/{}.png'.format(args.out, change)
    plot.dict_trends(history_dict, xlabel=change, out=trends_png)

    return x_train, y_train, x_test, y_test, y_pred


if __name__ == "__main__":
    ARGS = parse_args()
    x_train, y_train, x_test, y_test, y_pred = main(ARGS)

Using TensorFlow backend.


ModuleNotFoundError: No module named 'tensorflow'

----
# About Neural Networks

### Neural Network Principles
The primary Neural Network Layer in Keras is the **Dense** layer. In the most simple sense, this layer takes in an input, performs some calculation on them (typically a matrix vector multiplication type function), and outputs the data in some different dimensionality. This calculation is typically referred to as an **Activation** Function.

It's worth noting that when I first get my data, it's technically in four dimensions as: (Quantity of Image Samples, Colorscale, Width Pixels, and Height Pixels). In this case:
- Colorscale is always 1 because this ia greyscale image
- Each image is 28x28 samples.

I cannot immediately feed these images into the `Dense` Layer with that dimensionality. So I first need to send it through a **Flatten** layer which flattens it into a two dimension array as: (Quantity of Image Samples, Width Pixels * Height Pixels).


----
# Building a Neural Network
### Layers I'll Use
I wanted to begin by creating a basic neural network with only two Neural Layers. 
- The first layer will be a `Dense` layer and I will cycle between various activation functions to find the ideal one. (This of course, will happen after a `Flatten` layer. I will use the nadam optimzier initially, although I'll probably test this in a second experiment.
- The second layer will be a `Dense` layer and this will stay under the `Softmax` Activation Function.

Softmax is a logarithmic function which assigns probabilities for each possible option so that all options add to 1. Because of this, Softmax is regarded as one of the best 'final' functions to classify results. 

Speaking of classification, there has been some research in foregoing this final fit function, and rather, sending this data to another machine learning model and having that do the classification instead. It's plausible that given the success of Random Forests earlier, that Random Forests would do a better job at classifying the data produced by the Neural Network, than the Neural Network itself.



### First Experiment: Testing Activation Functions in a Neural Network
This is the code for my first Neural Network model. It only has two Dense Layers and we will loop between these **activation functions**:
* `softplus`
* `softsign`
* `relu`
* `thanh`
* `sigmoid`
* `hard_sigmoid`
* `selu`
* `elu`
* `linear`

![Activation](Jupyter/Activation.tiff)

For now, I'm using the `nadam` optimizer & `categorical_crossentropy` loss function. More about the loss function later.

In [17]:
def basic_neural(model_params, shape):
    """ Builds basic neural network model """
    from keras.layers import Dense, Flatten, InputLayer
    from keras.layers.normalization import BatchNormalization
    from keras.models import Sequential

    model = Sequential()

    model.add(InputLayer(input_shape=(shape[1], shape[2], shape[3])))
    model.add(BatchNormalization())

    model.add(Flatten())

    model.add(Dense(model_params['dense_1'], activation=model_params['activate_1']))
    model.add(Dense(28, activation='softmax'))

    model.compile(loss=model_params['loss'],
                  optimizer=model_params['optimizer'],
                  metrics=['accuracy'])

    print(model.summary())
    return model

In [18]:
def main(args):
    # Loads CSV File
    (x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
    x_train, y_train, x_test, y_test = flatten_data(args, x_train, x_test, y_train, y_test)

    # Creates output directory
    if not os.path.isdir(args.out):
        os.makedirs(args.out)

    # Creates range to loop filter between
    change = 'activation'
    range = ['softplus', 'softsign', 'relu', 'tanh', 'sigmoid', 'hard_sigmoid', 'selu',
             'elu', 'linear']
    history_dict = {x: {'loss': 0.0, 'acc': 0.0} for x in range}

    # Runs Model
    for new in range:
        history_dict[new] = {'loss': [], 'acc': []}
        for loop in [1, 2, 3]:
            print('Creating Model with the {} {}'.format(new, change))
            model_params = params.standard()
            model_params['activate_1'] = new

            model = models.basic_neural(model_params, x_train.shape)

            y_pred, metrics = models.fit_model(model, model_params, 
                                               x_train, y_train, x_test, y_test)

            # Adds Data to Trends
            history_dict[new]['loss'].append(metrics['loss'])
            history_dict[new]['acc'].append(metrics['acc'])

        # Calculates Average
        history_dict[new]['loss'] = np.mean(history_dict[new]['loss'])
        history_dict[new]['acc'] = np.mean(history_dict[new]['acc'])

        # Plots Confusion Matrix
        classes = {0: 'T-Shirt/top',
                   1: 'Trouser',
                   2: 'Pullover',
                   3: 'Dress',
                   4: 'Coat',
                   5: 'Sandal',
                   6: 'Shirt',
                   7: 'Sneaker',
                   8: 'Bag',
                   9: 'Ankle boot'}
        class_values = list(classes.values())
        title = "{} (Loss {} & Acc {})".format(new, metrics['loss'], metrics['acc'])
        conf_png = '{}/{}_{}.png'.format(args.out, new, change)
        plot.conf_matrix(y_test, y_pred, class_values, out=conf_png, title=title)

    # Plots Accuracy & Loss Trends
    trends_png = '{}/{}.png'.format(args.out, change)
    plot.dict_trends(history_dict, xlabel=change, out=trends_png)

    return x_train, y_train, x_test, y_test, y_pred

main(ARGS)

Below are the results from the test. We are considering two metrics here: `Accuracy` and `Loss`.

* `Accuracy`... speaks for itself. The higher the accuracy, the better.
* `Loss`, from a high-level point of view, calculates if the model is over-training to the data. The lower the loss, the better.

I'm using the `categorical_crossentropy` method to compute loss. This method, unlike many of the other options, works for categorical classification problems where multiple classes are possible, such as this problem.

![Neural_Activation](TensorData/Neural_Activation/Activation.png)

All of the activation functions, besides for Linear, appeared to perform well. While Sigmoid had the lowest loss, Relu & Softplus had the highest accuracy. 

I chose Sigmoid because it had reasonable accuracy to the other activation functions, but had notably lower loss. The confusion matrix for this is attached below.

![Neural_Sigmoid](TensorData/Neural_Activation/sigmoid_activation.png)

I chose to stick to the **Sigmoid** Activation Function for the Basic Neural Network. For reference, the Sigmoid Function typically looks like:

![Wikipedia_Sigmoid](https://upload.wikimedia.org/wikipedia/commons/thumb/8/88/Logistic-curve.svg/320px-Logistic-curve.svg.png)

## Second Experiment: Testing Optimizers
I now wanted to test the **optimizers** used by the first Neural Network Layer across these options:

- `RMSprop`
- `Adagrad`
- `Adadelta`
- `Adam`
- `Adamax`
- `Nadam`

To save on space, I won't replicate the code I used to run it, but it basically just involved me swapping out the 'range' and 'change' variables from main(). Below are the results _(disregard how the plot says 'epoch' rather than 'optimizer'. That was a formatting bug which did not affect the results)_:

![Neural_Optimizer](TensorData/Neural_Optimizer/optimizer.png)

The differences between each optimizer are marginal, but **Adam** optimizer had both the lowest loss & highest accuracy. Its Confusion Matrix is attached below:

![Neural_Adagrad](TensorData/Neural_Optimizer/Adam_optimizer.png)

## Experiment Three (and the last for Neural Networks): Epochs
An **Epoch** is a single pass of the data through the neural network model during the fitting process. Right now, I was using a default value of '8'. More Epochs usually increase accuracy, but it runs the risk of increasing loss & runtime (both are bad). 

Here are the runtime results for the epochs I tested for each pass of the model _(technically I run each model through an epoch setting three times, so I divide the elapsed run time by three for these results)_:
* 1 Epoch: 1 Second
* 4 Epochs: 4 Seconds
* 8 Epochs: 8 Seconds
* 12 Epochs: 12 Seconds
* 16 Epochs: 17 Seconds
* 20 Epochs: 21 Seconds
* 24 Epochs: 25 Seconds

And below are the actual metrics:
![Neural_Epoch](TensorData/Neural_Epoch/epoch.png)

We can see that at around 12-16 Epoch, we hit the highest accuracy before the loss begins increasing. The confusion matrix for **16 Epochs** is attached below, although any range between 12-16 seems to be optimal.
![Neural_16Epoch](TensorData/Neural_Epoch/16_epoch.png)

## Neural Network Conclusions
I ultimately got 0.328 Loss & 0.882 Accuracy using a two layer neural network, where the first layer used the Adam Optimizer and Sigmoid Activation Function and the second layer used the Softmax Activation Function. This appears to be relatively true at both 12 and 16 Epochs which takes 12 & 17 seconds to run each.

Recall that for the Machine Learning Models:
* SVC had the highest accuracy at 0.897 but needing 1:12 hours to run.
* Random Forests had a decent accuracy at 0.879 but needing 8 minutes to run.

Our model fell right in the middle with 0.882 accuracy, but only needed 12-17 seconds to run. This gives us the opportunity to add more Neural Network layers which would most likely increase accuracy.

Another way we could increase accuracy is by investigating other types of Neural Networks (or rather, Neural Network Layers) and involving them in our mix.

----
# About Convolutional Neural Networks

### Convolutional Neural Network Principles
_(All GIFs in this section are obtained from https://hackernoon.com/visualizing-parts-of-convolutional-neural-networks-using-keras-and-cats-5cc01b214e59)_

A convolutional neural network does not eliminate the layers from the neural networks, but rather, augments it with its own layers. The CNN is primarily driven by the **Convolutional Layer,** which effectively is another way to simplify the data.

![ConvolutionalLayer](https://cdn-images-1.medium.com/max/1600/1*ZCjPUFrB6eHPRi4eyP6aaA.gif)

As seen in the image above,
* The sliding yellow window is the _Kernel_
* The _Stride_ of the kernel refers to how many 'pixels' it moves in each move
* Each pixel has a _Filter_. A filter is a combination of weights (denoted in red text) and the weights change to accomodate what the CNN is learning. We multiply the weight to whatever value was originally in that square.

This produces a _convolved feature_. There are further types of layers which we can do at this point to reduce the size of this convolved feature. One of those types is **Max Pooling** or **Average Pooling** in which we create a kernel on this convolved feature and completely move it to seperate regions, selecting either the single highest or the average value across all the values within that kernel.

![Pooling](https://cdn-images-1.medium.com/max/800/1*Feiexqhmvh9xMGVVJweXhg.gif)

We can also create **Dropout** Layers which will temporarily turn off certain outputs while training the model, to help reduce the risk that we're overfitting the model.

----
# Building a CNN: Part 1

### Four Parts
To reiterate, the four permutations of layers I have initially are:

1. Convolutional Layer
2. Convolutional Layer + Max Pooling
3. Convolutional Layer + Average Pooling
4. Convolutional Layer + (whichever pooling wins) + Dropout

I first wanted to begin with Convolutional Layer just to figure out my convolutional parameters & epochs before proceeding.

Recall that in the Neural Network test, we determined that 12 epochs with a Dense Layer set to these values performed the best.
* `Adam` Optimizer
* `Sigmoid` Activation Function

I'm going to carry these neural network settings into my CNN.

### Components for the Convolutional Layer
The convolutional layer has these parameters:
* Number of Filters: Which is the number of layers requested in the output
* A tuple containing the size of the kernel and the stride value
* And something I'll keep static is the _padding_, which is how it'll ensure how each kernel doesn't go past the edge of the image.

In [None]:
def basic_cnn(model_params, shape):
    """ Builds basic Convolutional neural network model """
    from keras.layers import Dense, Dropout, Flatten, InputLayer, MaxPooling2D
    from keras.layers.normalization import BatchNormalization
    from keras.layers.convolutional import Conv2D
    from keras.models import Sequential

    model = Sequential()

    model.add(InputLayer(input_shape=(shape[1], shape[2], shape[3])))
    model.add(BatchNormalization())

    model.add(Conv2D(model_params['conv_filters'],
                     (model_params['nb_pool'], model_params['nb_conv']),
                     padding='same'))
    model.add(Flatten())

    model.add(Dense(model_params['dense_1'], activation=model_params['activate_1']))
    model.add(Dense(28, activation='softmax'))

    model.compile(loss=model_params['loss'],
                  optimizer=model_params['optimizer'],
                  metrics=['accuracy'])

    print(model.summary())
    return model

def main(args):
    # Loads CSV File
    (x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
    x_train, y_train, x_test, y_test = flatten_data(args, x_train, x_test, y_train, y_test)

    # Creates output directory
    if not os.path.isdir(args.out):
        os.makedirs(args.out)

    # Creates range to loop filter between
    change = 'conv_filters'
    range = [4, 14, 24, 32]
    history_dict = {x: {'loss': 0.0, 'acc': 0.0} for x in range}

    # Runs Model
    for new in range:
        history_dict[new] = {'loss': [], 'acc': []}
        for loop in [1, 2, 3]:
            print('Creating Model with the {} {}'.format(new, change))
            model_params = params.standard()
            model_params['conv_filters'] = new

            model = models.basic_cnn(model_params, x_train.shape)

            y_pred, metrics = models.fit_model(model, model_params, 
                                               x_train, y_train, x_test, y_test)

            # Adds Data to Trends
            history_dict[new]['loss'].append(metrics['loss'])
            history_dict[new]['acc'].append(metrics['acc'])

        # Calculates Average
        history_dict[new]['loss'] = np.mean(history_dict[new]['loss'])
        history_dict[new]['acc'] = np.mean(history_dict[new]['acc'])

        # Plots Confusion Matrix
        classes = {0: 'T-Shirt/top',
                   1: 'Trouser',
                   2: 'Pullover',
                   3: 'Dress',
                   4: 'Coat',
                   5: 'Sandal',
                   6: 'Shirt',
                   7: 'Sneaker',
                   8: 'Bag',
                   9: 'Ankle boot'}
        class_values = list(classes.values())
        title = "{} (Loss {} & Acc {})".format(new, metrics['loss'], metrics['acc'])
        conf_png = '{}/{}_{}.png'.format(args.out, new, change)
        plot.conf_matrix(y_test, y_pred, class_values, out=conf_png, title=title)

    # Plots Accuracy & Loss Trends
    trends_png = '{}/{}.png'.format(args.out, change)
    plot.dict_trends(history_dict, xlabel=change, out=trends_png)

    return x_train, y_train, x_test, y_test, y_pred

main(ARGS)

### Experiment 1: Testing Number of Filters (Output Dimensionality)
I wanted to modify the dimensionality of my output filters first, just to get a general idea of where in the world my output filter quantity should be.

Let's first keep in mind that this is a 28 x 28 array. If I have a kernel that is 2x2, sliding at a stride of 2 pixels each, this would give me a theoratical ideal output of 14x14. 

I chose to initially loop between these values of filters, with special emphasis around '14' filters: [4, 8, 12, 14, 16, 20, 24, 28, 32]. The results are attached:

![CNN_Filters](TensorData/CNN_FilterTake2/conv_filters.png)

Using 14 filters gave me the best results and its confusion matrix is attached below:
![CNN_14Filters](TensorData/CNN_FilterTake2/14_conv_filters.png)

Something interesting was that when I tested 4 filters, I had runtimes of 48 seconds. But anything greater than that gave me a 50% runtime reduction. On hindsight, this isn't that surprising since there is less data reduction needed at the higher values, but I didn't expect the runtime to drop this dramatically and then plateau even as I added more filters.

Recall that the basic neural network had an Accuracy of 0.8821 & Loss of 0.3279. By adding the convolutional layer, albid at a very basic state, we were able to improve those two values slightly.

### Experiment 2: Testing Kernel Sizes
The next parameter to modify was the kernel size which would slide through the image. Because this is a 28x28 image, I wanted to test sizes which presumably would make mathematical sense for such an image. As a result, I tested with [1, 2, 3, 4, 5, 6, 7]. 

![CNN_Kernel](TensorData/CNN_KernelSize/kernel_size.png)

_While the kernel size of 7 isn't plotted below, it was tested, and its results were worse than kernel size 6._

The Kernel Size of 2 had the greatest accuracy with lowest loss as noted below:
![CNN_2Kernel](TensorData/CNN_KernelSize/2_kernel_size.png)

Runtime for each larger kernel size grew linearly. Each epoch was the same amount of seconds as the given kernel size. For example, kernel size 1 had a runtime of 12 Epoch x 1 Size = 12 Seconds and kernel size 2 had a runtime of 12 Epoch x 2 Size = 24 Seconds. 