# Introduction
The MNIST dataset is regarded as the "Hello World!" of the data scientists journey into machine learning, especially using neural networks for classification problems. It comprises of an extremely well pre-processed and labeled dataset comprising of approxiamtly 70 000 handwritten digits ranging from 0-9. Credit for the dataset goes to Yan LeCun (director of AI research at Facebook),Corinna Cortes and Christopher Burges (yann.lecun.com).



We will use the ADAM optimiser for backpropagation and tweak the learing rate as a hyper-parameter. The loss will be determined using the sparse_categorical_cross_entropy since the targets should be 1-hot encoded.



# Problem
This is a visual problem, so we can intuitivly assess the accuracy of the model apart from quantifying it through scores and probability. The images were originally stored as 28x28 pixel images in grayscale.

# Goal
The objective is to create a model using a neural net to accuratly predict what digit is represented in each image. Specifically the technique used will be supervised machine learning since we have labels associated with the data. The accuracy must be as high as possible using different configurations for the neural net.

# Method
## Input data
The images were originally stored as 28x28 pixel data. This is mathematically equivalent to a 28x28 matrix with values ranging from 0-255 (the numeric values corresponding to different shades of gray from black to white) and is how we expect to load the data. A likely approach to handle this will be to “flatten” the matrix into a vector of size 1x784 (the input layer shape of the neural net). Each input neuron would therefore represent the intensity of the grey for a specific pixel in the image. 

## Forward propagation in hidden layers
Each input will be weighted and biased linearly (dot product) to each neuron in the following layer(s) and finally transformed non-linearly using a specific activation function before being forward propagated into the next layer. This is then repeated for n hidden layers. 

## Output data
A 1-hot encoded output lends itself nicely to a classification system rather than a vector of probabilities. The final activation function will be a Softmax function since the output vector should 1x10 in size and be 1-hot encoded. The final activation function will be a Softmax function since the output vector should be 1x10 in size and be 1-hot encoded.

## Back propagation
The Adaptive Moment Estimation (ADAM) optimiser will be used with different learing rates. The loss will be determined using sparse categorical cross entropy since the targets should be 1-hot encoded.

## Steps
### Import packages
### Helper functions
### Load the data
### Additional Preprocessing
#### Determine validation size
#### Standardise or scale the data
#### Shuffle the data
#### Split into training, validation and test sets
### Outline the model(s)
### Train the model(s)
### Select the best model
### Test the model

### Import packages

In [2]:
#table and array packages
import pandas as pd
import numpy as np

# neural net package
import tensorflow as tf

# dataset
import tensorflow_datasets as tfds

### Helper Functions

In [3]:
# a helper function to cast a value into tf.int64
def cast_to_tf_integer(x):
    # if x is not a tf.int64 object then
    if type(x) is not tf.int64:
        # cast x as a tf.int64
        x = tf.cast(x, tf.int64) 
    return x

# a helper function to cast a value into tf.float32
def cast_to_tf_float(x):
    # if x is not a tf.float32 object then
    if type(x) is not tf.float32:
        #cast x as a tf.float32
        x = tf.cast(x, tf.float32)
    return x

# a percent scaling function
## input = f(x)
## g(x) = f(x)/255
## y = g(x) = f(x)/255 = input/255
def percent_scale(x, label):
    # cast to a tf.float32
    x = cast_to_tf_float(x)
    # scale by dividing by the max possible value ensuring it is a float
    x /= 255.
    return x, label
    

### Load the data

In [4]:
# Load the data for supervised learning into a variable and extract the data information. 
raw_data, data_info = tfds.load(name='mnist', with_info=True, as_supervised=True)

print(f"#####################\nDATA INFO\n#####################{data_info}\n#####################")

#####################
DATA INFO
#####################tfds.core.DatasetInfo(
    name='mnist',
    full_name='mnist/3.0.1',
    description="""
    The MNIST database of handwritten digits.
    """,
    homepage='http://yann.lecun.com/exdb/mnist/',
    data_path='C:\\Users\\pjjvn\\tensorflow_datasets\\mnist\\3.0.1',
    download_size=11.06 MiB,
    dataset_size=21.00 MiB,
    features=FeaturesDict({
        'image': Image(shape=(28, 28, 1), dtype=tf.uint8),
        'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
    }),
    supervised_keys=('image', 'label'),
    disable_shuffling=False,
    splits={
        'test': <SplitInfo num_examples=10000, num_shards=1>,
        'train': <SplitInfo num_examples=60000, num_shards=1>,
    },
    citation="""@article{lecun2010mnist,
      title={MNIST handwritten digit database},
      author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
      journal={ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist},
      volume={

### Additional Preprocessing
#### Determine Validation size

In [5]:
# extract the training and test sets
train_set, test_set = raw_data['train'], raw_data['test']

# create a validation set from the train data since it is sufficiently large
# use the 10% of the train set size
validation_size =0.1*data_info.splits['train'].num_examples

# ensure the sizes of the train, validation and test sets are tf.int64
train_size = cast_to_tf_integer(data_info.splits['train'].num_examples)
validation_size = cast_to_tf_integer(validation_size)
test_size = cast_to_tf_integer(data_info.splits['test'].num_examples)

#### Standardise or scale the data

In [6]:
# scale using a custom function
scaled_train_set = train_set.map(percent_scale)
scaled_test_set = test_set.map(percent_scale)

#### Shuffle the train data

In [7]:
# buffer size
BUFFER_SIZE = 10000

# shuffle
scaled_shuffled_train_set = scaled_train_set.shuffle(
                                                    buffer_size = BUFFER_SIZE,
                                                    seed=None,
                                                    reshuffle_each_iteration=None)

#### Split into training, validation and testing sets

In [35]:
scaled_shuffled_train_set

<ShuffleDataset shapes: ((28, 28, 1), ()), types: (tf.float32, tf.int64)>

In [8]:
# extract the validation set
validation_set = scaled_shuffled_train_set.take(validation_size)

# exclude the validation set from the training set
training_set = scaled_shuffled_train_set.skip(validation_size)

# set the new training size and ensure it is a tf.int64
training_size = len(training_set)
training_size = cast_to_tf_integer(training_size)

# set the testing_set
testing_set = scaled_test_set
testing_size = len(testing_set)

print(f"#####################\nDATA SIZES\n#####################\nTraining Size: {training_size}\nValidation Size: {validation_size}\nTesting Size: {testing_size}\n#####################")


#####################
DATA SIZES
#####################
Training Size: 54000
Validation Size: 6000
Testing Size: 10000
#####################


In [34]:
training_set

<SkipDataset shapes: ((28, 28, 1), ()), types: (tf.float32, tf.int64)>

#### Batch the data

In [9]:
# set the batch size of the training set
BATCH_SIZE = 100

# batch the traingin set
batched_training_set = training_set.batch(BATCH_SIZE)

# the model expects the validation in batch form. We need the entire set per
# batch so the batch length should be the length of the set to get 1 batch
batched_validation_set = validation_set.batch(validation_size)

# the model expects the test_set in batch form. We need the entire set per
# batch so the batch length should be the length of the set to get 1 batch
batched_testing_set = testing_set.batch(testing_size)

#### Seperate inputs and targets of the training and validation set

In [10]:
# iter is the syntax for making an object iterable, without "loading the data"
# next iterates over an interable like in a for loop
validation_inputs, validation_targets = next(iter(batched_validation_set))

### Outline the model(s)
#### Input layer
The input is a 28x28x1 tensor (rank 3) that is to be transformed into a 784x1 vector. 

In [12]:
# define the length
input_length = 784

#### Hidden layers
We can use d hidden layers as the depth and w neurons for the width of each layer. 

In [13]:
# a list of hidden layers widths and depths for 3 models
hidden_layer_widths = [50,200]
hidden_layer_depths = [2]

#### Output layer
The output is a 10x1 vector that is 1-hot encoded.

In [14]:
# define the length
output_length = 10

#### Optimizer
The optimizer will be an adjustable ADAM optimizer

In [15]:
# define the optimizer
LEARNING_RATE = 0.001
adam_optimizer = tf.keras.optimizers.Adam(learning_rate=LEARNING_RATE)

#### Max Epochs

In [16]:
MAX_EPOCHS = 100

#### Early Stop
Early stopping will be used to prevent over fitting

In [17]:
# define the early stopping
early_stop = tf.keras.callbacks.EarlyStopping(
monitor='val_loss',
min_delta=0,
patience=1,
verbose=0,
mode='min',
restore_best_weights=True)

#### Model constructs

In [29]:


# Create the model contructs
def construct(output_length, optimizer='adam', mode='sequential', flatten_input=True, list_of_widths=[50], list_of_depths=[2]):
    dict = {}
    #if mode is sequential then
    if mode.lower() == 'sequential': 
        # for the number of widths
        for w in list_of_widths:
            # for the number of depths 
            for d in list_of_depths:
                # create a label for the dictionary
                label = str(w) + 'x' + str(d)
                print(f"Constructing a {label} model")
                # instantiate the model object
                if flatten_input is True:
                    model = tf.keras.Sequential([
                        # input layer:
                        #           flatten the tensor into a vector using an inbuilt method of tensor flow
                        tf.keras.layers.Flatten(input_shape=(28,28,1))])
                else:
                    model = tf.keras.Sequential()
                # for d layers
                for x in range(d):
                    # add a hidden layer of width w
                    model.add(tf.keras.layers.Dense(w,activation = 'relu'))
                # add the output layer
                model.add(tf.keras.layers.Dense(output_length,activation = 'softmax'))
                # choose the optimizer and loss functions
                model.compile(optimizer=optimizer,loss='sparse_categorical_crossentropy',metrics=['accuracy'])
                # save in the dicitonary
                dict[label] = model
    return dict

model_dict = construct(output_length=output_length, optimizer = adam_optimizer, mode='sequential', flatten_input=True, list_of_widths=hidden_layer_widths, list_of_depths=hidden_layer_depths)


Constructing a 50x2 model
Constructing a 200x2 model


### Fit the models - training

In [30]:
print(batched_training_set)
print(validation_set)
print(batched_testing_set)

<BatchDataset shapes: ((None, 28, 28, 1), (None,)), types: (tf.float32, tf.int64)>
<TakeDataset shapes: ((28, 28, 1), ()), types: (tf.float32, tf.int64)>
<BatchDataset shapes: ((None, 28, 28, 1), (None,)), types: (tf.float32, tf.int64)>


In [31]:
model_dict

{'50x2': <tensorflow.python.keras.engine.sequential.Sequential at 0x26d726ab6d0>,
 '200x2': <tensorflow.python.keras.engine.sequential.Sequential at 0x26d726b49a0>}

In [32]:
# for each model in the dictionary
for label in model_dict:
    # fit the model
    print(f"#####################\n{label} model\n#####################")
    model_dict[label].fit(
        batched_training_set,
        epochs=MAX_EPOCHS,
        callbacks=[early_stop],
        validation_data=(validation_inputs,validation_targets),
        verbose=2)
    print("\n")
    

#####################
50x2 model
#####################
Epoch 1/100
540/540 - 2s - loss: 0.2808 - accuracy: 0.9163 - val_loss: 0.1683 - val_accuracy: 0.9485
Epoch 2/100
540/540 - 1s - loss: 0.1447 - accuracy: 0.9565 - val_loss: 0.1342 - val_accuracy: 0.9597
Epoch 3/100
540/540 - 1s - loss: 0.1159 - accuracy: 0.9655 - val_loss: 0.1054 - val_accuracy: 0.9668
Epoch 4/100
540/540 - 1s - loss: 0.0953 - accuracy: 0.9713 - val_loss: 0.0956 - val_accuracy: 0.9712
Epoch 5/100
540/540 - 1s - loss: 0.0838 - accuracy: 0.9747 - val_loss: 0.0889 - val_accuracy: 0.9748
Epoch 6/100
540/540 - 1s - loss: 0.0737 - accuracy: 0.9774 - val_loss: 0.0745 - val_accuracy: 0.9793
Epoch 7/100
540/540 - 1s - loss: 0.0632 - accuracy: 0.9803 - val_loss: 0.0720 - val_accuracy: 0.9783
Epoch 8/100
540/540 - 1s - loss: 0.0566 - accuracy: 0.9831 - val_loss: 0.0632 - val_accuracy: 0.9788
Epoch 9/100
540/540 - 1s - loss: 0.0517 - accuracy: 0.9841 - val_loss: 0.0623 - val_accuracy: 0.9817
Epoch 10/100
540/540 - 1s - loss: 0.

### Test the models

In [33]:
# for each model in the dictionary
for label in model_dict:
    # extract the model
    model = model_dict[label]
    # test the model
    print(f"#####################\n{label} model\n#####################")
    test_loss, test_accuracy = model.evaluate(batched_testing_set)
    print(f"Test Loss: {test_loss}'\nTest Accuracy: {test_accuracy}")
    # save the model
    import os
    working_directory = os.getcwd()
    print(working_directory)
    model.save_weights(working_directory+f'\\{label}_model_trained')

#####################
50x2 model
#####################
Test Loss: 0.09716800600290298'
Test Accuracy: 0.973800003528595
C:\Users\pjjvn\Google Drive\Programming\MNIST_analysis
#####################
200x2 model
#####################
Test Loss: 0.06730654090642929'
Test Accuracy: 0.9796000123023987
C:\Users\pjjvn\Google Drive\Programming\MNIST_analysis
