# Assignment 1 

This notebook contains the code code to train and evaluate a deep learning classifier on the MNIST dataset. It does this with the help of the keras framework, build_deep_nn from a1.py and with the use of keras_tuner to define the optimal parameters of the neural network.

In [1]:
import tensorflow as tf
tf.random.set_seed(42)

import random
random.seed(42)

import numpy as np
np.random.seed(42)

tf.config.experimental.list_physical_devices()

[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]

In [2]:
import keras
keras.__version__

'2.10.0'

## Data Loading and Preprocessing

First we must import the MNIST dataset from `keras.datasets`, then we load the data into the instance. As the training and testing data are already on the MNIST dataset, we only have to separate the training and testing with the appropriate labels from the set of four Numpy arrays.

In [3]:
from keras.datasets import mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images = train_images.astype('float32') / 255 # normalize the training images

test_images = test_images.astype('float32') / 255 # normalize the testing images

train_images.shape
test_images.shape


(10000, 28, 28)

The `train_images` and `train_labels` are for the training set whereas the `test_images` and `test_labels` are for the testing set

---

## Model Building 

Once the training and testing data has been loaded, we can now construct a neural network using the following parameters from the MNIST training data to train the neural network.


From above, we can see that the shape, i.e. the input size is 28 * 28 and so input rows and columns are `28`. When building the model below, we need to consider the fact that MNIST images are grayscale and so the number of channels would be `1`. For the `hidden_sizes`, I have used a minimum number of neurons of `32` to a maximum of `512 `where stepping would be done at `32`.

For each model, we will try a different number of hidden layers from 1 to 3. Each layer will have the same number of neurons. This can be seen from the code line ``hp.Int('num_hidden', min_value=1, max_value=3)``

For the dropout layer, we are using a min_value of `0` and max of `0.5` with a step of 0.1.

I have also used the ``sparse_categorical_crossentropy`` loss function so that we can compile the model without having to separately one-hot encode the data.

Using ``adam`` I could ensure that we got the best of both worlds from Momentum and RMSProp. Moreover, ``adam`` can change its learning rate adaptively. This would enable us to come to a good solution faster by accurately and efficiently navigating through the loss.

In [4]:
from keras_tuner import RandomSearch
import a1

## This code was adapted from three sources: https://www.tensorflow.org/tutorials/keras/keras_tuner, https://www.analyticsvidhya.com/blog/2021/08/hyperparameter-tuning-of-neural-networks-using-keras-tuner/ and https://chat.openai.com/

def build_model(hp):
    # Initialize the Sequential API and start stacking the layers using build_deep_nn from a1.py
    model = a1.build_deep_nn(
        28,
        28,
        1,
        hp.Int('num_hidden', min_value=1, max_value=3),
        (hp.Int('hidden_size', min_value=32, max_value=512, step=32),) * 3,
        (hp.Float('dropout_rate', min_value=0.0, max_value=0.5, step=0.1),) * 3,
        10,
        'softmax'
    )
    
    model.compile(optimizer='adam',
                loss='sparse_categorical_crossentropy', 
                metrics=['accuracy'])
    
    print(model.fit(train_images, train_labels, epochs=5, validation_split=0.1))
    
    evaluation = model.evaluate(test_images, test_labels, return_dict=True)
    print("\nEvaluation Metrics:")
    for metric, value in evaluation.items():
        print(f"{metric}: {value}")

    print("\nModel Summary:")
    print(model.summary())
    return model



Now that we have created the model, we can use it to tune the parameters to answer our questions:
1. What are the hyperparameters of the optimal model?
2. What are the accuracy results of the optimal model on the test set?

## Hyperparameter Tuning

Using the RandomSearch from keras.tuner, I will now perform hyperparameter tuning on the model above. It is used to search the space for the optimal configuration for our neural network model by randomly sampling hyperparameter combinations. 

The `tuner.search` method performs the following steps for each trial:
1.  Builds the model using the specified hyperparameters.
2.  Trains the model on the training data for the specified number of epochs.
3.  Evaluates the model's performance on the validation set.
4.  Updates the tuner's internal state based on the trial's performance.

In [5]:
# This tuner was adapted from https://www.tensorflow.org/tutorials/keras/keras_tuner

tuner = RandomSearch(
    # Function to build hte model
    build_model, 
    
    # Objective to optimize the model for
    objective='accuracy',
    
    #Maximum sets of different hyperparameters to try
    max_trials=5,
    
    seed = 42
)

tuner.search(train_images, train_labels, epochs=5, validation_split=0.1)


Trial 5 Complete [00h 00m 37s]
accuracy: 0.9322592616081238

Best accuracy So Far: 0.9944815039634705
Total elapsed time: 00h 03m 23s


## Model Training and Evaluation

In this section, I will train the model using the optimal hyperparameters that I have found from the Keras Tuner and evaluate the performance. The metrics are displayed below.

In [6]:
# Get the best hyperparameters
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]

# Build the model with the best hyperparameters and train it
model = build_model(best_hps)
model.fit(train_images, train_labels, epochs=10, validation_split=0.1)

# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(test_images, test_labels)

num_hidden = best_hps.get('num_hidden')
hidden_size = best_hps.get('hidden_size')
dropout_rate = best_hps.get('dropout_rate')


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
<keras.callbacks.History object at 0x00000224E0B77160>

Evaluation Metrics:
loss: 0.0683172270655632
accuracy: 0.9793999791145325

Model Summary:
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_1 (Flatten)         (None, 784)               0         
                                                                 
 dense_4 (Dense)             (None, 480)               376800    
                                                                 
 dropout_3 (Dropout)         (None, 480)               0         
                                                                 
 dense_5 (Dense)             (None, 10)                4810      
                                                                 
Total params: 381,610
Trainable params: 381,610
Non-trainable params: 0
__________________________________________________

In [7]:
print("Test accuracy:", test_acc)
print("\nBest hyperparameters: ")

print("num_hidden: ", num_hidden) 
print("hidden_size: ", hidden_size) 
print("dropout_rate: ", dropout_rate)

Test accuracy: 0.9817000031471252

Best hyperparameters: 
num_hidden:  1
hidden_size:  480
dropout_rate:  0.1


From Above we can see the `best hyperparameters` and below we can see the `best accuracy` for those hyperparameters under Evaluation metrics. 

In [8]:
best_model = build_model(best_hps)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
<keras.callbacks.History object at 0x00000224E9E18730>

Evaluation Metrics:
loss: 0.06653013825416565
accuracy: 0.9818000197410583

Model Summary:
Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_2 (Flatten)         (None, 784)               0         
                                                                 
 dense_6 (Dense)             (None, 480)               376800    
                                                                 
 dropout_4 (Dropout)         (None, 480)               0         
                                                                 
 dense_7 (Dense)             (None, 10)                4810      
                                                                 
Total params: 381,610
Trainable params: 381,610
Non-trainable params: 0
_________________________________________________

## Results and Discussion

### Hyperparameters of the optimal model are:

- Number of hidden layers: 1

- Size of the hidden layers: 480

- Dropout rate of the final hidden layer: 0.1

### Accuracy results:

- Evaluated accuracy: 0.9818 $\approx$ 98.2% 

## Use of AI generators in this assignment

I acknowledge the use of ChatGPT in the drafting and proofreading of this assignment.

I have used ChatGPT in various parts of this assignment to ask about keras_tuner and neural_network algorithms and codes to be inspired by to learn and create this assignment. I have furthered my understanding by visiting the sources including the websites: https://www.analyticsvidhya.com/blog/, https://www.tensorflow.org/tutorials/keras/keras_tuner, and https://stackoverflow.com/ to further my understanding of the best practice on top of the lecture and workshop content. I have used these websites as source of inspiration and learning purely, including proofreading on ChatGPT.

General prompts such as: `How do I build a deep neural network` for the deep neural network on a1.py and autogenerated lines of text to finish the sentences of the notebook and code analysis such as `how could I improve my model` for code analysis to understand the flaws with complete explanations for my understanding of the code itself and reasoning. I have also used ChatGPT to navigate through the variables and parameter names of the libraries.