# IT5005 Artificial Intelligence Term Assignment

Fill your name and student number below:


| Student Number: | Name:                   |
|:----------------|:------------------------|
| A0002533Y       | Lee Ming Xuan           |



## 1. Introduction

In this assignment we hope to achieve the following:

    1. An understanding of the practical limitations of using dense networks in complex tasks
    2. Hands-on experience in building a deep learning neural network to solve a relatively complex task.
    
As this lab is more challenging than the previous labs, please work in teams of two persons. Please use the respective categories in the LumiNUS Forum under the "Labs" Heading to find a partner within your own group.

Each step may take a long time to run. You and your partner may want to work out how to do things simultaneously, but please do not miss out on any learning opportunities.


## 2. Submission Instructions


### 2.1 SUBMISSION INSTRUCTIONS

Please rename this Jupyter notebook to your student ID (e.g. A1234567Y.ipynb), complete it and submit to Canvas by 12 pm, Sunday 23 April 2023.

The folder will close shortly after 12 pm on 23 April, after which you will no longer be able to submit your assignment and you will get 0.


## 3. Creating a Dense Network for CIFAR-10

We will now begin building a neural network for the CIFAR-10 dataset. The CIFAR-10 dataset consists of 50,000 32x32x3 (32x32 pixels, RGB channels) training images and 10,000 testing images (also 32x32x3), divided into the following 10 categories:

    1. Airplane
    2. Automobile
    3. Bird
    4. Cat
    5. Deer
    6. Dog
    7. Frog
    8. Horse
    9. Ship
    10. Truck
    
In the first two parts of this lab we will create a classifier for the CIFAR-10 dataset.

### 3.1 Loading the Dataset

We begin firstly by creating a Dense neural network for CIFAR-10. The code below shows how we load the CIFAR-10 dataset:


In [1]:
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.datasets import cifar10

def load_cifar10():
    (train_x, train_y), (test_x, test_y) = cifar10.load_data()
    train_x = train_x.reshape(train_x.shape[0], 3072) # Question 1
    test_x = test_x.reshape(test_x.shape[0], 3072) # Question 1
    train_x = train_x.astype('float32')
    test_x = test_x.astype('float32')
    train_x /= 255.0
    test_x /= 255.0
    ret_train_y = to_categorical(train_y,10)
    ret_test_y = to_categorical(test_y, 10)
    
    return (train_x, ret_train_y), (test_x, ret_test_y)


(train_x, train_y), (test_x, test_y) = load_cifar10()

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


----

#### Question 1

Explain what the following two  statements do, and where the number "3072" came from (2 MARKS):

```
  train_x = train_x.reshape(train_x.shape[0], 3072) # Question 1
  test_x = test_x.reshape(test_x.shape[0], 3072) # Question 1
```

**ANSWER: It comes from 32 x 32 x 3 which is the datasize of each image. currently it is in a 3D array, in which we are trying to flatten into a to 1D array.**

*FOR GRADER: _______ / 2*

### 3.2 Building the MLP Classifier

In the code box below, create a new fully connected (dense) multilayer perceptron classifier for the CIFAR-10 dataset. To begin with, create a network with one hidden layer of 1024 neurons, using the SGD optimizer. You should output the training and validation accuracy at every epoch, and train for 50 epochs:


In [None]:
""" 
Write your code to build an MLP with one hidden layer of 1024 neurons,
with an SGD optimizer. Train for 50 epochs, and output the training and
validation accuracy at each epoch.
"""

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD

# Create the neural network
nn = Sequential()
nn.add(Dense(1024, input_shape = (3072, ), activation = 'relu'))
nn.add(Dense(10, activation = 'softmax'))

# Create our optimizer
sgd = SGD(learning_rate = 0.1)

# Selection loss function and metrics for accuracy
nn.compile(loss='categorical_crossentropy', optimizer=sgd, 
          metrics = 'accuracy')

# Runn neural network
nn.fit(train_x, train_y, shuffle = True, epochs = 50, 
      validation_data = (test_x, test_y))

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x7f168eeb53d0>

#### Question 2

Complete the following table on the design choices for your MLP 
(3 MARKS):

| Hyperparameter       | What I used | Why?                  |
|:---------------------|:------------|:----------------------|
| Optimizer            | SGD         | Specified in question |
| # of hidden layers   | 1           | Specified in question |
| # of hidden neurons  | 1024        | Specified in question |
| Hid layer activation | ReLu        | Common activation function in hidden layer to introduce non-linearity|
| # of output neurons  | 10          |There are 10 possible categories for the solution, and the y dataset is an array of size 10|
| Output activation    |softmax      |For multiclass  classification problems, and outputs a statistical distribution for the 10 output|
| lr                   |0.1          |Start with low learning rate to prevent huge swings in initial learnings |
| momentum             |None    |To not include momentum for the first run|
| decay                |None |Used for slowing down learning rate overtime to prevent overfitting. As unsure if there will be even after 50 epochs, to not include first|
| loss                 |categorical cross entropy|For multiclass classification problems|

*For TA: ___ / 3* <br>
*Code:  ____/ 5* <br>
**TOTAL: ____ / 8** <br>

#### Question 3:

What was your final training accuracy? Validation accuracy? Is there overfitting / underfitting? Explain your answer (5 MARKS)

**The final training accuracy was 0.6791 and validation accuracy was 0.4976. There is likely some overfitting as while the training accuracy of the model was still slowly improving in the last few epochs, the validation accuracy has not improved beyond 0.5.**

*FOR GRADER: ______ / 5*

### 3.3 Experimenting with the MLP

Cut and paste your code from Section 3.2 to the box below (you may need to rename your MLP). Experiment with the number of hidden layers, the number of neurons in each hidden layer, the optimization algorithm, etc. See [Keras Optimizers](https://keras.io/optimizers) for the types of optimizers and their parameters. **Train for 100 epochs.**


In [5]:
"""
Cut and paste your code from Section 3.2 below, then modify it to get
much better results than what you had earlier. E.g. increase the number of
nodes in the hidden layer, increase the number of hidden layers,
change the optimizer, etc. 

Train for 100 epochs.

"""
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam

# Create the neural network
nn = Sequential()
nn.add(Dense(512, input_shape = (3072, ), activation = 'relu'))
nn.add(Dense(256, activation = 'relu'))
nn.add(Dense(128, activation = 'relu'))
nn.add(Dense(10, activation = 'softmax'))

# Create our optimizer
optimizer = Adam()

# Selection loss function and metrics for accuracy
nn.compile(loss='categorical_crossentropy', optimizer=optimizer, 
          metrics = 'accuracy')

# Runn neural network
nn.fit(train_x, train_y, shuffle = True, epochs = 100, 
      validation_data = (test_x, test_y))

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x7f61637bb1f0>

----

#### Question 4:

Complete the following table with your final design (you may add more rows for the # neurons (layer1) etc. to detail how many neurons you have in each hidden layer). Likewise you may replace the lr, momentum etc rows with parameters more appropriate to the optimizer that you have chosen. (3 MARKS)


| Hyperparameter       | What I used | Why?                  |
|:---------------------|:------------|:----------------------|
| Optimizer            | Adam        | It has adaptive learning rate and will increase/decrease the rate according to the gradient last few gradients |
| # of hidden layers   |  2          | To help to filter out the most important data in the first layer before selecting the category in the second |
| # neurons(layer1)    | 512      |Reduce the num of neurons to reduce complexity of the model and hopefully reduce overfitting |
| Hid layer1 activation| relu        | Common activation function for hidden layers |
| # neurons(layer2)    | 256         | add a 2nd layer for more dimensionality to the model (have multiple lines as it is multi-class classification problem) |
| Hid layer2 activation| relu        | Common activation function for hidden layers |
| # neurons(layer2)    | 128         | add a 3rd layer for more dimensionality to the model (have multiple lines as it is multi-class classification problem) |
| Hid layer2 activation| relu        | Common activation function for hidden layers |
| # of output neurons  | 10          | 10 categories of images |
| Output activation    | softmax     | For multiclass classification problem   |
| lr                   | none        | not required for Adam Optimizer as it can adapt |
| momentum             | none        | not required for Adam Optimizer as it can adapt |
| decay                | none        | not required for Adam Optimizer as it can adapt |
| loss                 | category cross entropy |For multiclass classification problem |

*FOR GRADER: _____ / 3 * <br>
*CODE: ______ / 5 *<br>

***TOTAL: ______ / 8***

#### Question 5

What is the final training and validation accuracy that you obtained after 150 epochs. Is there considerable improvement over Section 3.2? Are there still signs of underfitting or overfitting? Explain your answer (5 MARKS)

**The final training and validation accuracy is 0.7404 and 0.4711 respectively. There is no considerable improvement as the validation accuracy is still as low, capping at around 0.49 - 0.5. While the model is able to achieve a higher training accuracy this is mainly due to overfitting. The validation accuracy has stagnated since the 20th epoch and slowly came down after the 35th epoch.**

*FOR GRADER: ______ / 5 *

#### Question 6

Write a short reflection on the practical difficulties of using a dense MLP to classsify images in the CIFAR-10 datasets. (3 MARKS)

**The accurancy of the model remain quite low despite adding additional layers to the neural network. We might need to further reduce the complexity of the model to see if it could reduce overfitting. It might be important to use other techniques such as convolution to simplify the input dataset before training the model to improve the results. By using convolution, we can do some feature selection first and use the new features to help with the classification problem (e.g. finding a wheel which can show that it is a automobile or truck or plane).**

**Overfitting also seem to be an unavoidable issue even by changing the complexity and layers in the neural network. The model will tend towards memorising the dataset than the features from the dataset which lead to it not being able to predict well of outcome on hidden datasets.**

*FOR GRADER: _______ /3*

----

## 4. Creating a CNN for the MNIST Data Set

In this section we will now create a convolutional neural network (CNN) to classify images in the MNIST dataset that we used in the previous lab. Let's go through each part to see how to do this.

### 4.1 Loading the MNIST Dataset

As always we will load the MNIST dataset, scale the inputs to between 0 and 1, and convert the Y labels to one-hot vectors. However unlike before we will not flatten the 28x28 image to a 784 element vector, since CNNs can inherently handle 2D data.

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

def load_mnist():
    (train_x, train_y),(test_x, test_y) = mnist.load_data()
    train_x = train_x.reshape(train_x.shape[0], 28, 28, 1)
    test_x = test_x.reshape(test_x.shape[0], 28, 28, 1)

    train_x=train_x.astype('float32')
    test_x = test_x.astype('float32')
    
    train_x /= 255.0
    test_x /= 255.0
        
    train_y = to_categorical(train_y, 10)
    test_y = to_categorical(test_y, 10)
        
    return (train_x, train_y), (test_x, test_y) 

### 4.2 Building the CNN

We will now build the CNN. Unlike before we will create a function to produce the CNN. We will also look at how to save and load Keras models using "checkpoints", particularly "ModelCheckpoint" that saves the model each epoch.

Let's begin by creating the model. We call os.path.exists to see if a model file exists, and call "load_model" if it does. Otherwise we create a new model.



In [None]:
# load_model loads a model from a hd5 file.
from keras.models import Sequential, load_model
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
import os

MODEL_NAME = 'mnist-cnn.hd5'

def buildmodel(model_name):
    if os.path.exists(model_name):
        model = load_model(model_name)                                                                                             
    else:
        model = Sequential()
        model.add(Conv2D(32, kernel_size=(5,5),
        activation='relu',
        input_shape=(28, 28, 1), padding='same')) # Question 7

        model.add(MaxPooling2D(pool_size=(2,2), strides=2)) # Question 8
        model.add(Conv2D(64, kernel_size=(5,5), activation='relu'))
        model.add(Conv2D(128, kernel_size=(5,5), activation='relu'))
        model.add(Conv2D(64, kernel_size=(5,5), activation='relu'))
        model.add(MaxPooling2D(pool_size=(2,2), strides=2))
        model.add(Flatten()) # Question 9
        model.add(Dense(1024, activation='relu'))
        model.add(Dropout(0.1))
        model.add(Dense(10, activation='softmax'))

    return model



----

#### Question 7

The first layer in our CNN is a 2D convolution kernel, shown here:

```
        model.add(Conv2D(32, kernel_size=(5,5),
        activation='relu',
        input_shape=(28, 28, 1), padding='same')) # Question 7
```

Why is the input_shape set to (28, 28, 1)? What does this mean? What does "padding = 'same'" mean? (4 MARKS)

**The input_shape is set to same size of one data point of a 28x28 image. This means that each image is one datapoint. Padding = same refers to that after convolution, we want to retain the size of the output as the input (i.e. 28x28x1 in this case).**

*FOR GRADER: ______ / 4*

#### Question 8

The second layer is the MaxPooling2D layer shown below:

```
        model.add(MaxPooling2D(pool_size=(2,2), strides=2)) # Question 8
```

What other types of pooling layers are available? What does 'strides = 2' mean? (3 MARKS)

**The other type of pooling is average pooling in which you take the mean values of a region in the input and convert it into one output as compared to maxpooling in which you take the max values of a region and convert it into the output. Strides refers to the number of steps you take each time when you pool. Strides = 2 means that you move by 2 steps in the array either right or down. As the pool_size is 2x2, there is no overlapping in the pooling of each region.**

*FOR GRADER: _____ / 3*


#### Question 9

What does the "Flatten" layer here do? Why is it needed?

```
        model.add(Flatten()) # Question 9
```
**Flatten refers to changing the input array into 1 1D array which in this case is a 32x32=784 element vector. This is required before passing to the dense layer as the layer dense can only read a 1D array.**

*FOR GRADER: ____ / 2*




----

### 4.3 Training the CNN

Let's now train the CNN. In this example we introduce the idea of a "callback", which is a routine that Keras calls at the end of each epoch. Specifically we look at two callbacks:

    1. ModelCheckpoint: When called, Keras saves the model to the specified filename.
    
    2. EarlyStopping: When called, Keras checks if it should stop the training prematurely.
    

Let's look at the code to see how training is done, and how callbacks are used.

In [None]:
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

def train(model, train_x, train_y, epochs, test_x, test_y, model_name):

    model.compile(optimizer=SGD(lr=0.01, momentum=0.7), 
                  loss='categorical_crossentropy', metrics=['accuracy'])

    savemodel = ModelCheckpoint(model_name)
    stopmodel = EarlyStopping(min_delta=0.001, patience=10) # Question 10

    print("Starting training.")

    model.fit(x=train_x, y=train_y, batch_size=32,
    validation_data=(test_x, test_y), shuffle=True,
    epochs=epochs, 
    callbacks=[savemodel, stopmodel])

    print("Done. Now evaluating.")
    loss, acc = model.evaluate(x=test_x, y=test_y)
    print("Test accuracy: %3.2f, loss: %3.2f"%(acc, loss))

Notice that there isn't very much that is unusual going on; we compile the model with our loss function and optimizer, then call fit, and finally evaluate to look at the final accuracy for the test set.  The only thing unusual is the "callbacks" parameter here in the fit function call

```
    model.fit(x=train_x, y=train_y, batch_size=32,
    validation_data=(test_x, test_y), shuffle=True,
    epochs=epochs, 
    callbacks=[savemodel, stopmodel])
```

----

#### Question 10.

What does do the min_delta and patience parameters do in the EarlyStopping callback, as shown below? (2 MARKS)

```
    stopmodel = EarlyStopping(min_delta=0.001, patience=10) # Question 10
```
**Min_delta refers to the minimum change in the monitored matric to qualify as an improvement. If the change is lesser than the min_delta value, then it will not be considered an improvement. Patience is the parameter to stop the training when a set number of epochs with no improvement is reached. So for this case, after 10 epochs of no improvement, the model will stop running.**

---

### 4.4 Putting it together.

Now let's run the code and see how it goes (Note: To save time we are training for only 5 epochs; we should train much longer to get much better results):

In [None]:
    (train_x, train_y),(test_x, test_y) = load_mnist()
    model = buildmodel(MODEL_NAME)
    train(model, train_x, train_y, 5, test_x, test_y, MODEL_NAME)
    

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz




Starting training.
Epoch 1/5



Epoch 2/5



Epoch 3/5



Epoch 4/5



Epoch 5/5



Done. Now evaluating.
Test accuracy: 0.99, loss: 0.03


----

#### Question 11.

Compare the relative advantages and disadvantages of CNN vs. the Dense MLP that you build in sections 3.2 and 3.3. What makes CNNs better (or worse)? (3 MARKS)

**The CNN helps to reduce the numbers of input point into the dense layer and hence complexity of the model. How the reduced number of input point is also learnt through gradient descent and improved using the dataset. However, convolution is meant to extract higher level features in a dataset. If there are no strong higher level features in the dataset to help train the model, it might make the model less accurate.**

*FOR TA: ______ / 3*

## 5. Making a CNN for the CIFAR-10 Dataset

Now comes the fun part: Using the example above for creating a CNN for the MNIST dataset, now create a CNN in the box below for the MNIST-10 dataset. At the end of each epoch save the model to a file called "cifar.hd5" (note: the .hd5 is added automatically for you).

---

#### Question 12.

Summarize your design in the table below (the actual coding cell comes after this):

| Hyperparameter       | What I used | Why?                  |
|:---------------------|:------------|:----------------------|
| Optimizer            |Adam        | It has adaptive learning rate and will increase/decrease the rate according to the gradient last few gradients |
| Input shape          |32x32x3      | Set to the same shape as the raw dataset|
| First layer          |Conv3D       |To perform first feature selection and select the important data |
| Second layer         |Maxpooling   |Pool to reduce the dataset|
| Third layer          |Conv3D       |To perform a feature selections of the simple features from the 1st convolution layer and further simpilfy the data|
| Fourth layer         |Maxpooling   |Pool again to reduce the dataset|
| Fifith layer         |Flatten      |To flatten before sending to dense layer|
| Dense layer          |1024         |To be comparable to the non-CNN model trained in part 1|


*FOR TA:*
*Table: ________ / 3* <br>
*Code: _________/ 7* <br>
**TOTAL: _______ / 10** <br>

---

***TOTAL: _______ / 55***

In [None]:
"""
Write your code for your CNN for the CIFAR-10 dataset here. 

Note: train_x, train_y, test_x, test_y were changed when we called 
load_mnist in the previous section. You will now need to call load_cifar10
again.

"""
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.datasets import cifar10

#load the cifar10 data
(train_x, train_y), (test_x, test_y) = cifar10.load_data()
train_x = train_x.astype('float32')
test_x = test_x.astype('float32')
train_x /= 255.0
test_x /= 255.0
train_y = to_categorical(train_y,10)
test_y = to_categorical(test_y, 10)

#code for the CNN model
model_name = 'cifar.hd5'

if os.path.exists(model_name):
    model = load_model(model_name)                                                                                             
else:
    model = Sequential()
    model.add(Conv2D(32, kernel_size=(5,5), activation='relu', input_shape=(32, 32, 3), padding='same')) 
    model.add(MaxPooling2D(pool_size=(2,2), strides=2)) 
    model.add(Conv2D(64, kernel_size=(5,5), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2,2), strides=2))
    model.add(Flatten()) 
    model.add(Dense(1024, activation='relu'))
    model.add(Dropout(0.1))
    model.add(Dense(10, activation='softmax'))

model.compile(optimizer=SGD(learning_rate=0.01, momentum=0.9), 
                  loss='categorical_crossentropy', metrics=['accuracy'])
#Create a place for saving the checkpoints
savemodel = ModelCheckpoint(model_name)

#Run the model
model.fit(x=train_x, y=train_y, batch_size=32, validation_data=(test_x, test_y), shuffle=True, epochs=5, callbacks=[savemodel])

#output the final results
loss, acc = model.evaluate(x=test_x, y=test_y)
print("Test accuracy: %3.2f, loss: %3.2f"%(acc, loss))

Epoch 1/5



Epoch 2/5



Epoch 3/5



Epoch 4/5



Epoch 5/5



Test accuracy: 0.67, loss: 1.65
