# CS3237 Lab 3 Introduction to Deep Learning


| Student Number: | Name:                   |
|:----------------|:------------------------|
| A0242607J     | Mitchell Kok Ming En |
| A0196650X     | Jordan Yoong Jia En  |



## 0. Special Note

Due to changes in Keras and Tensorflow, the code provided in this lab may break with the most recent versions of Tensorflow or Keras. In such event, replace all "keras" with "tensorflow.keras". E.g. change "from keras.layers import Dense" to "from tensorflow.keras.layers import Dense".


## 1. Introduction

We will achieve the following objectives in this lab:

    1. An understanding of the practical limitations of using dense networks in complex tasks
    2. Hands-on experience in building a deep learning neural network to solve a relatively complex task.
    
As this lab is more challenging than the previous labs, please work in teams of two persons. Please use the respective categories in the LumiNUS Forum under the "Labs" Heading to find a partner within your own group.

Each step may take a long time to run. You and your partner may want to work out how to do things simultaneously, but please do not miss out on any learning opportunities.


## 2. Submission Instructions

Please work together as a team of 2 to complete this lab. You will need to submit ONE copy of this notebook per team, but please fill in the names of both team members above. This lab is worth 55 marks.

**DO NOT SUBMIT MORE THAN ONE COPY OF THIS LAB!**

### 2.1 SUBMISSION INSTRUCTIONS

Please submit this completed Jupyter Notebook (cs3237lab2.ipynb) to the Files -> Labs -> Lab Submissions -> Lab2 Submissions-> Group Bxx folder by 1.00 PM on the following dates:

 1. Group B1: Friday 10 September, 1 pm.
 2. Group B2: Monday 13 September, 1 pm.
 3. Group B3: Sunday 12 September, 1 pm.
 
### 2.2 LATE SUBMISSION POLICY

If you submit between 1 pm and 1.15 pm, a 5 mark penalty will be levied. Submission is strictly not allowed once the folder closes, and you will receive 0 for the lab. NO EXCEPTIONS WILL BE MADE.


## 3. Creating a Dense Network for CIFAR-10

We will now begin building a neural network for the CIFAR-10 dataset. The CIFAR-10 dataset consists of 50,000 32x32x3 (32x32 pixels, RGB channels) training images and 10,000 testing images (also 32x32x3), divided into the following 10 categories:

    1. Airplane
    2. Automobile
    3. Bird
    4. Cat
    5. Deer
    6. Dog
    7. Frog
    8. Horse
    9. Ship
    10. Truck
    
In the first two parts of this lab we will create a classifier for the CIFAR-10 dataset.

### 3.1 Loading the Dataset

We begin firstly by creating a Dense neural network for CIFAR-10. The code below shows how we load the CIFAR-10 dataset:


In [10]:
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.datasets import cifar10

def load_cifar10():
    (train_x, train_y), (test_x, test_y) = cifar10.load_data()
    train_x = train_x.reshape(train_x.shape[0], 3072) # Question 1
    test_x = test_x.reshape(test_x.shape[0], 3072) # Question 1
    train_x = train_x.astype('float32')
    test_x = test_x.astype('float32')
    train_x /= 255.0
    test_x /= 255.0
    ret_train_y = to_categorical(train_y,10)
    ret_test_y = to_categorical(test_y, 10)
    
    return (train_x, ret_train_y), (test_x, ret_test_y)

(train_x, train_y), (test_x, test_y) = load_cifar10()

----

#### Question 1

Explain what the following two  statements do, and where the number "3072" came from (2 MARKS):

```
  train_x = train_x.reshape(train_x.shape[0], 3072) # Question 1
  test_x = test_x.reshape(test_x.shape[0], 3072) # Question 1
```

***ANSWER: ".shape[0]" returns the number of rows in each set. ".reshape" then reshapes the datasets into the shape (n, 3072). n is number of rows given by the return value of ".shape[0]", which is the number of images in each set. 3072 is the number of columns given by the number of pixels in each image (32 * 32 * 3 = 3072, accounting for RGB channels). This process is done for both train_x and test_x.**

*FOR GRADER: _______ / 2*

### 3.2 Building the MLP Classifier

In the code box below, create a new fully connected (dense) multilayer perceptron classifier for the CIFAR-10 dataset. To begin with, create a network with one hidden layer of 1024 neurons, using the SGD optimizer. You should output the training and validation accuracy at every epoch, and train for 50 epochs:


In [2]:
""" 
Write your code to build an MLP with one hidden layer of 1024 neurons,
with an SGD optimizer. Train for 50 epochs, and output the training and
validation accuracy at each epoch.
"""

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD

model = Sequential()

# First hidden layer
model.add(Dense(1024, input_shape = (3072, ), activation = 'relu'))

# Output with softmax activation
model.add(Dense(10, activation = 'softmax'))

sgd  = SGD(learning_rate = 0.01)
model.compile(loss = 'categorical_crossentropy', optimizer = sgd,
             metrics = 'accuracy')

model.fit(x = train_x, y = train_y, shuffle = True, epochs = 50, validation_data = (test_x, test_y))

2021-09-19 01:08:17.683775: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-19 01:08:17.712488: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2021-09-19 01:08:17.712502: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1835] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-09-19 01:08:17.713180: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x7fb1b8e562e0>

#### Question 2

Complete the following table on the design choices for your MLP 
(3 MARKS):

| Hyperparameter       | What I used | Why?                  |
|:---------------------|:------------|:----------------------|
| Optimizer            | SGD         | Specified in question |
| # of hidden layers   | 1           | Specified in question |
| # of hidden neurons  | 1024        | Specified in question |
| Hid layer activation | ReLu        | Suitable for classification |
| # of output neurons  | 10          | Equal to number of the categories |
| Output activation    | Softmax     | Suitable for classification, for use with categorical cross entropy loss function |
| lr                   |             |                       |
| momentum             |             |                       |
| decay                |             |                       |
| loss                 | Categorical Cross Entropy | For multiclass classification |

*For TA: ___ / 3* <br>
*Code:  ____/ 5* <br>
**TOTAL: ____ / 8** <br>

#### Question 3:

What was your final training accuracy? Validation accuracy? Is there overfitting / underfitting? Explain your answer (5 MARKS)

***Training accuracy = 0.7260. Validation Accuracy = 0.5199. There appears to be overfitting as the training accuracy is significantly higher than the validation accuracy.***

*FOR GRADER: ______ / 5*

### 3.3 Experimenting with the MLP

Cut and paste your code from Section 3.2 to the box below (you may need to rename your MLP). Experiment with the number of hidden layers, the number of neurons in each hidden layer, the optimization algorithm, etc. See [Keras Optimizers](https://keras.io/optimizers) for the types of optimizers and their parameters. **Train for 100 epochs.**


In [None]:
"""
Cut and paste your code from Section 3.2 below, then modify it to get
much better results than what you had earlier. E.g. increase the number of
nodes in the hidden layer, increase the number of hidden layers,
change the optimizer, etc. 

Train for 100 epochs.

"""
from tensorflow.keras.layers import Dropout
from tensorflow.keras.optimizers import Adam

model2 = Sequential()

# First hidden layer
model2.add(Dense(2048, input_shape = (3072, ), activation = 'relu'))
# model2.add(Dropout(0.3))

# model2.add(Dense(512, activation = 'relu'))
# model2.add(Dropout(0.3))

# Output with softmax activation
model2.add(Dense(10, activation = 'softmax'))

adam = Adam(learning_rate=0.01)
sgd  = SGD(learning_rate = 0.01, decay = 1e-6, momentum = 0.5)
model2.compile(loss = 'categorical_crossentropy', optimizer = sgd,
             metrics = 'accuracy')

model2.fit(x = train_x, y = train_y, shuffle = True, epochs = 50, validation_data = (test_x, test_y))


----

#### Question 4:

Complete the following table with your final design (you may add more rows for the # neurons (layer1) etc. to detail how many neurons you have in each hidden layer). Likewise you may replace the lr, momentum etc rows with parameters more appropriate to the optimizer that you have chosen. (3 MARKS)


| Hyperparameter       | What I used | Why?                  |
|:---------------------|:------------|:----------------------|
| Optimizer            |             |                       |
| # of hidden layers   |             |                       |
| # neurons(layer1)    |             |                       |
| Hid layer1 activation|             |                       |
| # neurons(layer2)    |             |                       |
| Hid layer2 activation|             |                       |
| # of output neurons  |             |                       |
| Output activation    |             |                       |
| lr                   |             |                       |
| momentum             |             |                       |
| decay                |             |                       |
| loss                 |             |                       |

*FOR GRADER: _____ / 3 * <br>
*CODE: ______ / 5 *<br>

***TOTAL: ______ / 8***

#### Question 5

What is the final training and validation accuracy that you obtained after 150 epochs. Is there considerable improvement over Section 3.2? Are there still signs of underfitting or overfitting? Explain your answer (5 MARKS)

***Write your answers here***

*FOR GRADER: ______ / 5 *

#### Question 6

Write a short reflection on the practical difficulties of using a dense MLP to classsify images in the CIFAR-10 datasets. (3 MARKS)

***Write your answers here***

*FOR GRADER: _______ /3*

----

## 4. Creating a CNN for the MNIST Data Set

In this section we will now create a convolutional neural network (CNN) to classify images in the MNIST dataset that we used in the previous lab. Let's go through each part to see how to do this.

### 4.1 Loading the MNIST Dataset

As always we will load the MNIST dataset, scale the inputs to between 0 and 1, and convert the Y labels to one-hot vectors. However unlike before we will not flatten the 28x28 image to a 784 element vector, since CNNs can inherently handle 2D data.

In [4]:
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

def load_mnist():
    (train_x, train_y),(test_x, test_y) = mnist.load_data()
    train_x = train_x.reshape(train_x.shape[0], 28, 28, 1)
    test_x = test_x.reshape(test_x.shape[0], 28, 28, 1)

    train_x=train_x.astype('float32')
    test_x = test_x.astype('float32')
    
    train_x /= 255.0
    test_x /= 255.0
        
    train_y = to_categorical(train_y, 10)
    test_y = to_categorical(test_y, 10)
        
    return (train_x, train_y), (test_x, test_y) 

### 4.2 Building the CNN

We will now build the CNN. Unlike before we will create a function to produce the CNN. We will also look at how to save and load Keras models using "checkpoints", particularly "ModelCheckpoint" that saves the model each epoch.

Let's begin by creating the model. We call os.path.exists to see if a model file exists, and call "load_model" if it does. Otherwise we create a new model.



In [5]:
# load_model loads a model from a hd5 file.
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
import os

MODEL_NAME = 'mnist-cnn.hd5'

def buildmodel(model_name):
    if os.path.exists(model_name):
        model = load_model(model_name)                                                                                             
    else:
        model = Sequential()
        model.add(Conv2D(32, kernel_size=(5,5),
        activation='relu',
        input_shape=(28, 28, 1), padding='same')) # Question 7

        model.add(MaxPooling2D(pool_size=(2,2), strides=2)) # Question 8
        model.add(Conv2D(64, kernel_size=(5,5), activation='relu'))
        model.add(Conv2D(128, kernel_size=(5,5), activation='relu'))
        model.add(Conv2D(64, kernel_size=(5,5), activation='relu'))
        model.add(MaxPooling2D(pool_size=(2,2), strides=2))
        model.add(Flatten()) # Question 9
        model.add(Dense(1024, activation='relu'))
        model.add(Dropout(0.1))
        model.add(Dense(10, activation='softmax'))

    return model



----

#### Question 7

The first layer in our CNN is a 2D convolution kernel, shown here:

```
        model.add(Conv2D(32, kernel_size=(5,5),
        activation='relu',
        input_shape=(28, 28, 1), padding='same')) # Question 7
```

Why is the input_shape set to (28, 28, 1)? What does this mean? What does "padding = 'same'" mean? (4 MARKS)

***"input_shape" informs the model that the input data (images from the MNIST dataset) have the dimensions of 28 * 28 * 1, indicating a 2D image with no colour data. "padding='same'" sets the layer outputs to have the same dimensions as its inputs - hence padding is required for the inputs.***

*FOR GRADER: ______ / 4*

#### Question 8

The second layer is the MaxPooling2D layer shown below:

```
        model.add(MaxPooling2D(pool_size=(2,2), strides=2)) # Question 8
```

What other types of pooling layers are available? What does 'strides = 2' mean? (3 MARKS)

***Available Layers: MaxPooling1D layer, MaxPooling2D layer, MaxPooling3D layer, AveragePooling1D layer, AveragePooling2D layer, AveragePooling3D layer, GlobalMaxPooling1D layer, GlobalMaxPooling2D layer,  GlobalMaxPooling3D layer, GlobalAveragePooling1D layer, GlobalAveragePooling2D layer, GlobalAveragePooling3D layer. "strides=2" informs the model to move the pooling window by 2 units in each dimension for each pooling step.***

*FOR GRADER: _____ / 3*


#### Question 9

What does the "Flatten" layer here do? Why is it needed?

```
        model.add(Flatten()) # Question 9
```

***The "Flatten" layer transforms the input shape to a 1-dimensional array. This ensures the shape of the data fits the shape of the dense layer following the "Flatten" layer.***

*FOR GRADER: ____ / 2*




----

### 4.3 Training the CNN

Let's now train the CNN. In this example we introduce the idea of a "callback", which is a routine that Keras calls at the end of each epoch. Specifically we look at two callbacks:

    1. ModelCheckpoint: When called, Keras saves the model to the specified filename.
    
    2. EarlyStopping: When called, Keras checks if it should stop the training prematurely.
    

Let's look at the code to see how training is done, and how callbacks are used.

In [6]:
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

def train(model, train_x, train_y, epochs, test_x, test_y, model_name):

    model.compile(optimizer=SGD(lr=0.01, momentum=0.7), 
                  loss='categorical_crossentropy', metrics=['accuracy'])

    savemodel = ModelCheckpoint(model_name)
    stopmodel = EarlyStopping(min_delta=0.001, patience=10) # Question 10

    print("Starting training.")

    model.fit(x=train_x, y=train_y, batch_size=32,
    validation_data=(test_x, test_y), shuffle=True,
    epochs=epochs, 
    callbacks=[savemodel, stopmodel])

    print("Done. Now evaluating.")
    loss, acc = model.evaluate(x=test_x, y=test_y)
    print("Test accuracy: %3.2f, loss: %3.2f"%(acc, loss))

Notice that there isn't very much that is unusual going on; we compile the model with our loss function and optimizer, then call fit, and finally evaluate to look at the final accuracy for the test set.  The only thing unusual is the "callbacks" parameter here in the fit function call

```
    model.fit(x=train_x, y=train_y, batch_size=32,
    validation_data=(test_x, test_y), shuffle=True,
    epochs=epochs, 
    callbacks=[savemodel, stopmodel])
```

----

#### Question 10.

What does do the min_delta and patience parameters do in the EarlyStopping callback, as shown below? (2 MARKS)

```
    stopmodel = EarlyStopping(min_delta=0.001, patience=10) # Question 10
```
***"min_delta" sets the minimum change required to qualify as an improvement - any change below min_delta will not register as an improvement. "patience" sets number of epochs with no improvement after which training will be stopped.***

---

### 4.4 Putting it together.

Now let's run the code and see how it goes (Note: To save time we are training for only 5 epochs; we should train much longer to get much better results):

In [7]:
    (train_x, train_y),(test_x, test_y) = load_mnist()
    model = buildmodel(MODEL_NAME)
    train(model, train_x, train_y, 5, test_x, test_y, MODEL_NAME)
    

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz




Starting training.
Epoch 1/5


2021-09-19 01:17:31.646743: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.


INFO:tensorflow:Assets written to: mnist-cnn.hd5/assets
Epoch 2/5
INFO:tensorflow:Assets written to: mnist-cnn.hd5/assets
Epoch 3/5
INFO:tensorflow:Assets written to: mnist-cnn.hd5/assets
Epoch 4/5
INFO:tensorflow:Assets written to: mnist-cnn.hd5/assets
Epoch 5/5
INFO:tensorflow:Assets written to: mnist-cnn.hd5/assets
Done. Now evaluating.
Test accuracy: 0.99, loss: 0.04


----

#### Question 11.

Compare the relative advantages and disadvantages of CNN vs. the Dense MLP that you build in sections 3.2 and 3.3. What makes CNNs better (or worse)? (3 MARKS)

***CNNs appear to be more effective at image classification - the CNN achieves higher accuracy with fewer epochs compared to the Dense MLP. This is because images have very high dimensionality, and CNNs are effective in reducing the number of parameters.***

*FOR TA: ______ / 3*

## 5. Making a CNN for the CIFAR-10 Dataset

Now comes the fun part: Using the example above for creating a CNN for the MNIST dataset, now create a CNN in the box below for the MNIST-10 dataset. At the end of each epoch save the model to a file called "cifar.hd5" (note: the .hd5 is added automatically for you).

---

#### Question 12.

Summarize your design in the table below (the actual coding cell comes after this):

| Hyperparameter       | What I used | Why?                  |
|:---------------------|:------------|:----------------------|
| Optimizer            | optimizer=SGD(learning_rate=0.01, momentum=0.7)|                       |
| Input shape          | input_shape=(32, 32, 3)| Match the image size for CIFAR-10 (32 * 32 * 3 for RGB)  |
| First layer          | model.add(MaxPooling2D(pool_size=(2,2), strides=2)) | Set pool size to 2*2 and stride to 2 |
| 2nd, 3rd, 4th layer         | model.add(Conv2D(128, kernel_size=(5,5), activation='relu')) | Convolution layer with kernel window 5*5 to balance accuracy and computational cost|
| Add more layers      |            - |                      - |
| if needed            |            - |                      - |
| Dense layer 1         |Dense(1024, activation='relu')| Hidden layer to help classification |
| Dense layer 2         |model.add(Dense(10, activation='softmax'))| Output layer for final classification |


*FOR TA:*
*Table: ________ / 3* <br>
*Code: _________/ 7* <br>
**TOTAL: _______ / 10** <br>

---

***TOTAL: _______ / 55***

In [27]:
"""
Write your code for your CNN for the CIFAR-10 dataset here. 

Note: train_x, train_y, test_x, test_y were changed when we called 
load_mnist in the previous section. You will now need to call load_cifar10
again.

"""

MODEL_NAME = 'cifar.hd5'

def load_cifar10():
    (train_x, train_y), (test_x, test_y) = cifar10.load_data()
    train_x = train_x.reshape(train_x.shape[0], 32,32,3)
    test_x = test_x.reshape(test_x.shape[0], 32,32,3)
    train_x = train_x.astype('float32')
    test_x = test_x.astype('float32')
    train_x /= 255.0
    test_x /= 255.0
    ret_train_y = to_categorical(train_y,10)
    ret_test_y = to_categorical(test_y, 10)
    
    return (train_x, ret_train_y), (test_x, ret_test_y)


def train(model, train_x, train_y, epochs, test_x, test_y, model_name):

    model.compile(optimizer=SGD(learning_rate=0.01, momentum=0.7), 
                  loss='categorical_crossentropy', metrics=['accuracy'])

    savemodel = ModelCheckpoint(model_name)
    stopmodel = EarlyStopping(min_delta=0.001, patience=10) # Question 10

    print("Starting training.")

    model.fit(x=train_x, y=train_y, batch_size=32,
    validation_data=(test_x, test_y), shuffle=True,
    epochs=epochs, 
    callbacks=[savemodel, stopmodel])

    print("Done. Now evaluating.")
    loss, acc = model.evaluate(x=test_x, y=test_y)
    print("Test accuracy: %3.2f, loss: %3.2f"%(acc, loss))

    
def buildmodel(model_name):
    if os.path.exists(model_name):
        model = load_model(model_name)                                                                                             
    else:
        model = Sequential()
        model.add(Conv2D(32, kernel_size=(5,5),
        activation='relu',
        input_shape=(32, 32, 3), padding='same')) # Question 7

        model.add(MaxPooling2D(pool_size=(2,2), strides=2)) # Question 8
        model.add(Conv2D(128, kernel_size=(5,5), activation='relu'))
        model.add(Conv2D(128, kernel_size=(5,5), activation='relu'))
        model.add(Conv2D(128, kernel_size=(5,5), activation='relu'))
        model.add(MaxPooling2D(pool_size=(2,2), strides=2))
        model.add(Flatten()) # Question 9
        model.add(Dense(1024, activation='relu'))
        model.add(Dropout(0.1))
        model.add(Dense(10, activation='softmax'))

    return model

(train_x, train_y), (test_x, test_y) = load_cifar10()
model = buildmodel(MODEL_NAME)
train(model, train_x, train_y, 5, test_x, test_y, MODEL_NAME)

Starting training.
Epoch 1/5
INFO:tensorflow:Assets written to: cifar.hd5/assets
Epoch 2/5
INFO:tensorflow:Assets written to: cifar.hd5/assets
Epoch 3/5
INFO:tensorflow:Assets written to: cifar.hd5/assets
Epoch 4/5
INFO:tensorflow:Assets written to: cifar.hd5/assets
Epoch 5/5
INFO:tensorflow:Assets written to: cifar.hd5/assets
Done. Now evaluating.
Test accuracy: 0.71, loss: 0.83
