<a href="https://colab.research.google.com/github/jdk455/NUS-Lab/blob/main/SWS3009Lab3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# SWS3009 Lab 3 Introduction to Deep Learning


|      Members            |
---------------------|
 |    CAI Jiajun              |
 |       MA Jiaolin            |
|   XIE Jinxiang             |

This lab should be done by both Deep Learning members of the team. Please ensure that you fill in the names of <b>both</b> team members in the spaces above. Answer <b>all</b> your questions on <b>this Python Notebook.</b>

## Submission Instructions

Please submit this Python notebook to Canvas on the deadline provided.

Marks will be awarded as follows:

**0 marks**: No/empty/Non-English submission

**1 mark** : Poor submission

**2 marks**: Acceptable submission

**3 marks**: Good submission


## 1. Introduction

We will achieve the following objectives in this lab:

    1. An understanding of the practical limitations of using dense networks in complex tasks
    2. Hands-on experience in building a deep learning neural network to solve a relatively complex task.
    

Each step may take a long time to run. You and your partner may want to work out how to do things simultaneously, but please do not miss out on any learning opportunities.


## 2. Submission Instructions

Please submit your answer book to Canvas by the deadline.

## 3. Creating a Dense Network for CIFAR-10

We will now begin building a neural network for the CIFAR-10 dataset. The CIFAR-10 dataset consists of 50,000 32x32x3 (32x32 pixels, RGB channels) training images and 10,000 testing images (also 32x32x3), divided into the following 10 categories:

    1. Airplane
    2. Automobile
    3. Bird
    4. Cat
    5. Deer
    6. Dog
    7. Frog
    8. Horse
    9. Ship
    10. Truck
    
In the first two parts of this lab we will create a classifier for the CIFAR-10 dataset.

### 3.1 Loading the Dataset

We begin firstly by creating a Dense neural network for CIFAR-10. The code below shows how we load the CIFAR-10 dataset:


In [1]:
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.datasets import cifar10

def load_cifar10():
    (train_x, train_y), (test_x, test_y) = cifar10.load_data()
    train_x = train_x.reshape(train_x.shape[0], 3072) # Question 1
    test_x = test_x.reshape(test_x.shape[0], 3072) # Question 1
    train_x = train_x.astype('float32')
    test_x = test_x.astype('float32')
    train_x /= 255.0
    test_x /= 255.0
    ret_train_y = to_categorical(train_y,10)
    ret_test_y = to_categorical(test_y, 10)

    return (train_x, ret_train_y), (test_x, ret_test_y)


(train_x, train_y), (test_x, test_y) = load_cifar10()

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


----

#### Question 1

Explain what the following two  statements do, and where the number "3072" came from:

```
  train_x = train_x.reshape(train_x.shape[0], 3072) # Question 1
  test_x = test_x.reshape(test_x.shape[0], 3072) # Question 1
```

**Please put your answers in the attached answer books**

Answer:

The two statements reshape the input data arrays train_x and test_x to have a specific shape, where each row represents a sample and each column represents a feature.

In this context, the number "3072" corresponds to the total number of features (or dimensions) in the input data. It is calculated based on the assumption that the input data is a three-dimensional array, representing images in the CIFAR-10 dataset, where each image has a shape of 32x32 pixels and three color channels (RGB).

The calculation of 3072 is derived as follows: 32 (height) x 32 (width) x 3 (RGB channels) = 3072.

The purpose of reshaping the data is often to transform the data into a format suitable for the subsequent steps of model training or evaluation. In this case, the reshaping ensures that each sample in the input data has a flat representation with 3072 features, allowing it to be compatible with the input requirements of the MLP classifier or any other model being used.

By reshaping the data in this way, each row in train_x and test_x represents an image sample with 3072 feature values. This reshaping is necessary because many machine learning algorithms, including MLPs, require the input data to be in a specific shape or format before training or making predictions.


### 3.2 Building the MLP Classifier

In the code box below, create a new fully connected (dense) multilayer perceptron classifier for the CIFAR-10 dataset. To begin with, create a network with one hidden layer of 1024 neurons, using the SGD optimizer. You should output the training and validation accuracy at every epoch, and train for 50 epochs:


In [2]:
"""
Write your code to build an MLP with one hidden layer of 1024 neurons,
with an SGD optimizer. Train for 50 epochs, and output the training and
validation accuracy at each epoch.
"""
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD

# Load CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Normalize pixel values between 0 and 1
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Convert labels to one-hot encoded vectors
num_classes = 10
y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)

# Define MLP architecture
model = Sequential()
model.add(Dense(1024, activation='relu', input_shape=(32*32*3,)))
model.add(Dense(num_classes, activation='softmax'))

# Compile the model
model.compile(loss='categorical_crossentropy', optimizer=SGD(), metrics=['accuracy'])

# Train the model
epochs = 50
batch_size = 128
history = model.fit(x_train.reshape(-1, 32*32*3), y_train, validation_data=(x_test.reshape(-1, 32*32*3), y_test),
                    epochs=epochs, batch_size=batch_size, verbose=1)

# Print training and validation accuracy at each epoch
for epoch in range(epochs):
    print("Epoch", epoch+1)
    print("Training accuracy:", history.history['accuracy'][epoch])
    print("Validation accuracy:", history.history['val_accuracy'][epoch])



Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Epoch 1
Training accuracy: 0.3098999857902527
Validation accuracy: 0.35740000009536743
Epoch 2
Training accuracy: 0.3730199933052063
Validation accuracy: 0.3840000033378601
Epoch 3
Training accuracy: 0.3975600004196167
Validation accuracy: 0.3815999925136566
Epoch 4
Training accuracy: 0.4129599928855896
Validation accuracy: 0.4146000146865845
Epoch 5
Training accuracy: 0.42660000920295715
Validation accura

#### Question 2

Complete the following table on the design choices for your MLP:

| Hyperparameter       | What I used | Why?                  |
|:---------------------|:------------|:----------------------|
| Optimizer            | SGD         | few hyperpara to fine-tune |
| # of hidden layers   | 1           |  |
| # of hidden neurons  | 1024        |  |
| Hid layer activation |  relu       |     time-efficient                  |
| # of output neurons  |    10       |       10 classes to classify               |
| Output activation    |softmax      |          make the sum of output is 1,each output neuron represent a possibility             |
| lr                   |     0.01     |         the default para of SGD              |
| momentum             |     0.9       |        the default para of SGD               |
| decay                |      0.01       |    the default para of SGD                   |
| loss                 | cross-entropy          |        For classification mission               |


#### Question 3:

What was your final training accuracy? Validation accuracy? Is there overfitting / underfitting? Explain your answer:

**The final training accuracy is 0.596, the final validation accuracy is 0.51. There is underfitting since the training accuracy is low.**


### 3.3 Experimenting with the MLP

Cut and paste your code from Section 3.2 to the box below (you may need to rename your MLP). Experiment with the number of hidden layers, the number of neurons in each hidden layer, the optimization algorithm, etc. See [Keras Optimizers](https://keras.io/optimizers) for the types of optimizers and their parameters. **Train for 100 epochs.**


In [3]:
"""
Cut and paste your code from Section 3.2 below, then modify it to get
much better results than what you had earlier. E.g. increase the number of
nodes in the hidden layer, increase the number of hidden layers,
change the optimizer, etc.

Train for 100 epochs.

"""

import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD, Adam

# Load CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Normalize pixel values between 0 and 1
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Convert labels to one-hot encoded vectors
num_classes = 10
y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)

# Define MLP architecture
model = Sequential()
model.add(Dense(1024, activation='relu', input_shape=(32*32*3,)))
model.add(Dense(512, activation='relu'))
model.add(Dense(256, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))

# Compile the model
optimizer = Adam(learning_rate=0.001)  # Experiment with different optimizers and parameters
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])

# Train the model
epochs = 100  # Increase the number of epochs
batch_size = 128
history = model.fit(x_train.reshape(-1, 32*32*3), y_train, validation_data=(x_test.reshape(-1, 32*32*3), y_test),
                    epochs=epochs, batch_size=batch_size, verbose=1)

# Print training and validation accuracy at each epoch
for epoch in range(epochs):
    print("Epoch", epoch+1)
    print("Training accuracy:", history.history['accuracy'][epoch])
    print("Validation accuracy:", history.history['val_accuracy'][epoch])


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

----

#### Question 4:

Complete the following table with your final design (you may add more rows for the # neurons (layer1) etc. to detail how many neurons you have in each hidden layer). Likewise you may replace the lr, momentum etc rows with parameters more appropriate to the optimizer that you have chosen.


| Hyperparameter       | What I used | Why?                  |
|:---------------------|:------------|:----------------------|
| Optimizer            |       Adam      |      robust to hyperparameter choices                  |
| # of hidden layers   |     3        |                       |
| # neurons(layer1)    |        1024     |                       |
| Hid layer1 activation|     relu        |  time-efficient                     |
| # neurons(layer2)    |         512    |                       |
| Hid layer2 activation|   relu          |   time-efficient                    |
| # neurons(layer3)    |         256    |                       |
| Hid layer3 activation|   relu          |  time-efficient                     |
| # of output neurons  |         10    |           10 classes for classification            |
| Output activation    |         softmax    |   make the sum of output is 1,each output neuron represent a possibility                    |
| lr                   |      0.001       |       default para of Adam                |
| beta_1             |         0.9    |    default para of Adam                    |
| beta_2               |         0.999    |          default para of Adam             |
| loss                 |       cross-entropy    |             For classification mission          |



#### Question 5

What is the final training and validation accuracy that you obtained after 150 epochs. Is there considerable improvement over Section 3.2? Are there still signs of underfitting or overfitting? Explain your answer.

**The training accuracy is about 0.94 and the validation accuracy is about 0.49. There is considerable improvement since Section 3.2 only reach around 0.6 training accuracy. There is a sign of overfitting since from around 70 epoches, the validation accuracy is declining.**


#### Question 6

Write a short reflection on the practical difficulties of using a dense MLP to classsify images in the CIFAR-10 datasets.



**Dimensionality**: The 32x32 color images have a total of 3,072 input features (32 x 32 x 3) when flattened to be fed into a dense MLP. This high dimensionality leads to a large number of parameters in the model, which can result in a slow training process and high computational requirements.

**Loss of spatial information**: Dense MLPs treat input features independently and do not take into account the spatial relationships between pixels in an image. By flattening the image input, we lose important spatial and structural information that can be crucial for recognizing objects and patterns in images.

**Overfitting**: Due to the high dimensionality and large number of parameters, dense MLPs are prone to overfitting, especially when the dataset is relatively small like CIFAR-10. Overfitting occurs when the model learns to memorize the training set instead of generalizing from the underlying patterns, resulting in poor performance on unseen data.

**Limited translation invariance**: Dense MLPs do not have built-in translation invariance, meaning they cannot recognize the same object if it appears in different parts of the image. This limitation makes it difficult for MLPs to classify images in the CIFAR-10 dataset, where objects can appear in various positions and orientations.

**Difficulty in handling varying scales**: Dense MLPs struggle to handle objects at varying scales within the image. Since the model does not have any mechanism to adapt its receptive field, it is not able to recognize objects at different scales effectively.



----

## 4. Creating a CNN for the MNIST Data Set

In this section we will now create a convolutional neural network (CNN) to classify images in the MNIST dataset that we used in the previous lab. Let's go through each part to see how to do this.

### 4.1 Loading the MNIST Dataset

As always we will load the MNIST dataset, scale the inputs to between 0 and 1, and convert the Y labels to one-hot vectors. However unlike before we will not flatten the 28x28 image to a 784 element vector, since CNNs can inherently handle 2D data.

In [4]:
from keras.datasets import mnist
from keras.utils import to_categorical

def load_mnist():
    (train_x, train_y),(test_x, test_y) = mnist.load_data()
    train_x = train_x.reshape(train_x.shape[0], 28, 28, 1)
    test_x = test_x.reshape(test_x.shape[0], 28, 28, 1)

    train_x=train_x.astype('float32')
    test_x = test_x.astype('float32')

    train_x /= 255.0
    test_x /= 255.0

    train_y = to_categorical(train_y, 10)
    test_y = to_categorical(test_y, 10)

    return (train_x, train_y), (test_x, test_y)

### 4.2 Building the CNN

We will now build the CNN. Unlike before we will create a function to produce the CNN. We will also look at how to save and load Keras models using "checkpoints", particularly "ModelCheckpoint" that saves the model each epoch.

Let's begin by creating the model. We call os.path.exists to see if a model file exists, and call "load_model" if it does. Otherwise we create a new model.



In [5]:
# load_model loads a model from a hd5 file.
from keras.models import Sequential, load_model
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
import os

MODEL_NAME = 'mnist-cnn.hd5'

def buildmodel(model_name):
    if os.path.exists(model_name):
        model = load_model(model_name)
    else:
        model = Sequential()
        model.add(Conv2D(32, kernel_size=(5,5),
        activation='relu',
        input_shape=(28, 28, 1), padding='same')) # Question 7

        model.add(MaxPooling2D(pool_size=(2,2), strides=2)) # Question 8
        model.add(Conv2D(64, kernel_size=(5,5), activation='relu'))
        model.add(Conv2D(128, kernel_size=(5,5), activation='relu'))
        model.add(Conv2D(64, kernel_size=(5,5), activation='relu'))
        model.add(MaxPooling2D(pool_size=(2,2), strides=2))
        model.add(Flatten()) # Question 9
        model.add(Dense(1024, activation='relu'))
        model.add(Dropout(0.1))
        model.add(Dense(10, activation='softmax'))

    return model



----

#### Question 7

The first layer in our CNN is a 2D convolution kernel, shown here:

```
        model.add(Conv2D(32, kernel_size=(5,5),
        activation='relu',
        input_shape=(28, 28, 1), padding='same')) # Question 7
```

Why is the input_shape set to (28, 28, 1)? What does this mean? What does "padding = 'same'" mean?

**The input_shape is set to (28, 28, 1) to specify the dimensions of the input images that the CNN expects. It means that the CNN expects grayscale images with a resolution of 28 pixels by 28 pixels and a single color channel.**


**The padding='same' parameter means that during the convolution operation, padding is applied to the input image in such a way that the output feature map has the same spatial dimensions as the input. Padding ensures that the information at the edges of the image is retained and helps avoid the loss of spatial information.**


#### Question 8

The second layer is the MaxPooling2D layer shown below:

```
        model.add(MaxPooling2D(pool_size=(2,2), strides=2)) # Question 8
```

What other types of pooling layers are available? What does 'strides = 2' mean?

**Other types of pooling layers available in TensorFlow's Keras API include:**

- AveragePooling2D: Computes the average value of each non-overlapping patch in the input feature map.

- GlobalAveragePooling2D: Computes the average value across the entire spatial dimensions of the input feature map. It reduces each feature map to a single value.

- GlobalMaxPooling2D: Computes the maximum value across the entire spatial dimensions of the input feature map. It reduces each feature map to a single value.

**strides=2 means that the pooling operation moves by a stride of 2 pixels horizontally and vertically when pooling the input feature map. **


#### Question 9

What does the "Flatten" layer here do? Why is it needed?

```
        model.add(Flatten()) # Question 9
```

**The "Flatten" layer in a neural network reshapes the output from the previous layers into a 1-dimensional array.**

**It is needed to prepare the data for fully connected layers that require a 1-dimensional input. The "Flatten" layer vectorizes the features, allowing the network to process and learn from the extracted features effectively. It transitions from spatial understanding to feature learning and maintains the correspondence between features from different spatial locations in the input data.**



----

### 4.3 Training the CNN

Let's now train the CNN. In this example we introduce the idea of a "callback", which is a routine that Keras calls at the end of each epoch. Specifically we look at two callbacks:

    1. ModelCheckpoint: When called, Keras saves the model to the specified filename.
    
    2. EarlyStopping: When called, Keras checks if it should stop the training prematurely.
    

Let's look at the code to see how training is done, and how callbacks are used.

In [6]:
from keras.optimizers import SGD
from keras.callbacks import EarlyStopping, ModelCheckpoint

def train(model, train_x, train_y, epochs, test_x, test_y, model_name):

    model.compile(optimizer=SGD(lr=0.01, momentum=0.7),
                  loss='categorical_crossentropy', metrics=['accuracy'])

    savemodel = ModelCheckpoint(model_name)
    stopmodel = EarlyStopping(min_delta=0.001, patience=10) # Question 10

    print("Starting training.")

    model.fit(x=train_x, y=train_y, batch_size=32,
    validation_data=(test_x, test_y), shuffle=True,
    epochs=epochs,
    callbacks=[savemodel, stopmodel])

    print("Done. Now evaluating.")
    loss, acc = model.evaluate(x=test_x, y=test_y)
    print("Test accuracy: %3.2f, loss: %3.2f"%(acc, loss))

Notice that there isn't very much that is unusual going on; we compile the model with our loss function and optimizer, then call fit, and finally evaluate to look at the final accuracy for the test set.  The only thing unusual is the "callbacks" parameter here in the fit function call

```
    model.fit(x=train_x, y=train_y, batch_size=32,
    validation_data=(test_x, test_y), shuffle=True,
    epochs=epochs,
    callbacks=[savemodel, stopmodel])
```

----

#### Question 10.

What does do the min_delta and patience parameters do in the EarlyStopping callback, as shown below? (2 MARKS)

```
    stopmodel = EarlyStopping(min_delta=0.001, patience=10) # Question 10
```

---

**min_delta**: This parameter represents the minimum change in the monitored metric that qualifies as an improvement. If the absolute change in the metric between the current epoch and the best recorded value is less than min_delta, the current epoch is considered as a non-improvement. The default value is 0, meaning that any improvement, no matter how small, will reset the patience counter.

**patience**: This parameter is an integer that determines the number of consecutive non-improvement epochs allowed before stopping the training process. In other words, if the monitored metric does not improve for the specified number of consecutive epochs (as defined by min_delta), the training will be stopped early. The default value is 0, meaning the training will be stopped as soon as the metric stops improving.


### 4.4 Putting it together.

Now let's run the code and see how it goes (Note: To save time we are training for only 5 epochs; we should train much longer to get much better results):

In [7]:
    (train_x, train_y),(test_x, test_y) = load_mnist()
    model = buildmodel(MODEL_NAME)
    train(model, train_x, train_y, 5, test_x, test_y, MODEL_NAME)


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


  super().__init__(name, **kwargs)


Starting training.
Epoch 1/5



Epoch 2/5



Epoch 3/5



Epoch 4/5



Epoch 5/5



Done. Now evaluating.
Test accuracy: 0.99, loss: 0.03


----

#### Question 11.

Compare the relative advantages and disadvantages of CNN vs. the Dense MLP that you build in sections 3.2 and 3.3. What makes CNNs better (or worse)?

CNN (Convolutional Neural Network)

---Advantages:

**Preservation of spatial information**: CNNs use convolutional layers to scan and learn local patterns in the input images, preserving the spatial relationships between pixels. This ability to capture spatial information is crucial for image classification tasks, as it helps the model recognize patterns and objects in images more effectively.

**Parameter efficiency**: CNNs use shared weights in their convolutional layers, which significantly reduces the number of parameters in the model compared to a dense MLP. This reduction in parameters leads to lower memory requirements and faster training times.

**Translation invariance**: CNNs have built-in translation invariance due to their use of convolutional and pooling layers. This means that the model can recognize the same object or pattern, even if it appears in different parts of the image. This property is particularly important for image classification tasks, where objects can appear in various positions and orientations.

**Handling varying scales**: CNNs can be designed with multiple layers and receptive fields of different sizes, enabling them to recognize objects and patterns at varying scales within the image.

---Disadvantages:

**Complexity**: CNNs are more complex than dense MLPs, both in terms of their architecture and the understanding required to effectively design and tune them.
    
**Computational resources**: CNNs can be computationally expensive, particularly for large models and high-resolution images. This can make training and inference slower and require more powerful hardware.

Dense MLP (Multi-Layer Perceptron)

---Advantages:

**Simplicity**: Dense MLPs are simpler in their architecture compared to CNNs, making them easier to understand and implement. This can be an advantage for less complex tasks or for educational purposes.
    
**General-purpose**: Dense MLPs can be used for a wide range of tasks, including image classification. However, their performance on image classification tasks is generally worse than CNNs due to their inability to capture spatial information.

---Disadvantages:

**Loss of spatial information**: Dense MLPs do not explicitly consider the spatial relationships between pixels in an image, as they treat each input feature independently. This lack of spatial information can lead to poorer performance in image classification tasks.

**Parameter inefficiency**: Dense MLPs have a larger number of parameters compared to CNNs, as each neuron in a layer is connected to every neuron in the previous layer. This can lead to increased memory requirements and longer training times.

In summary, CNNs are generally better suited for image classification tasks because they can capture spatial information, have built-in translation invariance, and are more parameter-efficient than dense MLPs. However, CNNs are more complex and can require more computational resources. On the other hand, dense MLPs are simpler and more general-purpose but are less effective for image classification due to their inability to capture spatial information and their parameter inefficiency.

## 5. Making a CNN for the CIFAR-10 Dataset

Now comes the fun part: Using the example above for creating a CNN for the MNIST dataset, now create a CNN in the box below for the CIFAR-10 dataset. At the end of each epoch save the model to a file called "cifar.hd5" (note: the .hd5 is added automatically for you).

---

#### Question 12.

Summarize your design in the table below (the actual coding cell comes after this):

| Hyperparameter       | What I used | Why?                  |
|:---------------------|:------------|:----------------------|
| Optimizer            |      Adam       |              more effective , robust to hyperpara         |
| Input shape          |      (32,32,3)       |             the RGB image shape          |
| First layer          |     CNN channel num:32 kernel size(3,3)        |   extract feature                   |
| Second layer         |     CNN channel num:32 kernel size(3,3)          |               extract feature               |     
| 3nd layer          |     CNN channel num:64 kernel size(3,3)        |   combine these low-level features into more complex, higher-level features                    |
| 4nd layer         |     CNN channel num:64 kernel size(3,3)          |               combine these low-level features into more complex, higher-level features        |
| 5nd layer          |     CNN channel num:128 kernel size(3,3)        |   combine these low-level features into more complex, higher-level features                   |
| 6nd layer         |     CNN channel num:128 kernel size(3,3)          |              combine these low-level features into more complex, higher-level features        |
| Dense layer          |      128       |                       |
| Dense layer          |      10      |   10 classes to classify                    |




In [9]:
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, BatchNormalization
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

# Load the CIFAR-10 dataset
(train_x, train_y), (test_x, test_y) = cifar10.load_data()

# Normalize the data
train_x = train_x.astype('float32') / 255.0
test_x = test_x.astype('float32') / 255.0

# One-hot encode the labels
train_y = tf.keras.utils.to_categorical(train_y, num_classes=10)
test_y = tf.keras.utils.to_categorical(test_y, num_classes=10)

# Create a CNN model
model = Sequential([
    Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3)),
    BatchNormalization(),
    Conv2D(32, (3, 3), activation='relu', padding='same'),
    BatchNormalization(),
    MaxPooling2D(pool_size=(2, 2)),
    Dropout(0.3),

    Conv2D(64, (3, 3), activation='relu', padding='same'),
    BatchNormalization(),
    Conv2D(64, (3, 3), activation='relu', padding='same'),
    BatchNormalization(),
    MaxPooling2D(pool_size=(2, 2)),
    Dropout(0.5),

    Conv2D(128, (3, 3), activation='relu', padding='same'),
    BatchNormalization(),
    Conv2D(128, (3, 3), activation='relu', padding='same'),
    BatchNormalization(),
    MaxPooling2D(pool_size=(2, 2)),
    Dropout(0.5),

    Flatten(),
    Dense(128, activation='relu'),
    BatchNormalization(),
    Dropout(0.5),
    Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer=Adam(learning_rate=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

# Define early stopping
early_stopping = EarlyStopping(monitor='val_loss', patience=10, verbose=1, restore_best_weights=True)

# Define model checkpoint
model_checkpoint = ModelCheckpoint('cifar.h5', monitor='val_loss', verbose=1, save_best_only=True, mode='min')

# Train the model
history = model.fit(train_x, train_y, batch_size=64, epochs=100, validation_split=0.2, callbacks=[early_stopping, model_checkpoint])

# Evaluate the model
test_loss, test_accuracy = model.evaluate(test_x, test_y, verbose=1)
print("Test Loss: {:.4f}, Test Accuracy: {:.2f}%".format(test_loss, test_accuracy * 100))

Epoch 1/100
Epoch 1: val_loss improved from inf to 1.30976, saving model to cifar.h5
Epoch 2/100
Epoch 2: val_loss improved from 1.30976 to 1.09879, saving model to cifar.h5
Epoch 3/100
Epoch 3: val_loss improved from 1.09879 to 1.06799, saving model to cifar.h5
Epoch 4/100
Epoch 4: val_loss improved from 1.06799 to 0.97033, saving model to cifar.h5
Epoch 5/100
Epoch 5: val_loss improved from 0.97033 to 0.75287, saving model to cifar.h5
Epoch 6/100
Epoch 6: val_loss did not improve from 0.75287
Epoch 7/100
Epoch 7: val_loss improved from 0.75287 to 0.73661, saving model to cifar.h5
Epoch 8/100
Epoch 8: val_loss improved from 0.73661 to 0.73308, saving model to cifar.h5
Epoch 9/100
Epoch 9: val_loss improved from 0.73308 to 0.60953, saving model to cifar.h5
Epoch 10/100
Epoch 10: val_loss did not improve from 0.60953
Epoch 11/100
Epoch 11: val_loss did not improve from 0.60953
Epoch 12/100
Epoch 12: val_loss improved from 0.60953 to 0.60577, saving model to cifar.h5
Epoch 13/100
Epoch 1