# Session 4: Solution

```{contents}

```

## 1. MNIST Dataset

### Prepare the Dataset

In [None]:
import numpy as np
import matplotlib.pyplot as plt
# Function to download dataset
from tensorflow.keras.datasets import mnist

**Download data**

In [None]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()
print('Shape of x_train:',x_train.shape)
print('Shape of y_train:',y_train.shape)
print('-'*10)
print('Shape of x_test:',x_test.shape)
print('Shape of y_test:',y_test.shape)
print('-'*10)
print('Labels:',np.unique(y_train))

Shape of x_train: (60000, 28, 28)
Shape of y_train: (60000,)
----------
Shape of x_test: (10000, 28, 28)
Shape of y_test: (10000,)
----------
Labels: [0 1 2 3 4 5 6 7 8 9]


### Data Preprocessing

Explore the data in `x_train`

In [None]:
print('x_train data type:', x_train.dtype)
print('Min of x_train:', x_train.min())
print('Max of x_train:', x_train.max())

x_train data type: uint8
Min of x_train: 0
Max of x_train: 255


Image data usually has a data type of 'uint8' and the value of each pixel is approximately `[0, 255]`

In addition, image data can also be `float` with the value per pixel being around `[0, 1]`

The most common step of pre-processing image data is to convert the data type to `float` and scale the pixel value from `[0, 255]` to `[0, 1]`

  ```
  # /255.0 stands for converting the data type of images to float and dividing 255
  images = images / 255.0
  ```

The above data preprocessing formula is also known as **Min Max Scaler**
$$
X_\text{scaled} = \frac{X - min(X)}{max(X) - min(X)}
$$

Since the minimum pixel value is 0 and the maximum value is 255, the above formula is reduced to:
$$
X_\text{scaled} = \frac{X}{255.0}
$$


In [None]:
x_train = x_train / 255.0
x_test = x_test / 255.0

In [None]:
# one-hot encoding
from tensorflow.keras.utils import to_categorical

y_train_encode = to_categorical(y_train, num_classes=10)
y_test_encode = to_categorical(y_test, num_classes=10)

### Build and train the model

**Note 1**

- The Deep Fully Connected Neural Network model only accepts input that is a 2-dimensional Tensor dataset, i.e. has **shape=(m, n)**
- Meanwhile, each sample of our data is 1 image with a shape (28, 28), i.e. the data set will be in the form of a 3-dimensional Tensor, **shape=(m,28,28)**
- Therefore, we need **Flatten** dataset.

  ![flatten](https://www.w3resource.com/w3r_images/numpy-manipulation-ndarray-flatten-function-image-1.png)

- It is clear that **Flatten** datasets cause each image to lose its color structure as well as the semantics of the image. Later we will learn another architecture that helps process image-style data better (no need **Flatten**)
- To implement **Flatten**, we use the layer **Flatten** available in Tensorflow
  ```
  from tensorflow.keras.layers import Flatten

  model = Sequential()
  # The first layer in the model always has the parameter input_shape
  model.add(Input(shape=(..))
  model.add(Flatten())
  # ... mlp here
  ```

**Note 2**

**Sparse Categorical Crossentropy vs. Categorical Crossentropy**

- When using loss function `sparse_categorical_crossentropy`, we don't need to perform **One Hot Encoding**, which means label would be in the form `[0,1,1,2, ... ]`.
- When using loss function ``categorical_crossentropy``, we need to perform **One Hot Encoding**.

There are 2 directions for you to approach this problem
1. Flatten data and use MLP network to directly solve this problem
2. Flatten data, use PCA to extract the feature and pass it over the MLP network to solve. Note in this way, you need to use PCA to extract features on the Test set and then put through the model to `predict / evaluate`

Please do both steps above in turn, remember to compare the quality of this new model with the old one in lesson 2.


**1. Flatten data and use MLP network to directly solve this problem**

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Dense, Flatten, Activation
from tensorflow.random import set_seed
from tensorflow.keras.backend import clear_session

clear_session()
set_seed(42)
np.random.seed(42)

# YOUR SOLUTION
model = Sequential()

# input layer
model.add(Input(shape=(x_train.shape[1:])))
model.add(Flatten())

# mlp
model.add(Dense(32, activation='relu', name='layer_1'))
model.add(Dense(64, activation='relu', name='layer_2'))
model.add(Dense(128, activation='relu', name='layer_3'))
model.add(Dense(64, activation='relu', name='layer_4'))
model.add(Dense(32, activation='relu', name='layer_5'))
model.add(Dense(10, activation='softmax', name='output_layer'))
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten (Flatten)           (None, 784)               0         
                                                                 
 layer_1 (Dense)             (None, 32)                25120     
                                                                 
 layer_2 (Dense)             (None, 64)                2112      
                                                                 
 layer_3 (Dense)             (None, 128)               8320      
                                                                 
 layer_4 (Dense)             (None, 64)                8256      
                                                                 
 layer_5 (Dense)             (None, 32)                2080      
                                                                 
 output_layer (Dense)        (None, 10)                3

In [None]:
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics="accuracy")

In [None]:
history = model.fit(x_train, y_train_encode, epochs=20, verbose=1)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [None]:
model.evaluate(x_test, y_test_encode)



[0.12124622613191605, 0.9704999923706055]

2. Flatten data, use PCA to extract the feature and pass it over the MLP network to solve. Note in this way, you need to use PCA to extract features on the Test set and then put it through the model to predict / evaluate

In [None]:
# flatten
x_train_flatten = x_train.reshape(x_train.shape[0], x_train.shape[1] * x_train.shape[2])
x_test_flatten = x_test.reshape(x_test.shape[0], x_test.shape[1] * x_test.shape[2])

# apply pca
from sklearn.decomposition import PCA
pca = PCA(0.99)
x_train_pca = pca.fit_transform(x_train_flatten)
x_test_pca = pca.transform(x_test_flatten)
print(pca.n_components_)

331


In [None]:
clear_session()
set_seed(42)
np.random.seed(42)

model_pca = Sequential()
# input layer
model_pca.add(Input(shape=(x_train_pca.shape[1:])))
# mlp
model_pca.add(Dense(32, activation='relu', name='layer_1'))
model_pca.add(Dense(64, activation='relu', name='layer_2'))
model_pca.add(Dense(128, activation='relu', name='layer_3'))
model_pca.add(Dense(64, activation='relu', name='layer_4'))
model_pca.add(Dense(32, activation='relu', name='layer_5'))
model_pca.add(Dense(10, activation='softmax', name='output_layer'))
model_pca.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 layer_1 (Dense)             (None, 32)                10624     
                                                                 
 layer_2 (Dense)             (None, 64)                2112      
                                                                 
 layer_3 (Dense)             (None, 128)               8320      
                                                                 
 layer_4 (Dense)             (None, 64)                8256      
                                                                 
 layer_5 (Dense)             (None, 32)                2080      
                                                                 
 output_layer (Dense)        (None, 10)                330       
                                                                 
Total params: 31,722
Trainable params: 31,722
Non-traina

In [None]:
model_pca.compile(loss="categorical_crossentropy", optimizer="adam", metrics="accuracy")
history_2 = model_pca.fit(x_train_pca, y_train_encode, epochs=20, verbose=1)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [None]:
model_pca.evaluate(x_test_pca, y_test_encode)



[0.19406358897686005, 0.9595000147819519]

With the mlp model (both pca or not), the accuracy is higher both on the train set and test set and the loss is also lower than the model in lesson 2 in both train set and test set