# Session 4: Assigment

```{contents}

```

## 1. MNIST Dataset

### Prepare the Dataset

In [None]:
import numpy as np
import matplotlib.pyplot as plt
# Function to download dataset
from tensorflow.keras.datasets import mnist

**Download data**

In [None]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()
print('Shape of x_train:',x_train.shape)
print('Shape of y_train:',y_train.shape)
print('-'*10)
print('Shape of x_test:',x_test.shape)
print('Shape of y_test:',y_test.shape)
print('-'*10)
print('Labels:',np.unique(y_train))

### Data Preprocessing

Explore the data in `x_train`

In [None]:
print('x_train data type:', x_train.dtype)
print('Min of x_train:', x_train.min())
print('Max of x_train:', x_train.max())

Image data usually has a data type of 'uint8' and the value of each pixel is approximately `[0, 255]`

In addition, image data can also be `float` with the value per pixel being around `[0, 1]`

The most common step of pre-processing image data is to convert the data type to `float` and scale the pixel value from `[0, 255]` to `[0, 1]`

  ```
  # /255.0 stands for converting the data type of images to float and dividing 255
  images = images / 255.0
  ```

The above data preprocessing formula is also known as **Min Max Scaler**
$$
X_\text{scaled} = \frac{X - min(X)}{max(X) - min(X)}
$$

Since the minimum pixel value is 0 and the maximum value is 255, the above formula is reduced to:
$$
X_\text{scaled} = \frac{X}{255.0}
$$


In [None]:
x_train = x_train / 255.0
x_test = x_test / 255.0

In [None]:
# one-hot encoding
from tensorflow.keras.utils import to_categorical

y_train_encode = to_categorical(y_train, num_classes=10)
y_test_encode = to_categorical(y_test, num_classes=10)

### Build and train the model

**Note 1**

- The Deep Fully Connected Neural Network model only accepts input that is a 2-dimensional Tensor dataset, i.e. has **shape=(m, n)**
- Meanwhile, each sample of our data is 1 image with a shape (28, 28), i.e. the data set will be in the form of a 3-dimensional Tensor, **shape=(m,28,28)**
- Therefore, we need **Flatten** dataset.

  ![flatten](https://www.w3resource.com/w3r_images/numpy-manipulation-ndarray-flatten-function-image-1.png)

- It is clear that **Flatten** datasets cause each image to lose its color structure as well as the semantics of the image. Later we will learn another architecture that helps process image-style data better (no need **Flatten**)
- To implement **Flatten**, we use the layer **Flatten** available in Tensorflow
  ```
  from tensorflow.keras.layers import Flatten

  model = Sequential()
  # The first layer in the model always has the parameter input_shape
  model.add(Input(shape=(..))
  model.add(Flatten())
  # ... mlp here
  ```

**Note 2**

**Sparse Categorical Crossentropy vs. Categorical Crossentropy**

- When using loss function `sparse_categorical_crossentropy`, we don't need to perform **One Hot Encoding**, which means label would be in the form `[0,1,1,2, ... ]`.
- When using loss function ``categorical_crossentropy``, we need to perform **One Hot Encoding**.

There are 2 directions for you to approach this problem
1. Flatten data and use MLP network to directly solve this problem
2. Flatten data, use PCA to extract the feature and pass it over the MLP network to solve. Note in this way, you need to use PCA to extract features on the Test set and then put through the model to `predict / evaluate`

Please do both steps above in turn, remember to compare the quality of this new model with the old one in lesson 2.


**1. Flatten data and use MLP network to directly solve this problem**

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Dense, Flatten, Activation
from tensorflow.random import set_seed
from tensorflow.keras.backend import clear_session

clear_session()
set_seed(42)
np.random.seed(42)

# YOUR SOLUTION


# input layer


# mlp


In [None]:
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics="accuracy")

In [None]:
history = model.fit(x_train, y_train_encode, epochs=20, verbose=1)

In [None]:
model.evaluate(x_test, y_test_encode)

2. Flatten data, use PCA to extract the feature and pass it over the MLP network to solve. Note in this way, you need to use PCA to extract features on the Test set and then put it through the model to predict / evaluate

In [None]:
# YOUR SOLUTION