# Session 2: Assigment

```{contents}

```

## Hand-written digit Recognition with PCA and Softmax

We will practice using Softmax Regression in combination with PCA to classify handwritten digits

### Prepare the dataset

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.datasets import mnist

In [None]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()
print('Shape of x_train:',x_train.shape)
print('Shape of y_train:',y_train.shape)
print('-'*10)
print('Shape of x_test:',x_test.shape)
print('Shape of y_test:',y_test.shape)
print('-'*10)
print('Labels:', np.unique(y_train))

From the results printed above, we understand that the Train data consists of 60,000 images, each of which is a gray image with a size of 28x28 (if the image is RGB $\rightarrow$ shape is `[n_sample, width, height, 3]` in which 3 is the color channel.)

The Test set consists of 10000 images

The dataset consists of 10 labels, numbered from 0 to 9

## Visualize images in the dataset

In [None]:
n_rows = 10
n_cols = 5
fig, axs = plt.subplots(n_rows, n_cols, figsize=(7, 15))
for row in range(n_rows):
  for col in range(n_cols):
    random_index = np.random.choice(np.where(y_train == row)[0])
    axs[row][col].grid('off')
    axs[row][col].axis('off')
    axs[row][col].imshow(x_train[random_index], cmap='gray')

### Perform PCA to reduce the dimension of dataset

Most algorithms of `sklearn` are only applicable on 2-dimensional data `(n_sample, n_feature)`. Since our dataset is in image form, we need to `flatten` the data before doing PCA `(n_sample, 28, 28)` → `(n_sample, 28 * 28)`

In [None]:
x_train_flatten = x_train.reshape(x_train.shape[0], x_train.shape[1] * x_train.shape[2])
x_test_flatten = x_test.reshape(x_test.shape[0], x_test.shape[1] * x_test.shape[2])
print('x_train shape after flatten', x_train_flatten.shape)
print('x_test shape after flatten', x_test_flatten.shape)

#### TODO 1

Apply PCA to the above dataset to extract features so that the amount of information retained is 99%. Print out the number of key components used.

Remember to apply `StandardScaler` to normalize data before performing PCA (in session 1, because the dataset is already `/255`, we skip this step)

In [None]:
# YOUR SOLUTION

#### Optional 1
You can refer to Assignment 1 to visualize the results of applying PCA

In [None]:
# YOUR SOLUTION

## One Hot Encoding

![ohe](https://i.imgur.com/mtimFxh.png)

One Hot Encoding also has the following probabilistic meanings:
- Observing line 1, we see that there is a number 1 in column Red and a number 0 in column Yellow, Green. This means that the probability that the sample has a Red label is 100%, the rest is 0%
- Recall that in the multi-class classification problem, we use the Softmax function to turn regression scores into probabilities. Example:

Red | Yellow | Green
--- | --- | ---
0.8 | 0.1 | 0.1

- Now thanks to One Hot Encoding, we can use the Cross Entropy formula to calculate the error between the predicted machine probability and the actual probability.
$$
\text{Cross Entropy} = -\sum{y \times \text{ln}(\hat{y})}
$$

  - With $y$ is the ground truth and $\hat{y}$ is the probability predicted from model

How to apply one-hot encoding on ``y_train`` và ``y_test``
- Import module
  ```
  from tensorflow.keras.utils import to_categorical
  ```
- Call method
  ```
  y_train_encode = to_categorical(y_train, num_classes=...)
  ```

In [None]:
# YOUR SOLUTION

### Train the model

#### TODO 2

Follow these steps
- Build and train the Softmax Regression model
- Evaluate the performane of model with method `model.evaluate()` on the test set `(x_test_pca, y_test_encode)`

In [None]:
# YOUR SOLUTION

###  Test the predicted results of the model on the Test Set

First, we need to use the model to predict the label for all the images in the Test Set

In [None]:
y_test_pred_prob = model.predict(x_test_pca)

Since `y_test_pred_prob` are probability vectors, we need to use the argmax function to convert them into labels (i.e. numbers 0 through 9)

In [None]:
y_test_pred = np.argmax(y_test_pred_prob, axis=1)
y_test_pred

In [None]:
np.random.seed(0)

fig, axs = plt.subplots(10, 10, figsize=(20,25))
for row in range(10):
  for col in range(10):
    random_index = np.random.choice(np.where(y_test_pred == row)[0])
    axs[row][col].grid('off')
    axs[row][col].axis('off')
    axs[row][col].imshow(x_test[random_index].reshape(28,28), cmap='gray')
    ax_name = 'True: {}\nPredict: {}'.format(y_test[random_index], y_test_pred[random_index])
    axs[row][col].set_title(ax_name)
plt.show()

### Save & Load sklearn model

We use `pickle` library to save models of `sklearn`

In [None]:
import pickle

with open("/content/scaler.pkl", "wb") as f:
  pickle.dump(scaler, f)

with open("/content/pca.pkl", "wb") as f:
  pickle.dump(pca, f)

After running the above code, we see in the Folder of Colab appear 2 files: `scaler.pkl` and `pca.pkl` representing 2 models StandardScaler and PCA.

We will delete 2 variables `scaler` and `pca` and then use `pickle` to reload 2 saved models

In [None]:
del scaler
del pca

print(scaler, pca) # Test whether delete successfully or not

In [None]:
import pickle

with open("/content/scaler.pkl", "rb") as f:
  scaler = pickle.load(f)

with open("/content/pca.pkl", "rb") as f:
  pca = pickle.load(f)

print(scaler)
print(pca)

### Save & Load Tensorflow model

Run the cell below, we will see that in the Folder of Colab there is 1 folder named `mnist_model`. This folder contains all the things related to the model we just trained

In [None]:
model.save('/content/mnist_model')

Try loading the saved model, first we will delete the variable `model` first

In [None]:
del model
print(model) # Test whether delete successfully or not

Load model and perform evaluation

In [None]:
from tensorflow.keras.models import load_model
model = load_model('/content/mnist_model')
model.evaluate(x_test_pca, y_test_encode)