**D3APL: Aplicações em Ciência de Dados** <br/>
IFSP Campinas

Prof. Dr. Samuel Martins (Samuka) <br/><br/>

<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.

# Training our first CNN for Multiclass Image Classification

## 1. Set up

#### 1.1 TensorFlow + Keras

In [None]:
import tensorflow as tf

In [None]:
tf.__version__

**GPU available?**

In [None]:
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

### 1.2 Fixing the seed for reproducibility (optional)
That's a try for reprodubility in Keras. See more on:
- https://stackoverflow.com/a/59076062
- https://machinelearningmastery.com/reproducible-results-neural-networks-keras/

In [None]:
import os
import tensorflow as tf
import numpy as np
import random

def reset_random_seeds(seed=42):
    os.environ['PYTHONHASHSEED'] = str(seed)
    tf.random.set_seed(seed)
    np.random.seed(seed)
    random.seed(seed)

    
# make some random data
reset_random_seeds()

#### 1.3 Other imports

In [None]:
import numpy as np
import matplotlib.pyplot as plt

#### 1.4 CIFAR-10
https://keras.io/api/datasets/cifar10/ <br/>
https://en.wikipedia.org/wiki/CIFAR-10

In [None]:
from tensorflow.keras.datasets import cifar10

In [None]:
(X_train, y_train), (X_test, y_test) = cifar10.load_data()

The dataset is already split into a training set and a test set.

In [None]:
# download done in folder: "~/.keras/datasets"
import os
os.listdir(os.path.expanduser('~/.keras/datasets'))

In [None]:
print(f'X_train.shape: {X_train.shape}')
print(f'y_train.shape: {y_train.shape}\n')

print(f'X_test.shape: {X_test.shape}')
print(f'y_test.shape: {y_test.shape}')

In [None]:
# reshape labels
y_train = y_train.ravel()
y_test = y_test.ravel()

In [None]:
print(f'y_train.shape: {y_train.shape}')
print(f'y_test.shape: {y_test.shape}')

In [None]:
print(f'Number of Classes: {np.unique(y_train).shape[0]}')
print(f'Classes: {np.unique(y_train)}')

In [None]:
class_names = ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]
class_names

In [None]:
y_train.shape

In [None]:
_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
for ax, image, label in zip(axes, X_train, y_train):
    ax.set_axis_off()
    ax.imshow(image)
    ax.set_title(f'Image: {class_names[label]}')

In [None]:
# 8-bit gray scale
print(f'Min. value of X_train: {X_train.min()}')
print(f'Max. value of X_train: {X_train.max()}\n')

print(f'Min. value of X_test: {X_train.min()}')
print(f'Max. value of X_test: {X_train.max()}')

#### 1.5 (Simple) Feature scaling
Since we are going to train the neural network using _Gradient Descent_, we must scale the **input features**. For simplicity, we’ll scale the pixel intensities down to the _0–1_ range by dividing them by **255.0** (8-bit gray image):

In [None]:
X_train = X_train / 255.0
X_test = X_test / 255.0

In [None]:
# rescaled 8-bit gray scale
print(f'Min. value of X_train: {X_train.min()}')
print(f'Max. value of X_train: {X_train.max()}\n')

print(f'Min. value of X_test: {X_test.min()}')
print(f'Max. value of X_test: {X_test.max()}')

## 2. Building and Training a CNN via Keras

### 2.1 Defining the Network Architecture
Proposed architecture for Multiclass Classification:
- INPUT [32x32x3]
- CONV [32, 4x4x3, 'valid']
- RELU
- MAX_POOL [2x2, stride=(1,1)]
- CONV [32, 4x4x3, 'valid']
- RELU => MAX_POOL [2x2, stride=(1,1)]
- FLATTEN
- FC [256]
- RELU => FC [10, 'softmax']


- optimizer: SGD with `learning_rate=0.01`
- kernel_initializer: "glorot_uniform"
- bias_initializer: "zeros"
- **No regularization**

#### **Defining the Network's Architecture**

In [None]:
print(f'X_train.shape: {X_train.shape}')
print(f'X_test.shape: {X_test.shape}')

In [None]:
model.summary()

### Visualizing the Architecture

#### Native Keras Function: `plot_model`
https://www.tensorflow.org/api_docs/python/tf/keras/utils/plot_model

Requirements:
- `graphviz` software for graph visualization
  + `sudo apt-get install graphviz`
  + `pip install graphviz`
- `pip install pydot`

In [None]:
from tensorflow.keras.utils import plot_model
# vertical
plot_model(model, show_shapes=True, show_layer_activations=True)

#### visualkeras
https://github.com/paulgavrikov/visualkeras <br/>
https://analyticsindiamag.com/how-to-visualize-deep-learning-models-using-visualkeras/#:~:text=Visualkeras%20is%20a%20python%20package,style%20architecture%20of%20neural%20networks.

More indicated for Deep Neural Networks, especially CNNs.

Requirements:
- `pip install visualkeras`

In [None]:
import visualkeras

In [None]:
visualkeras.layered_view(model, legend=True, scale_z=1).show() # display using your system viewer

### 2.2 Compiling: Defining the Loss Function, Optimizer, and Metrics

#### **Loss:**

- `'binary_crossentropy'`: _binary classification_
    + E.g.: One or more binary labels, _sigmoid_ as activation function.
- `'categorical_crossentropy'`: _multiclass classification_, classes as **one-hot vectors**
    + E.g.: [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.] to represent class 3 out of 10 classes
- `'sparse_categorical_crossentropy'`: _multiclass classification_, classes as **sparse labels**:
    + E.g.: 1, 2, 3, ..., 9, 10


In [None]:
# https://keras.io/api/optimizers/sgd/


### 2.3 Training

In case of GPU drivers, we can monitor its use by [_gpustat_](https://github.com/wookayin/gpustat).

On terminal, use: `gpustat -cpi`


#### **Visualizing the training history**

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

history_df = pd.DataFrame(history.history)
history_df

In [None]:
history_df[['loss', 'val_loss']].plot(figsize=(8, 5))
plt.grid(True)
plt.xticks(range(30))
plt.xlabel('Epochs')
plt.ylabel('Score')

history_df[['accuracy', 'val_accuracy']].plot(figsize=(8, 5))
plt.grid(True)
plt.xticks(range(30))
plt.xlabel('Epochs')
plt.ylabel('Score')

These plots give us som insight to the _training process_.  <br/>
Our _training_ and _testing loss/accuracy_ start to **diverge** significantly past ***epoch XXXX***, which implies that our network is modeling the _training data_ **too closely** and ***overfitting***. <br/>
This can be attributed to _a larger learning rate_ and the fact we **aren’t** using methods to help **combat overfitting**.

Some strategies to _remedy_ this issue include:
- changing the _learning rate_
- using some regularization technique:
    - L1
    - L2
    - _Dropout_
    - **Early Stopping**
- obtaining _more data_
- applying _data augmentation_

We'll see more details about these strategies soon!

#### **Effects of Learning Rates**
<img src='./figs/effects_of_learning_rates.png' width=500/>

Source: Rosebrock, Adrian. Deep learning for computer vision with python: starter bundle. PyImageSearch, 2017.

## 3. Evaluating and Predicting New Samples by using our Overfitted Model

#### **Evaluation**
https://www.tensorflow.org/api_docs/python/tf/keras/Sequential#evaluate

#### **Prediction**
https://www.tensorflow.org/api_docs/python/tf/keras/Sequential#predict

#### **Class Prediction**
https://stackoverflow.com/a/69503180/7069696

In [None]:
from sklearn.metrics import classification_report

print(classification_report(y_test, y_test_pred, target_names=[name for name in class_names]))

We got a **poor accuracy** of XXXX.

## 4. Model Regularization by Early Stopping

A very different way to **regularize** _iterative learning algorithms_, such as _Gradient Descent_, is **to stop training** as soon as the _validation error_ reaches a **minimum**. This is called **early stopping**.


<img src='./figs/early_stopping.png' width=500>

Source: https://towardsdatascience.com/a-practical-introduction-to-early-stopping-in-machine-learning-550ac88bc8fd

As the _epochs_ go by the algorithm learns, and its **prediction error** on the _training set_ goes down, along with its **prediction error** on the _validation set_. <br/>
After a while though, the _validation error_ stops **decreasing** and starts to **go back up**. <br/>
This indicates that the model has _started to **overfit the training data**_.

With **early stopping** we just ***stop*** _training_ as soon as the **validation error** reaches the _minimum_ (or after a _number of epochs_ **with no improvement**).

### ***Back to our model***

In [None]:
history_df[['loss', 'val_loss']].plot(figsize=(8, 5))
plt.grid(True)
plt.xlabel('Epochs')
plt.ylabel('Score')

Apparently, our **early stop point** is the ***epoch XXX***. <br/>
So, we should get the _trained model_ (_learned weights and biases_) at **this epoch**. However, our _final trained model_ is the one past 30 epochs.

### **Strategy 1**
One possible strategy to get there is to **retrain** our model ***from scratch***, but now with _just_ ***X + 1 epochs*** (PS: the epochs start at 0). <br/>
We need to _reset_ our _model weights_ and _biases_ before calling another `fit()`, otherwise Keras will keep training from the _learned weights and biases_ from the first `fit()` with 30 epochs.

In [None]:
# creating a new model with new initial weights and biases
model = build_first_cnn()

opt = tf.keras.optimizers.SGD(learning_rate=0.05)  # just for testing
model.compile(loss='sparse_categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

In [None]:
history = model.fit(X_train, y_train, epochs=6, batch_size=32, validation_split=0.2)

In [None]:
history_df = pd.DataFrame(history.history)

history_df[['loss', 'val_loss']].plot(figsize=(8, 5))
plt.grid(True)
plt.xticks(range(6))
plt.xlabel('Epochs')
plt.ylabel('Score')

history_df[['accuracy', 'val_accuracy']].plot(figsize=(8, 5))
plt.grid(True)
plt.xticks(range(6))
plt.xlabel('Epochs')
plt.ylabel('Score')

### **Strategy 2 - Keras Callbacks**
The `fit()` method accepts a **callbacks argument** that lets you specify a list of objects that Keras will _call_ at the _start_ and _end_ of _training_, at the _start_ and _end_ of _each epoch_, and even _before_ and _after processing each batch_.

One of this callbacks is `EarlyStopping`: https://keras.io/api/callbacks/early_stopping/

In [None]:
early_stopping_cb = tf.keras.callbacks.EarlyStopping(patience=3, restore_best_weights=True)

We've just created an **early stopping callback** that will **interrupt** _training_ when it _measures_ **no progress** (no improvement) after 3 epochs (parameter `patience`) on the losso from the **validation set** (default parameter `monitor="val_loss"`).

We can now **retrain** our model from _scratch_ again, with **a large number of epochs**, since _training_ will **stop _automatically_** when _there is no more progress_. `EarlyStopping callback` will keep track of the **best weights** and _restore them_ for you at the _end of training_ (parameter `restore_best_weights=True)`).

In [None]:
# creating a new model with new initial weights and biases
model = build_first_cnn()

opt = tf.keras.optimizers.SGD(learning_rate=0.05)  # just for testing
model.compile(loss='sparse_categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

In [None]:
history = model.fit(X_train, y_train, epochs=30, batch_size=32, validation_split=0.2, callbacks=[early_stopping_cb])

In [None]:
history_df = pd.DataFrame(history.history)

history_df[['loss', 'val_loss']].plot(figsize=(8, 5))
plt.grid(True)
plt.xlabel('Epochs')
plt.ylabel('Score')

history_df[['accuracy', 'val_accuracy']].plot(figsize=(8, 5))
plt.grid(True)
plt.xlabel('Epochs')
plt.ylabel('Score')

##### **More about Keras callbacks**
https://keras.io/api/callbacks/

## 5. Evaluating and Predicting New Samples with the Regularized model

#### **Evaluation**
https://www.tensorflow.org/api_docs/python/tf/keras/Sequential#evaluate

In [None]:
model.evaluate(X_test, y_test)

#### **Prediction**
https://www.tensorflow.org/api_docs/python/tf/keras/Sequential#predict

In [None]:
y_test_proba = model.predict(X_test)
y_test_proba

#### **Class Prediction**
https://stackoverflow.com/a/69503180/7069696

In [None]:
y_test_pred = np.argmax(y_test_proba, axis=1)
y_test_pred

In [None]:
from sklearn.metrics import classification_report

print(classification_report(y_test, y_test_pred, target_names=[name for name in class_names]))

The resulting _accuracy_ on the **testing set** keeps **poor** but slightly better than using the overfitted model.

# Exercise

Repeat the experiments considering different:
-  values for _learning_rate_ of SGD
- optimizers (e.g., 'nadam')
- kernel regularizer (e.g., 'l2')