**D3APL: Aplicações em Ciência de Dados** <br/>
IFSP Campinas

Prof. Dr. Samuel Martins (Samuka) <br/><br/>

<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.

# Animal Dataset - v3
We will evaluate some **multiclass classification** CNNs to predict the classes of the **Animal Dataset**: https://www.kaggle.com/datasets/alessiocorrado99/animals10


Target goals:
- Allocating GPU memory on demand
- Evaluate VGG16 by transfer learning

## 1. Set up

#### 1.1 TensorFlow

In [None]:
import tensorflow as tf

In [None]:
tf.__version__

**GPU available?**

In [None]:
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

### 1.2 Allocating memory on demand
By default `TensorFlow` allocates _GPU memory_ for the **lifetime of a process**, not the lifetime of the **session object** (so memory can linger much longer than the object). That is why memory is lingering after you stop the program. <br/>
Instead, we can indicate to `TensorFlow` allocates **memory on demand**.

Sources: <br/>
https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth

https://python.tutorialink.com/cuda-error-out-of-memory-python-process-utilizes-all-gpu-memory/ <br/>
https://blog.fearcat.in/a?ID=00950-b4887eea-22e7-4853-b4de-fe746a9e56e6 <br/>
https://stackoverflow.com/a/45553529

In [None]:
gpus = tf.config.list_physical_devices('GPU')

if gpus:
    try:
        # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        # Memory growth must be set before GPUs have been initialized
        print(e)

### 1.2 Fixing the seed for reproducibility (optional)
That's a try for reprodubility in Keras. See more on:
- https://stackoverflow.com/a/59076062
- https://machinelearningmastery.com/reproducible-results-neural-networks-keras/

In [None]:
import os
import tensorflow as tf
import numpy as np
import random

def reset_random_seeds(seed=42):
    os.environ['PYTHONHASHSEED'] = str(seed)
    tf.random.set_seed(seed)
    np.random.seed(seed)
    random.seed(seed)

    
# make some random data
reset_random_seeds()

### 1.3. Dataset
**Animal Dataset**: https://www.kaggle.com/datasets/alessiocorrado99/animals10

In [None]:
import pandas as pd
import numpy as np
import os

In [None]:
# train
dataset_df_train = pd.read_csv('../datasets/animals-dataset/preprocessed/train.csv')

# validation
dataset_df_validation = pd.read_csv('../datasets/animals-dataset/preprocessed/validation.csv')

# test
dataset_df_test = pd.read_csv('../datasets/animals-dataset/preprocessed/test.csv')

In [None]:
dataset_df_train

In [None]:
dataset_df_validation

In [None]:
dataset_df_test

## 2. Building and Training a CNN via Keras

### 2.1 Defining the Network Architecture - VGG16
Original paper: https://arxiv.org/pdf/1409.1556.pdf <br/>
Tutorial: https://medium.com/@mygreatlearning/everything-you-need-to-know-about-vgg16-7315defb5918 <br/>
Tutorial: https://towardsdatascience.com/step-by-step-vgg16-implementation-in-keras-for-beginners-a833c686ae6c


<img src='figs/vgg16.png' />

<img src='figs/vgg16_architecture.jpeg' />


- The 16 in VGG16 refers to 16 layers that have weights. In VGG16 there are thirteen convolutional layers, five Max Pooling layers, and three Dense layers which sum up to 21 layers but it has only sixteen weight layers i.e., learnable parameters layer.
- VGG16 takes input tensor size as 224, 244 with 3 RGB channel
- Most unique thing about VGG16 is that instead of having a large number of hyper-parameters they focused on having convolution layers of 3x3 filter with stride 1 and always used the same padding and maxpool layer of 2x2 filter of stride 2.
- The convolution and max pool layers are consistently arranged throughout the whole architecture
- Conv-1 Layer has 64 number of filters, Conv-2 has 128 filters, Conv-3 has 256 filters, Conv 4 and Conv 5 has 512 filters.
- Three Fully-Connected (FC) layers follow a stack of convolutional layers: the first two have 4096 channels each, the third performs 1000-way ILSVRC classification and thus contains 1000 channels (one for each class). The final layer is the softmax layer.

We will adapt the original VGG16 architecture for **10 classes** instead of 1000.

**PS**: Although the _input image dimensions_ is 224x224x3, we will adapt it to **100x100x3** due to **lack of memory** to store our data.

#### **Transfer Learning**
<img src='./figs/transfer_learning.png' />

https://www.researchgate.net/figure/The-architecture-of-our-transfer-learning-model_fig4_342400905

https://towardsdatascience.com/transfer-learning-with-vgg16-and-keras-50ea161580b4

<br/>

**Getting the VGG16 with trained weights as our base model**

In [None]:
# https://keras.io/api/applications/vgg/
# https://towardsdatascience.com/transfer-learning-with-vgg16-and-keras-50ea161580b4







# freeze the base model weights ==> these weights won't be updated during training
# i.e., the weights of all layers from the base model are not updated






In [None]:
base_model.summary()

Note that **there are no Trainable parameters**.

<br/>

**Plugging a Fully-connected network classifier into the base model**

In [None]:
model.summary()

Although our **model** has _a lot of parameters_, there is a **smaller** number of **trained parameters**.

In [None]:
from tensorflow.keras.optimizers import Adam
opt = Adam(learning_rate=0.001)
model.compile(loss='sparse_categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

### 2.2 Preprocessing

- **Image Resizing**
    + Since the **input layer's shape** and the **images' shape** ***are different***, we need to **resize** the images to the **input layer's shape**.
    + Let's use the function `c2.resize()` for that: https://learnopencv.com/image-resizing-with-opencv/#resize-by-wdith-height
- **Intensity (feature) Scaling**
    + Animals dataset contain 24-bit color images, i.e., it is a color image where each channel is a 8-bit grayscale image (values from 0 to 255)
    + We will simply rescale the values to [0, 1] by dividing them by 255.
- **Label Encoder**
    + Encode the string classes into class integers from 0 to n_classes-1
    + https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html

However, the _preprocessing data_ **may not fit into our memory**!!! <br/>
So, we need to deal with that first!

In [None]:
from sklearn.preprocessing import LabelEncoder

label_encoder = LabelEncoder()
label_encoder.fit(dataset_df_train['class'])

In [None]:
from animals_utils import preprocess_animals_dataset

X_train, y_train = preprocess_animals_dataset(dataset_df_train, label_encoder, new_dims=(100, 100))

In [None]:
X_val, y_val = preprocess_animals_dataset(dataset_df_validation, label_encoder, new_dims=(100, 100), verbose=1000)

In [None]:
X_test, y_test = preprocess_animals_dataset(dataset_df_test, label_encoder, new_dims=(100, 100))

In [None]:
print(f'X_train.shape: {X_train.shape}')
print(f'y_train.shape: {y_train.shape}\n')

print(f'X_val.shape: {X_val.shape}')
print(f'y_val.shape: {y_val.shape}\n')

print(f'X_test.shape: {X_test.shape}')
print(f'y_test.shape: {y_test.shape}')

### 2.3 Training with Early Stopping

In case of GPU drivers, we can monitor its use by [_gpustat_](https://github.com/wookayin/gpustat).

On terminal, use: `gpustat -cpi`


In [None]:
early_stopping_cb = tf.keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)

In [None]:
history = model.fit(X_train, y_train, epochs=100, batch_size=32, validation_data=(X_val, y_val), callbacks=[early_stopping_cb])

#### **Visualizing the training history**

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

history_df = pd.DataFrame(history.history)

In [None]:
history_df[['loss', 'val_loss']].plot(figsize=(8, 5))
plt.grid(True)
# plt.xticks(range(100))
plt.xlabel('Epochs')
plt.ylabel('Score')

history_df[['accuracy', 'val_accuracy']].plot(figsize=(8, 5))
plt.grid(True)
# plt.xticks(range(100))
plt.xlabel('Epochs')
plt.ylabel('Score')

## 3. Evaluating and Predicting New Samples by using our Overfitted Model

#### **Evaluation**
https://www.tensorflow.org/api_docs/python/tf/keras/Sequential#evaluate

In [None]:
model.evaluate(X_test, y_test)

#### **Prediction**
https://www.tensorflow.org/api_docs/python/tf/keras/Sequential#predict

In [None]:
y_test_proba = model.predict(X_test)
y_test_proba

#### **Class Prediction**
https://stackoverflow.com/a/69503180/7069696

In [None]:
y_test_pred = np.argmax(y_test_proba, axis=1)
y_test_pred

In [None]:
from sklearn.metrics import classification_report

class_names = label_encoder.classes_

print(classification_report(y_test, y_test_pred, target_names=[name for name in class_names]))

We got the **best accuracy** so far.

# Exercise

Repeat the experiments considering different networks:
- MobileNetV2: https://www.tensorflow.org/tutorials/images/transfer_learning
- VGG19: https://keras.io/api/applications/vgg/
- DenseNet: https://keras.io/api/applications/densenet/