# Model Training and Fine-tuning

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import sys
from glob import glob
import pandas as pd
import numpy as np
import os
# os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
os.environ['CUDA_VISIBLE_DEVICES'] = "0"


REPO_ROOT = "/home/mdorosan/2023/cv-toolkit"
META_PATH = os.path.join(REPO_ROOT, "metadata/kvasir-capsule.csv")
DATA_ROOT = os.path.join(REPO_ROOT, "datasets/kvasir-capsule")

# update with experiment name
EXP_PATH = os.path.join(
    REPO_ROOT,
    "tutorials/tensorflow_notebooks/classification/sample")
os.makedirs(EXP_PATH, exist_ok=True)

sys.path.append(REPO_ROOT)

This notebook demonstrates the training and fine-tuning of a custom image classifier that make use of a pre-trained CNN-base with custom dense layers implemented in `classification._model.py`. Other scripts used are the following:

* `classification._paths.py` to initialize the paths to pre-trained model weights (i.e., imagenet or others)
* `classification._config.py` to set hyperparameter configurations
* `classification._utils.py` for other functions used in the tutorial notebooks


**Note:** An epoch-wise logging and parameter scheduling (e.g., learning rate schedule) that use high level `tensorflow` objects is demonstrated in this tutorial; cases which prefer the use of a step-wise manner can be implemented by customizing the callbacks to use `on_batch_begin` and `on_batch_end` instead of `on_epoch_begin` and `on_epoch_end`.**Note:** An epoch-wise logging and parameter scheduling (e.g., learning rate schedule) that use high level `tensorflow` objects is demonstrated in this tutorial; cases which prefer the use of a step-wise manner can be implemented by customizing the callbacks to use `on_batch_begin` and `on_batch_end` instead of `on_epoch_begin` and `on_epoch_end`.

In [3]:
from cvtoolkit import explore
from cvtoolkit.tensorflow.classification._model import CustomClassifier
import cvtoolkit.tensorflow.classification._config as CONFIG
import cvtoolkit.tensorflow.classification._paths as PATHS
import cvtoolkit.tensorflow.classification._utils as UTILS

from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import (
    LearningRateScheduler, CSVLogger, ModelCheckpoint, ReduceLROnPlateau,
)
from tensorflow.keras import losses, metrics, optimizers

from tensorflow.config import list_physical_devices

# check GPUs
print("Deices Available: ", list_physical_devices())

2023-10-16 05:51:24.785275: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-10-16 05:51:24.839188: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-16 05:51:26.743533: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9803 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:3d:00.0, compute capability: 7.5


Deices Available:  [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


This tutorial demonstrates one of binary, multi-class, or multi-label classification tasks. The following table addresses the common confusion  when assigning a loss function based on the task and how the label array is structured. *For a light reference for on this subject, see this [link](https://wandb.ai/ayush-thakur/dl-question-bank/reports/A-Guide-to-Multi-Label-Classification-on-Keras--VmlldzoyMDgyMDU#:~:text=For%20a%20multi%2Dclass%20classification,the%20multi%2Dlabel%20classification%20setting).*

| Task | Label structure | Last layer (logits) activation | Units in head | Loss function |
|:---|:---|:---:|:---:|:---|
| Binary | (batch_size, 1) | sigmoid | 1 | BinaryCrossentropy, BinaryFocalCrossentropy, etc |
| Multiclass (n classes) | (batch_size, 1) label-encoded array | softmax | n | SparseCategoricalCrossentropy |
|  | (batch_size, n) array | softmax | n | CategoricalCrossentropy, CategoricalFocalCrossentropy, etc |
| Multilabel (n classes) | (batch_size, n) array | sigmoid | n | BinaryCrossentropy |


In [4]:
# tunable configs in the training pipeline
BASE_MODEL = "ResNet50"
DATAGEN_CONFIG = {
    'horizontal_flip': True,
    'vertical_flip': True,
    'brightness_range': (0.4, 1.2),
    'channel_shift_range': 150.0,
}

FLOW_CONFIG = {
    'x_col': "image_path",
    'validate_filenames': False,
    'seed': 42,
    'target_size': (224, 224),
    'color_mode': 'rgb',
    'class_mode': 'binary',  # binary, categorical, sparse
    'interpolation': 'bilinear',
    'batch_size': 128,
}

TRAIN_EPOCHS = 10
TRAIN_OPTIMIZER = optimizers.legacy.Adam(learning_rate=0.001)
FIT_CONFIG = {
    'shuffle': True,
    'verbose': 1,
}

COMPILE_CONFIG = {
    'loss': losses.BinaryCrossentropy(),
    'metrics': list(CONFIG.BINARY_METRICS.values()),
    # 'metrics': list(CONFIG.MULTILABEL_METRICS.values()),
    # 'metrics': list(CONFIG.MULTICLASS_METRICS.values()),
}

BASE_CONFIG = {
    "include_top": False,
    "input_shape": (*FLOW_CONFIG['target_size'], 3),
    "pooling": "max",
}

CALLBACKS = [
    # LearningRateScheduler(lambda epoch: 1e-3 * 0.9 ** epoch),
    CSVLogger(os.path.join(REPO_ROOT, EXP_PATH, 'trainlog.csv'), append=False),
    ReduceLROnPlateau(
        monitor='val_loss',
        factor=0.2,
        patience=5,
        verbose=1,
        mode='min',
        min_delta=0.0001,
        cooldown=0,
        min_lr=1e-8,
    ),
    # add model checkpoint
]

FT_CALLBACKS = [
    LearningRateScheduler(
        lambda epoch: 1e-4 * 0.2 ** epoch - TRAIN_EPOCHS),
    CSVLogger(os.path.join(REPO_ROOT, EXP_PATH, 'log.csv'), append=False),
    # add model checkpoint
]
FT_EPOCHS = 2
FT_OPTIMIZER = optimizers.legacy.Adam(learning_rate=1e-6)

In [5]:
# load from directory
from sklearn.model_selection import GroupShuffleSplit

paths = glob(os.path.join(DATA_ROOT, '*', '*'))

rows = []
for path in paths:
    img_meta = UTILS.path_parser(path)
    rows.append(img_meta)

metadata = pd.DataFrame(rows)
TARGET_KEY = "target"  # used to stratify and get y
GROUP_KEY = "case_id"  # used for grouped splits


# add some filtering here if necessary
use_classes = ["Normal clean mucosa", "Reduced mucosal view"]
metadata = metadata.loc[metadata[TARGET_KEY].isin(use_classes)]


# from notebook 2

X, y = metadata.drop(columns=[TARGET_KEY]), metadata[TARGET_KEY]
groups = metadata[GROUP_KEY]
gss = GroupShuffleSplit(n_splits=1, test_size=0.2, random_state=0)

for i, (train_index, val_index) in enumerate(gss.split(X, y, groups)):
    TRAIN = metadata.iloc[train_index]
    VAL = metadata.iloc[val_index]

CLASSES = y.unique().tolist()
print("Classes for this task are: ", CLASSES)

Classes for this task are:  ['Normal clean mucosa', 'Reduced mucosal view']


In [6]:
# init preprocessing function
preprocessing_function = CONFIG.BASE_PREPROCESSOR[BASE_MODEL]

# initialize data generators
train_datagen = ImageDataGenerator(
    **DATAGEN_CONFIG,
    preprocessing_function=preprocessing_function,
)

CLASS_WEIGHTS = UTILS.get_class_weights(
    class_weight='balanced',
    classes=y.unique(),
    y=y,
)

test_datagen = ImageDataGenerator(
    preprocessing_function=preprocessing_function,
)

train_dataset = train_datagen.flow_from_dataframe(
    dataframe=TRAIN,
    directory=DATA_ROOT,
    y_col=TARGET_KEY,
    classes=CLASSES,
    **FLOW_CONFIG,
)

val_dataset = test_datagen.flow_from_dataframe(
    dataframe=VAL,
    directory=DATA_ROOT,
    y_col=TARGET_KEY,
    classes=CLASSES,
    **FLOW_CONFIG,
)

Found 30318 non-validated image filenames belonging to 2 classes.
Found 6926 non-validated image filenames belonging to 2 classes.


In [7]:
for batch, label in train_dataset:
    print(batch.shape)
    print(label.shape)
    print(label[0])
    break

(128, 224, 224, 3)
(128,)
0.0


In [8]:
model = CustomClassifier(
    base=BASE_MODEL,
    **BASE_CONFIG,
)
# model.build((None, *BASE_CONFIG["input_shape"]))
# model.summary()

## Initial Training

In [None]:
# model.base.trainable = False
model.compile(optimizer=TRAIN_OPTIMIZER, **COMPILE_CONFIG)
model.fit(
    train_dataset,
    validation_data=val_dataset,
    epochs=TRAIN_EPOCHS,
    class_weight=CLASS_WEIGHTS,
    callbacks=CALLBACKS,
    **FIT_CONFIG,
)

2023-10-16 05:51:31.559423: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:432] Loaded cuDNN version 8900


Epoch 1/10
 39/237 [===>..........................] - ETA: 3:37 - loss: 1.8253 - AUC_PR: 0.3372 - AUC_ROC: 0.8594

## Fine-tuning

In [None]:
PCT_FT_LAYERS = 0.10
NUM_TRAINABLE, _ = UTILS.inspect_trainable_layers(
    model.base, return_counts=True)
N = int(np.ceil(NUM_TRAINABLE * PCT_FT_LAYERS))

In [None]:
# set ALL layers to trainable
model.base.trainable = True

# leave last N layers as trainable
for layer in model.base.layers[:-N]:
    if layer.get_weights():
        layer.trainable = False

In [None]:
model.compile(optimizer=FT_OPTIMIZER, **COMPILE_CONFIG)
model.fit(
    train_dataset,
    validation_data=val_dataset,
    epochs=FT_EPOCHS,
    class_weight=CLASS_WEIGHTS,
    callbacks=FT_CALLBACKS,  # updated init LR and LR schedule
    **FIT_CONFIG,
)

## Final note

As it is often the case that classification problems deal with imbalanced data, a loss function that specializes in an imbalanced problem may be necessary--some proposed solutions [1-3] use a differentialble precision, recall, and f-beta loss to directly evaluate performance on minority classes of interest. Others [4] includes the calibration of the decision threshold within the training and backpropagation step. This tutorial shows the simple case of weighting the loss function according to some user-defined `CLASS_WEIGHTS` look-up to address the imbalance.

### References

1.  Fränti P, Mariescu-Istodor R. Soft precision and recall. Pattern Recognition Letters. 2023 Mar 1;167:115–21. 
2.  Yacouby R, Axman D. Probabilistic Extension of Precision, Recall, and F1 Score for More Thorough Evaluation of Classification Models. In: Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems [Internet]. Online: Association for Computational Linguistics; 2020 [cited 2023 Oct 16]. p. 79–91. Available from: https://aclanthology.org/2020.eval4nlp-1.9
3.  Lee N, Yang H, Yoo H. A surrogate loss function for optimization of $F_\beta$ score in binary classification with imbalanced data [Internet]. arXiv; 2021 [cited 2023 Oct 16]. Available from: http://arxiv.org/abs/2104.01459
4.  Cal-Net: Jointly Learning Classification and Calibration On Imbalanced Binary Classification Tasks | IEEE Conference Publication | IEEE Xplore [Internet]. [cited 2023 Oct 16]. Available from: https://ieeexplore-ieee-org.libproxy1.nus.edu.sg/abstract/document/9534411

### Blog references
* Maiza A. Multi-Label Image Classification in TensorFlow 2.0 [Internet]. Medium. 2019 [cited 2023 Oct 16]. Available from: https://towardsdatascience.com/multi-label-image-classification-in-tensorflow-2-0-7d4cf8a4bc72
* Maiza A. The Unknown Benefits of using a Soft-F1 Loss in Classification Systems [Internet]. Medium. 2020 [cited 2023 Oct 16]. Available from: https://towardsdatascience.com/the-unknown-benefits-of-using-a-soft-f1-loss-in-classification-systems-753902c0105d


## End.