---
# Technology Selection

What algorithms to use and what technologies are available.

## Nature of the problem

* Multi-label Binary Classification 

It is a binary classification task where multiple althorithms have been developed and applied in the real life e.g. SPAM fileter.

* Naive Bayes - [Naive Bayes and Text Classification](https://arxiv.org/abs/1410.5329)
* CNN - [Convolutional Neural Networks for Sentence Classification](https://arxiv.org/pdf/1408.5882.pdf)
* DNN Language Model - [Attention Is All You Need](https://arxiv.org/abs/1706.03762)

## Asssessment

Skipped due to the time constraint.

## Decision

**Transformer Deep Neural Network Architecture** transfer-learning (fine-tuning) on the pre-trained language model.

1. State of the art algorithms being actively researched.
2. Pre-trained models for text classification e.g text sentiment analysis are available. 
3. Other well-explored althorithms have been well tested as published in Kaggle. 










---
# Implementation



## ML Model for Fine Tuning

### Framework
* Google TensorFlow 2.x 
* Keras for training the model
* Huggingface Transformer library

### Data allocation
Utilize [train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) to :
1. shuffle the train data 
2. allocate the ratio R of the data for validation. R=0.2
3 apply the model training on (1-R) ratio of the data for training

Apply the trained model on testing data for evaluation.

### Hyper parameter search
* Learning rate (5e-5, 5e-4, 5e-3) as the start value

### Epoch
Number of times to go through the entire training data set N. N=10 due to the time constraint.

### Early stopping
Utilize Keras [EarlyStopping](https://keras.io/api/callbacks/early_stopping/) to stop the training when no improvement is achieved N times. N=5.

### Reduce learning rate at no improvement
Utilize Keras [ReduceLROnPlateau](https://keras.io/api/callbacks/reduce_lr_on_plateau/) to reduce the learning rate when no improvement is achieved N times. N=3.


### Keras Callbacks

Utilize [Keras Callbacks API](https://keras.io/api/callbacks/) to apply Eary Stopping, Reduce Learning Rate, and TensorBoard during the model training.



In [43]:
import sys
import numpy as np
import tensorflow as tf
from sklearn.metrics import roc_auc_score


class ROCCallback(tf.keras.callbacks.Callback):
    """Take actions on the model training based on ROC/AUC
    """

    def __init__(
            self,
            validation_data,
            validation_label,
            output_path,
            output_format='h5',
            criterion=1,
            reduce_lr_patience=2,
            reduce_lr_factor=0.2,
            early_stop_patience=sys.maxsize,
            verbose=True
    ):
        """
        Args:
            validation_data: data to generate prediction to calculate ROC/AUC
            validation_label: label to calculate ROC/AUC
            output_path: path to save the model upon improvement
            output_format: model save format 'tf' or 'h5'
            reduce_lr_patience: number of consecutive no-improvement upon which to reduce LR.
            reduce_lr_factor: new learning rate = reduce_lr_factor * old learning rate
            early_stop_patience: total number of no-improvements upon which to stop the training
            verbose: [True|False] to output extra information
        """
        super().__init__()
        assert 0.0 < reduce_lr_factor < 1.0
        assert 0 < reduce_lr_patience
        assert 0 < early_stop_patience

        # --------------------------------------------------------------------------------
        # ROC/AUC calculation data
        # TODO:
        #    When recreating the data e.g. in the on_epoch_end in tf.keras.utils.Sequence
        #    then need to update the x, y accordingly here.
        # --------------------------------------------------------------------------------
        self.x = validation_data
        self.y = validation_label

        # --------------------------------------------------------------------------------
        # Training control parameters
        # --------------------------------------------------------------------------------
        self.criterion = criterion
        self.reduce_lr_patience = reduce_lr_patience
        self.reduce_lr_factor = reduce_lr_factor
        self.early_stop_patience = early_stop_patience
        self.output_path = output_path
        self.output_format = output_format
        self.verbose = verbose

        # --------------------------------------------------------------------------------
        # Statistics
        # --------------------------------------------------------------------------------
        self.min_val_loss = np.inf
        self.max_roc_auc = -1
        self.max_pr_auc = -1
        self.max_f1 = -1
        self.best_epoch = -1
        self.successive_no_improvement = 0
        self.total_no_improvement = 0

    def on_train_begin(self, logs={}):
        """Reset the statistics.
        The class instance can be re-used throughout multiple training runs
        """
        self.successive_no_improvement = 0
        self.total_no_improvement = 0

        # --------------------------------------------------------------------------------
        # DO NOT reset the best metric values that the model have achieved.
        # If restart the training on the same model, improvements needs to be measured
        # with the last best metrics of the model, not the initial values e.g -1 or np.inf.
        #
        # TODO:
        #    Save the best model metrics when saving the model as Keras config file.
        #    Reload the best metrics of the model when loading the model itself.
        #
        #    If the saved best model is re-loaded, the best metric values that the
        #    model achieved need to be re-loaded as well. Otherwise the first epoch
        #    result, even if the metrics are worse than the best metrics achieved,
        #    will become the best results and the best model will be overwritten with
        #    the inferior model.
        # --------------------------------------------------------------------------------
        # self.max_roc_auc = -1
        # self.min_val_loss = np.inf
        # self.best_epoch = -1

    def on_train_end(self, logs={}):
        pass

    def on_epoch_begin(self, epoch, logs={}):
        pass

    def _reduce_learning_rate(self):
        old_lr = tf.keras.backend.get_value(self.model.optimizer.lr)
        new_lr = old_lr * self.reduce_lr_factor
        tf.keras.backend.set_value(self.model.optimizer.lr, new_lr)
        self.successive_no_improvement = 0
        if self.verbose:
            print(f"Reducing learning rate to {new_lr}.")

    def _stop_early(self):
        if self.verbose:
            print(
                "Early stopping: no improvement [%s] times. best epoch [%s] AUC [%5f] val_loss [%5f]" %
                (self.total_no_improvement, self.best_epoch + 1, self.max_roc_auc, self.min_val_loss)
            )
        self.model.stop_training = True
        self.total_no_improvement = 0
        self.successive_no_improvement = 0

    def _handle_improvement(self, epoch, roc_auc, roc_auc_prev, val_loss, val_loss_prev):
        if self.verbose:
            print(
                "Model improved auc [%5f > %5f] val_loss [%5f < %5f]. Saving to %s" %
                (roc_auc, roc_auc_prev, val_loss, val_loss_prev, self.output_path)
            )

        # --------------------------------------------------------------------------------
        # Update statistics
        # --------------------------------------------------------------------------------
        self.best_epoch = epoch
        self.successive_no_improvement = 0

        # --------------------------------------------------------------------------------
        # Save the model upon improvement
        # --------------------------------------------------------------------------------
        self.model.save_weights(
            self.output_path, overwrite=True, save_format=self.output_format
        )

        # --------------------------------------------------------------------------------
        # Stop when no more AUC improvement expected better than 1.0
        # --------------------------------------------------------------------------------
        if roc_auc_prev > (1.0 - 1e-10):
            self._stop_early()
            if self.verbose:
                print("Stopped as no AUC improvement can be made beyond 1.0")

    def _handle_no_improvement(self, epoch, roc_auc, roc_auc_prev, val_loss, val_loss_prev):
        if self.verbose:
            if roc_auc <= roc_auc_prev:
                print(f"AUC [%5f] did not improve from [%5f]." % (roc_auc, roc_auc_prev))
            if val_loss >= val_loss_prev:
                print(f"val_loss [%5f] did not improve from [%5f]." % (val_loss, val_loss_prev))

        # --------------------------------------------------------------------------------
        # Reduce LR
        # --------------------------------------------------------------------------------
        self.successive_no_improvement += 1
        if self.successive_no_improvement >= self.reduce_lr_patience:
            self._reduce_learning_rate()

        # --------------------------------------------------------------------------------
        # Early Stop
        # --------------------------------------------------------------------------------
        self.total_no_improvement += 1
        if self.total_no_improvement >= self.early_stop_patience:
            self._stop_early()

    def _has_improved(self, roc_auc, roc_auc_prev, val_loss, val_loss_prev):
        """Decide if an improvement has been achieved
        Criteria:
            1: Both AUC and val_loss improved
            2: AUC improved
            3: val_loss improved
        """
        if self.criterion == 1:
            return (roc_auc > roc_auc_prev) and (val_loss < val_loss_prev)
        if self.criterion == 2:
            return roc_auc > roc_auc_prev
        if self.criterion == 3:
            return val_loss < val_loss_prev

    def on_epoch_end(self, epoch, logs={}):
        """Verify the performance improvement metrics and make decisions on:
        - Reduce learning rate if count of no consecutive improvement >= reduce_lr_patience
        - Early stopping if total count of no improvement >= early_stop_patience

        TODO:
            If validation data is recreated e.g. at on_epoch_end tf.keras.utils.Sequence,
            NEED to update the self.x, self.y accordingly.
        """
        # [print(f"{k}:{v}") for k, v in logs.items()]
        predictions = self.model.predict(self.x)
        val_loss = logs.get('val_loss')
        roc_auc = sklearn.metrics.roc_auc_score(self.y, predictions)
        f1 = sklearn.metrics.f1_score(y_true=self.y, y_pred=predictions, average='binary')
        precision, recall, thresholds = sklearn.metrics.precision_recall_curve(
            self.y, y_pred 
        )
        pr_auc = auc(recall, precision)

        val_loss_prev = self.min_val_loss
        roc_auc_prev = self.max_roc_auc
        pr_auc_prev = self.max_pr_auc
        f1_prev = self.max_f1
        self.min_val_loss = np.minimum(val_loss, self.min_val_loss)
        self.max_roc_auc = np.maximum(roc_auc, self.max_roc_auc)
        self.max_pr_auc = np.maximum(pr_auc, self.max_pr_auc)
        self.max_f1 = np.maximum(f1, self.max_f1)

        if self._has_improved(roc_auc, roc_auc_prev, val_loss, val_loss_prev):
            self._handle_improvement(epoch, roc_auc, roc_auc_prev, val_loss, val_loss_prev)
        else:
            self._handle_no_improvement(epoch, roc_auc, roc_auc_prev, val_loss, val_loss_prev)

    def on_batch_begin(self, batch, logs={}):
        pass

    def on_batch_end(self, batch, logs={}):
        pass


class SavePretrainedCallback(tf.keras.callbacks.Callback):
    """
    This is only for directly working on the Huggingface models.

    Hugging Face models have a save_pretrained() method that saves both
    the weights and the necessary metadata to allow them to be loaded as
    a pretrained model in future. This is a simple Keras callback that
    saves the model with this method after each epoch.

    """

    def __init__(self, output_dir, monitor_metric='val_loss', monitor_mode='min', verbose=True):
        assert monitor_mode in ['min', 'max']
        super().__init__()
        
        self.output_dir = output_dir
        self.monitor_metric = monitor_metric
        self.monitor_mode = monitor_mode
        self.best_metric_value = np.inf if monitor_mode == 'min' else -1
        self.verbose = verbose

        self.lowest_val_loss = np.inf
        self.best_epoch = -1

    def on_epoch_end(self, epoch, logs={}):
        """
        Save only the best model
        - https://stackoverflow.com/a/68042600/4281353
        - https://www.tensorflow.org/guide/keras/custom_callback

        TODO:
        save_pretrained() method is in the HuggingFace model only.
        Need to implement an logic to update for Keras model saving.
        """
        assert self.monitor_metric in logs, \
            f"monitor metric {self.monitor_metric} not in valid metrics {logs.keys()}"

        metric_value = logs.get(self.monitor_metric)
        previous_best = self.best_metric_value
        if self.monitor_mode == 'min':
            self.best_metric_value = np.minimum(metric_value, self.best_metric_value)

        elif self.monitor_mode == 'max':
            self.best_metric_value = np.maximum(metric_value, self.best_metric_value)
            
        if previous_best != self.best_metric_value:
            if self.verbose:
                print(
                    "Model %s improved from [%.5f] to [%.5f]" % 
                    (self.monitor_metric, previous_best, self.best_metric_value)
                )
                print(f"Saving to {self.output_dir}")
            self.best_epoch = epoch
            self.model.save_pretrained(self.output_dir)


class TensorBoardCallback(tf.keras.callbacks.TensorBoard):
    """TensorBoard visualization of the model training
    See https://keras.io/api/callbacks/tensorboard/
    """

    def __init__(self, output_directory):
        super().__init__(
            log_dir=output_directory,
            write_graph=True,
            write_images=True,
            histogram_freq=1,  # log histogram visualizations every 1 epoch
            embeddings_freq=1,  # log embedding visualizations every 1 epoch
            update_freq="epoch",  # every epoch
        )


class EarlyStoppingCallback(tf.keras.callbacks.EarlyStopping):
    """Stop training when no progress on the metric to monitor
    https://keras.io/api/callbacks/early_stopping/
    https://machinelearningmastery.com/how-to-stop-training-deep-neural-networks-at-the-right-time-using-early-stopping/

    Using val_loss to monitor.
    https://datascience.stackexchange.com/a/49594/68313
    Prefer the loss to the accuracy. Why? The loss quantify how certain
    the model is about a prediction. The accuracy merely account for
    the number of correct predictions. Similarly, any metrics using hard
    predictions rather than probabilities have the same problem.
    """

    def __init__(self, patience=3, monitor='val_loss', mode='auto'):
        assert patience > 0
        super().__init__(
            monitor=monitor,
            mode=mode,
            verbose=1,
            patience=patience,
            restore_best_weights=True
        )

    def on_epoch_end(self, epoch, logs={}):
        assert self.monitor in logs, \
            f"monitor metric {self.monitor} not in valid metrics {logs.keys()}"
        super().on_epoch_end(epoch, logs)


class ModelCheckpointCallback(tf.keras.callbacks.ModelCheckpoint):
    """Check point to save the model
    See https://keras.io/api/callbacks/model_checkpoint/

    NOTE:
        Did not work with the HuggingFace native model with the error.
        NotImplementedError: Saving the model to HDF5 format requires the model
        to be a Functional model or a Sequential model.
        It does not work for subclassed models, because such models are defined
        via the body of a Python method, which isn't safely serializable.

        Did not work with the tf.keras.models.save_model nor model.save()
        as causing out-of-index errors or load_model() failures. Hence use
        save_weights_only=True.
    """

    def __init__(self, path_to_file, monitor='val_loss', mode='auto'):
        """
        Args:
            path_to_file: path to the model file to save at check points
        """
        super().__init__(
            filepath=path_to_file,
            monitor=monitor,
            mode=mode,
            save_best_only=True,
            save_weights_only=True,  # Cannot save entire model.
            save_freq="epoch",
            verbose=1
        )

    def on_epoch_end(self, epoch, logs={}):
        assert self.monitor in logs, \
            f"monitor metric {self.monitor} not in valid metrics {logs.keys()}"
        super().on_epoch_end(epoch, logs)


class ReduceLRCallback(tf.keras.callbacks.ReduceLROnPlateau):
    """Reduce learning rate when a metric has stopped improving.
    See https://keras.io/api/callbacks/reduce_lr_on_plateau/
    """

    def __init__(self, patience=3, monitor='val_loss', mode='auto'):
        assert patience > 0
        super().__init__(
            monitor=monitor,
            mode=mode,
            factor=0.2,
            patience=patience,
            verbose=1
        )

    def on_epoch_end(self, epoch, logs={}):
        assert self.monitor in logs, \
            f"monitor metric {self.monitor} not in valid metrics {logs.keys()}"
        super().on_epoch_end(epoch, logs)


### Fine Tuning Runner

The Runner class implements the fine-tuning based on the [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) pretrained model. Each classification category e.g. ```toxic``` will have a dedicated Runner class instance. The reason for using the ***Distilled*** BERT model is to run the training on the limited resources


In [None]:
from tensorflow.keras.layers import (
    Dense
)


class Runner:
    """Fine tuning implementation class
    TODO:
        Need to refactor as the implementation is messy.
        - Encapsulate common logic and process in the base class.
        - Implement specifics in a subclassfor (custom/Keras or huggingface) 
        - Separate common functions in a library and make them re-usable.
        - Make function state-less. No reference to state/memory.
        - Eliminate magic numbers e.g. 512 for max BERT sequence length.
    
    See:
    - https://www.tensorflow.org/guide/keras/train_and_evaluate
    - https://stackoverflow.com/questions/68172891/
    - https://stackoverflow.com/a/68172992/4281353

    The TF/Keras model has the base model, e.g distilbert for DistiBERT which is
    from the base model TFDistilBertModel.
    https://huggingface.co/transformers/model_doc/distilbert.html#tfdistilbertmodel

    TFDistilBertForSequenceClassification has classification layers added on top
    of TFDistilBertModel, hence not required to add fine-tuning layers by users.
    _________________________________________________________________
    Layer (type)                 Output Shape              Param #   
    =================================================================
    distilbert (TFDistilBertMain multiple                  66362880  
    _________________________________________________________________
    pre_classifier (Dense)       multiple                  590592    
    _________________________________________________________________
    classifier (Dense)           multiple                  1538      
    _________________________________________________________________
    dropout_59 (Dropout)         multiple                  0         
    =================================================================
    """
    
    # ================================================================================
    # Class
    # ================================================================================
    USE_HF_TRAINER = False
    USE_CUSTOM_MODEL = False
    TOKENIZER_LOWER_CASE = True

    # ================================================================================
    # Instance
    # ================================================================================
    # --------------------------------------------------------------------------------
    # Instance properties
    # --------------------------------------------------------------------------------
    @property
    def category(self):
        """Category of the text comment classification, e.g. toxic"""
        return self._category

    @property
    def num_labels(self):
        """Number of labels to classify"""
        assert self._num_labels > 0
        return self._num_labels

    @property
    def tokenizer(self):
        """BERT tokenizer. The Tokenzer must match the pretrained model"""
        return self._tokenizer

    @property
    def max_sequence_length(self):
        """Maximum token length for the BERT tokenizer can accept. Max 512
        """
        assert 128 <= self._max_sequence_length <= 512
        return self._max_sequence_length

    @property
    def X(self):
        """Training TensorFlow DataSet"""
        return self._X

    @property
    def V(self):
        """Validation TensorFlow DataSet"""
        return self._V

    @property
    def model_base_class(self):
        """HuggingFace base class of the pretrained model class"""
        return self._model_base_class

    @property
    def model_class(self):
        """HuggingFace pretrained model class"""
        return self._model_class

    @property
    def model_name(self):
        """HuggingFace pretrained model name"""
        return self._model_name

    @property
    def model_base_name(self):
        """HuggingFace pretrained base model name"""
        return self._model_base_name

    @property
    def model(self):
        """TensorFlow/Keras Model instance"""
        return self._model

    @property
    def freeze_pretrained_base_model(self):
        """Boolean to freeze the base model"""
        return self._freeze_pretrained_base_model

    @property
    def batch_size(self):
        """Mini batch size during the training"""
        assert self._batch_size > 0
        return self._batch_size

    @property
    def learning_rate(self):
        """Training learning rate"""
        return self._learning_rate

    @property
    def l2(self):
        """Regularizer decay rate"""
        return self._l2

    @property
    def reduce_lr_patience(self):
        """Training patience for reducing learinig rate"""
        return self._reduce_lr_patience

    @property
    def reduce_lr_factor(self):
        """Factor to reduce the learinig rate"""
        return self._reduce_lr_factor

    @property
    def early_stop_patience(self):
        """Training patience for early stopping"""
        return self._early_stop_patience

    @property
    def num_epochs(self):
        """Number of maximum epochs to run for the training"""
        return self._num_epochs

    @property
    def output_directory(self):
        """Parent directory to manage training artefacts"""
        return self._output_directory

    @property
    def output_format(self):
        """Model output format 'h5' or 'tf'"""
        return self._output_format
    
    @property
    def output_path(self):
        """file path to save the model"""
        return self.model_directory + os.path.sep + 'model.h5'

    @property
    def model_directory(self):
        """Directory to save the trained models"""
        return self._model_directory

    @property
    def log_directory(self):
        """Directory to save logs, e.g. TensorBoard logs"""
        return self._log_directory

    @property
    def model_metric_names(self):
        """Model mtrics
        The attribute model.metrics_names gives labels for the scalar metrics
        to be returned from model.evaluate().
        """
        return self.model.metrics_names

    @property
    def history(self):
        """The history object returned from model.fit(). 
        The object holds a record of the loss and metric during training
        """
        assert self._history is not None
        return self._history

    @property
    def trainer(self):
        """HuggingFace trainer instance
        HuggingFace offers an optimized Trainer because PyTorch does not have
        the training loop as Keras/Model has. It is available for TensorFlow
        as well, hence to be able to hold the instance in case using it.
        """
        return self._trainer

    # --------------------------------------------------------------------------------
    # Instance initialization
    # --------------------------------------------------------------------------------
    def _build_output_directories(self, output_directory):
        # Parent directory
        Path(self.output_directory).mkdir(parents=True, exist_ok=True)
        
        # Model directory
        self._model_directory = "{parent}/model_C{category}_B{size}_L{length}".format(
            parent=self.output_directory,
            category=self.category,
            size=self.batch_size,
            length=self.max_sequence_length
        )
        Path(self.model_directory).mkdir(parents=True, exist_ok=True)

        # Log directory
        self._log_directory = "{parent}/log_C{category}_B{size}_L{length}".format(
            parent=self.output_directory,
            category=self.category,
            size=self.batch_size,
            length=self.max_sequence_length
        )
        Path(self.log_directory).mkdir(parents=True, exist_ok=True)
        
    def _build_huggingface_model_callbacks(self):
        return [
            SavePretrainedCallback(
                monitor_metric=self._monitor_metric,
                monitor_mode=self._monitor_mode,
                output_dir=self.model_directory, 
                verbose=True
            ),
            ReduceLRCallback(patience=self.reduce_lr_patience),
            EarlyStoppingCallback(patience=self.early_stop_patience),
            # TensorBoardCallback(self.log_directory),
        ]
        
    def _build_huggingface_model_auc_callbacks(self, validation_data, validation_label):
        raise NotImplementedError()
        
    def _build_huggingface_model(self):
        """Build model based on TFDistilBertForSequenceClassification which has
        classification heads added on top of the base BERT model.
        """
        # --------------------------------------------------------------------------------
        # Base model
        # --------------------------------------------------------------------------------
        config_file = self.model_directory + os.path.sep + "config.json"
        if os.path.isfile(config_file) and os.access(config_file, os.R_OK):
            # Load the saved model
            print(f"\nloading the saved huggingface model from {self.model_directory}...\n")
            self._pretrained_model = self.model_class.from_pretrained(
                self.model_directory,
                num_labels=self.num_labels
            )
        else:
            # Download the model from Huggingface
            self._pretrained_model = self.model_class.from_pretrained(
                self.model_name,
                num_labels=self.num_labels,            
            )

        # Freeze base model if required
        if self.freeze_pretrained_base_model:
            for _layer in self._pretrained_model.layers:
                if _layer.name == self.model_base_name:
                    _layer.trainable = False

        self._model = self._pretrained_model

        # --------------------------------------------------------------------------------
        # Loss layer
        # --------------------------------------------------------------------------------
        if self.num_labels == 1:    # Binary classification
            loss_fn = tf.keras.losses.BinaryCrossentropy(from_logits=True)
        else:                       # Categorical classification
            loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
            
        # --------------------------------------------------------------------------------
        # Build the model
        #     from_logits in SparseCategoricalCrossentropy(from_logits=[True|False])
        #     True  when the input is logits not  normalized by softmax.
        #     False when the input is probability normalized by softmax
        # --------------------------------------------------------------------------------
        optimizer = tf.keras.optimizers.Adam(learning_rate=self.learning_rate)
        self.model.compile(
            optimizer=optimizer, 
            # loss=self.model.compute_loss,
            loss=loss_fn,
            metrics=self._metrics 
        )

    def _build_custom_model_acc_callbacks(self):
        """Callbacks for accuracy"""
        return [
            EarlyStoppingCallback(
                patience=self.early_stop_patience, 
                monitor=self._monitor_metric, 
                mode=self._monitor_mode
            ),
            ReduceLRCallback(
                patience=self.reduce_lr_patience, 
                monitor=self._monitor_metric, 
                mode=self._monitor_mode
            ),
            ModelCheckpointCallback(
                self.output_path, 
                monitor=self._monitor_metric, 
                mode=self._monitor_mode
            ),
            # TensorBoardCallback(self.log_directory),
        ]

    def _build_custom_model_auc_callbacks(self, validation_data, validation_label):
        """Callbacks for ROC AUC"""
        return [
            ROCCallback(
                validation_data=dict(self.tokenize(validation_data)), 
                validation_label=validation_label,
                output_path=self.output_path,
                reduce_lr_patience = self.reduce_lr_patience,
                reduce_lr_factor = self.reduce_lr_factor,
                early_stop_patience=self.early_stop_patience, 
                verbose=True
            ),
            # TensorBoardCallback(self.log_directory),
        ]
        
    def _build_custom_model(self, validation_data, validation_label):
        # --------------------------------------------------------------------------------
        # Input layer (token indices and attention masks)
        # --------------------------------------------------------------------------------
        input_ids = tf.keras.layers.Input(shape=(self.max_sequence_length,), dtype=tf.int32, name='input_ids')
        attention_mask = tf.keras.layers.Input((self.max_sequence_length,), dtype=tf.int32, name='attention_mask')

        # --------------------------------------------------------------------------------
        # Base layer
        # --------------------------------------------------------------------------------
        # TFBaseModelOutput.last_hidden_state has shape (batch_size, max_sequence_length, 768)
        # Each sequence has [CLS]...[SEP] structure of shape (max_sequence_length, 768)
        # Extract [CLS] embeddings of shape (batch_size, 768) as last_hidden_state[:, 0, :]
        # --------------------------------------------------------------------------------
        base = self.model_base_class.from_pretrained(
            self.model_name,
        )
        # Freeze the base model weights.
        if self.freeze_pretrained_base_model:
            for layer in base.layers:
                layer.trainable = False

        base.summary()
        output = base([input_ids, attention_mask]).last_hidden_state[:, 0, :]

        if USE_CLASSIFICATION_LAYER:
            # -------------------------------------------------------------------------------
            # Classifiation leayer 01
            # --------------------------------------------------------------------------------
            output = tf.keras.layers.Dropout(
                rate=0.20,
                name="01_dropout",
            )(output)

            output = tf.keras.layers.Dense(
                units=NUM_BASE_MODEL_OUTPUT,
                kernel_initializer='glorot_uniform',
                activation=None,
                name="01_dense_relu_no_regularizer",
            )(output)
            output = tf.keras.layers.BatchNormalization(
                name="01_bn"
            )(output)
            output = tf.keras.layers.Activation(
                "relu",
                name="01_relu"
            )(output)

            # --------------------------------------------------------------------------------
            # Classifiation leayer 02
            # --------------------------------------------------------------------------------
            output = tf.keras.layers.Dense(
                units=NUM_BASE_MODEL_OUTPUT,
                kernel_initializer='glorot_uniform',
                activation=None,
                name="02_dense_relu_no_regularizer",
            )(output)
            output = tf.keras.layers.BatchNormalization(
                name="02_bn"
            )(output)
            output = tf.keras.layers.Activation(
                "relu",
                name="02_relu"
            )(output)
        
        # --------------------------------------------------------------------------------
        # TODO:
        #    Need to verify the effect of regularizers. 
        #
        #    [bias regularizer]
        #    It looks bias_regularizer adjusts the ROC threshold towards 0.5. 
        #    Without it, the threshold of the ROC with BinaryCrossEntropy loss was approx 0.02.
        #    With    it, the threshold of the ROC with BinaryCrossEntropy loss was approx 0.6.
        # --------------------------------------------------------------------------------
        activation = "sigmoid" if self.num_labels == 1 else "softmax"
        output = tf.keras.layers.Dense(
            units=self.num_labels,
            kernel_initializer='glorot_uniform',
            # https://huggingface.co/transformers/v4.3.3/main_classes/optimizer_schedules.html#adamweightdecay-tensorflow
            # kernel_regularizer=tf.keras.regularizers.l2(l2=self.l2),
            # bias_regularizer=tf.keras.regularizers.l2(l2=self.l2),
            # activity_regularizer=tf.keras.regularizers.l2(l2=self.l2/10.0),
            activation=activation,
            name=activation
        )(output)        

        # --------------------------------------------------------------------------------
        # Loss layer
        # --------------------------------------------------------------------------------
        if self.num_labels == 1:    # Binary classification
            loss_fn = tf.keras.losses.BinaryCrossentropy(from_logits=False)
        else:                       # Categorical classification
            loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)

        # --------------------------------------------------------------------------------
        # Build model
        # --------------------------------------------------------------------------------
        # TODO: Replace TIMESTAMP with instance variable
        name = f"{TIMESTAMP}_{self.model_name.upper()}"
        self._model = tf.keras.models.Model(inputs=[input_ids, attention_mask], outputs=output, name=name)
        self.model.compile(
            # https://huggingface.co/transformers/v4.3.3/main_classes/optimizer_schedules.html#adamweightdecay-tensorflow
            # optimizer=tf.keras.optimizers.Adam(learning_rate=self.learning_rate),
            optimizer=transformers.AdamWeightDecay(learning_rate=self.learning_rate),
            loss=loss_fn,
            metrics=self._metrics
        )
        
        # --------------------------------------------------------------------------------
        # Load model parameters if the saved weight file exits
        # --------------------------------------------------------------------------------
        path_to_h5 = self.model_directory + os.path.sep + "model.h5"
        if os.path.isfile(path_to_h5) and os.access(path_to_h5, os.R_OK):
            print(f"\nloading the saved model parameters from {path_to_h5}...\n")
            self.model.load_weights(path_to_h5)

    def _build_huggingface_model_monitor_metrics(self, validation_data, validation_label):
        """
        Callback Monitor Configurations: (metric_name, monitor_metric, monitor_mode)
        """
        # --------------------------------------------------------------------------------
        # Model Metrics
        # --------------------------------------------------------------------------------
        self._metrics=[
            "accuracy",
        ]
        if self._metric_name in  ["auc", "recall", "precision"]:
            assert self.num_labels == 1, "AUC/Recall/Precision apparrently works only with binary"
            self._monitor_metric = f"val_{self._metric_name}"
            self._monitor_mode = 'max'
            self._metrics=[
                "accuracy",
                tf.keras.metrics.AUC(
                    name="auc",
                    curve="PR",   # 'ROC' or 'PR'
                    multi_label=True if self.num_labels > 1 else False,
                    num_labels=self.num_labels,
                    from_logits=False
                ), 
                tf.keras.metrics.Recall(
                    name="recall",
                    class_id=1 if self.num_labels > 1 else None
                ), 
                tf.keras.metrics.Precision(
                    name="precision",
                    class_id=1 if self.num_labels > 1 else None
                )
            ]
            self._callbacks = self._build_huggingface_model_auc_callbacks(
                validation_data, 
                validation_label
            )
            
        elif self._metric_name == "loss":
            self._monitor_metric = f"val_{self._metric_name}"
            self._monitor_mode='min'
            self._callbacks = self._build_huggingface_model_callbacks()
            
        elif self._metric_name == "accuracy":
            self._monitor_metric = f"val_{self._metric_name}"
            self._monitor_mode='max'
            self._callbacks = self._build_huggingface_model_callbacks()
            
        elif self._metric_name == "sca":   # Sparce Categorical Accuracy
            self._monitor_metric = f"val_{self._metric_name}"
            self._monitor_mode='max'
            self._metrics=[
                "accuracy",
                tf.keras.metrics.SparseCategoricalAccuracy(
                    name='sca'
                )
            ]
            self._callbacks = self._build_huggingface_model_callbacks()
            self._callbacks = self._build_huggingface_model_callbacks()

        else:
            raise RuntimeError(f"Unknown monitor metric: {self._metric_name}")
            
    def _build_custom_model_monitor_metrics(self, validation_data, validation_label):
        """
        Callback Monitor Configurations: (metric_name, monitor_metric, monitor_mode)
        """
        # --------------------------------------------------------------------------------
        # Model Metrics
        # --------------------------------------------------------------------------------
        self._metrics=[
            "accuracy",
        ]
        if self._metric_name in  ["auc", "recall", "precision"]:
            assert self.num_labels == 1, "AUC/Recall/Precision apparrently works only with binary"
            self._monitor_metric = f"val_{self._metric_name}"
            self._monitor_mode = 'max'
            self._metrics=[
                "accuracy",
                tf.keras.metrics.AUC(
                    name="auc",
                    multi_label=True if self.num_labels > 1 else False,
                    num_labels=self.num_labels,
                    from_logits=False
                ), 
                tf.keras.metrics.Recall(
                    name="recall",
                    class_id=1 if self.num_labels > 1 else None
                ), 
                tf.keras.metrics.Precision(
                    name="precision",
                    class_id=1 if self.num_labels > 1 else None
                )
            ]
            self._callbacks = self._build_custom_model_auc_callbacks(
                validation_data, 
                validation_label
            )
            
        elif self._metric_name == "loss":
            self._monitor_metric = f"val_{self._metric_name}"
            self._monitor_mode='min'
            self._callbacks = self._build_custom_model_acc_callbacks()
            
        elif self._metric_name == "accuracy":
            self._monitor_metric = f"val_{self._metric_name}"
            self._monitor_mode='max'
            self._callbacks = self._build_custom_model_acc_callbacks()
            
        elif self._metric_name == "sca":   # Sparce Categorical Accuracy
            self._monitor_metric = f"val_{self._metric_name}"
            self._monitor_mode='max'
            self._metrics=[
                "accuracy",
                tf.keras.metrics.SparseCategoricalAccuracy(
                    name='sca'
                )
            ]
            self._callbacks = self._build_custom_model_acc_callbacks()

        else:
            raise RuntimeError(f"Unknown monitor metric: {self._metric_name}")

    def _build_model(self, validation_data, validation_label):
        if self.USE_CUSTOM_MODEL:
            self._build_custom_model_monitor_metrics(validation_data, validation_label)

            self._build_custom_model(validation_data, validation_label)
            self._validate_model = self._validate_custom_model
            self._train_fn = self._train_custom_model
            self._save_fn = self._save_custom_model
            self._load_fn = self._load_custom_model
            
            # --------------------------------------------------------------------------------
            # Prediction function to return logits.
            # sigmoid output when num_labels == 1
            # Probability for label==1 from softmax when num_labels > 1
            # --------------------------------------------------------------------------------
            if self.num_labels == 1:
                self._predict_fn = self._predict_custom_model_binary
            elif self.num_labels > 1:
                self._predict_fn = self._predict_custom_model_categorical
            else:
                assert False, "Invalid num_labels"
            
        else:
            self._build_huggingface_model_monitor_metrics(validation_data, validation_label)

            self._build_huggingface_model()
            self._validate_model = self._validate_huggingface_model
            self._train_fn = self._train_huggingface_model
            self._save_fn = self._save_huggingface_model
            self._load_fn = self._load_huggingface_model
            
            if self.num_labels == 1:
                self._predict_fn = self._predict_huggingface_model_binary
            elif self.num_labels > 1:
                self._predict_fn = self._predict_huggingface_model_categorical
            else:
                assert False, "Invalid num_labels"

    def _validate_huggingface_model(self):
        """Validate the huggingface model
        """
        # The number of classes in the output must match the num_labels
        test_sentences = [
            "i am a cat who has no name.",
            "to be or not to be."
        ]
        test_tokens = self.tokenize(test_sentences, padding='max_length')
        TEST_BATCH_SIZE = len(test_tokens)
        
        # Huggingface model output is based on TFSequenceClassifierOutput
        # which has 'loss' and 'logits' keys.
        test_model_output = self.model(test_tokens)['logits']
        assert test_model_output.shape == (TEST_BATCH_SIZE, self.num_labels), \
            "test_model_output type[%s] data [%s]" % \
            (type(test_model_output), test_model_output)

        # predict returns probabilities for the target class/label only 
        # in a np array of shape (batch_size, 1). The probability value
        # is between 0 and 1.
        test_predictions = self.predict(test_sentences)
        assert test_predictions.shape == (TEST_BATCH_SIZE, 1), \
            "test_predictions shape[%s] data [%s]" % \
            (test_predictions.shape, test_predictions)
        assert np.all(0 < test_predictions) and np.all(test_predictions < 1)
    
    def _validate_custom_model(self):
        """Validate the custom model
        """
        test_sentences = [
            "i am a cat who has no name.",
            "to be or not to be."
        ]
        test_tokens = self.tokenize(test_sentences, padding='max_length')
        TEST_BATCH_SIZE = len(test_tokens)

        # Model generates predictions for all the classes/labels.
        # including the binary classification where num_labels == 1
        test_model_output = self.model(test_tokens)
        assert test_model_output.shape == (TEST_BATCH_SIZE, self.num_labels), \
            "test_model_output type[%s] data [%s]" % \
            (type(test_model_output), test_model_output)

        # predict returns probabilities for the target class/label only 
        # in a np array of shape (batch_size, 1). The probability value
        # is between 0 and 1.
        test_predictions = self.predict(test_sentences)
        assert test_predictions.shape == (TEST_BATCH_SIZE, 1), \
            "test_predictions shape[%s] data [%s]" % \
            (test_predictions.shape, test_predictions)
        assert np.all(0 < test_predictions) and np.all(test_predictions < 1)

    def _build_dataset(self, training_data, training_label, validation_data, validation_label):
        # TODO: 
        #    Do not generate data here but provide a utility to generate X, V
        if self.num_labels == 1:
            assert np.all(np.isin(training_label, [0,1]))
            assert np.all(np.isin(validation_label, [0,1]))
        else:
            assert np.all(np.isin(training_label, np.arange(self.num_labels)))
            assert np.all(np.isin(validation_label, np.arange(self.num_labels)))

        self._X = tf.data.Dataset.from_tensor_slices((
            dict(self.tokenize(training_data)),
            training_label
        ))
        self._V = tf.data.Dataset.from_tensor_slices((
            dict(self.tokenize(validation_data)),
            validation_label
        ))
    
    def __init__(
            self,
            category,
            training_data,
            training_label,
            validation_data,
            validation_label,
            model_base_class=TFDistilBertModel,
            model_class=TFDistilBertForSequenceClassification,
            tokenizer_class=DistilBertTokenizerFast,
            tokenizer_lower=True,
            model_name='distilbert-base-uncased',
            model_base_name='distilbert',
            num_labels=2,
            max_sequence_length=256,
            freeze_pretrained_base_model=False,
            batch_size=32,
            learning_rate=2e-5,
            l2=1e-4,
            metric_name="accuracy",
            early_stop_patience=5,
            reduce_lr_patience=1,
            reduce_lr_factor=0.2,
            num_epochs=20,
            output_directory="./output",
            output_format='h5',
    ):
        """
        NOTE:
            https://arxiv.org/abs/2006.04884 indicated that over-fitting is not an issue and 
            recommends longer iterations with small learning rate and early stop on val_accuracy
            - learning_rate=2e-5
            - monitor_metric='val_accuracy'
            - monitor_mode='max'

        Args:
            category: 
            training_data: 
            training_label:
            validation_data:
            validation_label:
            model_name: Huggingface pre-trained model class
            model_base_class: Base class of the Pre-trained model
            model_name: Huggingface model name
            model_base_name: Pre-trained base model name
            num_labels: Number of labels
            max_sequence_length=256: maximum tokens for tokenizer
            freeze_pretrained_base_model: flag to freeze pretrained model base layer
            batch_size:
            learning_rate:
            l2: L2 regularizer decay rate
            metric_name: metric for the model
            early_stop_patience:
            reduce_lr_patience:
            reduce_lr_factor:
            num_epochs:
            output_directory: Directory to save the outputs
            output_format: Model save format ['h5' | 'tf']
        """
        self._category = category
        self._trainer = None

        # --------------------------------------------------------------------------------
        # Model to use
        # --------------------------------------------------------------------------------
        self._model_name = model_name
        self._model_base_name = model_base_name
        self._model_class = model_class
        self._model_base_class = model_base_class
        self._tokenizer = tokenizer_class.from_pretrained(
            model_name, 
            do_lower_case=self.TOKENIZER_LOWER_CASE
        )
        
        # --------------------------------------------------------------------------------
        # Model training configurations
        # --------------------------------------------------------------------------------
        assert 128 <= max_sequence_length <= 512, "Current max sequenth length for BERT is 512"
        self._max_sequence_length = max_sequence_length

        assert num_labels > 0
        self._num_labels = num_labels

        assert isinstance(freeze_pretrained_base_model, bool)
        self._freeze_pretrained_base_model = freeze_pretrained_base_model

        assert (0.0 < learning_rate) and (0 <= l2 < 1.0)
        self._learning_rate = learning_rate
        self._l2 = l2
        self._model = None

        assert num_epochs > 0
        self._num_epochs = num_epochs

        assert batch_size > 0
        self._batch_size = batch_size

        assert early_stop_patience > 0
        self._metric_name = metric_name
        self._early_stop_patience = early_stop_patience
        self._reduce_lr_patience = reduce_lr_patience
        self._reduce_lr_factor = reduce_lr_factor

        # model.fit() result holder
        self._history = None  

        # --------------------------------------------------------------------------------
        # Model output
        # --------------------------------------------------------------------------------
        self._output_directory = output_directory
        self._output_format = output_format
        self._build_output_directories(output_directory)
        
        # --------------------------------------------------------------------------------
        # Data
        # --------------------------------------------------------------------------------
        self._build_dataset(training_data, training_label, validation_data, validation_label)
        del training_data, training_label
        
        # --------------------------------------------------------------------------------
        # Model
        # --------------------------------------------------------------------------------
        self._build_model(validation_data, validation_label)
        del validation_data, validation_label
        
        # --------------------------------------------------------------------------------
        # Validations
        # --------------------------------------------------------------------------------
        self._validate_model()
        self.model.summary()
            
    # --------------------------------------------------------------------------------
    # Instance methods
    # --------------------------------------------------------------------------------
    def tokenize(self, sentences, truncation=True, padding='longest'):
        """Tokenize using the Huggingface tokenizer
        Args: 
            sentences: String or list of string to tokenize
            padding: Padding method ['do_not_pad'|'longest'|'max_length']
        """
        return self.tokenizer(
            sentences,
            truncation=truncation,
            padding=padding,
            max_length=self.max_sequence_length,
            return_tensors="tf"
        )

    def decode(self, tokens):
        sentences = []
        if isinstance(tokens, list) or tf.is_tensor(tokens):
            for sequence in tokens:
                sentences.append(self.tokenizer.decode(sequence))
        elif 'input_ids' in tokens:
            for sequence in tokens['input_ids']:
                sentences.append(self.tokenizer.decode(sequence))
        return sentences

    def _hf_train(self):
        """Train the model using HuggingFace Trainer"""
        self._training_args = TFTrainingArguments(
            output_dir='./results',             # output directory
            num_train_epochs=3,                 # total number of training epochs
            per_device_train_batch_size=self.batch_size,     # batch size per device during training
            per_device_eval_batch_size=self.batch_size,      # batch size for evaluation
            warmup_steps=500,                   # number of warmup steps for learning rate scheduler
            weight_decay=0.01,                  # strength of weight decay
            logging_dir='./logs',               # directory for storing logs
            logging_steps=10,
        )

        # with self._training_args.strategy.scope():
        #     self._model = TFDistilBertForSequenceClassification.from_pretrained(self.model_name)

        self._trainer = TFTrainer(
            model=self.model,
            args=self._training_args,   # training arguments
            train_dataset=self.X,       # training dataset
            eval_dataset=self.V         # evaluation dataset
        )
        self.trainer.train()

    def _train_huggingface_model(self):
        """Train the model using Keras
        """
        # --------------------------------------------------------------------------------
        # Train the model
        # --------------------------------------------------------------------------------
        self._history = self.model.fit(
            self.X.shuffle(1000).batch(self.batch_size).prefetch(tf.data.experimental.AUTOTUNE),
            epochs=self.num_epochs,
            batch_size=self.batch_size,
            validation_data=self.V.shuffle(1000).batch(self.batch_size).prefetch(tf.data.experimental.AUTOTUNE),
            callbacks=self._callbacks
        )

    def _train_custom_model(self):
        """Train the model using Keras
        """
        # --------------------------------------------------------------------------------
        # Train the model
        # --------------------------------------------------------------------------------
        self._history = self.model.fit(
            self.X.shuffle(1000).batch(self.batch_size).prefetch(tf.data.experimental.AUTOTUNE),
            epochs=self.num_epochs,
            batch_size=self.batch_size,
            validation_data=self.V.shuffle(1000).batch(self.batch_size).prefetch(tf.data.experimental.AUTOTUNE),
            callbacks=self._callbacks
        )        
        
    def train(self):
        """Run the model trainig"""
        self._train_fn()

    def evaluate(self, data, label):
        """Evaluate the model on the given data and label.
        https://www.tensorflow.org/api_docs/python/tf/keras/Model#evaluate
        The attribute model.metrics_names gives labels for the scalar metrics
        to be returned from model.evaluate().

        Args:
            data: data to run the prediction
            label: label for the data
        Returns: 
            scalar loss if the model has a single output and no metrics, OR 
            list of scalars (if the model has multiple outputs and/or metrics). 
        """
        if self.num_labels == 1:
            assert np.all(np.isin(label, [0,1]))
        else:
            assert np.all(np.isin(label, np.arange(self.num_labels)))

        test_dataset = tf.data.Dataset.from_tensor_slices((
            dict(self.tokenize(data)),
            label
        ))
        evaluation = self.model.evaluate(
            test_dataset.shuffle(1000).batch(self.batch_size).prefetch(tf.data.experimental.AUTOTUNE)
        )
        return evaluation

    def _predict_custom_model_binary(self, data):
        """Calcuate the binary classification predictions
        Args:
            data: sentences to tokenize of type List[str]
        Returns: Probabilities as numpy array of shape (batch_size, 1)
        """
        tokens = dict(self.tokenize(data, padding='max_length'))
        probabilities = self.model.predict(tokens)
        assert isinstance(probabilities, np.ndarray)
        return probabilities

    def _predict_custom_model_categorical(self, data):
        """Calcuate the categorical classification predictions
        Args:
            data: sentences to tokenize of type List[str]
        Returns: Probabilities for label 1 as numpy array of shape (batch_size, 1)
        """
        tokens = dict(self.tokenize(data, padding='max_length'))
        probabilities = self.model.predict(tokens)
        assert isinstance(probabilities, np.ndarray) and probabilities.ndim == 2
        return probabilities[:, 1:2]

    def _predict_huggingface_model_binary(self, data):
        """Calcuate the binary classification predictions
        Args:
            data: sentences to tokenize of type List[str]
        Returns: Probabilities as numpy array of shape (batch_size, 1)
        """
        tokens = dict(self.tokenize(data))
        probabilities = self.model.predict(tokens)["logits"]
        return probabilities 
    
    def _predict_huggingface_model_categorical(self, data):
        """Calcuate the categorical classification predictions
        Args:
            data: sentences to tokenize of type List[str]
        Returns: Probabilities for label 1 as numpy array of shape (batch_size, 1)
        """
        tokens = dict(self.tokenize(data))
        logits = self.model.predict(tokens)["logits"]
        # [:, 1:2] -> TensorFlow Tensor indices to select column 1 for all rows
        return tf.nn.softmax(logits)[:, 1:2].numpy()

    def predict(self, sentences):
        """Generate prediction (probabilities) for the target label
        Args:
            sentences: text sentences of type str or List[str]
        Return:
            normalized probabilities in numpy array via sigmoid or softmax 
        """
        result = self._predict_fn(sentences)
        assert isinstance(result, np.ndarray) and result.shape[-1] == 1, \
            f"Expected np.ndarray but {type(result)} {result}"
        return result
                
    def _save_huggingface_model(self, path_to_dir):
        """Save Keras model in huggingface format
        """
        self.model.save_pretrained(path_to_dir)

    def _save_custom_model(self, path_to_dir):
        """Save Keras model in "tf" format for explicit save.
        Use h5 for auto-save model during the trainig to avoid overwriting 
        the best model saved during the training.
        """
        self.model.save_weights(
            self.output_path, overwrite=True, save_format=self.output_format
        )

    def save(self, path_to_dir):
        """Save the model from the HuggingFace. 
        - config.json 
        - tf_model.h5  

        TODO:
            Save the best model metrics when saving the model as Keras config file.
            Reload the best metrics of the model when loading the model itself.

            If the saved best model is re-loaded, the best metric values that the
            model achieved need to be re-loaded as well. Otherwise the first epoch
            result, even if the metrics are worse than the best metrics achieved,
            will become the best results and the best model will be overwritten with
            the inferior model.

        Args:
            path_to_dir: directory path to save the model artefacts
        """
        # path_to_dir is mandatory to avoid overwriting the best model saved
        # during the training
        Path(path_to_dir).mkdir(parents=True, exist_ok=True)
        self._save_fn(path_to_dir)

    def _load_huggingface_model(self, path_to_dir):
        self._model = self.model_class.from_pretrained(path_to_dir)

    def _load_custom_model(self, path_to_dir):
        path_to_file = path_to_dir + os.path.sep + 'model.h5'
        self.model.load_weights(path_to_file)
        
    def load(self, path_to_dir):
        """Load the model as the HuggingFace format.
        TODO:
            Reload the best metrics of the model when loading the model itself.
            If the saved best model is re-loaded, the best metric values that the
            model achieved need to be re-loaded as well. Otherwise the first epoch
            result, even if the metrics are worse than the best metrics achieved,
            will become the best results and the best model will be overwritten with
            the inferior model.

        Args:
            path_to_dir: Directory path from where to load config.json and .h5.
        """
        if os.path.isdir(path_to_dir) and os.access(path_to_dir, os.R_OK):
            self._load_fn(path_to_dir)
        else:
            raise RuntimeError(f"{path_to_dir} does not exit")

### Utilities

In [45]:
from math import ceil


def balance(
    df, 
    data_col_name,
    label_col_name,
    retain_columns,
    positive_negative_ratio=1.0,
    negative_replication_factor=1.0
):
    """Balance the data volumes of positives(label=1) and negatives/0.
    negatives (label==0) has more data than positives, causing skewness. 
    Replicate positives so that they have positive_negative_ratio 
    times more data than negatives.

    This is a naive way. Ideally better to have proper data argumentation.

    Args:
        df: Pandas dataframe 
        data_col_name: Column name for the data
        label_col_name: Column name for the label
        retain_columns: Columns to retain in the dataframe to return
        positive_negative_ratio: how many times more the volume of the positves to be than negatives
        negative_replication_factor: adjust the negative volume to negative_replication_factor * negative_size
    Returns: 
        Pandas dataframe with the ratin_columns.
    """
    assert 0.0 < positive_negative_ratio <= 10.0
    assert 0.0 < negative_replication_factor <= 10.0

    positive_indices = df.index[df[label_col_name]==1].tolist()
    negative_indices = df.index[df[label_col_name]==0].tolist()
    assert not bool(set(positive_indices) & set(negative_indices))

    # Adjust the volume of negatives
    negative_size = ceil(len(negative_indices) * negative_replication_factor)

    # Positives to have positive_negative_ratio times more than negatives
    positive_size = ceil(negative_size * positive_negative_ratio)

    # Random shuffle and select positives
    target_positive_indices = np.random.choice(
        a=positive_indices,
        size=positive_size,
        replace=True
    ).tolist()
    # Random shuffle and select negatives
    target_negative_indices = np.random.choice(
        a=negative_indices, 
        size=negative_size,
        replace=True
    ).tolist()
    assert len(target_positive_indices) >= len(target_negative_indices) * positive_negative_ratio
    
    # Further shuffle the indices
    indices = np.random.choice(
        a=target_positive_indices + target_negative_indices,
        size=negative_size+positive_size,
        replace=False
    )

    # Extract [data, label]
    data = df.iloc[indices][
        df.columns[df.columns.isin(retain_columns)]
    ]
    return data


def generate_runner(
    train,
    category,
    max_sequence_length,
    positive_negative_ratio,
    negative_replication_factor,
    freeze_pretrained_base_model,
    num_labels,
    batch_size,
    num_epochs,
    learning_rate,
    l2,
    metric_name,
    early_stop_patience,
    reduce_lr_patience,
    output_directory,
):
    """Wrapper to create the Runner instances for the respective category
    Args:
        train: Pandas dataframe containing entire training data
        category: unhealthy comment category, e.g. 'toxic'
        max_sequence_length:
        positive_negative_ratio: how many times more the volume of the positves to be than negatives
        negative_replication_factor: adjust the negative volume to negative_replication_factor * negative_size
        freeze_pretrained_base_model: flat to freeze the base model
        num_labels: Number of classes to classify
        batch_size:
        num_epochs:
        learning_rate:
        l2: regularizer weight decay
        metric_name: metric for the callbacks to monitor
        early_stop_patience:
        reduce_lr_patience:
        output_directory:
    """
    print("\n--------------------------------------------------------------------------------")
    print(f"Build runner for [{category}]")
    print("--------------------------------------------------------------------------------")

    balanced = balance(
        df=train, 
        data_col_name='comment_text', 
        label_col_name=category,
        retain_columns=['id', 'comment_text', category],
        positive_negative_ratio=positive_negative_ratio,
        negative_replication_factor=negative_replication_factor
    )
    data = balanced['comment_text'].tolist()
    label = balanced[category].tolist()
    del balanced

    # --------------------------------------------------------------------------------
    # Split data into training and validation
    # --------------------------------------------------------------------------------
    train_data, validation_data, train_label, validation_label = train_test_split(
        data,
        label,
        test_size=.2,
        shuffle=True
    )
    del data, label

    # --------------------------------------------------------------------------------
    # Instantiate the model trainer
    # --------------------------------------------------------------------------------
    runner = Runner(
        category=category,
        training_data=train_data,
        training_label=train_label,
        validation_data=validation_data,
        validation_label=validation_label,
        max_sequence_length=max_sequence_length,
        freeze_pretrained_base_model=freeze_pretrained_base_model,
        num_labels=num_labels,
        batch_size=batch_size,
        num_epochs=num_epochs,
        learning_rate=learning_rate,
        l2=l2,
        metric_name = metric_name,
        early_stop_patience=early_stop_patience,
        reduce_lr_patience=reduce_lr_patience,
        output_directory=output_directory
    )
    return runner


def generate_category_runner(category):
    def f(train):
        return generate_runner(
            category=category,
            train=train,
            positive_negative_ratio=5.0,
            negative_replication_factor=0.2,
            freeze_pretrained_base_model=FREEZE_BASE_MODEL,
            num_labels=NUM_LABELS,
            batch_size=BATCH_SIZE,
            max_sequence_length=MAX_SEQUENCE_LENGTH,
            num_epochs=NUM_EPOCHS,
            learning_rate=LEARNING_RATE,
            l2=L2,
            metric_name=METRIC_NAME,
            early_stop_patience=EARLY_STOP_PATIENCE,
            reduce_lr_patience=REDUCE_LR_PATIENCE,
            output_directory=RESULT_DIRECTORY
        )
    return f


def evaluate(runner, test, category):
    """
    Evaluate the model of the runner
    Args:
        runner: Runner instance
        test: Pandas dataframe holding entire data
    """
    print("\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~")
    print(f"Model evaluation on [{runner.category}]")
    print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~")
    test_data = test['comment_text'].tolist()
    test_label = test[category].tolist()
    evaluation = runner.evaluate(test_data, test_label)

    print(f"Evaluation: {runner.model_metric_names}:{evaluation}")
    del test_data, test_label

## Execution

In [None]:
if CLEANING_FOR_ANALYSIS and (not CLEANING_FOR_TRAINING):
    # Data has been clearned but training needs non cleaned data
    train, test = load_raw_data(TEST_MODE)
    print(f"Data records for training [{train['id'].count()}]")

# Drop the rows with -1. ['toxic'] >= 0 is sufficient
test = test[test['toxic'] >= 0]
gc.collect()

train.head(3)

In [1]:
# HuggingFace
MAX_SEQUENCE_LENGTH = 256   # Max token length to accept. 512 taks 1 hour/epoch on Google Colab
NUM_BASE_MODEL_OUTPUT = 768
USE_CLASSIFICATION_LAYER = False

# Model training
FREEZE_BASE_MODEL = False
NUM_EPOCHS = 10
BATCH_SIZE = 32
NUM_LABELS = 2
LEARNING_RATE = 2e-5  # Must be small to avoid catastrophic forget
L2 = 1e-4
METRIC_NAME = 'loss'
REDUCE_LR_PATIENCE = 1
EARLY_STOP_PATIENCE = 3

print("""
TIMESTAMP = {}
CLEANING_FOR_TRAINING = {}
MAX_SEQUENCE_LENGTH = {}
FREEZE_BASE_MODEL = {}
NUM_LABELS = {}
NUM_EPOCHS = {}
BATCH_SIZE = {}
LEARNING_RATE = {}
L2 = {}
METRIC_NAME = {}
REDUCE_LR_PATIENCE = {}
EARLY_STOP_PATIENCE = {}
RESULT_DIRECTORY = {}
""".format(
    TIMESTAMP,
    CLEANING_FOR_TRAINING,
    MAX_SEQUENCE_LENGTH,
    FREEZE_BASE_MODEL,
    NUM_LABELS,
    NUM_EPOCHS,
    BATCH_SIZE,
    LEARNING_RATE,
    L2,
    METRIC_NAME,
    REDUCE_LR_PATIENCE,
    EARLY_STOP_PATIENCE,
    RESULT_DIRECTORY,
))


MAX_SEQUENCE_LENGTH = 256
FREEZE_BASE_MODEL = False
NUM_EPOCHS = 10
BATCH_SIZE = 32
LEARNING_RATE = 5e-05
REDUCE_LR_PATIENCE = 3
EARLY_STOP_PATIENCE = 5

