# Custom Model Huggingface Framework

## Description

- This notebook dedicated to explore how to define a custom model and train the custom model with Huggingface framework
- Useful sources:
    - [Huggingface Models](https://huggingface.co/docs/transformers/en/models)
    - [Medium Blog - A Guide to Craft Your Own Custom Hugging Face Model](https://medium.com/@edandwe/a-guide-to-craft-your-own-custom-hugging-face-model-ba9cd555a646)
- Fundamentally, there are several things to defined to start using custom model: 
    - `AutoModel` class is a convenient way to load an architecture without needing to know the exact model class name because there are many models available. It automatically selects the correct model class based on the configuration file.
    - `AutoConfig`
    - The base workflow is as follows: 
      
       ```mermaid
        flowchart LR
            A[checkpoint or local folder] --> B[config file]
            A --> D[model file]
            B --> C[config class]
            C --> E[model config]
            C --> F[model class]
            E --> G[model]
            F --> G
            G --> H[pretrained model]
            D --> H
       ```

    - More detail in this video [Instantiate a Transformers model (PyTorch)](https://www.youtube.com/watch?v=AhChOFRegn4&t=101s)
    - `PretrainedConfig` class contains all the necessary information to build a model, two things to follows:
        - 1. A custom configuration must subclass PretrainedConfig. This ensures a custom model has all the functionality of a Transformers’ model such as `from_pretrained()`, `save_pretrained()`, and `push_to_hub()`.
        - 2. The PretrainedConfig `__init__` must accept any `kwargs` and they must be passed to the superclass `__init__`.
        - > It is useful to check the validity of some of the parameters. In the example below, a check is implemented to ensure block_type and stem_type belong to one of the predefined values.
        - > Add model_type to the configuration class to enable AutoClass support.
    - `PretrainedModel` class, inheriting from PreTrainedModel and initializing the superclass with the configuration extends Transformers’ functionalities such as saving and loading to the custom model/ 
        - > Add `config_class` to the model class to enable AutoClass support.
        - NOTE: 
            - A model can return any output format. Returning a dictionary (like `ResnetModelForImageClassification`) with losses when labels are available makes the custom model compatible with [Trainer](https://huggingface.co/docs/transformers/v4.52.1/en/main_classes/trainer#transformers.Trainer).
            - For other output formats, you’ll need your own training loop or a different library for training.
    - The `AutoClass` API is a shortcut for automatically loading the correct architecture for a given model. It is convenient to enable this for users loading your custom model. It is convenient to enable this for users loading your custom model.
        - Make sure you have the `model_type` attribute (must be different from existing model types) in the configuration class and `config_class` attribute in the model class. Use the [register()](https://huggingface.co/docs/transformers/v4.52.1/en/model_doc/auto#transformers.AutoConfig.register) method to add the custom configuration and model to the [AutoClass](https://huggingface.co/docs/transformers/en/models#model-classes) API.
        - > The first argument to `AutoConfig.register()` must match the `model_type` attribute in the custom configuration class, and the first argument to `AutoModel.register()` must match the `config_class` of the custom model class.

## 0. Import Library

In [1]:
from transformers import Trainer, TrainerCallback, TrainingArguments
import evaluate
import torch
import numpy as np

## 1. Custom Model Initialization

### Configuration

#### LSTM

In [2]:
from typing import Optional
from transformers.configuration_utils import PretrainedConfig
from transformers.utils import logging


logger = logging.get_logger(__name__)

class LSTMConfig(PretrainedConfig):
    r"""
    This is the configuration class to store the configuration of a [`LSTMModel`]. It is used to instantiate an
    LSTM model according to the specified arguments, defining the model architecture. Instantiating a configuration
    with the defaults will yield a similar configuration to that of a basic LSTM architecture.

    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
    documentation from [`PretrainedConfig`] for more information.

    Args:
        input_size (`int`, *optional*, defaults to 34):
            The dimension of the input features. For pose estimation, this would be the number of keypoints * 2 (x,y coordinates).
        hidden_size (`int`, *optional*, defaults to 128):
            The dimension of the hidden states.
        num_layers (`int`, *optional*, defaults to 1):
            Number of recurrent layers.
        num_labels (`int`, *optional*, defaults to 6):
            The number of labels for classification.
        dropout (`float`, *optional*, defaults to 0.0):
            The dropout probability for all fully connected layers in the model.
        bidirectional (`bool`, *optional*, defaults to `False`):
            If `True`, becomes a bidirectional LSTM.
        batch_first (`bool`, *optional*, defaults to `True`):
            If `True`, then the input and output tensors are provided as (batch, seq, feature).
        proj_size (`int`, *optional*, defaults to 0):
            If > 0, will use LSTM with projections of corresponding size.
        window_size (`int`, *optional*, defaults to 32):
            The size of the sliding window for sequential data.
        learning_rate (`float`, *optional*, defaults to 0.001):
            The learning rate for training.
        initializer_range (`float`, *optional*, defaults to 0.02):
            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
        layer_norm_eps (`float`, *optional*, defaults to 1e-5):
            The epsilon used by the layer normalization layers.
        use_layer_norm (`bool`, *optional*, defaults to `False`):
            Whether to use layer normalization after the LSTM.
        use_projection (`bool`, *optional*, defaults to `True`):
            Whether to use a projection layer after LSTM.

    Example:

    ```python
    >>> from transformers import LSTMConfig, LSTMModel

    >>> # Initializing a LSTM configuration
    >>> configuration = LSTMConfig()

    >>> # Initializing a model from the configuration
    >>> model = LSTMModel(configuration)

    >>> # Accessing the model configuration
    >>> configuration = model.config
    ```"""

    model_type = "lstm"

    def __init__(
        self,
        input_size=34,
        hidden_size=128,
        num_layers=1,
        num_labels=5,
        dropout=0.0,
        bidirectional=False,
        batch_first=True,
        proj_size=0,
        window_size=16,
        learning_rate=0.001,
        initializer_range=0.02,
        layer_norm_eps=1e-5,
        use_layer_norm=False,
        use_projection=True,
        **kwargs,
    ):
        super().__init__(**kwargs)

        self.input_size = input_size
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.num_labels = num_labels
        self.dropout = dropout
        self.bidirectional = bidirectional
        self.batch_first = batch_first
        self.proj_size = proj_size
        self.window_size = window_size
        self.learning_rate = learning_rate
        self.initializer_range = initializer_range
        self.layer_norm_eps = layer_norm_eps
        self.use_layer_norm = use_layer_norm
        self.use_projection = use_projection

### Modelling

#### LSTM

In [3]:
from typing import Optional, Tuple, Union

import torch
import torch.nn as nn
from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss

from transformers.modeling_outputs import (
    BaseModelOutput,
    SequenceClassifierOutput,
)
from transformers.modeling_utils import PreTrainedModel
from transformers.utils import add_start_docstrings, add_start_docstrings_to_model_forward, logging

logger = logging.get_logger(__name__)

_CONFIG_FOR_DOC = "LSTMConfig"

LSTM_PRETRAINED_MODEL_ARCHIVE_LIST = [
    # Add pretrained model identifiers here
]


class LSTMPreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
    models.
    """

    config_class = LSTMConfig
    base_model_prefix = "lstm"
    supports_gradient_checkpointing = False
    _no_split_modules = ["LSTMLayer"]

    def _init_weights(self, module):
        """Initialize the weights"""
        if isinstance(module, nn.Linear):
            module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
            if module.bias is not None:
                module.bias.data.zero_()
        elif isinstance(module, nn.LSTM):
            for name, param in module.named_parameters():
                if 'weight' in name:
                    nn.init.xavier_uniform_(param)
                elif 'bias' in name:
                    nn.init.zeros_(param)
        elif isinstance(module, nn.LayerNorm):
            module.bias.data.zero_()
            module.weight.data.fill_(1.0)


class LSTMLayer(nn.Module):
    """LSTM layer with optional layer normalization and dropout."""

    def __init__(self, config):
        super().__init__()
        self.lstm = nn.LSTM(
            input_size=config.input_size if hasattr(config, 'input_size') else config.hidden_size,
            hidden_size=config.hidden_size,
            num_layers=config.num_layers,
            batch_first=config.batch_first,
            dropout=config.dropout if config.num_layers > 1 else 0,
            bidirectional=config.bidirectional,
            proj_size=config.proj_size if config.proj_size > 0 else 0,
        )
        
        self.use_layer_norm = config.use_layer_norm
        if self.use_layer_norm:
            norm_size = config.hidden_size * (2 if config.bidirectional else 1)
            self.layer_norm = nn.LayerNorm(norm_size, eps=config.layer_norm_eps)
        
        self.dropout = nn.Dropout(config.dropout)

    def forward(
        self,
        input_tensor,
        hidden_states=None,
        cell_states=None,
    ):
        lstm_output, (hidden_states, cell_states) = self.lstm(
            input_tensor, 
            (hidden_states, cell_states) if hidden_states is not None else None
        )
        
        if self.use_layer_norm:
            lstm_output = self.layer_norm(lstm_output)
        
        lstm_output = self.dropout(lstm_output)
        
        return lstm_output, (hidden_states, cell_states)


LSTM_START_DOCSTRING = r"""
    This model is a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. Use it
    as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and
    behavior.

    Parameters:
        config ([`LSTMConfig`]): Model configuration class with all the parameters of the model.
            Initializing with a config file does not load the weights associated with the model, only the
            configuration. Check out the [`~PreTrainedModel.from_pretrained`] method to load the model weights.
"""

LSTM_INPUTS_DOCSTRING = r"""
    Args:
        input_ids (`torch.FloatTensor` of shape `(batch_size, sequence_length, input_size)`):
            Input sequence tensor.
        attention_mask (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*):
            Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:

            - 1 for tokens that are **not masked**,
            - 0 for tokens that are **masked**.

        output_hidden_states (`bool`, *optional*):
            Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
            more detail.
        return_dict (`bool`, *optional*):
            Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
"""


@add_start_docstrings(
    "The bare LSTM Model transformer outputting raw hidden-states without any specific head on top.",
    LSTM_START_DOCSTRING,
)
class LSTMModel(LSTMPreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        self.config = config

        self.lstm_layer = LSTMLayer(config)
        
        # Initialize weights and apply final processing
        self.post_init()

    @add_start_docstrings_to_model_forward(LSTM_INPUTS_DOCSTRING)
    def forward(
        self,
        input_ids: Optional[torch.Tensor] = None,
        attention_mask: Optional[torch.Tensor] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple[torch.Tensor], BaseModelOutput]:
        r"""
        Returns:

        Example:

        ```python
        >>> from transformers import LSTMConfig, LSTMModel
        >>> import torch

        >>> # Initializing a LSTM configuration
        >>> configuration = LSTMConfig()

        >>> # Initializing a model from the configuration
        >>> model = LSTMModel(configuration)

        >>> # Random input tensor (batch_size=2, sequence_length=32, input_size=34)
        >>> inputs = torch.randn(2, 32, 34)
        >>> outputs = model(inputs)
        >>> last_hidden_states = outputs.last_hidden_state
        ```"""
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        if input_ids is None:
            raise ValueError("You have to specify input_ids")

        # Apply attention mask if provided
        if attention_mask is not None:
            # Expand attention mask
            attention_mask = attention_mask.unsqueeze(-1).expand_as(input_ids)
            input_ids = input_ids * attention_mask

        # Pass through LSTM
        sequence_output, (final_hidden, final_cell) = self.lstm_layer(input_ids)

        # Get the last hidden state
        if self.config.bidirectional:
            # Concatenate forward and backward hidden states
            last_hidden_state = torch.cat([final_hidden[-2], final_hidden[-1]], dim=-1)
        else:
            last_hidden_state = final_hidden[-1]

        if not return_dict:
            return (sequence_output, last_hidden_state)

        return BaseModelOutput(
            last_hidden_state=sequence_output,
            hidden_states=(last_hidden_state,) if output_hidden_states else None,
        )


@add_start_docstrings(
    """LSTM Model with a classification head on top (a linear layer on top of the hidden-states output).""",
    LSTM_START_DOCSTRING,
)
class LSTMForSequenceClassification(LSTMPreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        self.num_labels = config.num_labels
        self.config = config

        self.lstm = LSTMModel(config)
        
        # Classification head
        classifier_input_size = config.hidden_size * (2 if config.bidirectional else 1)
        
        if config.use_projection:
            self.pre_classifier = nn.Linear(classifier_input_size, config.hidden_size)
            self.classifier = nn.Linear(config.hidden_size, config.num_labels)
            self.dropout = nn.Dropout(config.dropout)
        else:
            self.classifier = nn.Linear(classifier_input_size, config.num_labels)

        # Initialize weights and apply final processing
        self.post_init()

    @add_start_docstrings_to_model_forward(LSTM_INPUTS_DOCSTRING)
    def forward(
        self,
        input_ids: Optional[torch.Tensor] = None,
        attention_mask: Optional[torch.Tensor] = None,
        labels: Optional[torch.Tensor] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
    ) -> Union[Tuple[torch.Tensor], SequenceClassifierOutput]:
        r"""
        labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
            Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
            config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
            `config.num_labels > 1` a classification loss is computed (Cross-Entropy).

        Returns:

        Example:

        ```python
        >>> from transformers import LSTMConfig, LSTMForSequenceClassification
        >>> import torch

        >>> # Number of labels for classification
        >>> num_labels = 6

        >>> # Initializing a LSTM configuration
        >>> configuration = LSTMConfig(num_labels=num_labels)

        >>> # Initializing a model from the configuration
        >>> model = LSTMForSequenceClassification(configuration)

        >>> # Random input tensor (batch_size=2, sequence_length=32, input_size=34)
        >>> inputs = torch.randn(2, 32, 34)
        >>> labels = torch.tensor([1, 3])

        >>> outputs = model(inputs, labels=labels)
        >>> loss = outputs.loss
        >>> logits = outputs.logits
        ```"""
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.lstm(
            input_ids,
            attention_mask=attention_mask,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        # Get the last hidden state from LSTM
        if return_dict:
            # For sequence classification, we typically use the last hidden state
            sequence_output = outputs.last_hidden_state
            if len(outputs.hidden_states) > 0:
                pooled_output = outputs.hidden_states[0]  # Final hidden state
            else:
                # If hidden states not returned, use the last timestep
                pooled_output = sequence_output[:, -1, :]
        else:
            sequence_output = outputs[0]
            pooled_output = outputs[1]

        # Apply projection if configured
        if self.config.use_projection:
            pooled_output = self.pre_classifier(pooled_output)
            pooled_output = torch.relu(pooled_output)
            pooled_output = self.dropout(pooled_output)

        logits = self.classifier(pooled_output)

        loss = None
        if labels is not None:
            if self.config.problem_type is None:
                if self.num_labels == 1:
                    self.config.problem_type = "regression"
                elif self.num_labels > 1 and (labels.dtype == torch.long or labels.dtype == torch.int):
                    self.config.problem_type = "single_label_classification"
                else:
                    self.config.problem_type = "multi_label_classification"

            if self.config.problem_type == "regression":
                loss_fct = MSELoss()
                if self.num_labels == 1:
                    loss = loss_fct(logits.squeeze(), labels.squeeze())
                else:
                    loss = loss_fct(logits, labels)
            elif self.config.problem_type == "single_label_classification":
                loss_fct = CrossEntropyLoss()
                loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
            elif self.config.problem_type == "multi_label_classification":
                loss_fct = BCEWithLogitsLoss()
                loss = loss_fct(logits, labels)

        if not return_dict:
            output = (logits,) + outputs[2:]
            return ((loss,) + output) if loss is not None else output

        return SequenceClassifierOutput(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
            attentions=None,  # LSTM doesn't have attention weights
        )


# Add to AutoModel registry
def register_lstm_auto_model():
    """Register the LSTM model with AutoModel."""
    try:
        from transformers import AutoConfig, AutoModel, AutoModelForSequenceClassification
        
        AutoConfig.register("lstm", LSTMConfig)
        AutoModel.register(LSTMConfig, LSTMModel)
        AutoModelForSequenceClassification.register(LSTMConfig, LSTMForSequenceClassification)
    except ImportError:
        logger.warning("Could not register LSTM model with AutoModel. Please ensure transformers is installed.")


# Register on module import
register_lstm_auto_model()

__all__ = [
    "LSTMConfig",
    "LSTMPreTrainedModel", 
    "LSTMModel",
    "LSTMForSequenceClassification",
]

## 2. Model Development

### 2.1. Create Dataset Object

In [56]:
import seedir as sd

sd.seedir(r'./data/video_hand_focused_data/hand_landmark_flatten', style='emoji', itemlimit=(None, 2))
print(np.load(r'./data/video_hand_focused_data/hand_landmark_flatten/HC/Record_20250402151124_w005_landmarks.npy').shape)

📁 hand_landmark_flatten/
├─📄 dataset_summary.json
├─📁 HC/
│ ├─📄 Record_20250402151124_w005_landmarks.json
│ └─📄 Record_20250402151124_w005_landmarks.npy
├─📁 HH/
│ ├─📄 Record_20250402151124_w007_landmarks.json
│ └─📄 Record_20250402151124_w007_landmarks.npy
├─📁 IH/
│ ├─📄 Record_20250402151124_w001_landmarks.json
│ └─📄 Record_20250402151124_w001_landmarks.npy
├─📁 OH/
│ ├─📄 Record_20250402151124_w000_landmarks.json
│ └─📄 Record_20250402151124_w000_landmarks.npy
└─📁 SF/
  ├─📄 Record_20250402151124_w003_landmarks.json
  └─📄 Record_20250402151124_w003_landmarks.npy
(16, 42)


In [None]:
from datasets import Dataset
import os
import glob

landmark_dir = r'data/video_hand_focused_data/hand_landmark_flatten'
val_ratio = 0.25

# Create lists of file paths and labels
filepaths, labels = [], []
class_mapping = {
    'OH': 0,  # Open Hand
    'IH': 1,  # Intrinsic Plus
    'SF': 2,  # Straight Fist
    'HH': 3,  # Hook Hand
    'HC': 4   # Hand Close
}

# Scan through each class directory
for class_name in class_mapping.keys():
    class_dir = os.path.join(landmark_dir, class_name)
    if not os.path.exists(class_dir):
        print(f"Warning: Class directory not found: {class_dir}")
        continue
        
    # Get all .npy files (excluding metadata jsons)
    for npy_path in glob.glob(os.path.join(class_dir, '*_landmarks.npy')):
        filepaths.append(npy_path)
        labels.append(class_name)

unique_labels = sorted(set(labels))

# Create dataset with file paths
dataset = Dataset.from_dict({
    "landmark_path": filepaths,
    "label": labels
})

# Encode the label column
dataset = dataset.class_encode_column("label")

# Create a function to load landmarks when needed
def load_landmarks(example):
    # Load the landmark data
    landmarks = np.load(example["landmark_path"])
    # Flatten if needed (should already be (16, 42) if flatten_keypoints=True was used)
    if landmarks.shape == (16, 21, 2):
        landmarks = landmarks.reshape(16, -1)
    example["input_ids"] = landmarks.tolist()
    
    return example

# Split the dataset
split = dataset.train_test_split(
    test_size=val_ratio, 
    shuffle=True, 
    seed=42, 
    stratify_by_column="label"
)

train_ds = split["train"]
val_ds = split["test"]

print(f"\nTrain set: {len(train_ds)} samples")
print(f"Validation set: {len(val_ds)} samples")

# After loading the landmarks, remove the metadata field before training
train_ds = train_ds.map(load_landmarks, remove_columns=["landmark_path"])
val_ds = val_ds.map(load_landmarks, remove_columns=["landmark_path"])

# Rename 'label' to 'labels' for compatibility with Trainer
train_ds = train_ds.rename_column("label", "labels")
val_ds = val_ds.rename_column("label", "labels")

# Build label mappings
label2id = {lab: idx for idx, lab in enumerate(unique_labels)}
id2label = {idx: lab for lab, idx in label2id.items()}

Casting to class labels:   0%|          | 0/60 [00:00<?, ? examples/s]


Train set: 45 samples
Validation set: 15 samples


Map:   0%|          | 0/45 [00:00<?, ? examples/s]

Map:   0%|          | 0/15 [00:00<?, ? examples/s]

### 2.2. Model Initialization

In [47]:
from torchinfo import summary

INPUT_SIZE = 42
HIDDEN_SIZE = 128
NUM_LAYERS = 2
NUM_LABELS = 5
DROPOUT = 0.2
BIDIRECTIONAL = True
BATCH_FIRST = True
WINDOW_SIZE = 16
USE_PROJECTION = True
USE_LAYER_NORM = True
OUTPUT_HIDDEN_STATES = True

config = LSTMConfig(
    input_size=INPUT_SIZE,
    hidden_size=HIDDEN_SIZE,
    num_layers=NUM_LAYERS,
    num_labels=NUM_LABELS,
    dropout=DROPOUT,
    bidirectional=BIDIRECTIONAL,
    batch_first=BATCH_FIRST,
    window_size=WINDOW_SIZE,
    use_projection=USE_PROJECTION,
    use_layer_norm=USE_LAYER_NORM,
    output_hidden_states=OUTPUT_HIDDEN_STATES
)

model = LSTMForSequenceClassification(config)

print("\nModel Architecture Summary:")

input_shape = (32, 16, 42)  # batch_size=32, window_size=16, features=42
model_summary = summary(
    model,
    input_size=input_shape,
    col_names=["input_size", "output_size", "num_params", "trainable"],
    col_width=20,
    row_settings=["var_names"],
    verbose=0
)

print(model_summary)


Model Architecture Summary:
Layer (type (var_name))                                           Input Shape          Output Shape         Param #              Trainable
LSTMForSequenceClassification (LSTMForSequenceClassification)     [32, 16, 42]         [32, 256]            --                   True
├─LSTMModel (lstm)                                                [32, 16, 42]         [32, 256]            --                   True
│    └─LSTMLayer (lstm_layer)                                     [32, 16, 42]         [32, 16, 256]        --                   True
│    │    └─LSTM (lstm)                                           [32, 16, 42]         [32, 16, 256]        571,392              True
│    │    └─LayerNorm (layer_norm)                                [32, 16, 256]        [32, 16, 256]        512                  True
│    │    └─Dropout (dropout)                                     [32, 16, 256]        [32, 16, 256]        --                   --
├─Linear (pre_classifier)     

### 2.3. Model Training

#### Setup `Trainer`, `Callback`, and `Evaluate`

In [None]:
from datetime import datetime

# Setup paths
experiment_date = datetime.now().strftime("%Y%m%d")
model_name = f"hand-pose-lstm-h{HIDDEN_SIZE}-l{NUM_LAYERS}"
output_dir = f"experiments/hand_pose/{experiment_date}/{model_name}"

if not os.path.exists(output_dir):
    os.makedirs(output_dir, exist_ok=True)

training_args = TrainingArguments(
        output_dir=output_dir,
        remove_unused_columns=False,
        eval_strategy="epoch",
        save_strategy="best",
        save_total_limit=3,
        learning_rate=5e-4,
        per_device_train_batch_size=32,
        per_device_eval_batch_size=32,
        num_train_epochs=100,
        warmup_ratio=0.1,
        logging_steps=10,
        load_best_model_at_end=True,
        metric_for_best_model="accuracy",
        greater_is_better=True,
        push_to_hub=False,
)


def compute_metrics(eval_pred):
    """Compute evaluation metrics"""
    # Load metrics
    accuracy = evaluate.load("accuracy")
    precision = evaluate.load("precision")
    recall = evaluate.load("recall")
    f1 = evaluate.load("f1")
    confusion = evaluate.load("confusion_matrix")
    
    logits, labels = eval_pred
    preds = np.argmax(logits, axis=-1)
    
    # Compute metrics
    acc = accuracy.compute(predictions=preds, references=labels)["accuracy"]
    prec = precision.compute(predictions=preds, references=labels, average="macro")["precision"]
    rec = recall.compute(predictions=preds, references=labels, average="macro")["recall"]
    f1sc = f1.compute(predictions=preds, references=labels, average="macro")["f1"]
    
    # Confusion matrix
    cm = confusion.compute(predictions=preds, references=labels)["confusion_matrix"]
    cm_list = cm.tolist()
    
    return {
        "accuracy": acc,
        "precision": prec,
        "recall": rec,
        "f1": f1sc,
        "confusion_matrix": cm_list,
    }  

class CustomTrainer(Trainer):
    """Custom trainer that tracks batch-level metrics"""
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.epoch_losses = []
        self.epoch_preds = []
        self.epoch_labels = []
    
    def compute_loss(self, model, inputs, return_outputs=False, **kwargs):
        """Override to store batch-level loss & predictions"""
        labels = inputs.get("labels", None)
        outputs = model(**inputs)
        loss = outputs.loss
        logits = outputs.logits
        
        if labels is not None:
            # Store the loss
            self.epoch_losses.append(loss.item())
            
            # Store predictions + labels
            preds = logits.argmax(dim=-1).detach().cpu().numpy()
            labs = labels.detach().cpu().numpy()
            self.epoch_preds.extend(preds.tolist())
            self.epoch_labels.extend(labs.tolist())
        
        return (loss, outputs) if return_outputs else loss


class MetricsCallback(TrainerCallback):
    """Callback to track and store training metrics"""
    def __init__(self, trainer):
        super().__init__()
        self.trainer = trainer
        self.train_losses = []
        self.train_accuracies = []
        self.eval_losses = []
        self.eval_accuracies = []
        self.eval_confusion_matrices = []
        self.eval_f1_scores = []
        self.eval_precisions = []
        self.eval_recalls = []
    
    def on_epoch_end(self, args, state, control, **kwargs):
        """Compute and store training metrics at epoch end"""
        t = self.trainer
        if t.epoch_losses:
            avg_loss = float(np.mean(t.epoch_losses))
            acc = np.mean(np.array(t.epoch_preds) == np.array(t.epoch_labels))
            
            self.train_losses.append(avg_loss)
            self.train_accuracies.append(acc)
            
            # Clear for next epoch
            t.epoch_losses.clear()
            t.epoch_preds.clear()
            t.epoch_labels.clear()
    
    def on_evaluate(self, args, state, control, metrics, **kwargs):
        """Store validation metrics"""
        self.eval_losses.append(metrics.get("eval_loss", 0))
        self.eval_accuracies.append(metrics.get("eval_accuracy", 0))
        self.eval_confusion_matrices.append(metrics.get("eval_confusion_matrix", []))
        self.eval_f1_scores.append(metrics.get("eval_f1", 0))
        self.eval_precisions.append(metrics.get("eval_precision", 0))
        self.eval_recalls.append(metrics.get("eval_recall", 0))

trainer = CustomTrainer(
        model=model,
        args=training_args,
        train_dataset=train_ds,
        eval_dataset=val_ds,
        compute_metrics=compute_metrics,
    )

# Add metrics callback
metrics_cb = MetricsCallback(trainer)
trainer.add_callback(metrics_cb)

#### Training

In [49]:
train_result = trainer.train()

Epoch,Training Loss,Validation Loss


ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.