# LLAMA3 Fine-tuning for text classification using QLORA

This notebook is an implementaton of:
https://arxiv.org/abs/2305.14314

This notebook is derived from:
https://github.com/adidror005/youtube-videos/blob/main/LLAMA_3_Fine_Tuning_for_Sequence_Classification_Actual_Video.ipynb

Where I adapated the problem and data set from:
https://www.kaggle.com/code/jhoward/getting-started-with-nlp-for-absolute-beginners

### Requirements:
* A GPU with enough memory, 12GB VRAM or more.  Nvidia/Tesla T4 or better would work

### Installs
* They suggest using latest version of transformers
* Must restart after install because the accelerate package used in the hugging face trainer requires it.

### Google Colab
For your convenience open this notebook on google colab with this link
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jkyamog/ml-experiments/blob/main/fine-tuning-qlora/LLAMA_3_Fine_Tuning_for_Sequence_Classification.ipynb)

Add the following secrets to Colab's secret:
* huggingface - hugging face token to access models
* kaggle_user - kaggle username from Setting > Account > API
* kaggle_key - kaggle key from Setting > Account > API

In [1]:
# Install Pytorch
#%pip install "torch==2.2.2" tensorboard

# Install Hugging Face libraries
#%pip install  --upgrade "transformers==4.40.0" "datasets==2.18.0" "accelerate==0.29.3" "evaluate==0.4.1" "bitsandbytes==0.43.1" "huggingface_hub==0.22.2" "trl==0.8.6" "peft==0.10.0"


In [2]:
%pip list | grep "transformers\\|datasets\\|accelerate\\|evaluate\\|bitsandbytes\\|huggingface-hub\\|trl\\|peft"

accelerate                1.5.2
bitsandbytes              0.45.3
datasets                  3.4.1
evaluate                  0.4.3
huggingface-hub           0.29.3
peft                      0.14.0
transformers              4.49.0
trl                       0.15.2
Note: you may need to restart the kernel to use updated packages.


### Big Picture Overview of Parameter Efficient Fine Tuning Methods like LoRA and QLoRA Fine Tuning for Sequence Classification

**The Essence of Fine-tuning**
- LLMs are pre-trained on vast amounts of data for broad language understanding.
- Fine-tuning is crucial for specializing in specific domains or tasks, involving adjustments with smaller, relevant datasets.

**Model Fine-tuning with PEFT: Exploring LoRA and QLoRA**
- Traditional fine-tuning is resource-intensive; PEFT (Parameter Efficient Fine-tuning) makes the process faster and less demanding.
- Focus on two PEFT methods: LoRA and QLoRA.

**The Power of PEFT**
- PEFT modifies only a subset of the LLM's parameters, enhancing speed and reducing memory demands, making it suitable for less powerful devices.

**LoRA: Efficiency through Adapters**
- **Low-Rank Adaptation (LoRA):** Injects small trainable adapters into the pre-trained model.
- **Equation:** For a weight matrix $W$, LoRA approximates $W = W_0 + BA$, where $W_0$ is the original weight matrix, and $BA$ represents the low-rank modification through trainable matrices $B$ and $A$.
- Adapters learn task nuances while keeping the majority of the LLM unchanged, minimizing overhead.

**QLoRA: Compression and Speed**
- **Quantized LoRA (QLoRA):** Extends LoRA by quantizing the model’s weights, further reducing size and enhancing speed.
- **Innovations in QLoRA:**
  1. **4-bit Quantization:** Uses a 4-bit data type, NormalFloat (NF4), for optimal weight quantization, drastically reducing memory usage.
  2. **Low-Rank Adapters:** Fine-tuned with 16-bit precision to effectively capture task-specific nuances.
  3. **Double Quantization:** Reduces quantization constants from 32-bit to 8-bit, saving additional memory without accuracy loss.
  4. **Paged Optimizers:** Manages memory efficiently during training, optimizing for large tasks.

**Why PEFT Matters**
- **Rapid Learning:** Speeds up model adaptation.
- **Smaller Footprint:** Eases deployment with reduced model size.
- **Edge-Friendly:** Fits better on devices with limited resources, enhancing accessibility.

**Conclusion**
- PEFT methods like LoRA and QLoRA revolutionize LLM fine-tuning by focusing on efficiency, facilitating faster adaptability, smaller models, and broader device compatibility.




### Fine-tuning for Text Classification:


#### 1. Text Generation with Classification Label as part of text
- **Approach**: Train the model to generate text that naturally appends the classification label at the end.
- **Input**: "Lorem ipsum dolor sit amet, consectetur adipiscing elit"
- **Output**: "Lorem ipsum dolor sit amet, consectetur adipiscing elit 0.25"
- **Use Case**: This method is useful for classifiying text


#### 2. Sequence Classification Head
- **Approach**: Add a sequence classification head (linear layer) on top of the LLaMa Model transformer. This setup is similar to GPT-2 and focuses on classifying the sentiment based on the last relevant token in the sequence.
    - **Token Positioning**:
        - **With pad_token_id**: The model identifies and ignores padding tokens, using the last non-padding token for classification.
        - **Without pad_token_id**: It defaults to the last token in each sequence.
        - **inputs_embeds**: If embeddings are directly passed (without input_ids), the model cannot identify padding tokens and takes the last embedding in each sequence as the input for classification.
- **Input**: Specific sentences (e.g., "Lorem ipsum dolor sit amet, consectetur adipiscing elit").
- **Output**: Direct classification (e.g., "0.25", "0.50").
- **Training Objective**: Minimize cross-entropy loss between the predicted and the actual sentiment labels.

https://huggingface.co/docs/transformers/main/en/model_doc/llama

### Peft Configs
* Bits and bytes config for quantization
* Lora config for lora

### Going to use Hugginface Transformers trainer class: Main componenents
* Hugging face dataset (for train + eval)
* Data collater
* Compute Metrics
* Class weights since we use custom trainer and also custom weighted loss..
* trainingArgs: like # epochs, learning rate, weight decay etc..



### Login to huggingface hub to put your LLama token so we can access Llama 3 7B Param Pre-trained Model

In [3]:
#device_map="DDP" # for DDP and running with `accelerate launch test_sft.py`
device_map='auto' # for PP and running with `python test_sft.py`

if device_map == "DDP":
    device_string = PartialState().process_index
    device_map={'':device_string}

In [4]:
#from google.colab import userdata
#hugginface_token = userdata.get('huggingface')
#!huggingface-cli login --token $hugginface_token

###### Imports

In [5]:
import os
import random
import functools
import csv
import pandas as pd
import numpy as np
import torch
import torch.nn.functional as F
import evaluate

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import f1_score, confusion_matrix, classification_report, balanced_accuracy_score, accuracy_score

from scipy.stats import pearsonr
from datasets import Dataset, DatasetDict
from peft import LoraConfig, prepare_model_for_kbit_training, get_peft_model

from transformers import (
    AutoModelForSequenceClassification,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    Trainer,
    DataCollatorWithPadding
)


The easiest way to download Kaggle datasets is to use the Kaggle API. You can install this using pip by running this in a notebook cell:

!pip install kaggle
You need an API key to use the Kaggle API; to get one, click on your profile picture on the Kaggle website, and choose My Account, then click Create New API Token. This will save a file called kaggle.json to your PC. You need to copy this key on your GPU server. To do so, open the file you downloaded, copy the contents, and paste them in the following cell (e.g., creds = '{"username":"xxx","key":"xxx"}'):


In [6]:
#creds = '{"username":"' + userdata.get('kaggle_user') + '","key":"' + userdata.get('kaggle_key') + '"}'

Then execute this cell (this only needs to be run once):

### Convert from Pandas DataFrame to Hugging Face Dataset
* Also let's shuffle the training set.
* We put the components train,val,test into a DatasetDict so we can access them later with HF trainer.
* Later we will add a tokenized dataset


In [7]:
from pathlib import Path
path = Path('sequence_classification')

In [8]:
df_train = pd.read_excel(path/'train.xlsx', index_col = 0)
df_val = pd.read_excel(path/'val.xlsx', index_col = 0)

In [9]:
df_val['score_ascat'] = df_val['score_ascat'].astype('category')

In [10]:
category_map = {code: category for code, category in enumerate(df_val['score_ascat'].cat.categories)}
category_map

{0: 0.0, 1: 0.25, 2: 0.5, 3: 0.75, 4: 1.0}

In [11]:
dataset_train = Dataset.from_pandas(df_train.drop(['score_ascat', 'score'],axis=1).reset_index(drop=True))
dataset_val = Dataset.from_pandas(df_val.drop(['score_ascat', 'score'],axis=1).reset_index(drop=True))

In [12]:
vectors_train = np.load('sequence_classification/vector_train.npy', allow_pickle=True)

In [13]:
vectors_val = np.load('sequence_classification/vector_val.npy', allow_pickle=True)

In [14]:
class AddVector:
    def __init__(self, vectors):
        self.n = 0
        self.vectors = vectors
    def __call__(self, example):
        self.n += 1
        return { 'vector': self.vectors[self.n-1] }

In [15]:
add_vector = AddVector(vectors_train)
dataset_train = dataset_train.map(add_vector)
add_vector = AddVector(vectors_val)
dataset_val = dataset_val.map(add_vector)

Map:   0%|          | 0/29178 [00:00<?, ? examples/s]

Map:   0%|          | 0/7295 [00:00<?, ? examples/s]

In [16]:
# Combine them into a single DatasetDict
dataset = DatasetDict({
    'train': dataset_train,
    'val': dataset_val,
})
dataset

DatasetDict({
    train: Dataset({
        features: ['id', 'anchor', 'target', 'context', 'input', 'score_category', 'vector'],
        num_rows: 29178
    })
    val: Dataset({
        features: ['id', 'anchor', 'target', 'context', 'input', 'score_category', 'vector'],
        num_rows: 7295
    })
})

* Since our classes are not balanced let's calculate class weights based on inverse value counts
* Convert to pytorch tensor since we will need it

In [17]:
df_train.score_category.value_counts(normalize=True)

score_category
2    0.337994
1    0.315854
0    0.204538
3    0.109500
4    0.032113
Name: proportion, dtype: float64

In [18]:
class_weights=(1/df_train.score_category.value_counts(normalize=True).sort_index()).tolist()
class_weights=torch.tensor(class_weights)
class_weights=class_weights/class_weights.sum()
class_weights


tensor([0.0953, 0.0617, 0.0577, 0.1781, 0.6072])

## Load LLama model with 4 bit quantization as specified in bits and bytes and prepare model for peft training

### Model Name

#### Load model
* AutomodelForSequenceClassification
* Num Labels is # of classes


In [19]:
model_name = "meta-llama/Meta-Llama-3-8B"

In [20]:
from typing import Callable, List, Optional, Tuple, Union
from transformers.cache_utils import Cache, DynamicCache, StaticCache
from transformers.modeling_outputs import (
    SequenceClassifierOutput,
)
from torch import nn
from transformers.loss.loss_utils import ForSequenceClassificationLoss
from transformers.configuration_utils import PretrainedConfig

class LlamaForSequenceClassification3(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden_size = 4096
        self.config = PretrainedConfig()
        self.config.num_labels = 5
        self.config.problem_type = "multi_label_classification"
        self.num_labels = self.config.num_labels
        self.score = nn.Linear(self.hidden_size, self.num_labels, bias=False) 
    def forward(
        self,
        input_ids: Optional[torch.LongTensor] = None,
        attention_mask: Optional[torch.Tensor] = None,
        #non_pad_mask: Optional[torch.LongTensor] = None,
        vector: Optional[torch.FloatTensor] = None,
        labels: Optional[torch.LongTensor] = None,
        return_dict: Optional[bool] = True,
    ) -> Union[Tuple, SequenceClassifierOutput]:

        #print(input_ids)
        #print(input_ids.shape)
        hidden_states = vector
        logits = self.score(hidden_states)
        #print(input_ids != self.config.pad_token_id)
        non_pad_mask = (input_ids != self.config.pad_token_id).to(logits.device, torch.int32)

        batch_size = non_pad_mask.shape[0]

        # To handle both left- and right- padding, we take the rightmost token that is not equal to pad_token_id
        token_indices = torch.arange(non_pad_mask.shape[-1], device=logits.device)
        last_non_pad_token = (token_indices * non_pad_mask).argmax(-1)

        pooled_logits = logits[torch.arange(batch_size, device=logits.device), last_non_pad_token]
        
        loss = None
        #if labels is not None:
        #    loss = self.loss_function(logits=logits, labels=labels, pooled_logits=pooled_logits, config=self.config)
            #loss = self.loss_function(labels=labels, pooled_logits=pooled_logits, config=self.config)
            
        if not return_dict:
            output = (pooled_logits,)
            return ((loss,) + output) if loss is not None else output

        return SequenceClassifierOutput(
            loss=loss,
            logits=pooled_logits,
            #hidden_states=hidden_states,
        )

    @property
    def loss_function(self):
        '''
        if hasattr(self, "_loss_function"):
            return self._loss_function

        loss_type = getattr(self, "loss_type", None)

        if loss_type is None or loss_type not in LOSS_MAPPING:
            logger.warning_once(
                f"`loss_type={loss_type}` was set in the config but it is unrecognised."
                f"Using the default loss: `ForCausalLMLoss`."
            )
            loss_type = "ForCausalLM"
        return LOSS_MAPPING[loss_type]
        '''
        return ForSequenceClassificationLoss

        
        

In [21]:
#model = LlamaModel.from_pretrained(
#model = LlamaModel2.from_pretrained(
model = LlamaForSequenceClassification3(
    #model_name,
    #quantization_config=quantization_config,
    #num_labels=len(category_map),
    #device_map=device_map,
)

model

LlamaForSequenceClassification3(
  (score): Linear(in_features=4096, out_features=5, bias=False)
)

In [22]:
model.config

PretrainedConfig {
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4"
  },
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4
  },
  "problem_type": "multi_label_classification",
  "transformers_version": "4.49.0"
}

* get_peft_model prepares a model for training with a PEFT method such as LoRA by wrapping the base model and PEFT configuration with get_peft_model

### Load the tokenizer

#### Since LLAMA3 pre-training doesn't have EOS token
* Set the pad_token_id to eos_token_id
* Set pad token ot eos_token

In [23]:
tokenizer = AutoTokenizer.from_pretrained(model_name, add_prefix_space=True)

tokenizer.pad_token_id = tokenizer.eos_token_id
tokenizer.pad_token = tokenizer.eos_token

#### Update some model configs
* Must use .cache = False as below or it crashes from my experience

In [24]:
model.config.pad_token_id = tokenizer.pad_token_id
model.config.use_cache = False
model.config.pretraining_tp = 1

# Trainer Components
* model
* tokenizer
* training arguments
* train dataset
* eval dataset
* Data Collater
* Compute Metrics
* class_weights: In our case since we are using a custom trainer so we can use a weighted loss we will subclass trainer and define the custom loss.

#### Create LLAMA tokenized dataset which will house our train/val parts during the training process but after applying tokenization

In [25]:
MAX_LEN = 512
col_to_delete = ['id', 'anchor', 'context', 'target']

def llama_preprocessing_function(examples):
    return tokenizer(examples['input'], truncation=True, max_length=MAX_LEN)

tokenized_datasets = dataset.map(llama_preprocessing_function, batched=True, remove_columns=col_to_delete)
tokenized_datasets = tokenized_datasets.rename_column("score_category", "label")
tokenized_datasets.set_format("torch")

Map:   0%|          | 0/29178 [00:00<?, ? examples/s]

Map:   0%|          | 0/7295 [00:00<?, ? examples/s]

In [26]:
tokenized_datasets

DatasetDict({
    train: Dataset({
        features: ['input', 'label', 'vector', 'input_ids', 'attention_mask'],
        num_rows: 29178
    })
    val: Dataset({
        features: ['input', 'label', 'vector', 'input_ids', 'attention_mask'],
        num_rows: 7295
    })
})

## Data Collator
A **data collator** prepares batches of data for training or inference in machine learning, ensuring uniform formatting and adherence to model input requirements. This is especially crucial for variable-sized inputs like text sequences.

### Functions of Data Collator

1. **Padding:** Uniformly pads sequences to the length of the longest sequence using a special token, allowing simultaneous batch processing.
2. **Batching:** Groups individual data points into batches for efficient processing.
3. **Handling Special Tokens:** Adds necessary special tokens to sequences.
4. **Converting to Tensor:** Transforms data into tensors, the required format for machine learning frameworks.

### `DataCollatorWithPadding`

The `DataCollatorWithPadding` specifically manages padding, using a tokenizer to ensure that all sequences are padded to the same length for consistent model input.

- **Syntax:** `collate_fn = DataCollatorWithPadding(tokenizer=tokenizer)`
- **Purpose:** Automatically pads text data to the longest sequence in a batch, crucial for models like BERT or GPT.
- **Tokenizer:** Uses the provided `tokenizer` for sequence processing, respecting model-specific vocabulary and formatting rules.

This collator is commonly used with libraries like Hugging Face's Transformers, facilitating data preprocessing for various NLP models.


In [27]:
tokenized_datasets['val']

Dataset({
    features: ['input', 'label', 'vector', 'input_ids', 'attention_mask'],
    num_rows: 7295
})

In [28]:
#collate_fn = DataCollatorWithPadding(tokenizer=tokenizer)

In [29]:
from collections.abc import Mapping

def pad_tensor(vec, pad, dim, dtype=None):
    """
    args:
        vec - tensor to pad
        pad - the size to pad to
        dim - dimension to pad

    return:
        a new tensor padded to 'pad' in dimension 'dim'
    """
    pad_size = list(vec.shape)
    pad_size[dim] = pad - vec.size(dim)
    if dtype is None:
        return torch.cat([vec, torch.zeros(*pad_size)], dim=dim)
    else:
        return torch.cat([vec, torch.zeros(*pad_size, dtype=dtype)], dim=dim)
    
class PadCollate:
    """
    a variant of callate_fn that pads according to the longest sequence in
    a batch of sequences
    """

    def __init__(self, dim=0):
        """
        args:
            dim - the dimension to be padded (dimension of time in sequences)
        """
        self.dim = dim

    def pad_collate(self, features):
        """
        args:
            batch - list of (tensor, label)

        reutrn:
            xs - a tensor of all examples in 'batch' after padding
            ys - a LongTensor of all labels in batch
        """
        import torch

        if not isinstance(features[0], Mapping):
            features = [vars(f) for f in features]

        first = features[0]
        batch = {}
        
        max_len = max(map(lambda f: f['input_ids'].shape[self.dim], features))
        dim = self.dim

        # Special handling for labels.
        # Ensure that tensor is created with the correct type
        # (it should be automatically the case, but let's make sure of it.)
        '''
        '''
        if "label" in first and first["label"] is not None:
            label = first["label"].item() if isinstance(first["label"], torch.Tensor) else first["label"]
            dtype = torch.long if isinstance(label, int) else torch.float
            batch["labels"] = torch.tensor([f["label"] for f in features], dtype=dtype)
        elif "label_ids" in first and first["label_ids"] is not None:
            if isinstance(first["label_ids"], torch.Tensor):
                batch["labels"] = torch.stack([f["label_ids"] for f in features])
            else:
                dtype = torch.long if isinstance(first["label_ids"][0], int) else torch.float
                batch["labels"] = torch.tensor([f["label_ids"] for f in features], dtype=dtype)
                
        if "input_ids" in first and first["input_ids"] is not None:
            dtype = first['input_ids'].dtype
            #print([pad_tensor(f["input_ids"],max_len,dim) for f in features])
            #print(torch.Tensor([pad_tensor(f["input_ids"],max_len,dim).numpy() for f in features],dtype=dtype))
            #batch["input_ids"] = [np.pad(f["input_ids"].numpy(),(0,max_len-len(f['input_ids'])),'constant', constant_values=0) for f in features]
            batch["input_ids"] = torch.from_numpy(np.array([ \
               np.pad(f["input_ids"].numpy(),(0,max_len-len(f['input_ids'])),'constant', constant_values=model.config.pad_token_id) \
               for f in features],dtype='long'))
            #batch['input_ids'] = torch.tensor([pad_tensor(f['input_ids'], max_len, dim, dtype=torch.int64) for f in features])

        # Handling of all other possible keys.
        # Again, we will use the first element to figure out which key/values are not None for this model.
        for k, v in first.items():
            if k not in ("label", "label_ids", "input_ids") and v is not None and not isinstance(v, str):
                if isinstance(v, torch.Tensor):
                    batch[k] = torch.stack([pad_tensor(f[k], max_len, dim) for f in features])
                elif isinstance(v, np.ndarray):
                    batch[k] = torch.from_numpy(np.stack([pad_tensor(f[k], max_len, dim) for f in features]))
                else:
                    batch[k] = torch.tensor([pad_tensor(f[k], max_len, dim) for f in features])
            
        return batch

    def __call__(self, features):
        return self.pad_collate(features)

In [30]:
collate_fn = PadCollate()

# define which metrics to compute for evaluation
* We will use balanced accuracy and accuracy for simplicity

In [31]:
def compute_metrics(eval_pred):
    predictions, labels = eval_pred

    #print('len =', len(predictions))
    #print('predictions[0].shape =', predictions[0].shape)
    #print('predictions[0] =', predictions[0])
    #print('predictions[1].shape =', predictions[1].shape)
    #print('predictions[1] =', predictions[1])
    #print('labels =', labels)
    try:
        # it's a classification task, take the argmax
        predictions_processed = np.argmax(predictions, axis=1)

        # Calculate Pearson correlation
        pearson, _ = pearsonr(predictions_processed, labels)

        return {'pearson': pearson}
    except Exception as e:
        print(f"Error in compute_metrics: {e}")
        return {'pearson': None}

### Define custom trainer with classweights
* We will have a custom loss function that deals with the class weights and have class weights as additional argument in constructor

In [32]:
class CustomTrainer(Trainer):
    def __init__(self, *args, class_weights=None, **kwargs):
        super().__init__(*args, **kwargs)
        # Ensure label_weights is a tensor
        if class_weights is not None:
            self.class_weights = torch.tensor(class_weights, dtype=torch.float32).to(self.args.device)
        else:
            self.class_weights = None

    def compute_loss(self, model, inputs, return_outputs=False, **kwargs):
        # Extract labels and convert them to long type for cross_entropy
        labels = inputs.pop("labels").long()

        # Forward pass
        outputs = model(**inputs)

        # Extract logits assuming they are directly outputted by the model
        logits = outputs.get('logits')

        # Compute custom loss with class weights for imbalanced data handling
        if self.class_weights is not None:
            loss = F.cross_entropy(logits, labels, weight=self.class_weights)
        else:
            loss = F.cross_entropy(logits, labels)

        return (loss, outputs) if return_outputs else loss


# define training args

In [33]:
training_args = TrainingArguments(
    output_dir = 'sequence_classification',
    learning_rate = 1e-4,
    #lr_scheduler_type = 'cosine',
    #lr_scheduler_type = 'polynomial',
    #lr_scheduler_kwargs = { 'power': 2, 'lr_end': 1e-5 },
    per_device_train_batch_size = 8,
    per_device_eval_batch_size = 8,
    num_train_epochs = 2,
    weight_decay = 0.01,
    evaluation_strategy = 'epoch',
    save_strategy = 'epoch',
    load_best_model_at_end = True
)



#### Define custom trainer

In [34]:
trainer = CustomTrainer(
    model = model,
    args = training_args,
    train_dataset = tokenized_datasets['train'],
    eval_dataset = tokenized_datasets['val'],
    tokenizer = tokenizer,
    data_collator = collate_fn,
    compute_metrics = compute_metrics,
    class_weights=class_weights,
)

  super().__init__(*args, **kwargs)
  self.class_weights = torch.tensor(class_weights, dtype=torch.float32).to(self.args.device)


* https://huggingface.co/docs/transformers/en/training

### Run trainer!

In [35]:
train_result = trainer.train()

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Currently logged in as: [33msdfsdfrr[0m to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


  return F.linear(input, self.weight, self.bias)


Epoch,Training Loss,Validation Loss,Pearson
1,1.4466,1.157807,0.517958
2,1.1213,1.10727,0.554428


#### Let's check the results
* I wrapped in a function a convenient way add the predictions

In [36]:
from tqdm.auto import tqdm, trange
def make_predictions(model, df, dataset):
  from torch.utils.data import DataLoader

  # Convert summaries to a list
  sentences = df.input.tolist()

  # Define the batch size
  batch_size = 32  # You can adjust this based on your system's memory capacity
  dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=False, collate_fn=PadCollate())

  # Initialize an empty list to store the model outputs
  all_outputs = []

  # Process the sentences in batches
  for inputs in tqdm(dataloader):
      # Get the batch of sentences
      #batch_sentences = sentences[i:i + batch_size]

      # Tokenize the batch
      #inputs = tokenizer(batch_sentences, return_tensors="pt", padding=True, truncation=True, max_length=512)
      

      # Move tensors to the device where the model is (e.g., GPU or CPU)
      #inputs = {k: v.to('cuda' if torch.cuda.is_available() else 'cpu') for k, v in inputs.items()}
      for k in inputs.keys():
          inputs[k] = inputs[k].to('cuda')

      # Perform inference and store the logits
      with torch.no_grad():
          outputs = model(**inputs)
          all_outputs.append(outputs['logits'])

  final_outputs = torch.cat(all_outputs, dim=0)
  df['predictions']=final_outputs.argmax(axis=1).cpu().numpy()
  df['predictions']=df['predictions'].apply(lambda l:category_map[l])


### Analyze performance

In [39]:
def get_performance_metrics(df_test):
  y_test = df_test.score.round()
  y_pred = df_test.predictions.round()
  print(f"comparing test {y_test} and pred {y_pred}")

  print("Confusion Matrix:")
  print(confusion_matrix(y_test, y_pred))

  print("\nClassification Report:")
  print(classification_report(y_test, y_pred))

  print("Balanced Accuracy Score:", balanced_accuracy_score(y_test, y_pred))
  print("Accuracy Score:", accuracy_score(y_test, y_pred))

In [40]:
outputs = make_predictions(model,df_val, tokenized_datasets['val'])

get_performance_metrics(df_val)
df_val

  0%|          | 0/228 [00:00<?, ?it/s]

comparing test 33511    0.0
18670    0.0
18049    0.0
31660    1.0
15573    0.0
        ... 
5040     0.0
33907    1.0
9090     0.0
25999    0.0
22135    0.0
Name: score, Length: 7295, dtype: float64 and pred 33511    1.0
18670    1.0
18049    0.0
31660    0.0
15573    0.0
        ... 
5040     0.0
33907    1.0
9090     0.0
25999    0.0
22135    0.0
Name: predictions, Length: 7295, dtype: float64
Confusion Matrix:
[[5117 1127]
 [ 363  688]]

Classification Report:
              precision    recall  f1-score   support

         0.0       0.93      0.82      0.87      6244
         1.0       0.38      0.65      0.48      1051

    accuracy                           0.80      7295
   macro avg       0.66      0.74      0.68      7295
weighted avg       0.85      0.80      0.82      7295

Balanced Accuracy Score: 0.7370606895845511
Accuracy Score: 0.7957505140507197


Unnamed: 0,id,anchor,target,context,score,input,score_ascat,score_category,predictions
33511,ed1c4e525eb105fe,transmit alarm,display indicator,G08,0.00,TEXT1: G08; TEXT2: display indicator; ANC1: tr...,0.00,0,0.75
18670,5386316f318f5221,locking formation,retaining element,B60,0.25,TEXT1: B60; TEXT2: retaining element; ANC1: lo...,0.25,1,0.75
18049,1544ca6753fcbddd,lateral power,transducer,H01,0.25,TEXT1: H01; TEXT2: transducer; ANC1: lateral p...,0.25,1,0.25
31660,f9d8979b94cec923,spreader body,spreader,A01,0.75,TEXT1: A01; TEXT2: spreader; ANC1: spreader body,0.75,3,0.00
15573,e151ca5ea5cc0f08,high gradient magnetic separators,magnetic filtration,B03,0.50,TEXT1: B03; TEXT2: magnetic filtration; ANC1: ...,0.50,2,0.00
...,...,...,...,...,...,...,...,...,...
5040,f297c5e94dd07e6e,cervical support,gel pack,A47,0.25,TEXT1: A47; TEXT2: gel pack; ANC1: cervical su...,0.25,1,0.00
33907,06779da2bf614d00,trommel screen,trommel screen,B03,1.00,TEXT1: B03; TEXT2: trommel screen; ANC1: tromm...,1.00,4,1.00
9090,ed6245a94e7e5a77,different conductivity,conductive,H03,0.50,TEXT1: H03; TEXT2: conductive; ANC1: different...,0.50,2,0.00
25999,f93d71c0e9af4923,prolog,sliding window,H03,0.00,TEXT1: H03; TEXT2: sliding window; ANC1: prolog,0.00,0,0.00


### Saving the model trainer state and model adapters

In [41]:
metrics = train_result.metrics
max_train_samples = len(dataset_train)
metrics["train_samples"] = min(max_train_samples, len(dataset_train))
trainer.log_metrics("train", metrics)
trainer.save_metrics("train", metrics)
trainer.save_state()

***** train metrics *****
  epoch                    =        2.0
  total_flos               =        0GF
  train_loss               =     1.2303
  train_runtime            = 0:01:22.93
  train_samples            =      29178
  train_samples_per_second =    703.624
  train_steps_per_second   =     21.993


#### Saving the adapter model
* Note this doesn't save the entire model. It only saves the adapters.

In [63]:
trainer.save_model("saved_model")



[1;34mwandb[0m: 
[1;34mwandb[0m: 🚀 View run [33msequence_classification[0m at: [34mhttps://wandb.ai/sdfsdfrr/huggingface/runs/d0zscpin[0m
[1;34mwandb[0m: Find logs at: [1;35mwandb/run-20250317_171149-d0zscpin/logs[0m


### Save to google drive
Make sure before disconnecting and deleting the Colab runtime you save your model for future inference use

In [None]:
from google.colab import drive
drive.mount('/content/drive')


In [None]:
!cp -r sequence_classification /content/drive/MyDrive/Colab

In [None]:
!cp -r saved_model /content/drive/MyDrive/Colab