# Project: Apply Lightweight Fine-Tuning to a Foundation Model

In this project, have used the Emotions dataset [https://huggingface.co/datasets/dair-ai/emotion]

The dataset contains:
- Text column which has statements that need to be classified as one of the emotions
- Label column which has labels describing the emotion as sadness (0), joy (1), love (2), anger (3), fear (4), surprise (5)  

Initially the foundation model BERT sentiment classifier (DistilBERT) will be used, without finetuning the base model parameters and then its performance will be evaluated. Then, the model will be trained using PEFT approach to finetune the parameters and the performance will be evaluated. Finally, we will compare the performance of the model without parameter finetuning and with parameter efficient finetuning.

### Importing Libraries and Loading the dataset

In [1]:
# Importing necessary libraries
import numpy as np
import pandas as pd
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer, DataCollatorWithPadding
from datasets import load_dataset
from peft import LoraConfig, get_peft_model, AutoPeftModelForSequenceClassification

In [2]:
# Load the split configuration of the Emotion dataset
dataset = load_dataset("emotion", "split")

# View the dataset characteristics
print(dataset)

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 16000
    })
    validation: Dataset({
        features: ['text', 'label'],
        num_rows: 2000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 2000
    })
})


### Preprocessing the dataset

In [3]:
# Load a pre-trained tokenizer
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

# Tokenize the dataset
def tokenize_function(examples):
    return tokenizer(examples["text"], truncation=True)

tokenized_dataset = dataset.map(tokenize_function, batched=True)

# Inspect the tokenized dataset
print(tokenized_dataset["train"][0])



{'text': 'i didnt feel humiliated', 'label': 0, 'input_ids': [101, 1045, 2134, 2102, 2514, 26608, 102], 'attention_mask': [1, 1, 1, 1, 1, 1, 1]}


### Load and Setup the model without finetuning the base model parameters

In [4]:
# Load the pre-trained model for sequence classification
model_nofinetuning = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=6,
    id2label={0: "sadness", 1: "joy", 2: "love", 3: "anger", 4: "fear", 5: "surprise"},
    label2id={"sadness": 0, "joy": 1, "love": 2, "anger": 3, "fear": 4, "surprise": 5},
)

#Freeze all the parameters of base model
for param in model_nofinetuning.base_model.parameters():
    param.requires_grad = False

model_nofinetuning.classifier

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Linear(in_features=768, out_features=6, bias=True)

In [5]:
print(model_nofinetuning)

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
 

### Train the Model

In [6]:
#Accuracy Metric to be supplied to trainer
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}

In [7]:
# Create the Trainer
trainer_nofinetuning = Trainer(
    model=model_nofinetuning,
    args=TrainingArguments(
        output_dir="./data/emotions",
        learning_rate=2e-5,
        per_device_train_batch_size=64,
        per_device_eval_batch_size=64,
        evaluation_strategy="epoch",
        save_strategy = "epoch",
        num_train_epochs=3,
        weight_decay=0.01,
        load_best_model_at_end=True,
        logging_dir='.data/emotions/logs_no_finetuning',
        logging_steps=100,
        ),
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

# Train the model
trainer_nofinetuning.train()

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False)


Epoch,Training Loss,Validation Loss,Accuracy
1,1.5682,1.539479,0.4485
2,1.5159,1.507341,0.485
3,1.5013,1.497092,0.4875


TrainOutput(global_step=750, training_loss=1.539314208984375, metrics={'train_runtime': 53.9177, 'train_samples_per_second': 890.246, 'train_steps_per_second': 13.91, 'total_flos': 703485182771712.0, 'train_loss': 1.539314208984375, 'epoch': 3.0})

### Evaluate the Model without Finetuning and visualize its performance

In [8]:
trainer_nofinetuning.evaluate()

{'eval_loss': 1.4970916509628296,
 'eval_accuracy': 0.4875,
 'eval_runtime': 1.4719,
 'eval_samples_per_second': 1358.783,
 'eval_steps_per_second': 21.741,
 'epoch': 3.0}

In [9]:
# Make a dataframe with the predictions and the text and the labels

results_nofinetuning = trainer_nofinetuning.predict(tokenized_dataset["test"])

# Get the predicted labels
predicted_labels_nofinetuning = np.argmax(results_nofinetuning.predictions, axis=1)

# Get the text of the examples
texts = [item["text"] for item in tokenized_dataset["test"]]

# Create a DataFrame
df = pd.DataFrame({
    "Text": texts,
    "True Label": results_nofinetuning.label_ids,
    "Predicted Label": predicted_labels_nofinetuning
})

# Map label indices to actual emotions
label_map = {0: "sadness", 1: "joy", 2: "love", 3: "anger", 4: "fear", 5: "surprise"}
df["True Label"] = df["True Label"].map(label_map)
df["Predicted Label"] = df["Predicted Label"].map(label_map)

df.head(10)

Unnamed: 0,Text,True Label,Predicted Label
0,im feeling rather rotten so im not very ambiti...,sadness,joy
1,im updating my blog because i feel shitty,sadness,sadness
2,i never make her separate from me because i do...,sadness,sadness
3,i left with my bouquet of red and yellow tulip...,joy,joy
4,i was feeling a little vain when i did this one,sadness,joy
5,i cant walk into a shop anywhere where i do no...,fear,sadness
6,i felt anger when at the end of a telephone call,anger,sadness
7,i explain why i clung to a relationship with a...,joy,joy
8,i like to have the same breathless feeling as ...,joy,joy
9,i jest i feel grumpy tired and pre menstrual w...,anger,joy


### Prepare for Parameter Efficient Fine Tuning of the Model

In [10]:
# Load the pre-trained model for sequence classification
model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=6,
    id2label={0: "sadness", 1: "joy", 2: "love", 3: "anger", 4: "fear", 5: "surprise"},
    label2id={"sadness": 0, "joy": 1, "love": 2, "anger": 3, "fear": 4, "surprise": 5},
)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [11]:
# Apply LoRA for parameter-efficient fine-tuning
config = LoraConfig(
    r=4,               # Rank of the adaptation matrix
    lora_alpha=32,     # Alpha scaling factor
    target_modules=["q_lin", "k_lin", "v_lin", "out_lin"],  # Adjust target modules for DistilBERT
    lora_dropout=0.1   # Dropout rate
)
model = get_peft_model(model, config)
model.print_trainable_parameters()

trainable params: 147,456 || all params: 67,105,542 || trainable%: 0.2197


In [12]:
# Unfreeze the parameters in the target modules
for name, param in model.named_parameters():
    if any(target in name for target in config.target_modules):
        param.requires_grad = True

In [13]:
labels = tokenized_dataset["train"].features["label"].names

### Train the model with PEFT

In [None]:
# Create the Trainer for training (with PEFT)
trainer_with_peft = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir="./data/emotions_with_peft",
        learning_rate=2e-5,
        label_names=['labels'],
        per_device_train_batch_size=64,
        per_device_eval_batch_size=64,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        num_train_epochs=3,
        weight_decay=0.01,
        load_best_model_at_end=True,
        logging_dir='./data/emotions_with_peft/logs_with_peft',
        logging_steps=100,
        ),
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

# Train the model with PEFT
trainer_with_peft.train()

### Evaluate the PEFT model

In [15]:
trainer_with_peft.evaluate()

{'eval_loss': 0.5748061537742615,
 'eval_accuracy': 0.865,
 'eval_runtime': 1.7542,
 'eval_samples_per_second': 1140.112,
 'eval_steps_per_second': 18.242,
 'epoch': 3.0}

### Save the Trained PEFT Model

In [16]:
# Save the model
trainer_with_peft.save_model("./savedpeftmodel")

# Save the tokenizer
tokenizer.save_pretrained("./savedpeftmodel")



('./savedpeftmodel/tokenizer_config.json',
 './savedpeftmodel/special_tokens_map.json',
 './savedpeftmodel/vocab.txt',
 './savedpeftmodel/added_tokens.json',
 './savedpeftmodel/tokenizer.json')

### Load the saved PEFT model

In [17]:
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("./savedpeftmodel")

# Load the PEFT model
model_peft = AutoPeftModelForSequenceClassification.from_pretrained("./savedpeftmodel")

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


### Evaluate the saved and loaded PEFT Model

In [18]:
# Create the Trainer for evaluation
trainer_eval = Trainer(
    model=model_peft,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
)

# Evaluate the model on the validation set
results = trainer_eval.evaluate(tokenized_dataset["validation"])
print(results)

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False)


{'eval_loss': 0.44630372524261475, 'eval_accuracy': 0.275, 'eval_runtime': 13.5025, 'eval_samples_per_second': 148.12, 'eval_steps_per_second': 18.515}


### Conclusion

We can see that in this case, 
- Model with no finetuning has an accuracy of 48.75%
- Model with PEFT had an accuracy of 86.5%
- However, once this trained model was saved and reloaded, its accuracy drops drastially to 27.5% . Probably there's an issue with the Huggingface library AutoPeftModelForSequenceClassification in properly loading the saved model.