# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: Low-Rank Adaptation (LoRA)
* Model: bhadresh-savani/distilbert-bert-base-uncased-emotion
* Evaluation approach: trainer.train() & trainer.evaluate()
* Fine-tuning dataset: Dair-ai/emotion dataset

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [24]:
! pip install datasets --upgrade
! pip install --upgrade pyarrow
! pip install accelerate -U
! pip install transformers --upgrade
! pip install --upgrade peft



In [2]:
from datasets import load_dataset
dataset = load_dataset("dair-ai/emotion", trust_remote_code=True)

splits = ["train", "test"]
print(dataset["train"])
print(dataset["train"][:10])
dataset

Dataset({
    features: ['text', 'label'],
    num_rows: 16000
})
{'text': ['i didnt feel humiliated', 'i can go from feeling so hopeless to so damned hopeful just from being around someone who cares and is awake', 'im grabbing a minute to post i feel greedy wrong', 'i am ever feeling nostalgic about the fireplace i will know that it is still on the property', 'i am feeling grouchy', 'ive been feeling a little burdened lately wasnt sure why that was', 'ive been taking or milligrams or times recommended amount and ive fallen asleep a lot faster but i also feel like so funny', 'i feel as confused about life as a teenager or as jaded as a year old man', 'i have been with petronas for years i feel that petronas has performed well and made a huge profit', 'i feel romantic too'], 'label': [0, 0, 3, 2, 3, 0, 5, 4, 1, 2]}


DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 16000
    })
    validation: Dataset({
        features: ['text', 'label'],
        num_rows: 2000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 2000
    })
})

In [3]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(
    "bhadresh-savani/distilbert-base-uncased-emotion",
    num_labels=6,
    id2label={0: "sadness", 1: "joy", 2: "love", 3: "anger", 4: "fear", 5: "surprise"},
    label2id={"sadness":0, "joy":1, "love":2, "anger":3, "fear":4, "surprise":5},
)

for param in model.base_model.parameters():
    param.requires_grad = True

model.save_pretrained("initial-model")
print(model.classifier)
print(model)

Linear(in_features=768, out_features=6, bias=True)
DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
   

In [4]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bhadresh-savani/distilbert-base-uncased-emotion")

tokenized_dataset = {}
for split in splits:
    tokenized_dataset[split] = dataset[split].map(
        lambda x: tokenizer(x["text"], truncation=True), batched=True
    )
print(tokenized_dataset["train"])
print(tokenized_dataset["test"])
tokenized_dataset["train"][0]["input_ids"][:5]

Dataset({
    features: ['text', 'label', 'input_ids', 'attention_mask'],
    num_rows: 16000
})
Dataset({
    features: ['text', 'label', 'input_ids', 'attention_mask'],
    num_rows: 2000
})


[101, 1045, 2134, 2102, 2514]

In [5]:
tokenized_dataset["train"][:3]

{'text': ['i didnt feel humiliated',
  'i can go from feeling so hopeless to so damned hopeful just from being around someone who cares and is awake',
  'im grabbing a minute to post i feel greedy wrong'],
 'label': [0, 0, 3],
 'input_ids': [[101, 1045, 2134, 2102, 2514, 26608, 102],
  [101,
   1045,
   2064,
   2175,
   2013,
   3110,
   2061,
   20625,
   2000,
   2061,
   9636,
   17772,
   2074,
   2013,
   2108,
   2105,
   2619,
   2040,
   14977,
   1998,
   2003,
   8300,
   102],
  [101, 10047, 9775, 1037, 3371, 2000, 2695, 1045, 2514, 20505, 3308, 102]],
 'attention_mask': [[1, 1, 1, 1, 1, 1, 1],
  [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
  [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]}

In [6]:
import numpy as np
import torch
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predicted_classes = np.argmax(predictions, axis=1)
    true_classes = labels
    accuracy = (predicted_classes == true_classes).mean()
    
    # Print class distributions
    print("Predicted Class Distribution:", np.bincount(predicted_classes))
    print("True Class Distribution:", np.bincount(true_classes))
    
    return {"accuracy": accuracy}

trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir="./project1/data/sentiment_classifier",
        learning_rate=2e-5,
        use_cpu=False,
        per_device_train_batch_size=64,
        per_device_eval_batch_size=64,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        num_train_epochs=1,
        weight_decay=0.001,
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

trainer.train()
eval_results_standard=trainer.evaluate()
print(eval_results_standard)

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.192258,0.9255


Predicted Class Distribution: [584 694 161 272 224  65]
True Class Distribution: [581 695 159 275 224  66]


Checkpoint destination directory ./project1/data/sentiment_classifier/checkpoint-250 already exists and is non-empty. Saving will proceed but saved results may be invalid.


Predicted Class Distribution: [584 694 161 272 224  65]
True Class Distribution: [581 695 159 275 224  66]
{'eval_loss': 0.1922580748796463, 'eval_accuracy': 0.9255, 'eval_runtime': 6.8444, 'eval_samples_per_second': 292.21, 'eval_steps_per_second': 4.675, 'epoch': 1.0}


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [7]:
from peft import LoraConfig, TaskType
loraConfig=LoraConfig(
    task_type=TaskType.SEQ_CLS,
    r=8,
    lora_alpha=32,
    lora_dropout=0.2,
    bias="none",
    target_modules=['q_lin','k_lin', 'v_lin', 'lin1', 'lin2', 'out_lin' 'pre_classifier']
)

initial_model=model.from_pretrained("initial-model")

In [8]:
from peft import get_peft_model #, merge_and_unload

lora_model = get_peft_model(initial_model,loraConfig)
lora_model.print_trainable_parameters()
#Reduce the memory footprint by merging adapter weights with the model
lora_model = lora_model.merge_and_unload(initial_model,loraConfig)
loraConfig.target_modules

trainable params: 1,185,030 || all params: 68,143,116 || trainable%: 1.7390311297182242


Unloading and merging model: 100%|██████████| 132/132 [00:00<00:00, 359.13it/s]


{'k_lin', 'lin1', 'lin2', 'out_linpre_classifier', 'q_lin', 'v_lin'}

In [9]:
trainer_forPEFT = Trainer(
    model=lora_model,
    args=TrainingArguments(
        output_dir="./project1/data/sentiment_classifier_wpeft",
        learning_rate=2e-5,
        use_cpu=False,
        per_device_train_batch_size=64,
        per_device_eval_batch_size=64,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        num_train_epochs=1,
        weight_decay=0.001,
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_dataset["train"].rename_column("label","labels"),
    eval_dataset=tokenized_dataset["test"].rename_column("label","labels"),
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

trainer_forPEFT.train()

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.179532,0.9265


Predicted Class Distribution: [581 691 167 272 231  58]
True Class Distribution: [581 695 159 275 224  66]


TrainOutput(global_step=250, training_loss=0.050500446319580075, metrics={'train_runtime': 66.0881, 'train_samples_per_second': 242.101, 'train_steps_per_second': 3.783, 'total_flos': 234152829066240.0, 'train_loss': 0.050500446319580075, 'epoch': 1.0})

In [10]:
eval_results_wLoRA=trainer_forPEFT.evaluate()
print(eval_results_wLoRA)

Predicted Class Distribution: [581 691 167 272 231  58]
True Class Distribution: [581 695 159 275 224  66]
{'eval_loss': 0.17953208088874817, 'eval_accuracy': 0.9265, 'eval_runtime': 6.7367, 'eval_samples_per_second': 296.879, 'eval_steps_per_second': 4.75, 'epoch': 1.0}


In [11]:
lora_model.add_adapter(loraConfig, adapter_name="adapter_1")
lora_model.enable_adapters()
trainer_forPEFT.save_model("lora-model")



## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [13]:
from peft import AutoPeftModelForSequenceClassification
from peft import PeftModel, PeftConfig

lora_model_ft = AutoPeftModelForSequenceClassification.from_pretrained("lora-model")
#config = PeftConfig.from_pretrained("lora-model")

tokenized_dataset_validation = {}

tokenized_dataset_validation["validation"] = dataset["validation"].map(
    lambda x: tokenizer(x["text"], truncation=True), batched=True
    )

print(tokenized_dataset_validation)



{'validation': Dataset({
    features: ['text', 'label', 'input_ids', 'attention_mask'],
    num_rows: 2000
})}


In [13]:
trainer_forPEFT_ft = Trainer(
    model=lora_model_ft,
    args=TrainingArguments(
        output_dir="./project1/data1/sentiment_classifier_wpeft_ft",
        #I have tried learning_rate between 1e-2 and 2e-4
        learning_rate=2e-5,
        use_cpu=False,
        #I have tried per device size between 8 and 128. Yet, beyond 64 my device runs out of memory.
        per_device_train_batch_size=64,
        per_device_eval_batch_size=64,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        #I have tried to run 1-5 epochs
        num_train_epochs=1,
        #I have tried weight_decay between 0.1 and 0.001
        weight_decay=0.001,
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_dataset["train"].rename_column("label","labels"),
    eval_dataset=tokenized_dataset["test"].rename_column("label","labels"),
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

trainer_forPEFT_ft.train()

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.179183,0.926


Predicted Class Distribution: [580 692 166 273 230  59]
True Class Distribution: [581 695 159 275 224  66]


TrainOutput(global_step=250, training_loss=0.05044174575805664, metrics={'train_runtime': 101.0742, 'train_samples_per_second': 158.3, 'train_steps_per_second': 2.473, 'total_flos': 240587257559040.0, 'train_loss': 0.05044174575805664, 'epoch': 1.0})

In [14]:
import pandas as pd

predictions_wPEFT = trainer_forPEFT.predict(tokenized_dataset_validation["validation"])
predictions_wPEFT_ft = trainer_forPEFT_ft.predict(tokenized_dataset_validation["validation"])

Predicted Class Distribution: [557 702 181 266 215  79]
True Class Distribution: [550 704 178 275 212  81]


Predicted Class Distribution: [558 701 181 266 213  81]
True Class Distribution: [550 704 178 275 212  81]


In [15]:
print(predictions_wPEFT)
print(predictions_wPEFT_ft)

PredictionOutput(predictions=array([[ 7.3266683, -1.3848588, -1.6489316, -1.6579152, -2.133351 ,
        -2.2893136],
       [ 7.3092737, -1.6362753, -2.1889608, -1.5475152, -1.6482623,
        -2.2317388],
       [-2.4774506,  4.2734942,  4.0861044, -2.8272586, -3.1093655,
        -2.6277878],
       ...,
       [-1.99391  ,  7.0916996, -1.5699017, -2.1599686, -2.144558 ,
        -2.0108826],
       [-2.6794627,  5.182468 ,  3.3520355, -2.794747 , -3.0530515,
        -2.843311 ],
       [-2.0570905,  7.070473 , -1.5376037, -2.5843813, -2.4198775,
        -1.3191832]], dtype=float32), label_ids=array([0, 0, 2, ..., 1, 1, 1]), metrics={'test_loss': 0.1562727987766266, 'test_accuracy': 0.9335, 'test_runtime': 8.2835, 'test_samples_per_second': 241.444, 'test_steps_per_second': 3.863})
PredictionOutput(predictions=array([[ 7.3414755, -1.5093819, -1.6869037, -1.5502182, -2.162981 ,
        -2.2826679],
       [ 7.326073 , -1.7544694, -2.225881 , -1.440836 , -1.6786766,
        -2.2328193],

In [16]:
print("Accuracy with Standart Pretrained Model : ", predictions_wPEFT.metrics["test_accuracy"], " + Test Runtime : ", predictions_wPEFT.metrics["test_runtime"])

Accuracy with Standart Pretrained Model :  0.9335  + Test Runtime :  8.2835


In [17]:
print("Accuracy with Fine-Tuned Model : ", predictions_wPEFT_ft.metrics["test_accuracy"], " + Test Runtime : ", predictions_wPEFT_ft.metrics["test_runtime"])

Accuracy with Fine-Tuned Model :  0.9345  + Test Runtime :  8.0846


In [18]:
import pandas as pd

df = pd.DataFrame(
    {
        "text": [item["text"] for item in tokenized_dataset_validation["validation"]],
        "predictions wPEFT": predictions_wPEFT.predictions.argmax(axis=1),
        "predictions wPEFT ft": predictions_wPEFT_ft.predictions.argmax(axis=1),
        "labels": predictions_wPEFT.label_ids,
    }
)

# Filter the DataFrame based on the condition
filtered_df = df[df["predictions wPEFT"] != df["predictions wPEFT ft"]]

# Show all the cells
pd.set_option("display.max_colwidth", None)
filtered_df[:20]

Unnamed: 0,text,predictions wPEFT,predictions wPEFT ft,labels
258,talking to a very good friend who had just had a very bad experience which was changing his whole way of looking at life etc,1,3,0
732,i feel less weird about soliciting guys for them because well i am a guy i guess and i dont feel bad about exploiting them maybe,4,5,5
1069,i will choose not to focus on him instead focusing on how i feel i will try not to focus on him and instead of being agitated by him i will choose to let the negative feeling go,3,4,4
1713,i just feel gassed and low energy,4,0,0
1940,i was feeling pretty strange like dinosaur soldier after i read them because in a weird sort of adult or perhaps college aged way my brain was analyzing the books,4,5,4


In [19]:
count_success = df[(df["predictions wPEFT"] != df["predictions wPEFT ft"]) & (df["predictions wPEFT"] == df["labels"])]["predictions wPEFT"].value_counts()
count_success_PEFT = df[(df["predictions wPEFT"] != df["predictions wPEFT ft"]) & (df["predictions wPEFT ft"] == df["labels"])]["predictions wPEFT ft"].value_counts()

In [20]:
print(count_success)
print(count_success_PEFT)

predictions wPEFT
4    1
Name: count, dtype: int64
predictions wPEFT ft
5    1
4    1
0    1
Name: count, dtype: int64
