# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: **LoRA** - as suggested in the instructions
* Model: **GPT-2** - as suggested in the instructions
* Evaluation approach: **Huggingface Trainer: evaluate** - as suggested in the instructions
* Fine-tuning dataset: **dair-ai/emotion** - Emotion is a dataset of English Twitter messages with six basic emotions: anger, fear, joy, love, sadness, and surprise. 

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
!pip install scikit-learn

Defaulting to user installation because normal site-packages is not writeable
Collecting scikit-learn
  Downloading scikit_learn-1.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.5/13.5 MB[0m [31m65.3 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting joblib>=1.2.0
  Downloading joblib-1.4.2-py3-none-any.whl (301 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m301.8/301.8 kB[0m [31m36.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting threadpoolctl>=3.1.0
  Downloading threadpoolctl-3.5.0-py3-none-any.whl (18 kB)
Installing collected packages: threadpoolctl, joblib, scikit-learn
Successfully installed joblib-1.4.2 scikit-learn-1.6.0 threadpoolctl-3.5.0


In [24]:
from datasets import load_dataset
import random
import numpy as np
import pandas as pd
import gc
import os
from sklearn.metrics import accuracy_score, classification_report

import torch

from transformers import (AutoTokenizer, AutoModelForSequenceClassification,
                          DataCollatorWithPadding, Trainer, 
                          TrainingArguments, AutoModelForSequenceClassification
                         )
from peft import (get_peft_config, get_peft_model, LoraConfig, 
                  TaskType, PeftModel, PeftConfig, 
                  AutoPeftModelForSequenceClassification
                 )


In [3]:
splits = ["train", "test"]
dataset = {split: ds for split, ds in zip(splits, load_dataset("dair-ai/emotion", split=splits))}

dataset["train"] = dataset["train"].shuffle(seed=42).select(range(1500))
dataset["test"] = dataset["test"].shuffle(seed=42).select(range(100))
    
# View the dataset characteristics
print(dataset["train"], dataset["test"])

Downloading readme:   0%|          | 0.00/9.05k [00:00<?, ?B/s]

Downloading data: 100%|██████████| 1.03M/1.03M [00:00<00:00, 4.06MB/s]
Downloading data: 100%|██████████| 127k/127k [00:00<00:00, 1.20MB/s]
Downloading data: 100%|██████████| 129k/129k [00:00<00:00, 1.18MB/s]


Generating train split:   0%|          | 0/16000 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/2000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/2000 [00:00<?, ? examples/s]

Dataset({
    features: ['text', 'label'],
    num_rows: 1500
}) Dataset({
    features: ['text', 'label'],
    num_rows: 100
})


In [4]:
from collections import Counter

Counter(dataset["test"]["label"])

Counter({0: 30, 1: 38, 2: 7, 3: 13, 4: 11, 5: 1})

In [5]:
#Loading the gpt2 model from pre-trained
tokenizer = AutoTokenizer.from_pretrained("gpt2")

# Set the padding token to be the same as the end-of-sequence token
tokenizer.pad_token = tokenizer.eos_token

def tokenize_dataset(dataset, splits, tokenizer):
    """Tokenizes the dataset for the specified splits.

    Args:
        dataset: The dataset containing the splits.
        splits: A list of split names (e.g., ['train', 'test']).
        tokenizer: The tokenizer to use for tokenization.

    Returns:
        A dictionary containing tokenized datasets for each split.
    """
    tokenized_ds = {}
    for split in splits:
        tokenized_ds[split] = dataset[split].map(
            lambda x: tokenizer(x['text'], padding='max_length', truncation=True, return_tensors='pt'),
            batched=True
        )
    return tokenized_ds

tokenized_dataset = tokenize_dataset(dataset, splits, tokenizer)

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Map:   0%|          | 0/1500 [00:00<?, ? examples/s]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

In [7]:
num_labels = 6
id2label = {
            0: "sadness", 
            1: "joy",
            2: "love",
            3: "anger",
            4: "fear",
            5: "surprise"
             }

label2id = {v: k for k, v in id2label.items()}

model = AutoModelForSequenceClassification.from_pretrained(
    "gpt2",
    num_labels=num_labels,
    id2label=id2label, 
    label2id=label2id,
)

#freeze
for param in model.base_model.parameters():
    param.requires_grad = False
    
# set the pad token of the model's configuration
model.config.pad_token_id = model.config.eos_token_id

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [8]:
model

GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=6, bias=False)
)

In [9]:
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    
    accuracy = accuracy_score(labels, predictions)
    
    return {"accuracy": accuracy}


trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir="./data/emotion_analysis",
        learning_rate=2e-3,
        # Reduce the batch size if you don't have enough memory
        per_device_train_batch_size=2,
        per_device_eval_batch_size=2,
        num_train_epochs=3,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

trainer.train()

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Accuracy
1,2.2105,1.582065,0.56
2,1.5103,1.226986,0.56
3,1.2182,1.237575,0.6


Checkpoint destination directory ./data/emotion_analysis/checkpoint-750 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory ./data/emotion_analysis/checkpoint-1500 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory ./data/emotion_analysis/checkpoint-2250 already exists and is non-empty.Saving will proceed but saved results may be invalid.


TrainOutput(global_step=2250, training_loss=1.610480740017361, metrics={'train_runtime': 464.2065, 'train_samples_per_second': 9.694, 'train_steps_per_second': 4.847, 'total_flos': 2351755689984000.0, 'train_loss': 1.610480740017361, 'epoch': 3.0})

### We then just evaluate the results

In [10]:
trainer.evaluate()

{'eval_loss': 1.226986289024353,
 'eval_accuracy': 0.56,
 'eval_runtime': 8.7776,
 'eval_samples_per_second': 11.393,
 'eval_steps_per_second': 5.696,
 'epoch': 3.0}

In [11]:

df = pd.DataFrame(tokenized_dataset["test"])
df = df[["text", "label"]]

# Replace <br /> tags in the text with spaces
df["text"] = df["text"].str.replace("<br />", " ")

# Add the model predictions to the dataframe
predictions = trainer.predict(tokenized_dataset["test"])
df["predicted_label"] = np.argmax(predictions[0], axis=1)

df.head(2)

Unnamed: 0,text,label,predicted_label
0,i was feeling really troubled and down over wh...,0,4
1,i feel so thrilled to have three such distingu...,1,1


In [14]:
df["predicted_label"].value_counts()

predicted_label
1    50
0    21
3    15
4    11
2     3
Name: count, dtype: int64

In [15]:
df[df["label"] != df["predicted_label"]].head(5)

Unnamed: 0,text,label,predicted_label
0,i was feeling really troubled and down over wh...,0,4
5,im feeling and if ive liked being pregnant,2,1
8,i don t have the feeling of divine vibrations,1,4
9,i vented my feelings towards the pathetic excu...,0,3
11,i get the feeling that this could be dangerous,3,4


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

### PEFT Model - creating setup

In [17]:
# Access the encoder layers (if applicable)
for name, layer in model.named_modules():
    print(name, layer)

 GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): ModulesToSaveWrapper(
    (original_module): Linear(
      in_features=768, out_features=6, bias=False
      (lora_dropout): ModuleDict(
        (default): D

In [18]:

#Need to setup the configuration using LoraCongif
peft_config = LoraConfig(
    task_type=TaskType.SEQ_CLS, 
    inference_mode=False, 
    r=8, 
    target_modules=["c_attn"],
    lora_alpha=32, 
    lora_dropout=0.03
)

model = AutoModelForSequenceClassification.from_pretrained(
    "gpt2",    
    num_labels=num_labels,
    id2label=id2label, 
    label2id=label2id,
)

# creating a PEFT model
peft_model = get_peft_model(model, peft_config)
peft_model.print_trainable_parameters()

peft_model.config.pad_token_id = peft_model.config.eos_token_id

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 304,128 || all params: 124,743,936 || trainable%: 0.2438018309763771


### We apply similar logic as above. Functions were already created for accuracy

In [20]:
gc.collect()

os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'max_split_size_mb:126'  # Adjust the size as needed
#torch.cuda.empty_cache()


trainer = Trainer(
    model=peft_model,
    args=TrainingArguments(
        output_dir="./data/emotion_analysis_perf",
        learning_rate=2e-5,
        per_device_train_batch_size=2,
        per_device_eval_batch_size=2,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        num_train_epochs=1,
        weight_decay=0.02,
        load_best_model_at_end=True,
        fp16=True
    ),
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,5.6413,2.450142,0.25


TrainOutput(global_step=750, training_loss=4.518879069010417, metrics={'train_runtime': 218.6191, 'train_samples_per_second': 6.861, 'train_steps_per_second': 3.431, 'total_flos': 786678939648000.0, 'train_loss': 4.518879069010417, 'epoch': 1.0})

### Evaluation of PEFT model

In [21]:
trainer.evaluate()

{'eval_loss': 2.450141668319702,
 'eval_accuracy': 0.25,
 'eval_runtime': 5.4425,
 'eval_samples_per_second': 18.374,
 'eval_steps_per_second': 9.187,
 'epoch': 1.0}

In [22]:
items_for_manual_review = tokenized_dataset["test"].select(
    [34, 57, 99, 25, 44, 89]
)

results = trainer.predict(items_for_manual_review)
df = pd.DataFrame(
    {
        "sentiment": [item["label"] for item in items_for_manual_review],
        "predictions": results.predictions.argmax(axis=1),
        "labels": results.label_ids,
    }
)
# Show all the cell
pd.set_option("display.max_colwidth", None)
df

Unnamed: 0,sentiment,predictions,labels
0,3,0,3
1,4,0,4
2,0,0,0
3,3,0,3
4,1,3,1
5,0,0,0


### Finally saving the PEFT model

In [23]:
#Saving the trained model
peft_model.save_pretrained("./peft_model")

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

### Load model

In [29]:
peft_saved = "./peft_model"
config = PeftConfig.from_pretrained(peft_model_id)

In [30]:
model_peft_inf = AutoPeftModelForSequenceClassification.from_pretrained(
        peft_saved,
        num_labels=num_labels,
        id2label=id2label, 
        label2id=label2id)

model_peft_inf

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): GPT2ForSequenceClassification(
      (transformer): GPT2Model(
        (wte): Embedding(50257, 768)
        (wpe): Embedding(1024, 768)
        (drop): Dropout(p=0.1, inplace=False)
        (h): ModuleList(
          (0-11): 12 x GPT2Block(
            (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (attn): GPT2Attention(
              (c_attn): Linear(
                in_features=768, out_features=2304, bias=True
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.03, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=768, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=2304, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding

In [33]:
print(config)

LoraConfig(peft_type='LORA', auto_mapping=None, base_model_name_or_path='gpt2', revision=None, task_type='SEQ_CLS', inference_mode=True, r=8, target_modules=['c_attn'], lora_alpha=32, lora_dropout=0.03, fan_in_fan_out=True, bias='none', modules_to_save=None, init_lora_weights=True, layers_to_transform=None, layers_pattern=None)


In [38]:
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

inference_dataset = tokenized_dataset["test"].select(
    [random.randint(0, 100) for _ in range(10)]
)

In [40]:
print(inference_dataset)

Dataset({
    features: ['text', 'label', 'input_ids', 'attention_mask'],
    num_rows: 10
})


In [53]:
for x in inference_dataset:
    print(x["text"])

i just notice what i am doing that is ruining my happy moment because this feelingof discontent is my resistance to receiving love in the genuine way its being delivered
i am off on wednesday to a postgraduate open day but there will be plenty to write about the rest of the week i feel sure
i feel very glad that finland s well known visual artist vesa kivinen had called me to work with him
i get the feeling that this could be dangerous
i feel special excitement and happiness
is hand started fondling his aching cock through the fabric of his boxers and he instinctively arched his back to feel more of the delicious sensation
i sit here writing this i feel unhappy inside
i tune out the rest of the world and focus on the rhythm of the needles and the softness of the yarn and for that time i feel my most peaceful
i get the feeling that this could be dangerous
i don t care if any of you read this but this is just what i feel when i m around you guys i feel hated


In [71]:
def inference(inference_dataset, model_peft_inf):
    
    #Go through each sample of inference
    tokenized_inference_dataset = inference_dataset.map(lambda x: tokenizer(x['text'], padding='max_length', truncation=True, return_tensors='pt'))
    tokenizer.pad_token = tokenizer.eos_token
   
    predicted_labels = []
    predicted_class_indexes = []
    for x in inference_dataset:
 
        # set to evaluation mode
        model_peft_inf.eval()
        
        # Perform inference
        with torch.no_grad():  # Disable gradient calculation
            outputs = model_peft_inf(**inputs)

        # Get the logits (raw predictions)
        logits = outputs.logits
        
        # Apply softmax to get probabilities
        probabilities = torch.nn.functional.softmax(logits, dim=-1)

        # Get the predicted class (index of the maximum probability)
        predicted_class_index = torch.argmax(probabilities, dim=-1).item()

        # Map the predicted class index to the corresponding label
        predicted_label = id2label[predicted_class_index]
        
        #Append
        predicted_labels.append(predicted_label)
        predicted_class_indexes.append(predicted_class_index)
    
    df_inf = pd.DataFrame(
    {
        "text": [x["text"] for x in inference_dataset],
        "predicted_label": predicted_labels,
        "predicted_label_num": predicted_class_indexes,
        "actual_labels": [x["label"] for x in inference_dataset],
    }
)
# Show all the cell

    return df_inf

In [72]:
df_inf = inference(inference_dataset=inference_dataset, 
          model_peft_inf=model_peft_inf)

In [73]:
df_inf.head()

Unnamed: 0,text,predicted_label,predicted_label_num,actual_labels
0,i just notice what i am doing that is ruining my happy moment because this feelingof discontent is my resistance to receiving love in the genuine way its being delivered,anger,3,0
1,i am off on wednesday to a postgraduate open day but there will be plenty to write about the rest of the week i feel sure,anger,3,1
2,i feel very glad that finland s well known visual artist vesa kivinen had called me to work with him,anger,3,1
3,i get the feeling that this could be dangerous,anger,3,3
4,i feel special excitement and happiness,anger,3,1


### Evaluation

In [79]:
true_labels = list(df_inf['actual_labels'].values)
predicted_labels = list(df_inf['predicted_label_num'].values)

print(classification_report(true_labels, predicted_labels))

              precision    recall  f1-score   support

           0       0.00      0.00      0.00         3
           1       0.00      0.00      0.00         5
           3       0.20      1.00      0.33         2

    accuracy                           0.20        10
   macro avg       0.07      0.33      0.11        10
weighted avg       0.04      0.20      0.07        10



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
