# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: Using the LORA technique with task type as SEQ_CLS to specify a classification model
* Model: Roberta-base for Sequence Classification
* Evaluation approach: Accuracy 
* Fine-tuning dataset: dair-ai/emotion dataset from huggingface containing six emotion classes which are: sadness, joy, love, anger, fear, surprise

In [1]:
# install required libraries
!pip install peft




[notice] A new release of pip is available: 23.3.1 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip


## Loading and Evaluating a Foundation Model


In [1]:
# import required libraries
import torch
from datasets import load_dataset

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# due to not enough computing power only loading subset of the train, validation and test dataset
emotion_train = load_dataset("dair-ai/emotion", split="train[:2000]")
emotion_validation = load_dataset("dair-ai/emotion", split="validation[:100]")
emotion_test = load_dataset("dair-ai/emotion", split="test[:100]")

In [3]:
# check train dataset
emotion_train

Dataset({
    features: ['text', 'label'],
    num_rows: 2000
})

In [4]:
# the foundation model for evaluation
model_name = "roberta-base"

In [5]:
# define the model and the tokenizer
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels = 6)

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.weight', 'classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [6]:
# define function to carry out tokenization of the training data
def preprocess_function(examples):
    return tokenizer(examples["text"], padding = "max_length", truncation = True, return_tensors="pt")

In [7]:
# get the model output on the test dataset
inputs = tokenizer(emotion_test["text"], padding = "max_length", truncation = True, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

predicted_class_id = logits.argmax(dim = 1)

In [8]:
# accuracy of foundation model on test dataset without fine-tuning
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(emotion_test['label'], predicted_class_id)
print(f"accuracy without any finetuning on the test dataset {accuracy * 100} %")

accuracy without any finetuning on the test dataset 17.0 %


## Performing Parameter-Efficient Fine-Tuning

In [9]:
# tokenize the train and validation dataset
tokenized_emotion_train = emotion_train.map(preprocess_function, batched = True)
tokenized_emotion_validation = emotion_validation.map(preprocess_function, batched = True)

Map: 100%|██████████| 2000/2000 [00:00<00:00, 3779.22 examples/s]
Map: 100%|██████████| 100/100 [00:00<00:00, 2429.33 examples/s]


In [10]:
# remove the text list as it is not required for training
tokenized_emotion_train = tokenized_emotion_train.remove_columns(["text"])
tokenized_emotion_validation = tokenized_emotion_validation.remove_columns(["text"])

In [11]:
# LORA accepts "labels" instead of "label" so rename accordingly
train_lora = tokenized_emotion_train.rename_column('label', 'labels')
valid_lora = tokenized_emotion_validation.rename_column('label', 'labels')

In [12]:
# import the accuracy function from evaluate library
import evaluate
import numpy as np

accuracy = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=labels)




In [13]:
# create a PEFT config with appropriate hyperparameters for your the model.
from peft import LoraConfig, get_peft_model, TaskType

config = LoraConfig(task_type = "SEQ_CLS", 
                    inference_mode = False, 
                    r = 8, 
                    lora_alpha = 16, 
                    lora_dropout = 0.01)

In [14]:
# using the PEFT config and foundation model, create a PEFT model.
lora_model = get_peft_model(model, config)

In [15]:
# check the trainable parameters
lora_model.print_trainable_parameters()

trainable params: 890,118 || all params: 125,540,364 || trainable%: 0.7090293286070128


In [16]:
# define the training arguments
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir="my_awesome_model_2_roberta",
    learning_rate=2e-5,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    num_train_epochs=3,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True
)

trainer = Trainer(
    model=lora_model,
    args=training_args,
    train_dataset=train_lora,
    eval_dataset=valid_lora,
    tokenizer=tokenizer
    # compute_metrics=compute_metrics,
)

In [17]:
# run the training loop
train_results = trainer.train()

  0%|          | 0/1500 [00:00<?, ?it/s]You're using a RobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
 33%|███▎      | 500/1500 [38:52<1:16:57,  4.62s/it]

{'loss': 1.6473, 'learning_rate': 1.3333333333333333e-05, 'epoch': 1.0}


                                                    
 33%|███▎      | 500/1500 [40:00<1:16:57,  4.62s/it]

{'eval_loss': 1.5461130142211914, 'eval_runtime': 68.3691, 'eval_samples_per_second': 1.463, 'eval_steps_per_second': 0.366, 'epoch': 1.0}


 67%|██████▋   | 1000/1500 [1:18:44<37:58,  4.56s/it]

{'loss': 1.5755, 'learning_rate': 6.666666666666667e-06, 'epoch': 2.0}


                                                     
 67%|██████▋   | 1000/1500 [1:19:53<37:58,  4.56s/it]

{'eval_loss': 1.4854538440704346, 'eval_runtime': 68.2852, 'eval_samples_per_second': 1.464, 'eval_steps_per_second': 0.366, 'epoch': 2.0}


100%|██████████| 1500/1500 [1:58:41<00:00,  4.63s/it]  

{'loss': 1.4835, 'learning_rate': 0.0, 'epoch': 3.0}


                                                     
100%|██████████| 1500/1500 [1:59:54<00:00,  4.63s/it]

{'eval_loss': 1.37740159034729, 'eval_runtime': 72.7088, 'eval_samples_per_second': 1.375, 'eval_steps_per_second': 0.344, 'epoch': 3.0}


100%|██████████| 1500/1500 [1:59:55<00:00,  4.80s/it]

{'train_runtime': 7195.5313, 'train_samples_per_second': 0.834, 'train_steps_per_second': 0.208, 'train_loss': 1.5687763671875, 'epoch': 3.0}





In [18]:
# save the trained model and the tokenizer
tokenizer.save_pretrained("roberta-base-lora-token")
lora_model.save_pretrained("roberta-base-lora")

## Performing Inference with a PEFT Model

In [20]:
# load the fine-tuned the PEFT model for inference and get the model predictions on test dataset
from peft import AutoPeftModelForSequenceClassification
from transformers import AutoTokenizer

inference_model = AutoPeftModelForSequenceClassification.from_pretrained("roberta-base-lora", num_labels = 6)
tokenizer = AutoTokenizer.from_pretrained("roberta-base-lora-token")

inputs = tokenizer(emotion_test["text"], padding = "max_length", truncation = True, return_tensors="pt")

with torch.no_grad():
    logits = inference_model(**inputs).logits

predicted_class_id_lora = logits.argmax(dim = 1)

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.weight', 'classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [21]:
# check the accuracy of the fine-tuned model on the test dataset
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(emotion_test['label'], predicted_class_id_lora)
print(f"accuracy with finetuning on the test dataset {accuracy * 100} %")

accuracy with finetuning on the test dataset 54.0 %


In [2]:
# the accuracy increased from 17% to 54% with just a sample of dataset and can be further improved by hyper-paramter tuning and
# increasing the training dataset