# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: using LoRa Low Rank Adaptiation of LLM
* Model: openai-community/gpt2
* Evaluation approach: using transformer
* Fine-tuning dataset: Twitter dataset

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.


I will be using the cardiffnlp/twitter-roberta-base-sentiment-latest a RoBERta model trained on over 120 million tweets and fine-tune for sentiment tasks.

In [3]:
!pip install scikit-learn
!pip install evaluate


Defaulting to user installation because normal site-packages is not writeable
Collecting scikit-learn
  Downloading scikit_learn-1.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.5/13.5 MB[0m [31m68.5 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting joblib>=1.2.0
  Downloading joblib-1.4.2-py3-none-any.whl (301 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m301.8/301.8 kB[0m [31m34.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting threadpoolctl>=3.1.0
  Downloading threadpoolctl-3.6.0-py3-none-any.whl (18 kB)
Installing collected packages: threadpoolctl, joblib, scikit-learn
Successfully installed joblib-1.4.2 scikit-learn-1.6.1 threadpoolctl-3.6.0
Defaulting to user installation because normal site-packages is not writeable
Collecting evaluate
  Downloading evaluate-0.4.3-py3-none-any.whl (84 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.

In [4]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer, Trainer, TrainingArguments
from datasets import load_dataset
import evaluate
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from datasets import Dataset

In [5]:
model_name = "openai-community/gpt2"  
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token  # 

base_model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=3,  
    pad_token_id=tokenizer.pad_token_id
)
dataset = load_dataset("tweet_eval", "sentiment")

def preprocess_function(examples):
      return tokenizer(examples['text'], padding="max_length", truncation=True, max_length=512)


tokenized_dataset = dataset.map(preprocess_function, batched=True)


# Extract the number of classess and their names
num_labels = dataset['train'].features['label'].num_classes
class_names = dataset["train"].features["label"].names
print(f"number of labels: {num_labels}")
print(f"the labels: {class_names}")

# df = pd.DataFrame(dataset['train'])
# df.head(10)

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at openai-community/gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Downloading readme:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading data: 100%|██████████| 3.78M/3.78M [00:00<00:00, 26.9MB/s]
Downloading data: 100%|██████████| 901k/901k [00:00<00:00, 6.91MB/s]
Downloading data: 100%|██████████| 167k/167k [00:00<00:00, 902kB/s] 


Generating train split:   0%|          | 0/45615 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/12284 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/2000 [00:00<?, ? examples/s]

Map:   0%|          | 0/45615 [00:00<?, ? examples/s]

Map:   0%|          | 0/12284 [00:00<?, ? examples/s]

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

number of labels: 3
the labels: ['negative', 'neutral', 'positive']


In [6]:
id2label = {i: label for i, label in enumerate(class_names)}

In [7]:
accuracy = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return accuracy.compute(predictions=predictions, references=labels)

training_args = TrainingArguments(
    output_dir="./results-base",
    per_device_eval_batch_size=8,
    do_train=False,
    do_eval=True,
)

trainer = Trainer(
    model=base_model,
    args=training_args,
    eval_dataset=tokenized_dataset["test"],  
    compute_metrics=compute_metrics,
)


Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]

In [8]:
base_model_metrics = trainer.evaluate()
print("Base Model Metrics:", base_model_metrics)

Base Model Metrics: {'eval_loss': 4.517601013183594, 'eval_accuracy': 0.19545750569846956, 'eval_runtime': 424.0337, 'eval_samples_per_second': 28.969, 'eval_steps_per_second': 3.622}


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [9]:
from peft import LoraConfig, get_peft_model, TaskType, PeftModelForSequenceClassification

peft_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,      
    inference_mode=False,            
    r=4,                              
    lora_alpha=16,                    
    lora_dropout=0.1,
)

model = PeftModelForSequenceClassification(base_model, peft_config)
model.print_trainable_parameters()



trainable params: 152,064 || all params: 124,591,872 || trainable%: 0.12204969518396834


In [10]:
training_args = TrainingArguments(
    output_dir="./results-peft",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    evaluation_strategy="epoch",
    num_train_epochs=3,
    save_strategy="epoch",
    logging_dir="./logs",
    learning_rate=2e-4,
    save_total_limit=1,
    load_best_model_at_end=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"].shuffle(seed=42).select(range(2000)), 
    eval_dataset=tokenized_dataset["test"].select(range(500)),
    compute_metrics=compute_metrics,
)

trainer.train()


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.124599,0.422
2,1.034500,0.978649,0.51
3,1.034500,0.898095,0.554


TrainOutput(global_step=750, training_loss=0.9541781819661458, metrics={'train_runtime': 552.0189, 'train_samples_per_second': 10.869, 'train_steps_per_second': 1.359, 'total_flos': 1570555035648000.0, 'train_loss': 0.9541781819661458, 'epoch': 3.0})

In [11]:
# model.save_pretrained("peftmodel-gpt2-sentiment")


model.save_pretrained("peftmodel-gpt2-sentiment")
tokenizer.save_pretrained("peftmodel-gpt2-sentiment")


('peftmodel-gpt2-sentiment/tokenizer_config.json',
 'peftmodel-gpt2-sentiment/special_tokens_map.json',
 'peftmodel-gpt2-sentiment/vocab.json',
 'peftmodel-gpt2-sentiment/merges.txt',
 'peftmodel-gpt2-sentiment/added_tokens.json',
 'peftmodel-gpt2-sentiment/tokenizer.json')

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [18]:
from peft import AutoPeftModelForSequenceClassification



inference_model = AutoPeftModelForSequenceClassification.from_pretrained("peftmodel-gpt2-sentiment", id2label=id2label)
inference_model.config.pad_token_id = inference_model.config.eos_token_id
tokenizer = AutoTokenizer.from_pretrained("peftmodel-gpt2-sentiment")


Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at openai-community/gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [19]:
trainer = Trainer(
    model=inference_model,
    args=training_args,  
    eval_dataset=tokenized_dataset["test"].select(range(500)), 
    compute_metrics=compute_metrics,
)

peft_model_metrics = trainer.evaluate()
print("PEFT Model Metrics:", peft_model_metrics)


PEFT Model Metrics: {'eval_loss': 0.8980945944786072, 'eval_accuracy': 0.554, 'eval_runtime': 18.7539, 'eval_samples_per_second': 26.661, 'eval_steps_per_second': 3.359}


In [20]:
print("Base Model Accuracy:", base_model_metrics['eval_accuracy'])
print("PEFT Model Accuracy:", peft_model_metrics['eval_accuracy'])


Base Model Accuracy: 0.19545750569846956
PEFT Model Accuracy: 0.554
