<a href="https://colab.research.google.com/github/rithwikgarapati/Cognizant-Peft-Fine-tuning/blob/main/LightweightFineTuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: LoRA
* Model: AutoModelForSequenceClassification
* Evaluation approach: Hugging Face Trainer
* Fine-tuning dataset: Yelp review dataset

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
!pip install datasets
from datasets import load_dataset

# Load the dataset
"""
    - Dataset return dict which contains dictionaries of
        train and test data.
    - Need only training data for now
    - The first element in dataset contains our data
"""
dataset = load_dataset("Yelp/yelp_review_full")
train_ds = dataset["train"].shuffle(seed=42).select(range(1000))
test_ds = dataset["test"].shuffle(seed=42).select(range(500))



Collecting datasets
  Downloading datasets-2.20.0-py3-none-any.whl.metadata (19 kB)
Collecting pyarrow>=15.0.0 (from datasets)
  Downloading pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting requests>=2.32.2 (from datasets)
  Downloading requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.5.0,>=2023.1.0 (from fsspec[http]<=2024.5.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.5.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-2.20.0-py3-none-any.whl (547 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m547.8/547.8 kB[0m [31m10.5 MB/s[0m eta [36m0:00:00

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/6.72k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/299M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/23.5M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/650000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/50000 [00:00<?, ? examples/s]

In [2]:
# Load Tokenizer and tokenize the text.
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("gpt2", num_labels=5)
tokenizer.pad_token = tokenizer.eos_token

# Pre-process datasets - Tokenize them before
tokenized_ds = {}

# map - perform operation on every element
# lambda - perform function on every element
# Truncation = True - truncate text
# batched = True = perform batches of elements
tokenized_ds["train"] = train_ds.map(
    lambda x: tokenizer(x['text'], truncation=True), batched=True
)

tokenized_ds["test"] = test_ds.map(
    lambda x: tokenizer(x['text'], truncation=True), batched=True
)

print(tokenized_ds)

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

{'train': Dataset({
    features: ['label', 'text', 'input_ids', 'attention_mask'],
    num_rows: 1000
}), 'test': Dataset({
    features: ['label', 'text', 'input_ids', 'attention_mask'],
    num_rows: 500
})}


In [6]:
import numpy as np
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments
model = AutoModelForSequenceClassification.from_pretrained("gpt2", num_labels=5)
model.config.pad_token_id = tokenizer.pad_token_id

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)


# Compute metrics function
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}

# Define the training arguments
training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=2e-3,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    num_train_epochs=1,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
)

# Initialize the Trainer with compute_metrics
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=data_collator,
)

# Evaluate
evaluation_results = trainer.evaluate()
print("Evaluation Results:", evaluation_results)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Evaluation Results: {'eval_loss': 6.223862171173096, 'eval_accuracy': 0.224, 'eval_runtime': 11.1634, 'eval_samples_per_second': 44.789, 'eval_steps_per_second': 11.197}


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [8]:
!pip install peft
# For Peft, import LoRA
from peft import LoraConfig
config = LoraConfig(
    r=8,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="SEQ_CLS",
)



Collecting peft
  Downloading peft-0.12.0-py3-none-any.whl.metadata (13 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.13.0->peft)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.13.0->peft)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1.13.0->peft)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch>=1.13.0->peft)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch>=1.13.0->peft)
  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch>=1.13.0->peft)
  Using cached nvidia_cufft_cu12-11.

In [9]:
from peft import get_peft_model
model = AutoModelForSequenceClassification.from_pretrained("gpt2", num_labels=5)
model.config.pad_token_id = model.config.eos_token_id

lora_model =  get_peft_model(model, config)
lora_model.print_trainable_parameters()

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 298,752 || all params: 124,742,400 || trainable%: 0.2395




In [10]:
import numpy as np
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)


# Compute metrics function
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}

# Define the training arguments
training_args = TrainingArguments(
    output_dir="./peft/results",
    learning_rate=2e-3,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    num_train_epochs=1,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
)

# Initialize the Trainer with compute_metrics
trainer = Trainer(
    model=lora_model,
    args=training_args,
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=data_collator,
)

# Start training
trainer.train()

# Evaluate
evaluation_results = trainer.evaluate()
print("Evaluation Results:", evaluation_results)



Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.181783,0.446


Evaluation Results: {'eval_loss': 1.1817834377288818, 'eval_accuracy': 0.446, 'eval_runtime': 11.318, 'eval_samples_per_second': 44.177, 'eval_steps_per_second': 11.044, 'epoch': 1.0}


In [11]:
lora_model.save_pretrained("gpt-lora")


## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [14]:
from peft import AutoPeftModelForSequenceClassification # Import the necessary class
lora_model = AutoPeftModelForSequenceClassification.from_pretrained("gpt-lora", num_labels=5)


Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [15]:
import torch
def predict(sentence: str) -> str:
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    lora_model.to(device)

    # Prepare the input text
    inputs = tokenizer(sentence, return_tensors="pt").to(device)

    # Get predictions
    with torch.no_grad():
        outputs = lora_model(**inputs)
        logits = outputs.logits

    probabilities = torch.nn.functional.softmax(logits, dim=1)
    predicted_label = probabilities.argmax().item()

    return predicted_label

# Example usage
sentence = "This is absolutely terrible place to come and dine. I hated every second of my experience here."
predicted_label = predict(sentence)
print(f"Sentence: '{sentence}'")
print(f"Predicted label: {predicted_label}")

Sentence: 'This is absolutely terrible place to come and dine. I hated every second of my experience here.'
Predicted label: 0


In [None]:
for i in range(5):
    sentence = test_ds[i]['text']
    predicted_label = predict(sentence)
    print(f"Sentence: '{sentence}'")
    print(f"Predicted label: {predicted_label}")
    print(test_ds[i]['label'])

Sentence: 'Kabuto is your run-of-the-mill Japanese Steakhouse. Different stations with chefs slinging shrimp tails around the communal dining areas like it's a lunchtime magic show. Always a plethora of laughs and gags going around the group. \n\nThis place is great for lunch. $9 and 30 minutes and you're out the door. Uhhh...If I'm craving a salad with ginger dressing, which I always am, (you do too. admit it) fried rice, steak, shrimp and white sauce (DUDE) then Kabuto is king of lunch options in my book. Always super clean and full of kindhearted staff. The parking lot is super difficult to get in and out of though. 51 traffic at lunch is a beast. Good luck getting stuck behind someone trying to cut across traffic at 12pm on a weekday. It's murder. This place would greatly benefit from another exit/entrance or a stoplight. Here's hoping....\n\nYou can't really shake a stick at balanced lunch when you can have soup or a salad, veggies, fried rice and a choice of a protein for under $

In [16]:
"""
Overall Statistics:

Topic                           Before_training                   After_training
--------------------------------------------------------------------------------
1. Evaluation Loss                6.2238                              1.181783
2. Accuracy                       0.224                               0.446


Conclusion:
1. PeFT model helped with 22% accuracy which is significantly higher.
2. In our case of classifying yelp reviews in a scale of 1 - 5, the peft model was extremely helpful.
"""



'\nOverall Statistics:\n\nTopic                           Before_training                   After_training\n--------------------------------------------------------------------------------\n1. Validation Loss                1.582934                            1.630605\n2. Accuracy                       0.242000                            0.250000\n\n\nConclusion:\n1. PeFT model helped with 0.8% accuracy which is negligible\n2. In our case of classifying yelp reviews in a scale of 1 - 5, the peft model did not help much.\n'