<a href="https://colab.research.google.com/github/sujoy1968/SBGenAIUdacity/blob/main/PEFT_SB.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Lightweight Fine-Tuning Project**
TODO: In this cell, describe your choices for each of the following

* PEFT technique:
* Model:
* Evaluation approach:
* Fine-tuning dataset:


## Loading and Evaluating a Foundation Model
TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

## Loading Dependencies

In [None]:
!pip install -q transformers
!pip install -q peft
!pip install -q evaluate
!pip install -U datasets

## Download the dataset from HuggingFace**
Download "rotten_tomatoes" datset from HuggingFace. This dataset has the following splits:

1. train
2. validation
3. test




In [None]:
from datasets import load_dataset

dataset = load_dataset("rotten_tomatoes")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/7.46k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/699k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/90.0k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/92.2k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/8530 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/1066 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1066 [00:00<?, ? examples/s]

In [None]:
dataset

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 8530
    })
    validation: Dataset({
        features: ['text', 'label'],
        num_rows: 1066
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 1066
    })
})

In [None]:
dataset["train"]

Dataset({
    features: ['text', 'label'],
    num_rows: 8530
})

## Pre-process datasets
Convert the text to tokens in the dataset. Use lambda function to tokenize all examples.

In [None]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    "gpt2"
)

tokenizer.pad_token = tokenizer.eos_token
tokenizer.deprecation_warnings["Asking-to-pad-a-fast-tokenizer"] = True

def tokenize_function(examples):
    return tokenizer(examples["text"], padding='max_length', truncation=True, max_length=128)

splits = ["train", "validation",  "test"]

tokenized_dataset = {}
for split in splits:
    tokenized_dataset[split] = dataset[split].map(
        tokenize_function, batched=True
    )


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Map:   0%|          | 0/8530 [00:00<?, ? examples/s]

Map:   0%|          | 0/1066 [00:00<?, ? examples/s]

Map:   0%|          | 0/1066 [00:00<?, ? examples/s]

## Download Pre-trained model from Huggingface
Download GPT-2 LLM from HuggingFace. We will be using this model for Sentiment Analysis for moview reviews from Rotten Tomatoes.

In [None]:
from transformers import AutoModelForSequenceClassification


pt_model = AutoModelForSequenceClassification.from_pretrained(
    "gpt2",
    num_labels=2,
    id2label={0: "NEGATIVE", 1: "POSITIVE"},  # For converting predictions to strings
    label2id={"NEGATIVE": 0, "POSITIVE": 1},
)

pt_model.config.pad_token_id = pt_model.config.eos_token_id

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
print(pt_model)

GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=2, bias=False)
)


## Evaluate GPT-2 using HuggingFace Trainer API

In [None]:
import torch
import numpy as np
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}


training_args =  TrainingArguments(
    output_dir="./results",
    per_device_eval_batch_size=8,
    do_eval=True,
    logging_dir="./logs",
)

trainer = Trainer(
    model=pt_model,
    args=training_args,
    eval_dataset = tokenized_dataset["test"],
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

In [None]:
# Evaluate the model
eval_results = trainer.evaluate()

print(f"Pre-trained GPT-2 evaluation results: {eval_results}")

Pre-trained GPT-2 evaluation results: {'eval_loss': 2.308474540710449, 'eval_accuracy': 0.50093808630394, 'eval_runtime': 11.509, 'eval_samples_per_second': 92.623, 'eval_steps_per_second': 11.643}


## Performing Parameter-Efficient Fine-Tuning
TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

### Configure LORA into GPT-2 model¶

In [None]:
from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.1,
    bias="none",
)

In [None]:
lora_model = get_peft_model(pt_model, lora_config)

# check the number of trainable parameters

lora_model.print_trainable_parameters()

trainable params: 589,824 || all params: 125,031,168 || trainable%: 0.4717


## Create Training Loop¶

In [None]:
lora_training_args =  TrainingArguments(
    output_dir="./results_lora",
    learning_rate = 5e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    num_train_epochs=3,
    logging_dir="./logs_lora",
    logging_steps=10,
)

# LoRA takes in "labels", not "label" so we need to rename the
# training and testing sets
tokenized_dataset_lora = dataset.map(
    tokenize_function,
    batched=True,
    # remove_columns=["text"],
)

#tokenized_dataset_lora.rename_column('label', 'labels')
train_lora = tokenized_dataset_lora['train'].rename_column('label', 'labels')
eval_lora = tokenized_dataset_lora['validation'].rename_column('label', 'labels')
test_lora = tokenized_dataset_lora['test'].rename_column('label', 'labels')

train_lora



Map:   0%|          | 0/1066 [00:00<?, ? examples/s]

Dataset({
    features: ['text', 'labels', 'input_ids', 'attention_mask'],
    num_rows: 8530
})

In [None]:
# Initialize the Trainer
lora_trainer = Trainer(
    model=lora_model,
    args=lora_training_args,
    train_dataset = train_lora,
    eval_dataset = eval_lora,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

# Train the model

lora_trainer.train()

# Save the LoRA Model
lora_model.save_pretrained("gpt2-lora")

Epoch,Training Loss,Validation Loss
1,0.5388,No log
2,0.3793,No log
3,0.3652,No log




## Performing Inference with a PEFT Model
TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

### Load saved LoRA model

In [None]:
from peft import AutoPeftModelForCausalLM


lora_model = AutoPeftModelForCausalLM.from_pretrained("gpt2-lora")

lora_model.config.pad_token_id = lora_model.config.eos_token_id

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


### Evaluate LoRA model with Rotten Tomatoes dataset¶

In [None]:
training_args =  TrainingArguments(
    output_dir="./results_lora",
    per_device_eval_batch_size=8,
    do_eval=True,
    logging_dir="./logs_lora",
)


trainer = Trainer(
    model=lora_model,
    args=training_args,
    eval_dataset = test_lora,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

# Evaluate LoRA model

lora_eval_results=trainer.evaluate()

print(f"LoRA fine-tined GPT-2 evaluation results: {lora_eval_results}")

LoRA fine-tined GPT-2 evaluation results: {'eval_loss': 0.46442681550979614, 'eval_accuracy': 0.7889305816135085, 'eval_runtime': 8.8566, 'eval_samples_per_second': 120.362, 'eval_steps_per_second': 15.13}


### Compare the Evaluation resiults of LoRA fine-tuned with the original GPT2

In [None]:
def compare_results(original_results, finetuned_results):
    print("\nComparison of Original and LoRA Fine-Tuned Model Performance:")
    for key in original_results.keys():
        if key in finetuned_results:
            print(f"{key}:")
            print(f"  Original: {original_results[key]}")
            print(f"  LoRA Fine-Tuned: {finetuned_results[key]}")

compare_results(eval_results, lora_eval_results)


Comparison of Original and LoRA Fine-Tuned Model Performance:
eval_loss:
  Original: 2.308474540710449
  LoRA Fine-Tuned: 0.46442681550979614
eval_accuracy:
  Original: 0.50093808630394
  LoRA Fine-Tuned: 0.7889305816135085
eval_runtime:
  Original: 11.509
  LoRA Fine-Tuned: 8.8566
eval_samples_per_second:
  Original: 92.623
  LoRA Fine-Tuned: 120.362
eval_steps_per_second:
  Original: 11.643
  LoRA Fine-Tuned: 15.13
