# Lightweight Fine-Tuning Project

In this cell, describe your choices for each of the following

* PEFT technique: LoRA is selected for its efficiency in fine-tuning transformer models by adding trainable low-rank matrices to the attention layers while keeping the base model frozen.
* Model: GPT-2 is selected due to its strong contextual understanding and pretrained knowledge, making it adaptable for text classification tasks like spam detection.
* Evaluation approach: The Transformer Trainer is used as it provides an optimized training loop for fine-tuning transformer models with built-in support for distributed training, mixed precision, and evaluation.
* Fine-tuning dataset: ucirvine/sms_spam is well labled dataset with a balanced property to distinguish between spam and not spam messages.

## Loading and Evaluating a Foundation Model

In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [5]:
import os

# Debugging: Makes CUDA ops synchronous for better error tracking
os.environ['CUDA_LAUNCH_BLOCKING']="1"
# Enables CUDA Dynamic Shared Allocation for memory debugging
os.environ['TORCH_USE_CUDA_DSA'] = "1"

In [6]:
from transformers import AutoTokenizer

# load the pre-defined model's tokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")

# to ensure that the padding token used by a tokenizer is set to the same value as the end-of-sequence token.
tokenizer.pad_token = tokenizer.eos_token

def tokenize(batch):
    return tokenizer(batch["sms"], padding=True, truncation=True)


In [7]:
from datasets import load_dataset

# The sms_spam dataset only has a train split, so we use the train_test_split method to split it into train and test
dataset = load_dataset("sms_spam", split="train").train_test_split(
    test_size=0.2, shuffle=True, seed=23
)

In [8]:
# Tokenize train and test sets
train_dataset = dataset["train"].map(tokenize, batched=True)
test_dataset = dataset["test"].map(tokenize, batched=True)

In [7]:
from transformers import AutoModelForSequenceClassification

# Load pre-defined model with custom output labels
foundation_model = AutoModelForSequenceClassification.from_pretrained("gpt2", num_labels=2,
    id2label={0: "not spam", 1: "spam"},
    label2id={"not spam": 0, "spam": 1},
)# To ensures that the model's configuration recognizes the padding token that was set in the tokenizer.
foundation_model.config.pad_token_id = tokenizer.pad_token_id

print(foundation_model)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2SdpaAttention(
          (c_attn): Conv1D(nf=2304, nx=768)
          (c_proj): Conv1D(nf=768, nx=768)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D(nf=3072, nx=768)
          (c_proj): Conv1D(nf=768, nx=3072)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=2, bias=False)
)


In [8]:
import torch

# get predictions from the foundation model

predictions = []
labels = []
for example in test_dataset:    
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")    
    foundation_model.to(device)

    # Tokenize the input data
    inputs = tokenizer(example["sms"], return_tensors="pt").to(device)

    # Get raw predictions
    with torch.no_grad():
        outputs = foundation_model(**inputs)
        logits = outputs.logits        

    probabilities = torch.nn.functional.softmax(logits, dim=1)    
    predicted_class_id = probabilities.argmax().item()
    
    predictions.append(predicted_class_id)
    labels.append(example["label"])

In [9]:
from sklearn.metrics import accuracy_score

def compute_metrics(labels, preds):
    acc = accuracy_score(labels, preds)
    return {"accuracy": acc}

# Compute evaluation metrics
evaluation_metrics = compute_metrics(labels, predictions)
print(evaluation_metrics)

{'accuracy': 0.8834080717488789}


## Performing Parameter-Efficient Fine-Tuning

In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [4]:
from peft import LoraConfig, PeftModelForSequenceClassification, TaskType, AutoPeftModelForSequenceClassification
from transformers import AutoModelForSequenceClassification

# PEFT model configuration
peft_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    inference_mode=False,
    r=4,
    lora_alpha=16,
    lora_dropout=0.1
)

# Load again the pre-defined foundation model with custom output labels
model = AutoModelForSequenceClassification.from_pretrained("gpt2", num_labels=2,
    id2label={0: "not spam", 1: "spam"},
    label2id={"not spam": 0, "spam": 1},
)
model.config.pad_token_id = model.config.eos_token_id

peft_model = PeftModelForSequenceClassification(model, peft_config)

print(peft_model.print_trainable_parameters())

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 148,992 || all params: 124,590,336 || trainable%: 0.1196
None




In [9]:
import numpy as np
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}

In [10]:
import torch

# Clears unused GPU memory
torch.cuda.empty_cache()
# Resets peak memory tracking for profiling
torch.cuda.reset_peak_memory_stats()

In [11]:
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments
import numpy as np

peft_training_args = TrainingArguments(
    output_dir="./results/peft_model",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    num_train_epochs=1,
    weight_decay=0.01,
    logging_dir='./logs/peft_model',
    save_strategy="epoch",
    load_best_model_at_end=True,
    logging_steps=100,
    warmup_ratio=0.1,
)

# Initialize the Trainer with compute_metrics
peft_trainer = Trainer(
    model=peft_model,
    args=peft_training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
)

peft_trainer.train()

  peft_trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy
1,0.9781,0.762144,0.869058


TrainOutput(global_step=140, training_loss=0.950489262172154, metrics={'train_runtime': 287.0679, 'train_samples_per_second': 15.533, 'train_steps_per_second': 0.488, 'total_flos': 588140786128896.0, 'train_loss': 0.950489262172154, 'epoch': 1.0})

In [12]:
# Evaluate the model
evaluation_results = peft_trainer.evaluate()
print("Evaluation Results:", evaluation_results)

Evaluation Results: {'eval_loss': 0.7621443271636963, 'eval_accuracy': 0.8690582959641255, 'eval_runtime': 18.2539, 'eval_samples_per_second': 61.083, 'eval_steps_per_second': 1.917, 'epoch': 1.0}


In [13]:
# save the model locally
peft_model.save_pretrained('model/peft_model')

## Performing Inference with a PEFT Model

In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [14]:
from peft import AutoPeftModelForSequenceClassification

# Load the model from the local machine
inference_model = AutoPeftModelForSequenceClassification.from_pretrained(
    "model/peft_model",
    num_labels=2,
    id2label={0: "not spam", 1: "spam"},
    label2id={"not spam": 0, "spam": 1},
)
inference_model.config.pad_token_id = inference_model.config.eos_token_id

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [15]:
import torch

def predict(prompt: str) -> str:
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")    
    inference_model.to(device)

    # Prepare the input text
    inputs = tokenizer(prompt, return_tensors="pt").to(device)

    # Get predictions
    with torch.no_grad():
        outputs = inference_model(**inputs)
        logits = outputs.logits        

    probabilities = torch.nn.functional.softmax(logits, dim=1)    
    predicted_class_id = probabilities.argmax().item()    
    id2label={0: "spam", 1: "not spam"}
    predicted_label = id2label[predicted_class_id]

    return predicted_label

In [19]:

# Non spam inference example
prompt = "Yup next stop."
predicted_label = predict(prompt)
print(f"Prompt: '{prompt}'\nPredicted label: {predicted_label}")

Prompt: 'Yup next stop.'
Predicted label: spam


In [17]:
# Spam inference example
prompt = "SMS. ac Sptv: The New Jersey Devils and the Detroit Red Wings play Ice Hockey. Correct or Incorrect? End? Reply END SPTV"
predicted_label = predict(prompt)
print(f"Prompt: '{prompt}'\nPredicted label: {predicted_label}")

Prompt: 'SMS. ac Sptv: The New Jersey Devils and the Detroit Red Wings play Ice Hockey. Correct or Incorrect? End? Reply END SPTV'
Predicted label: spam
