# Lightweight Fine-Tuning Project

In this cell, describe your choices for each of the following

* PEFT technique: 
  - **LoRA (Low-Rank Adaptation)**: Efficient fine-tuning method that reduces memory and computation requirements by adding lightweight, trainable matrices to the model's weights.

* Model: 
  - **BERT (bert-base-uncased)**: A foundational transformer model, suitable for text classification tasks and compatible with my GPU (laptop 4070).

* Evaluation approach: 
  - **Accuracy and F1-Score**: Evaluate performance using these metrics. Accuracy provides an overall measure of correctness, while the F1-score balances precision and recall, especially useful for imbalanced datasets.
  - **Cross-validation**: Perform k-fold cross-validation to ensure robust evaluation and reduce overfitting risks.

* Fine-tuning dataset: 
  - **IMDb Movie Reviews Dataset**: A benchmark dataset for sentiment analysis. It consists of 50,000 movie reviews categorized into "positive" and "negative" sentiment labels, suitable for sequence classification tasks.


## Loading and Evaluating a Foundation Model

In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset
from peft import LoraConfig, get_peft_model, PeftModel
from torch.utils.data import DataLoader
from torch.nn.functional import softmax
from sklearn.metrics import accuracy_score
from transformers import AdamW, get_linear_schedule_with_warmup
from tqdm import tqdm
import torch.nn as nn

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [2]:
# Define model and tokenizer
MODEL_NAME = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

# Here I use the AutoModelForSequenceClassification method to get a model with a classification head (2 labels).
# This classification head is used on the [CLS] Token, which represents the entire input sequence.
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME, num_labels=2)  # Binary classification
model.to(device)

# Load IMDb Dataset
dataset = load_dataset("imdb")
test_dataset = dataset['test']

# Preprocess the data
def preprocess_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True, max_length=512)

encoded_test = test_dataset.map(preprocess_function, batched=True)
encoded_test.set_format(type="torch", columns=["input_ids", "attention_mask", "label"])

# Create DataLoader for evaluation
test_dataloader = DataLoader(encoded_test, batch_size=16)

# Define evaluation function
def evaluate(model, dataloader):
    model.eval()
    total, correct = 0, 0
    predictions, true_labels = [], []

    with torch.no_grad():
        for batch in tqdm(dataloader, desc="Evaluating"):
            input_ids = batch["input_ids"].to(device)
            attention_mask = batch["attention_mask"].to(device)
            labels = batch["label"].to(device)  # Ensure the key is 'labels'

            outputs = model(input_ids, attention_mask=attention_mask)
            logits = outputs.logits
            # No need for softmax here because the logits have the same relative values
            preds = torch.argmax(logits, dim=1)
            # Alternatively, if you want to use softmax:
            # preds = torch.argmax(F.softmax(logits, dim=1), dim=1)

            predictions.extend(preds.cpu().tolist())
            true_labels.extend(labels.cpu().tolist())
            correct += (preds == labels).sum().item()
            total += labels.size(0)

    accuracy = correct / total
    return accuracy, predictions, true_labels


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [5]:
# Evaluate the model on the test dataset
model.to(device)

print("Evaluating the pre-trained model on IMDb (large) test set...")
accuracy, predictions, true_labels = evaluate(model, test_dataloader)
print(f"Accuracy before fine-tuning: {accuracy:.4f}")

Evaluating the pre-trained model on IMDb test set...


Evaluating: 100%|███████████████████████████████████████████████████████████████████| 1563/1563 [04:39<00:00,  5.58it/s]

Accuracy before fine-tuning: 0.4469





We can see that the default accuracy is not very high. Only less 50%. This needs fine-tuning!

## Performing Parameter-Efficient Fine-Tuning

In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [None]:
# Load and preprocess the dataset
encoded_dataset = dataset.map(preprocess_function, batched=True)
encoded_dataset.set_format(type="torch", columns=["input_ids", "attention_mask", "label"])

orig_size_train = len(encoded_dataset["train"])
orig_size_test = len(encoded_dataset["test"])

print(f"Original size train: {orig_size_train}")
print(f"Original size test: {orig_size_test}")

subset_train_size = 5000 # Use a small subset for quicker training
subset_test_size = 2500 # we only take n test records otherwise training would take forever

# Split the dataset for training and evaluation
train_dataset = encoded_dataset["train"].shuffle(seed=42).select(range(subset_train_size))
eval_dataset = encoded_dataset["test"].select(range(subset_test_size))

# Configure LoRA for PEFT
lora_config = LoraConfig(
    task_type="SEQ_CLS",
    r=8,  # Rank of the low-rank adapters
    lora_alpha=32,
    target_modules=["query", "value"],  # Apply LoRA to attention layers
    lora_dropout=0.1,
    bias="none"
)

# Wrap the base model with PEFT using LoRA
peft_model = get_peft_model(model, lora_config)
print("PEFT model initialized.")


Original size train: 25000
Original size test: 25000
PEFT model initialized.


In [4]:
peft_model.to(device)

# Create DataLoaders for training and validation
train_dataloader = DataLoader(train_dataset, batch_size=16, shuffle=True)
val_dataloader = DataLoader(eval_dataset, batch_size=16)

# Define optimizer
optimizer = AdamW(peft_model.parameters(), lr=5e-5, weight_decay=0.01)

num_epochs = 3

# Instantiate scheduler
lr_scheduler = get_linear_schedule_with_warmup(
    optimizer=optimizer,
    num_warmup_steps=0.06 * (len(train_dataloader) * num_epochs),
    num_training_steps=(len(train_dataloader) * num_epochs),
)

# Define loss function
# Defining a loss function is not needed here because the peft model returns a loss when it gets the labels

# Training loop (I do this manually instead of the Trainer because here I have fine control over what is printed out and what is done)
for epoch in range(num_epochs):
    print(f"Epoch {epoch + 1}/{num_epochs}")
    
    # Training phase
    peft_model.train()
    total_train_loss = 0
    for batch in tqdm(train_dataloader, desc="Training"):
        optimizer.zero_grad()
        input_ids = batch["input_ids"].to(device)
        attention_mask = batch["attention_mask"].to(device)
        labels = batch["label"].to(device)
        
        # Forward pass
        outputs = peft_model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
        loss = outputs.loss  # Model outputs the loss when labels are provided
        
        # Backward pass and optimization
        loss.backward()
        optimizer.step()
        
        # Update the learning rate
        lr_scheduler.step()

        total_train_loss += loss.item()
    
    avg_train_loss = total_train_loss / len(train_dataloader)
    print(f"Training loss: {avg_train_loss:.4f}")
    
    # Validation phase
    peft_model.eval()
    total_val_loss = 0
    preds = []
    true_labels = []
    with torch.no_grad():
        for batch in tqdm(val_dataloader, desc="Validation"):
            input_ids = batch["input_ids"].to(device)
            attention_mask = batch["attention_mask"].to(device)
            labels = batch["label"].to(device)
            
            # Forward pass
            outputs = peft_model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
            loss = outputs.loss
            logits = outputs.logits
            
            total_val_loss += loss.item()
            
            # Get predictions
            predictions = torch.argmax(logits, dim=-1) # fim=-1 last dimension of tensor
            preds.extend(predictions.cpu().numpy())
            true_labels.extend(labels.cpu().numpy())
    
    avg_val_loss = total_val_loss / len(val_dataloader)
    val_accuracy = accuracy_score(true_labels, preds)
    print(f"Validation loss: {avg_val_loss:.4f}, Validation accuracy: {val_accuracy:.4f}")




Epoch 1/3


Training:   0%|                                                                                 | 0/313 [00:00<?, ?it/s]

Training: 100%|███████████████████████████████████████████████████████████████████████| 313/313 [02:41<00:00,  1.93it/s]


Training loss: 0.6006


Validation: 100%|█████████████████████████████████████████████████████████████████████| 157/157 [00:31<00:00,  5.00it/s]


Validation loss: 0.3882, Validation accuracy: 0.8344
Epoch 2/3


Training: 100%|███████████████████████████████████████████████████████████████████████| 313/313 [02:42<00:00,  1.93it/s]


Training loss: 0.2945


Validation: 100%|█████████████████████████████████████████████████████████████████████| 157/157 [00:29<00:00,  5.24it/s]


Validation loss: 0.2714, Validation accuracy: 0.8836
Epoch 3/3


Training: 100%|███████████████████████████████████████████████████████████████████████| 313/313 [02:37<00:00,  1.99it/s]


Training loss: 0.2600


Validation: 100%|█████████████████████████████████████████████████████████████████████| 157/157 [00:30<00:00,  5.14it/s]

Validation loss: 0.2766, Validation accuracy: 0.8804





In [5]:
# Save the PEFT model
peft_model.save_pretrained("./peft_lora_model")
print("PEFT model weights saved.")

PEFT model weights saved.


## Performing Inference with a PEFT Model

In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [10]:
# Load the base model and the PEFT model with saved weights
base_model_name = "bert-base-uncased"
base_model = AutoModelForSequenceClassification.from_pretrained(base_model_name, num_labels=2)
peft_model = PeftModel.from_pretrained(base_model, "./peft_lora_model")
peft_model.to(device)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [11]:
# accuracy, predictions, true_labels
# Evaluate the model
print("Evaluating the saved PEFT model...")
accuracy, _, _ = evaluate(peft_model, test_dataloader)

print(f"Validation Accuracy: {val_accuracy:.4f}")

Evaluating the saved PEFT model...


Evaluating: 100%|███████████████████████████████████████████████████████████████████| 1563/1563 [05:01<00:00,  5.18it/s]

Validation Accuracy: 0.8804



