# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: LoRA (Low Rank Adaptation)
* Model: distilbert-base-uncased (for sequence classification)
* Evaluation approach: Accuracy using the Hugging Face Trainer
* Fine-tuning dataset: is_sarcastic (from hugging face library)

### Project: Movie review sentiment analysis.

### Project Overview
#### Project Introduction:
This project demonstrates the process of lightweight fine-tuning using a pre-trained model for sentiment analysis on movie reviews.

#### Project Summary:
In this project, we will:
* Load a pre-trained model and evaluate its performance.
* Perform parameter-efficient fine-tuning using the pre-trained model.
* Perform inference using the fine-tuned model and compare its performance to the original model.
#### Key Concepts:
* Sentiment detection requires understanding context and subtle cues.
* Using PEFT allows us to fine-tune the model efficiently without needing extensive computational resources.

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
! pip install scikit-learn peft transformers datasets

Defaulting to user installation because normal site-packages is not writeable
Collecting scikit-learn
  Downloading scikit_learn-1.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.4/13.4 MB[0m [31m48.5 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting joblib>=1.2.0
  Downloading joblib-1.4.2-py3-none-any.whl (301 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m301.8/301.8 kB[0m [31m34.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting threadpoolctl>=3.1.0
  Downloading threadpoolctl-3.5.0-py3-none-any.whl (18 kB)
Installing collected packages: threadpoolctl, joblib, scikit-learn
Successfully installed joblib-1.4.2 scikit-learn-1.5.1 threadpoolctl-3.5.0


## Import dependencies

In [13]:
import random
import torch
import logging
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification, Trainer, TrainingArguments, EvalPrediction
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
from datasets import load_dataset
from peft import LoraConfig, get_peft_model, AutoPeftModelForSequenceClassification

# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


## Loading and Preparing the Dataset
In this section, we load the IMDb dataset and prepare it for fine-tuning. We use only a subset of the data to ensure quick experimentation and validation.


In [3]:
# Load the IMDb dataset and use a smaller subset for testing
imdb_dataset = load_dataset("imdb")

# Split the dataset
train_dataset = imdb_dataset["train"].shuffle(seed=42).select(range(1000))
eval_dataset = imdb_dataset["test"].shuffle(seed=42).select(range(200))

Downloading readme:   0%|          | 0.00/7.81k [00:00<?, ?B/s]

Downloading data: 100%|██████████| 21.0M/21.0M [00:00<00:00, 22.6MB/s]
Downloading data: 100%|██████████| 20.5M/20.5M [00:01<00:00, 19.1MB/s]
Downloading data: 100%|██████████| 42.0M/42.0M [00:01<00:00, 27.7MB/s]


Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

## Loading the Pre-trained Model
Here, we load the pre-trained BERT model (`distilbert-base-uncased`) and its tokenizer. We also configure the model to use the appropriate padding token.

In [15]:
# Load the pre-trained model and tokenizer
model_name = "distilbert-base-uncased"
tokenizer = DistilBertTokenizer.from_pretrained(model_name)
model = DistilBertForSequenceClassification.from_pretrained(model_name, num_labels=2)
model.config.pad_token_id = tokenizer.pad_token_id


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## Tokenizing the Dataset
The dataset is tokenized to convert the text into the format required by the BERT model. This includes adding padding and truncating the sequences to a fixed length.


In [16]:
# Tokenize the dataset
def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True, max_length=128)

logger.info("Tokenizing dataset...")
train_dataset = train_dataset.map(tokenize_function, batched=True)
eval_dataset = eval_dataset.map(tokenize_function, batched=True)

# Set the format for PyTorch
train_dataset.set_format(type='torch', columns=['input_ids', 'attention_mask', 'label'])
eval_dataset.set_format(type='torch', columns=['input_ids', 'attention_mask', 'label'])

INFO:__main__:Tokenizing dataset...


Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Map:   0%|          | 0/200 [00:00<?, ? examples/s]

## Evaluating the Pre-trained Model
We evaluate the pre-trained BERT model using the tokenized IMDb dataset. We use accuracy as the primary metric for evaluation.


In [17]:
# Define metrics function
def compute_metrics(p: EvalPrediction):
    preds = p.predictions.argmax(-1)
    precision, recall, f1, _ = precision_recall_fscore_support(p.label_ids, preds, average='weighted')
    acc = accuracy_score(p.label_ids, preds)
    return {"accuracy": acc, "f1": f1, "precision": precision, "recall": recall}

# Evaluate the pretrained model
logger.info("Evaluating pretrained model...")
trainer = Trainer(
    model=model,
    eval_dataset=eval_dataset,
    compute_metrics=compute_metrics,
)
pretrained_eval_results = trainer.evaluate()
logger.info(f"Pretrained model evaluation results: {pretrained_eval_results}")


INFO:__main__:Evaluating pretrained model...


INFO:__main__:Pretrained model evaluation results: {'eval_loss': 0.6939542889595032, 'eval_accuracy': 0.46, 'eval_f1': 0.4593517406962785, 'eval_precision': 0.4620289855072464, 'eval_recall': 0.46, 'eval_runtime': 44.744, 'eval_samples_per_second': 4.47, 'eval_steps_per_second': 0.559}


## Creating a PEFT Model
We create a parameter-efficient fine-tuning (PEFT) model using LoRA (Low Rank Adaptation) technique.


In [20]:
logger.info("Creating PEFT model...")
peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_lin", "k_lin", "v_lin"],
    lora_dropout=0.1,
    bias="none",
    task_type="SEQ_CLS"
)
peft_model = get_peft_model(model, peft_config)
peft_model.print_trainable_parameters()

INFO:__main__:Creating PEFT model...


trainable params: 1,626,628 || all params: 67,989,508 || trainable%: 2.3924691439155583


## Training the PEFT Model
We fine-tune the PEFT model using the tokenized IMDb dataset. The Hugging Face Trainer is used to handle the training process, including batching and optimization.


In [21]:
# Set up training arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    push_to_hub=False,
    logging_dir='./logs',
    logging_steps=100,
)

# Initialize Trainer for PEFT model
peft_trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=compute_metrics,
)

# Train the PEFT model
logger.info("Starting training...")
peft_trainer.train()

# Save the PEFT model
logger.info("Saving the model...")
peft_model.save_pretrained("./peft_model")

INFO:__main__:Starting training...


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,0.683581,0.61,0.551108,0.693588,0.61
2,0.684900,0.676295,0.635,0.585047,0.725757,0.635
3,0.684900,0.67366,0.665,0.645356,0.696763,0.665


INFO:__main__:Saving the model...


## Loading the Saved PEFT Model
We load the saved PEFT model to perform inference and further evaluation.


In [22]:
fine_tuned_model = AutoPeftModelForSequenceClassification.from_pretrained("./peft_model")

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## Evaluating the Fine-Tuned Model
We evaluate the fine-tuned model using the same metrics and dataset to compare its performance to the original model.


In [23]:
logger.info("Evaluating the fine-tuned model...")
eval_results = trainer.evaluate()
logger.info(f"Evaluation results: {eval_results}")

INFO:__main__:Evaluating the fine-tuned model...
INFO:__main__:Evaluation results: {'eval_loss': 0.6736596822738647, 'eval_accuracy': 0.665, 'eval_f1': 0.6453562929490093, 'eval_precision': 0.6967629315877295, 'eval_recall': 0.665, 'eval_runtime': 48.5853, 'eval_samples_per_second': 4.116, 'eval_steps_per_second': 0.515}


## Generating and Reviewing Predictions
To validate the model's performance, we generate predictions for a few samples from the test dataset and manually compare them with the true labels.


In [29]:
# Generate and review predictions
logger.info("Generating predictions for sample data...")
sample_indices = random.sample(range(len(eval_dataset)), 5)
samples = [eval_dataset[i] for i in sample_indices]

for idx, sample in enumerate(samples):
    # Prepare inputs
    inputs = {
        'input_ids': sample['input_ids'].unsqueeze(0),
        'attention_mask': sample['attention_mask'].unsqueeze(0)
    }
    
    with torch.no_grad():
        outputs = fine_tuned_model(**inputs)
    
    logits = outputs.logits
    predicted_label = logits.argmax(-1).item()
    true_label = sample['label'].item()

    # Decode the input_ids to get the original text
    original_text = tokenizer.decode(sample['input_ids'], skip_special_tokens=True)

    logger.info(f"Sample {idx + 1}:")
    logger.info(f"Text: {original_text[:200]}...")
    logger.info(f"True Label: {true_label}, Predicted Label: {predicted_label}\n")

    # Print label meanings
    label_meanings = {0: "Negative", 1: "Positive"}
    logger.info(f"True Sentiment: {label_meanings[true_label]}")
    logger.info(f"Predicted Sentiment: {label_meanings[predicted_label]}\n")

INFO:__main__:Generating predictions for sample data...
INFO:__main__:Sample 1:
INFO:__main__:Text: sex, drugs, racism and of course you abc's. what more could you want in a kid's show! < br / > < br / > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ...
INFO:__main__:True Label: 1, Predicted Label: 0

INFO:__main__:True Sentiment: Positive
INFO:__main__:Predicted Sentiment: Negative

INFO:__main__:Sample 2:
INFO:__main__:Text: jessica bohl plays daphne, the sexually precocious suburban teenager struggling with the hell of high school. daphne's neighbor is buddy ( richard brundage ), a depressed middle - aged man still angry...
INFO:__main__:True Label: 1, Predicted Label: 1

INFO:__main__:True Sentiment: Positive
INFO:__main__:Predicted Sentiment: Positive

INFO:__main__:Sample 3:
INFO:__main__:Text: this movie was absolutely one of the worst movies i have ever seen. the plot could have been made to work, had the movie been written bett