# **Fine-Tuning RoBERTa for Sentiment Analysis with PEFT on IMDB Dataset**




### **Summary**
* PEFT technique: LORA
* Model: FacebookAI/roberta-base
* Evaluation approach: F1-score
* Fine-tuning dataset: stanfordnlp/imdb

### **Intro**
This model is a fine-tuned version of the bert-base model to classify the sentiment of movie reviews into one of two categories: negative(label 0), positive(label 1).



## Install dependencies


In [1]:
!pip install datasets
!pip install transformers datasets evaluate accelerate peft

Collecting datasets
  Downloading datasets-2.20.0-py3-none-any.whl.metadata (19 kB)
Collecting pyarrow>=15.0.0 (from datasets)
  Downloading pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting requests>=2.32.2 (from datasets)
  Downloading requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.5.0,>=2023.1.0 (from fsspec[http]<=2024.5.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.5.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-2.20.0-py3-none-any.whl (547 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m547.8/547.8 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[

## Dataset Preprocessing
**Load a pretrained HF model and load and preprocess a dataset**

In [2]:
import torch
from transformers import RobertaModel, RobertaTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer, DataCollatorWithPadding
from peft import LoraConfig, get_peft_model
from datasets import load_dataset

peft_model_name = 'roberta-base-peft'
modified_base = 'roberta-base-modified'
base_model = 'roberta-base'

dataset = load_dataset('stanfordnlp/imdb')
tokenizer = RobertaTokenizer.from_pretrained(base_model)

def preprocess(examples):
    tokenized = tokenizer(examples['text'], truncation=True, padding=True)
    return tokenized

tokenized_dataset = dataset.map(preprocess, batched=True,  remove_columns=["text"])

# Select a subset of the train, unsupervised, and test datasets for training, evaluation, and testing
train_dataset = tokenized_dataset['train'].select(range(900))
test_dataset = tokenized_dataset['unsupervised'].select(range(100))
eval_dataset = tokenized_dataset['test'].select(range(100))

# Extract the number of classess and their names
num_labels = dataset['train'].features['label'].num_classes
class_names = dataset["train"].features["label"].names
print(f"number of labels: {num_labels}")
print(f"the labels: {class_names}")

# Create an id2label mapping
# Will need this for the classifier.
id2label = {i: label for i, label in enumerate(class_names)}

data_collator = DataCollatorWithPadding(tokenizer=tokenizer, return_tensors="pt")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/7.81k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/21.0M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/20.5M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/42.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/50000 [00:00<?, ? examples/s]

number of labels: 2
the labels: ['neg', 'pos']


**Evaluate the pretrained model**

In [3]:
import numpy as np
from sklearn.metrics import f1_score

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    accuracy = (predictions == labels).mean()
    f1 = f1_score(labels, predictions, average='weighted')  # Use 'weighted' for multi-class
    return {"accuracy": accuracy, "f1_score": f1}

# Define the model
model = AutoModelForSequenceClassification.from_pretrained(base_model, num_labels=num_labels)

trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir=".",
        learning_rate=2e-3,
        per_device_train_batch_size=1,
        per_device_eval_batch_size=1,
        num_train_epochs=0,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer, padding="max_length"),
    compute_metrics=compute_metrics,
)

# Evaluate the model
trainer.evaluate()


model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


{'eval_loss': 0.6286383271217346,
 'eval_accuracy': 1.0,
 'eval_f1_score': 1.0,
 'eval_runtime': 239.0639,
 'eval_samples_per_second': 0.418,
 'eval_steps_per_second': 0.418}

In [4]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"Trainable params: {trainable_params} || All params: {all_param} || Trainable%: {100 * trainable_params / all_param:.2f}%"
    )

In [25]:
def freeze_parameters(model, layer_names=None):
    """
    Freeze parameters in the model. If layer_names is specified, only these layers will be frozen.
    """
    if layer_names is None:
        # Freeze all parameters
        for param in model.parameters():
            param.requires_grad = False
    else:
        # Freeze specific layers
        for name, param in model.named_parameters():
            if any(layer_name in name for layer_name in layer_names):
                param.requires_grad = False
            else:
                param.requires_grad = True


## Training
Create a PEFT model and train the PEFT model

In [8]:
import os
import torch
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer
from datasets import DatasetDict
from peft import LoraModel, LoraConfig

# Define the path where you want to create the directory
path = '/content/drive/MyDrive/model_results'

# Create the directory
os.makedirs(path, exist_ok=True)

base_model = 'roberta-base'
num_labels = 2  # Adjust according to your task

model = AutoModelForSequenceClassification.from_pretrained(base_model, num_labels=num_labels)

config = LoraConfig(
    task_type="SEQ_CLS",
    inference_mode=False,
    r=8,                 # Dimension of the low-rank approximation
    lora_alpha=16,       # Scaling factor for the LoRA layers
    lora_dropout=0.1,    # Dropout rate for the LoRA layers
    target_modules=[f"roberta.encoder.layer.{i}.attention.self.query" for i in range(12)] +
                    [f"roberta.encoder.layer.{i}.attention.self.key" for i in range(12)] +
                    [f"roberta.encoder.layer.{i}.attention.self.value" for i in range(12)]
)

# Apply PEFT method (LoRA) to the model
peft_model = get_peft_model(model, config)
print_trainable_parameters(peft_model)

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Trainable params: 1034498 || All params: 125681668 || Trainable%: 0.82%


In [None]:
# Define training arguments
training_args = TrainingArguments(
    output_dir="path",
    learning_rate=2e-3,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=1,
    evaluation_strategy="epoch",
    save_strategy="epoch",
)

# Define data collator
data_collator = DataCollatorWithPadding(tokenizer=tokenizer, padding="max_length")

# Define Trainer
def get_trainer(model):
    return Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        tokenizer=tokenizer,
        data_collator=data_collator
    )

In [9]:


# Apply parameter freezing (e.g., freeze all but the classifier layer)
freeze_parameters(peft_model, layer_names=['classifier'])

# Create Trainer for the PEFT model
peft_lora_finetuning_trainer = get_trainer(peft_model)

# Train the model
peft_lora_finetuning_trainer.train()



Epoch,Training Loss,Validation Loss
1,No log,0.0


TrainOutput(global_step=113, training_loss=0.008103689788717084, metrics={'train_runtime': 4849.903, 'train_samples_per_second': 0.186, 'train_steps_per_second': 0.023, 'total_flos': 239660129894400.0, 'train_loss': 0.008103689788717084, 'epoch': 1.0})

**Save the PEFT model**

In [13]:
# The notebook was run in Google Colab because the Udacity Workspace environment's kernel kept crashing,
# even with a reduced dataset, causing extended training times for a small model.
import os
from google.colab import drive
import os

# Define the path in Google Drive for saving the tokenizer
save_directory = '/content/drive/My Drive/roberta-base-modified'

# Create the directory if it does not already exist
os.makedirs(save_directory, exist_ok=True)

# Save the tokenizer to the specified directory
tokenizer.save_pretrained(save_directory)

# Define the path in Google Drive for saving the fine-tuned model
save_directory = '/content/drive/My Drive/roberta-base-peft'

# Create the directory if it does not already exist
os.makedirs(save_directory, exist_ok=True)

# Save the fine-tuned model to the specified directory
peft_model.save_pretrained(save_directory)


In [15]:
tokenizer.save_pretrained(modified_base)
peft_model.save_pretrained(peft_model_name)

## Performing Inference with a PEFT Model


In [16]:
from peft import AutoPeftModelForSequenceClassification
from transformers import AutoTokenizer

# Load the saved PEFT model from the specified directory
inference_model = AutoPeftModelForSequenceClassification.from_pretrained(peft_model_name, id2label=id2label)

# Load the tokenizer used for preprocessing the input text
tokenizer = AutoTokenizer.from_pretrained(modified_base)

def classify(text):
    # Tokenize the input text, handling truncation and padding, and prepare it for the model
    inputs = tokenizer(text, truncation=True, padding=True, return_tensors="pt")

    # Perform inference with the loaded PEFT model
    output = inference_model(**inputs)

    # Get the predicted class by finding the index of the maximum logit value
    prediction = output.logits.argmax(dim=-1).item()

    # Print the predicted class, corresponding label, and the input text
    print(f'\n Class: {prediction}, Label: {id2label[prediction]}, Text: {text}')


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [17]:
classify( "I would put this at the top of my list of films in the category of unwatchable trash! There are films that are bad, but the worst kind are the ones that are unwatchable but you are suppose to like them because they are supposed to be good for you! ")
classify( "Its not the cast. A finer group of actors, you could not find. Its not the setting. The director is in love with New York City, and by the end of the film, so are we all! Woody Allen could not improve upon what Bogdonovich has done here. If you are going to fall in love, or find love, Manhattan is the place to go. No, the problem with the movie is the script. ")


 Class: 0, Label: neg, Text: I would put this at the top of my list of films in the category of unwatchable trash! There are films that are bad, but the worst kind are the ones that are unwatchable but you are suppose to like them because they are supposed to be good for you! 

 Class: 0, Label: neg, Text: Its not the cast. A finer group of actors, you could not find. Its not the setting. The director is in love with New York City, and by the end of the film, so are we all! Woody Allen could not improve upon what Bogdonovich has done here. If you are going to fall in love, or find love, Manhattan is the place to go. No, the problem with the movie is the script. 


In [18]:
from torch.utils.data import DataLoader
import evaluate
from tqdm import tqdm

# Load the accuracy metric for evaluation
metric = evaluate.load('accuracy')

def evaluate_model(inference_model, dataset):
    # Create a DataLoader for the evaluation dataset
    eval_dataloader = DataLoader(
        dataset.rename_column("label", "labels"),  # Rename 'label' column to 'labels' for consistency
        batch_size=8,  # Batch size for evaluation
        collate_fn=data_collator  # Function to collate data samples into batches
    )

    # Determine the device to use (GPU if available, otherwise CPU)
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    # Move the model to the appropriate device
    inference_model.to(device)

    # Set the model to evaluation mode
    inference_model.eval()

    # Iterate over the evaluation dataset
    for step, batch in enumerate(tqdm(eval_dataloader)):
        # Move each tensor in the batch to the device
        batch = {k: v.to(device) for k, v in batch.items()}

        with torch.no_grad():  # Disable gradient calculation for evaluation
            outputs = inference_model(**batch)

        # Get predictions from the model output
        predictions = outputs.logits.argmax(dim=-1)

        # Get references (true labels) from the batch
        references = batch["labels"]

        # Add predictions and references to the metric for evaluation
        metric.add_batch(
            predictions=predictions,
            references=references,
        )

    # Compute the final evaluation metric
    eval_metric = metric.compute()

    print(eval_metric)

Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]

## Evaluate Models
**Load the saved PEFT model**

In [22]:
def compute_metrics(p):
    predictions = np.argmax(p.predictions, axis=1)
    return {
        'accuracy': (predictions == p.label_ids).mean(),
        'f1': f1_score(p.label_ids, predictions, average='weighted')
    }

# Define Trainer
def get_trainer(model):
    return Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        tokenizer=tokenizer,
        data_collator=data_collator,
        compute_metrics=compute_metrics  # Include compute_metrics function here
    )

In [24]:
# Create Trainer for the PEFT model
peft_lora_finetuning_trainer = get_trainer(peft_model)

# Evaluate the model
eval_results = peft_lora_finetuning_trainer.evaluate()

# Print the evaluation results
print("Evaluation Results:")
print(eval_results)

Evaluation Results:
{'eval_loss': 0.0, 'eval_accuracy': 1.0, 'eval_f1': 1.0, 'eval_runtime': 201.62, 'eval_samples_per_second': 0.496, 'eval_steps_per_second': 0.064}
