# Lightweight Fine-Tuning Project with BERT

This Notebook contains a walk through:
* The evaluation of BERT LLM model
* Finetuning BERT with custom data (imdb) with different parameters
* Compare the accuracy between the original model and after fine-tuning

These tools and techiniques are used throughout the document:

* LLM Model: **google-bert/bert-base-cased**
* PEFT technique: **LoRA**
* Evaluation approach: **Using accuracy metric**
* Fine-tuning dataset: **stanfordnlp/imdb**

## Loading and Evaluating a Foundation Model

In this section, a pre-trained Hugging Face model is loaded and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [None]:
# You kight need to restart the kernel after this command to avoid errors when calling AutoModelForSequenceClassification
!pip install -r requirements.txt -q

## Prepare the Foundation Model

### Load a pretrained HF model

In [None]:
from transformers import AutoTokenizer
model_id="google-bert/bert-base-cased"
tokenizer = AutoTokenizer.from_pretrained(model_id)

### Load and preprocess a dataset

In [None]:
from datasets import load_dataset
dataset = load_dataset("stanfordnlp/imdb")

In [None]:
# IMDB contain train, test and unsupervised datasets with 25000, 25000 and 500000 samples respectively.
dataset

In [None]:
def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_train_datasets = dataset["train"].map(tokenize_function, batched=True)
tokenized_test_datasets = dataset["test"].map(tokenize_function, batched=True)

In [None]:
train_dataset_size = 6000
eval_dataset_size = 3000

In [None]:
small_train_dataset = tokenized_train_datasets.shuffle(seed=42).select(range(train_dataset_size))
small_eval_dataset = tokenized_test_datasets.shuffle(seed=42).select(range(eval_dataset_size))

In [None]:
print(small_eval_dataset, small_train_dataset)

### Evaluate the pretrained model

In [None]:
#Create a map between expected ids and labels
id2label = {0: "NEGATIVE", 1: "POSITIVE"}
label2id = {"NEGATIVE": 0, "POSITIVE": 1}

In [None]:
from transformers import AutoModelForSequenceClassification
import torch 

model = AutoModelForSequenceClassification.from_pretrained(
    model_id, 
    num_labels=2,
    id2label=id2label,
    label2id=label2id
)

In [None]:
def count_params(model, is_human: bool = False):
    params: int = sum(p.numel() for p in model.parameters() if p.requires_grad)
    return f"{params / 1e6:.2f}M" if is_human else params

print(model)
print("Total # of params for the model {}: {}".format(model_id, count_params(model, is_human=True)))

In [None]:
import random


# Generate a random integer within the range of eval_dataset_size
x = random.randint(0, eval_dataset_size)

print("text: {},\nlabel:{} = {}".format(
    small_eval_dataset["text"][x],
    small_eval_dataset["label"][x],
    id2label[small_eval_dataset["label"][x]])
)

In [None]:
#Use accuracy metric
#Function inspired from https://huggingface.co/learn/nlp-course/en/chapter3/3#evaluation
import numpy as np

import evaluate

metric = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)


In [None]:
from transformers import Trainer, TrainingArguments

trainer = Trainer(
    model=model,
    args=TrainingArguments(
        "evaluate_foundational_model",
        eval_strategy="epoch",
        per_device_train_batch_size=16,
        per_device_eval_batch_size=16
    ),
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
    processing_class=tokenizer,
)


In [None]:
%%time
import numpy as np

# Let's see the perfomance of the foundation model before any prior training
trainer_foundation_eval=trainer.evaluate(eval_dataset=small_eval_dataset)
trainer_foundation_eval

###### **Without any fine tuning the model "google-bert/bert-base-cased" has an _accuracy_ of _0.4963_**
###### **Not great, but we will improve it sustantialy with fione tuning in the next steps**

### Saving the foundation model to local directory

In [None]:
# Save the foundational model to the local directory "foundational_model/" 
trainer.save_model("foundational_model/")

## Performing Parameter-Efficient Fine-Tuning

Create two PEFT models to test two different lora_config values and compare the results between the two. Save the PEFT model weights for each training.

### PEFT model (Same foundational model for the two PEFT configuraiotns)

In [None]:
peft_model_id = model_id 
model = AutoModelForSequenceClassification.from_pretrained(
    peft_model_id,
    num_labels=2,
    id2label=id2label,
    label2id=label2id
)

### Create a PEFT model #1

In [None]:
# Create an dictiopnary with two set of values for two trainings and performance comparaison
peft_values= {
    "values1": {
        "r": 16,
        "lora_alpha": 16,
        "lora_dropout": 0.1,
        "bias": "none"
    },
    "values2": {
        "r": 64,
        "lora_alpha": 128,
        "lora_dropout": 0.01,
        "bias": "none"
    }
}

In [None]:
from peft import LoraConfig, TaskType

lora_config1 = LoraConfig(
    task_type=TaskType.TOKEN_CLS,
    r=peft_values["values1"]["r"],
    lora_alpha=peft_values["values1"]["lora_alpha"],
    lora_dropout=peft_values["values1"]["lora_dropout"],
    bias=peft_values["values1"]["bias"],
    target_modules=["query", "value"]
)

In [None]:
from peft import get_peft_model

lora_model1 = get_peft_model(model, lora_config1)
lora_model1.print_trainable_parameters()

**Here we can see the advantage of using PEFT fine tuning instead of training the whole model: only 0.543% of the 109 million parameters BERT has.**

### Train the PEFT model #1

In [None]:
training_args_peft1 = TrainingArguments(
    "trainer_peft1_output",
    eval_strategy="epoch",
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16)

In [None]:
%%time
trainer1 = Trainer(
    model=lora_model1,
    args=training_args_peft1,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
)
trainer1.train()

In [None]:
%%time
trainer1_eval=trainer1.evaluate()
trainer1_eval

##### **With fine tuning the model1 "google-bert/bert-base-cased" the _accuracy_ is now _0.882_ much better than the performance of the original foundational model.**

### Save the PEFT model #1

In [None]:
lora_model1.save_pretrained("trainer_peft_1")

In [None]:
!ls -ltra trainer_peft_1/

### Create PEFT model #2

In [None]:
from peft import LoraConfig, TaskType

lora_config2 = LoraConfig(
    task_type=TaskType.TOKEN_CLS,
    r=peft_values["values2"]["r"],
    lora_alpha=peft_values["values2"]["lora_alpha"],
    lora_dropout=peft_values["values2"]["lora_dropout"],
    bias=peft_values["values2"]["bias"],
    target_modules=["query", "value"]
)

In [None]:
from peft import get_peft_model

lora_model2 = get_peft_model(model, lora_config2)
lora_model2.print_trainable_parameters()

### Train PEFT model #2

In [None]:
training_args_peft2 = TrainingArguments(
    "trainer_peft2_output",
    eval_strategy="epoch",
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16
)

In [None]:
%%time
trainer2 = Trainer(
    model=lora_model2,
    args=training_args_peft2,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics
)
trainer2.train()

In [None]:
%%time
trainer2_eval=trainer2.evaluate()
trainer2_eval

**With fine tuning the model2 "google-bert/bert-base-cased" the _accuracy_ is now _0.899_ much better than the performance of the original foundational model and the PEFT1 model.**

### Save the PEFT model #2

In [None]:
lora_model1.save_pretrained("trainer_peft_2")

In [None]:
!ls -ltra trainer_peft_2/

## Performing Inference with a PEFT Model

Loading the PEFT model weights that has the best accuracy and evaluate the performance of the trained PEFT model.

## Perform Inference Using the Fine-Tuned Model

### Load the saved PEFT model

We load the best PEFT model of the two we created: "trainer_peft_2"

In [None]:
saved_model = AutoModelForSequenceClassification.from_pretrained("trainer_peft_2")


### Evaluate the fine-tuned model

In [None]:
# %%time

x = random.randint(0, eval_dataset_size)

text_to_classify=small_eval_dataset["text"][x]

print("Text from eval_dataset: {} \n\nlabel from eval_dataset:{}".format(
    small_eval_dataset["text"][x], 
    id2label[small_eval_dataset["label"][x]]
))


def classify(text):
    #Tokenize the text and return a PyTorch tensor
    inputs = tokenizer(text, truncation=True, padding=True, return_tensors="pt")

    #Pass the tokinezed text to the model and get logits
    with torch.no_grad():
        outputs = saved_model(**inputs)
    # Get the predicted class
    predictions = torch.argmax(outputs.logits, dim=-1)
    print(f"\nPredicted class: {model.config.id2label[predictions.item()]} \n")


classify(text_to_classify)

**The inference classified the text as POSITIVE which matches the label in the dataset.**