# Lightweight Fine-Tuning Project

Our goal is to compare traditional Full fine-tuning BERT model on sentiment analysis task (actually DistilBERT) with PEFT Bert model on the same tasks using the Hugging Face Transformers library. For evaluation we use accurary approach and We will use the IMDB movie review dataset to train and evaluate Bert model. The IMDB dataset contains movie reviews that are labeled as either positive or negative.

* PEFT technique : LoRA
* Model : BERT 
* Evaluation approach : Accuracy
* Fine-tuning dataset :  IMDB dataset

In [1]:
# Install and update some packages first

### then restart the kernel to use updated packages ###

!pip install -q --upgrade datasets transformers[torch] peft 

[0m

## Loading and Evaluating a Foundation Model

In the cells below, We load Bert pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
# Import the datasets packages from HugginFace
from datasets import load_dataset_builder, load_dataset
from transformers import (AutoTokenizer, AutoModelForSequenceClassification, 
                          DataCollatorWithPadding, Trainer, TrainingArguments)
import os
import torch
import numpy as np
import pandas as pd

print("current directory is :", os.getcwd())

# Attempt GPU; if not, stay on CPU
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print(device)

current directory is : /workspace
cuda:0


In [2]:
# Load the train and test splits of the imdb dataset
splits = ["train", "test"]

dataset={split:ds for split, ds in zip(splits, load_dataset("imdb",split=splits))}

README.md:   0%|          | 0.00/7.81k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/21.0M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/20.5M [00:00<?, ?B/s]

unsupervised-00000-of-00001.parquet:   0%|          | 0.00/42.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

In [3]:
# check data
dataset

{'train': Dataset({
     features: ['text', 'label'],
     num_rows: 25000
 }),
 'test': Dataset({
     features: ['text', 'label'],
     num_rows: 25000
 })}

In [4]:
# We choose smaller subset of dataset to fine-tune to reduce the time it takes to train
ds={split:dataset[split].shuffle(seed=42).select(range(500)) for split in splits}
ds

{'train': Dataset({
     features: ['text', 'label'],
     num_rows: 500
 }),
 'test': Dataset({
     features: ['text', 'label'],
     num_rows: 500
 })}

In [5]:
# show few lines of text
ds['train']["text"][:3]

['There is no relation at all between Fortier and Profiler but the fact that both are police series about violent crimes. Profiler looks crispy, Fortier looks classic. Profiler plots are quite simple. Fortier\'s plot are far more complicated... Fortier looks more like Prime Suspect, if we have to spot similarities... The main character is weak and weirdo, but have "clairvoyance". People like to compare, to judge, to evaluate. How about just enjoying? Funny thing too, people writing Fortier looks American but, on the other hand, arguing they prefer American series (!!!). Maybe it\'s the language, or the spirit, but I think this series is more English than American. By the way, the actors are really good and funny. The acting is not superficial at all...',
 'This movie is a great. The plot is very true to the book which is a classic written by Mark Twain. The movie starts of with a scene where Hank sings a song with a bunch of kids called "when you stub your toe on the moon" It reminds m

In [6]:
label={0: "NEGATIVE", 1: "POSITIVE"}
# show few lines of label (1=pos,0=neg)
ds['train']["label"][:3]

[1, 1, 0]

In [7]:
# show few lines of text
ds['test']["text"][:3]

["<br /><br />When I unsuspectedly rented A Thousand Acres, I thought I was in for an entertaining King Lear story and of course Michelle Pfeiffer was in it, so what could go wrong?<br /><br />Very quickly, however, I realized that this story was about A Thousand Other Things besides just Acres. I started crying and couldn't stop until long after the movie ended. Thank you Jane, Laura and Jocelyn, for bringing us such a wonderfully subtle and compassionate movie! Thank you cast, for being involved and portraying the characters with such depth and gentleness!<br /><br />I recognized the Angry sister; the Runaway sister and the sister in Denial. I recognized the Abusive Husband and why he was there and then the Father, oh oh the Father... all superbly played. I also recognized myself and this movie was an eye-opener, a relief, a chance to face my OWN truth and finally doing something about it. I truly hope A Thousand Acres has had the same effect on some others out there.<br /><br />Sinc

In [8]:
# show few lines of label (1=pos,0=neg)
ds['test']["label"][:3]

[1, 1, 0]

###  Tokenize text

Models cannot process raw text, so you’ll need to convert the text into numbers (integers). Tokenization provides a way to do this by dividing text into individual words called tokens. Tokens are finally converted to numbers.

In [9]:
# Load the tokenizer for Bert
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

In [10]:
def tokenization(example):
    """
    tokenize your entire dataset

    """
    return tokenizer(example["text"], padding=True, truncation=True)

In [11]:
tokenize_ds={split:ds[split].map(tokenization, batched=True) for split in splits}

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

In [12]:
# check
tokenize_ds

{'train': Dataset({
     features: ['text', 'label', 'input_ids', 'attention_mask'],
     num_rows: 500
 }),
 'test': Dataset({
     features: ['text', 'label', 'input_ids', 'attention_mask'],
     num_rows: 500
 })}

### Train Bert model

Now it's time to train our model. We'll use the Trainer class from the 🤗 Transformers library to do this. The Trainer class provides a high-level API that abstracts away a lot of the training loop.

First we'll define a function to compute our accuracy metreic then we make the Trainer.

Let's take this opportunity to learn about the DataCollator. According to the HuggingFace documentation:

Data collators are objects that will form a batch by using a list of dataset elements as input. These elements are of the same type as the elements of train_dataset or eval_dataset.

To be able to build batches, data collators may apply some processing (like padding).

In [33]:
bert_model=AutoModelForSequenceClassification.from_pretrained(
        "distilbert-base-uncased",
        num_labels=2,
        id2label={0: "NEGATIVE", 1: "POSITIVE"},  # For converting predictions to strings
        label2id={"NEGATIVE": 0, "POSITIVE": 1},
       
)


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [34]:
training_args=TrainingArguments(
    output_dir="./result_non_PEFT/sentiment_analysis",
    learning_rate=1e-4,
    per_device_train_batch_size=16, 
    per_device_eval_batch_size=16,   
    num_train_epochs=3,
    weight_decay=0.001,  
    logging_strategy="epoch",
    eval_strategy="epoch",         
    save_strategy="epoch",         
    load_best_model_at_end=True
)

In [35]:
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}

In [36]:
trainer = Trainer(
    model=bert_model,
    args=training_args,
    train_dataset=tokenize_ds["train"],
    eval_dataset=tokenize_ds["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer, padding=True, return_tensors="pt"),
    compute_metrics=compute_metrics
    )

In [37]:
# run trainer to train bert model
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,0.6464,0.522028,0.766
2,0.2916,0.382,0.86
3,0.1163,0.463799,0.852


TrainOutput(global_step=96, training_loss=0.3514350006977717, metrics={'train_runtime': 113.9992, 'train_samples_per_second': 13.158, 'train_steps_per_second': 0.842, 'total_flos': 198701097984000.0, 'train_loss': 0.3514350006977717, 'epoch': 3.0})

### Evaluate the model

Evaluating the model is as simple as calling the evaluate method on the trainer object. This will run the model on the test set and compute the metrics we specified in the compute_metrics function.

In [None]:
# Evaluate the model
trainer.evaluate()

In [None]:
pd.set_option("display.max_colwidth", None)

df = pd.DataFrame(tokenize_ds["test"])
df = df[["text", "label"]]

# Replace <br /> tags in the text with spaces
df["text"] = df["text"].str.replace("<br />", " ")

# display few row of raw data
df.head()

In [None]:
# Add the model predictions to the dataframe
predictions = trainer.predict(tokenize_ds["test"])
df["predicted_label"] = np.argmax(predictions[0], axis=1)

df.head()

## Performing Parameter-Efficient Fine-Tuning

In the cells below, We create a PEFT model from the loaded model, run a training loop, and save the PEFT model weights.

In [None]:
from peft import LoraConfig, TaskType, get_peft_model

Lora_config = LoraConfig(task_type=TaskType.SEQ_CLS,
                         target_modules=["q_lin","k_lin","v_lin"],
                         inference_mode=False, 
                         r=8, lora_alpha=32, 
                         lora_dropout=0.1
                        )

In [None]:
# create a PeftModel with the get_peft_model() function It takes a base model, which you can load from the Transformers library 
# and the LoraConfig containing the parameters for how to configure a model for training with LoRA.

peft_bert_model=get_peft_model(bert_model,Lora_config)

peft_bert_model


In [None]:
peft_bert_model.print_trainable_parameters()

### Train peft model

Each PEFT method is defined by a PeftConfig class that stores all the important parameters for building a PeftModel. For example, to train with LoRA, load and create a LoraConfig class and specify the following parameters:

- task_type: the task to train for (text classification language modeling in this case)
- inference_mode: whether you’re using the model for inference or not
- r: the dimension of the low-rank matrices
- lora_alpha: the scaling factor for the low-rank matrices
- lora_dropout: the dropout probability of the LoRA layers

In [None]:
training_args_peft=TrainingArguments(
    output_dir="./result_PEFT/sentiment_analysis",
    learning_rate=1e-4,
    per_device_train_batch_size=16, 
    per_device_eval_batch_size=16,   
    num_train_epochs=10,
    weight_decay=0.01,
    eval_strategy="epoch",         
    save_strategy="epoch",         
    load_best_model_at_end=True
)

In [None]:
peft_trainer = Trainer(
    model=peft_bert_model,
    args=training_args_peft,
    train_dataset=tokenize_ds["train"],
    eval_dataset=tokenize_ds["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer, padding=True, return_tensors="pt"),
    compute_metrics=compute_metrics
    )

In [None]:
# run trainer to train bert model
peft_trainer.train()

In [None]:
# Evaluate the peft model
peft_trainer.evaluate()

In [None]:
pd.set_option("display.max_colwidth", None)

dfs = pd.DataFrame(tokenize_ds["test"])
dfs = dfs[["text", "label"]]

# Replace <br /> tags in the text with spaces
dfs["text"] = dfs["text"].str.replace("<br />", " ")

dfs.head()

In [None]:
# Add the model predictions to the dataframe
predictions = peft_trainer.predict(tokenize_ds["test"])
dfs["predicted_label"] = np.argmax(predictions[0], axis=1)

dfs.head()

## Performing Inference with a PEFT Model

In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [None]:
# save the PEFT model.
peft_bert_model.save_pretrained("lora_bert_model")

# save bert model
bert_model.save_pretrained("Bert_Model")

In [None]:
from peft import AutoPeftModelForSequenceClassification
from transformers import AutoTokenizer
import torch

model = AutoPeftModelForSequenceClassification.from_pretrained("lora_bert_model")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

model = model.to(device)
model.eval()

input_text = "I love Udacity deep learning course"
inputs = tokenizer(input_text, return_tensors="pt").to(device)

predictions= model(**inputs).logits.cpu().detach().numpy()

print('The sentiment anlysis of this `{input_text}` is:',label[np.argmax(predictions[0])])

In [None]:
BertModel = AutoModelForSequenceClassification.from_pretrained("Bert_Model")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

Model = BertModel.to(device)
Model.eval()

input_text = "I love Udacity deep learning course"
inputs = tokenizer(input_text, return_tensors="pt").to(device)

predictions= Model(**inputs).logits.cpu().detach().numpy()

print(f'The sentiment anlysis of this `{input_text}` is:',label[np.argmax(predictions[0])])