# Lightweight Fine-Tuning Project


#### PEFT technique: 
LoRA is the PEFT technique used. This is because LoRA is compatible with all models at least for now. Also the LoRA is qite easy to finetune and most impotantly only a small pretrained size model is created from the foundation model, hence good space optimization
#### Model: 
I have decided to use the `distilbert-base-uncased` foundation model. The preference for this is because it's just eppropriate and good and classicication which is the intention here

#### Evaluation approach:
A simple method passed to the trainer and used for evaluation

#### Fine-tuning dataset: 
The dataset was preprocessed basically by tokenization and save dataset was used for training both the lora model and the none lora model

## Loading and Evaluating a Foundation Model

In [1]:
#! pip install -U datasets

In [2]:
# Import the datasets and transformers packages
import torch
from datasets import load_dataset, DatasetDict
from transformers import (set_seed)

torch.cuda.empty_cache()

set_seed(42)

splits = ["train", "test"]
dataset = {split: ds for split, ds in zip(splits, load_dataset("imdb", split=splits))}

for split in splits:
    dataset[split] = dataset[split].shuffle(seed=42).select(range(1000))

# Reserve 0.2 percent for inference(test)
# and 0.98 for evaluation(validation)
val_test_ds = dataset["test"].train_test_split(test_size=0.02)
dataset = DatasetDict(
    train=dataset["train"],
    test=val_test_ds["test"],
    validate=val_test_ds["train"]
)

print(dataset)

  from .autonotebook import tqdm as notebook_tqdm


DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 1000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 20
    })
    validate: Dataset({
        features: ['text', 'label'],
        num_rows: 980
    })
})


In [3]:
from transformers import AutoTokenizer
feature_field = "text"
model_name = "distilbert-base-uncased"
batch_size = 4
num_epochs = 1

tokenizer = AutoTokenizer.from_pretrained(model_name)
#tokenizer.padding_side = "left"
#tokenizer.pad_token = tokenizer.eos_token

def tokenizer_fn(x):
    return tokenizer(x[feature_field], padding='max_length', truncation=True, return_tensors="pt")
    

In [4]:
tokenized_dataset = {}
for split in ["train", "validate"]:
    tokenized_dataset[split] = dataset[split].map(tokenizer_fn, batched=True)
    
    
tokenized_train_dataset=tokenized_dataset["train"]
tokenized_eval_dataset=tokenized_dataset["validate"]


In [5]:
import torch
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
id2label={0: "NEGATIVE", 1: "POSITIVE"}  # For converting predictions to strings
label2id={"NEGATIVE": 0, "POSITIVE": 1}
torch.cuda.empty_cache()

In [6]:
from transformers import AutoModelForSequenceClassification

# Create foundation model
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=2,
    id2label=id2label,
    label2id=label2id
)
for param in model.base_model.parameters():
    param.requires_grad = False
model.to(device)
model.config.pad_token_id = tokenizer.pad_token_id

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'pre_classifier.weight', 'pre_classifier.bias', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [7]:
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}

In [8]:
import numpy as np
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments

training_args=TrainingArguments(
        output_dir=f"./data/{model_name}_trained",
        learning_rate=2e-3,
        per_device_train_batch_size=batch_size,
        per_device_eval_batch_size=batch_size,
        num_train_epochs=num_epochs,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train_dataset,
    eval_dataset=tokenized_eval_dataset,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

trainer.train()

You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.40558,0.821429


TrainOutput(global_step=250, training_loss=0.55659375, metrics={'train_runtime': 39.0389, 'train_samples_per_second': 25.615, 'train_steps_per_second': 6.404, 'total_flos': 132467398656000.0, 'train_loss': 0.55659375, 'epoch': 1.0})

In [9]:
trainer.evaluate()

{'eval_loss': 0.405579537153244,
 'eval_accuracy': 0.8214285714285714,
 'eval_runtime': 16.8084,
 'eval_samples_per_second': 58.304,
 'eval_steps_per_second': 14.576,
 'epoch': 1.0}

## Performing Parameter-Efficient Fine-Tuning

In [10]:

from peft import get_peft_config, get_peft_model, LoraConfig, TaskType
from transformers import AutoModelForSequenceClassification

peft_config = LoraConfig(
    task_type=TaskType.SEQ_CLS, inference_mode=False, r=16, lora_alpha=32, lora_dropout=0.05,
    target_modules=["q_lin"]
)
_model = AutoModelForSequenceClassification.from_pretrained(model_name,
    num_labels=2,
    id2label=id2label,
    label2id=label2id,
)

_model.to(device)
_model.config.pad_token_id = tokenizer.pad_token_id
lora_model = get_peft_model(_model, peft_config)
lora_model.print_trainable_parameters()

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'pre_classifier.weight', 'pre_classifier.bias', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 1,331,716 || all params: 67,694,596 || trainable%: 1.967241225577297


In [11]:
import numpy as np
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments


training_args = TrainingArguments(
        output_dir=f"./data/{model_name}_peft_finetuned",
        learning_rate=2e-3,
        per_device_train_batch_size=batch_size,
        per_device_eval_batch_size=batch_size,
        num_train_epochs=num_epochs,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    )

trainer2 = Trainer(
    model=lora_model,
    args=training_args,
    train_dataset=tokenized_train_dataset,
    eval_dataset=tokenized_eval_dataset,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

trainer2.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.391402,0.85102


TrainOutput(global_step=250, training_loss=0.48859368896484373, metrics={'train_runtime': 54.4246, 'train_samples_per_second': 18.374, 'train_steps_per_second': 4.594, 'total_flos': 134739406848000.0, 'train_loss': 0.48859368896484373, 'epoch': 1.0})

In [12]:
trainer2.save_model(f"{model_name}-peft-lora") # or lora_model.save_pretrained("gpt2-peft-lora")

In [13]:
trainer2.evaluate()

{'eval_loss': 0.39140161871910095,
 'eval_accuracy': 0.8510204081632653,
 'eval_runtime': 17.3411,
 'eval_samples_per_second': 56.513,
 'eval_steps_per_second': 14.128,
 'epoch': 1.0}

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [14]:
#To load a PEFT model for inference:
import torch

peft_model_id = f"{model_name}-peft-lora"
from peft import PeftConfig, AutoPeftModelForSequenceClassification
config = PeftConfig.from_pretrained(peft_model_id)
inference_model = AutoPeftModelForSequenceClassification.from_pretrained(peft_model_id)
inference_model.eval()

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'pre_classifier.weight', 'pre_classifier.bias', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): DistilBertForSequenceClassification(
      (distilbert): DistilBertModel(
        (embeddings): Embeddings(
          (word_embeddings): Embedding(30522, 768, padding_idx=0)
          (position_embeddings): Embedding(512, 768)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (transformer): Transformer(
          (layer): ModuleList(
            (0-5): 6 x TransformerBlock(
              (attention): MultiHeadSelfAttention(
                (dropout): Dropout(p=0.1, inplace=False)
                (q_lin): Linear(
                  in_features=768, out_features=768, bias=True
                  (lora_dropout): ModuleDict(
                    (default): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (default): Linear(in_features=768, out_features=16, bias=Fa

In [15]:
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

### Manual Evaluation with unknown test data

Note that same metrics and dataset is used for training both models. Also same unknown test data is used for this evaluation of both model.

In [16]:
infer_reviews_count = 20
reviews = dataset["test"]["text"][-infer_reviews_count:]
labels = dataset["test"]["label"][-infer_reviews_count:]

In [17]:

def get_prediction(review, pred_model):
    """Given a review, return the predicted sentiment"""
    pred_model.to(device)
    inputs = tokenizer(review, padding='max_length', truncation=True, return_tensors="pt")
    inputs.to(device)
    outputs = pred_model(**inputs)

    predictions = torch.argmax(outputs.logits, dim=-1)

    return id2label[predictions.item()]


In [18]:
def add_emoji(is_true: bool) -> str:
    return '✅' if is_true else '😡' 

original_model_correct_predtions_count = 0
peft_model_correct_prediction_count = 0
total_lebel_count = len(labels)
counter = 0
for review, label in zip(reviews, labels):
    counter += 1
    print(f"({counter}) Review: {review[:80]} \n... {review[-80:]}")
    #Peft Trained model prediction
    peft_model_prediction = get_prediction(review, inference_model)
    is_peft_model_predict_correct = label2id[peft_model_prediction] == label
    if is_peft_model_predict_correct:
        peft_model_correct_prediction_count += 1

    # Original trained model prediction
    original_model_prediction = get_prediction(review, model)
    is_original_model_predict_correct = label2id[original_model_prediction] == label
    if is_original_model_predict_correct:
        original_model_correct_predtions_count += 1

          
    print(f"Actual Label: ---------    {id2label[label]}\n"
          f"Original Model prediction: {original_model_prediction} {add_emoji(is_original_model_predict_correct)}\n"
          f"Peft Model prediction      {peft_model_prediction} {add_emoji(is_peft_model_predict_correct)}\n")

print("---Manual Inference Using same unknown test data---")
print(f"Original Model: Correctly predicted {original_model_correct_predtions_count} out of {total_lebel_count}")
print(f"Peft Model: Correctly predicted {peft_model_correct_prediction_count} out of {total_lebel_count}")
                     

(1) Review: I watched this movie which I really thought had a promising beginning but then i 
... ess when it comes to controversial matters, weirdness and originality in movies.
Actual Label: ---------    NEGATIVE
Original Model prediction: NEGATIVE ✅
Peft Model prediction      NEGATIVE ✅

(2) Review: This movie is perfect for any aspiring screen writer, actor or director. By watc 
... idering to watch this movie so they can go do something decent with their lives.
Actual Label: ---------    NEGATIVE
Original Model prediction: NEGATIVE ✅
Peft Model prediction      NEGATIVE ✅

(3) Review: I'm a Boorman fan, but this is arguably his least successful film. Comedy has ne 
... orman wrote the script with his daughter, Telsche, who died a couple years ago.)
Actual Label: ---------    NEGATIVE
Original Model prediction: POSITIVE 😡
Peft Model prediction      POSITIVE 😡

(4) Review: I wonder why I haven't heard of this movie before. It's truly a magnificent come 
... acter. If you liked the Ta

#### Inference from the Peft Trained LoRA model vs the original Loaded model

From the manual evaluation above, the perf model performed better much better. Using 20 sample unknown test data, the peft model predicted 17 out of 20 reviews correctly. But the original model trained without peft LoRA predicted only 14 out of 20 correctly.

### Inference Training Evaluation
Note that same metrics and dataset is used for training both models. Also same unknown test data is used for this evaluation of both model.

With the appropriate hyper-parameter set for the lora-model during training for the peft model, the peft model performed better during the training evaluation.

##### Originally trained model result
```json
{'eval_loss': 0.405579537153244,
 'eval_accuracy': 0.8214285714285714,
 'eval_runtime': 16.8084,
 'eval_samples_per_second': 58.304,
 'eval_steps_per_second': 14.576,
 'epoch': 1.0}
 ```
 
 ####W Peft trained model with LoRA result
```json
{'eval_loss': 0.39140161871910095,
 'eval_accuracy': 0.8510204081632653,
 'eval_runtime': 17.3411,
 'eval_samples_per_second': 56.513,
 'eval_steps_per_second': 14.128,
 'epoch': 1.0}
 ```

It's evident from the above the accuracy `eval_accuracy` of the peft trained model with LoRA is `0.85` againt `0.82` for the original loaded model. Though not too big but signaficant.

Also the `eval_loss` for the peft trained model is also 0.39 which is small than 0.41 for the original model. This also indicates better performance of the peft model