# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: LoRA
* Model: "distilbert-base-uncased"
* Evaluation approach: HuggingFace trainer.evaluate()
* Fine-tuning dataset: "stanfordnlp/imdb"

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
! pip install -q "evaluate==0.4.3"
! pip install -q "scikit-learn==1.6.0"

In [2]:
DEVICE = "cuda"
DATASET = "stanfordnlp/imdb"
DATASET_SEED = 23
SUBSET_SIZE = 100
PRETRAINED_MODEL = "distilbert-base-uncased"
FINE_TUNED_MODEL = "tashrifmahmud/sentiment_analysis_model_v2"

In [3]:
from datasets import load_dataset

# Load a dataset from the Hugging Face Hub.
dataset_splits = ["train", "test"]
try:
    # Example: IMDB dataset for sequence classification
    loaded_dataset = load_dataset(DATASET, split=dataset_splits)
    print("Dataset loaded successfully.")
except Exception as e:
    print(f"An error occurred while loading the dataset: {e}")
else:
    datasets = {split: ds for split, ds in zip(dataset_splits, loaded_dataset)}

# Take a subset of the dataset to reduce computational resources
try:
    # Thin out the dataset to make it run faster for this example.
    for split in dataset_splits:
        datasets[split] = datasets[split].shuffle(seed=DATASET_SEED).select(range(SUBSET_SIZE))
    print("Subset of the dataset created successfully.")
except Exception as e:
    print(f"An error occurred while creating a subset of the dataset: {e}")
else:
    print("Train split of dataset:")
    print(datasets["train"])

    print("Test split of dataset:")
    print(datasets["test"])

    print("Let's take a look at first data in the set:")
    print(datasets["train"][0])

Dataset loaded successfully.
Subset of the dataset created successfully.
Train split of dataset:
Dataset({
    features: ['text', 'label'],
    num_rows: 100
})
Test split of dataset:
Dataset({
    features: ['text', 'label'],
    num_rows: 100
})
Let's take a look at first data in the set:
{'text': "As soon as I heard about this film I knew I had to check it out. Well, I heard about it, then I found the trailer. After that, that's when I knew I had to see it. And I am so glad I did. You want to see classic television mixed with zombies? No? Then get lost.<br /><br />FIDO is a movie unlike anything I've ever seen. Well, actually, it kind of is. It's kind of like a Lassie episode and a Zombie film. Though when combined, it feels completely new and original. FIDO is about a little boy named Timmy and his new pet Fido. Well this new pet ain't no squawking parakeet or some potty-trained puppy. It's a re-animated dead guy...a zombie. A large radiation cloud engulfed Earth which led to all o

In [4]:
# Load the tokenizer for the pretrained model.
from transformers import AutoTokenizer

try:
    pre_trained_tokenizer = AutoTokenizer.from_pretrained(PRETRAINED_MODEL)
    fine_tuned_tokenizer = AutoTokenizer.from_pretrained(FINE_TUNED_MODEL)
    print("Tokenizers loaded successfully.")
except Exception as e:
    print(f"An error occurred while loading the tokenizers: {e}")

# Tokenize both the dataset subsets: train and test.
def preprocess_function(examples):
    """Preprocess the imdb dataset by returning tokenized examples"""
    return pre_trained_tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_ds = {}
try:
    for split in dataset_splits:
        tokenized_ds[split] = datasets[split].map(preprocess_function, batched=True)
    print("Dataset tokenized successfully.")
except Exception as e:
    print(f"An error occurred while tokenizing the dataset: {e}")
else:
    # Check that we tokenized the examples properly
    assert tokenized_ds["train"][0]["input_ids"][:5] == [101, 2004, 2574, 2004, 1045]

    print("Show the first example of tokenized training set:")
    print(tokenized_ds["train"][0]["input_ids"])

Tokenizers loaded successfully.


Map:   0%|          | 0/100 [00:00<?, ? examples/s]

Dataset tokenized successfully.
Show the first example of tokenized training set:
[101, 2004, 2574, 2004, 1045, 2657, 2055, 2023, 2143, 1045, 2354, 1045, 2018, 2000, 4638, 2009, 2041, 1012, 2092, 1010, 1045, 2657, 2055, 2009, 1010, 2059, 1045, 2179, 1996, 9117, 1012, 2044, 2008, 1010, 2008, 1005, 1055, 2043, 1045, 2354, 1045, 2018, 2000, 2156, 2009, 1012, 1998, 1045, 2572, 2061, 5580, 1045, 2106, 1012, 2017, 2215, 2000, 2156, 4438, 2547, 3816, 2007, 14106, 1029, 2053, 1029, 2059, 2131, 2439, 1012, 1026, 7987, 1013, 1028, 1026, 7987, 1013, 1028, 10882, 3527, 2003, 1037, 3185, 4406, 2505, 1045, 1005, 2310, 2412, 2464, 1012, 2092, 1010, 2941, 1010, 2009, 2785, 1997, 2003, 1012, 2009, 1005, 1055, 2785, 1997, 2066, 1037, 27333, 2666, 2792, 1998, 1037, 11798, 2143, 1012, 2295, 2043, 4117, 1010, 2009, 5683, 3294, 2047, 1998, 2434, 1012, 10882, 3527, 2003, 2055, 1037, 2210, 2879, 2315, 27217, 1998, 2010, 2047, 9004, 10882, 3527, 1012, 2092, 2023, 2047, 9004, 7110, 1005, 1056, 2053, 5490, 6692,

In [5]:
# Load the pretrained model for sequence classification
from transformers import AutoModelForSequenceClassification, BitsAndBytesConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_storage=torch.bfloat16,
)

try:
    pre_trained_model = AutoModelForSequenceClassification.from_pretrained(
        PRETRAINED_MODEL,
        # quantization_config=bnb_config,
        # torch_dtype=torch.bfloat16,
        num_labels=2,
        id2label={0: "NEGATIVE", 1: "POSITIVE"},  # For converting predictions to strings
        label2id={"NEGATIVE": 0, "POSITIVE": 1},
    )
    print("Pretrained Model loaded successfully.")
except Exception as e:
    print(f"An error occurred while loading the model: {e}")
else:
    pre_trained_model.to(DEVICE)

# Unfreeze all the model parameters.
# Hint: Check the documentation at https://huggingface.co/transformers/v4.2.2/training.html
for param in pre_trained_model.base_model.parameters():
    param.requires_grad = True

print("Pretrained Model classifiers:")
print(pre_trained_model.classifier)

print("Inspect Pretrained Model:")
print(pre_trained_model)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'classifier.weight', 'classifier.bias', 'pre_classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Pretrained Model loaded successfully.
Pretrained Model classifiers:
Linear(in_features=768, out_features=2, bias=True)
Inspect Pretrained Model:
DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (s

In [6]:
# Load the fine-tuned model for sequence classification
try:
    fine_tuned_model = AutoModelForSequenceClassification.from_pretrained(
        FINE_TUNED_MODEL, num_labels=2,
        id2label={0: "NEGATIVE", 1: "POSITIVE"},
        label2id={"NEGATIVE": 0, "POSITIVE": 1},
    )
    print("Fine tuned Model loaded successfully.")
except Exception as e:
    print(f"An error occurred while loading the model: {e}")
else:
    fine_tuned_model.to(DEVICE)

print("Fine Tuned Model classifiers:")
print(fine_tuned_model.classifier)

print("Inspect Fine Tuned Model:")
print(fine_tuned_model)

Fine tuned Model loaded successfully.
Fine Tuned Model classifiers:
Linear(in_features=768, out_features=2, bias=True)
Inspect Fine Tuned Model:
DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (s

In [7]:
import numpy as np
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments

# Helper function to compute prediction accuracy
def compute_metrics(eval_predictions):
    e_predictions, e_labels = eval_predictions
    e_predictions = np.argmax(e_predictions, axis=1)
    return {"accuracy": (e_predictions == e_labels).mean()}

# The HuggingFace Trainer class handles the training and eval loop for PyTorch for us.
# Read more about it here https://huggingface.co/docs/transformers/main_classes/trainer
trainer = Trainer(
    model=pre_trained_model,
    args=TrainingArguments(
        output_dir="./data/sentiment_analysis/pre_trained/",
        learning_rate=2e-3,
        per_device_train_batch_size=4,
        per_device_eval_batch_size=4,
        num_train_epochs=1,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    tokenizer=pre_trained_tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=pre_trained_tokenizer),
    compute_metrics=compute_metrics,
)

print("Evaluate pre-train foundation_model.")
pre_train_evaluate_results = trainer.evaluate()
print(pre_train_evaluate_results)

print("Full train the foundation_model.")
trainer.train()

print("Evaluate post-train foundation_model.")
post_train_evaluate_results = trainer.evaluate()
print(post_train_evaluate_results)

You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Evaluate pre-train foundation_model.


{'eval_loss': 0.701190173625946, 'eval_accuracy': 0.45, 'eval_runtime': 2.3774, 'eval_samples_per_second': 42.063, 'eval_steps_per_second': 10.516}
Full train the foundation_model.


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.692948,0.53


Checkpoint destination directory ./data/sentiment_analysis/pre_trained/checkpoint-25 already exists and is non-empty.Saving will proceed but saved results may be invalid.


Evaluate post-train foundation_model.


{'eval_loss': 0.6929481625556946, 'eval_accuracy': 0.53, 'eval_runtime': 1.6984, 'eval_samples_per_second': 58.88, 'eval_steps_per_second': 14.72, 'epoch': 1.0}


In [8]:
# The HuggingFace Trainer class for the fine_tuned_model.
trainer_ft = Trainer(
    model=fine_tuned_model,
    args=TrainingArguments(
        output_dir="./data/sentiment_analysis/fine_tuned/",
        learning_rate=2e-3,
        per_device_train_batch_size=2,
        per_device_eval_batch_size=2,
        num_train_epochs=1,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    tokenizer=fine_tuned_tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=fine_tuned_tokenizer),
    compute_metrics=compute_metrics,
)

print("Evaluate fine-tuned pre-trained model.")
fine_tuned_evaluate_results = trainer_ft.evaluate()
print(fine_tuned_evaluate_results)

You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Evaluate fine-tuned pre-trained model.


{'eval_loss': 0.18837320804595947, 'eval_accuracy': 0.94, 'eval_runtime': 1.712, 'eval_samples_per_second': 58.412, 'eval_steps_per_second': 29.206}


In [9]:
# Show some of the predictions.
import pandas as pd

df = pd.DataFrame(tokenized_ds["test"])
df = df[["text", "label"]]

# Replace <br /> tags in the text with spaces.
df["text"] = df["text"].str.replace("<br />", " ")

# Add the model predictions to the dataframe.
predictions = trainer.predict(tokenized_ds["test"])
df["predicted_label"] = np.argmax(predictions[0], axis=1)

print(df.head(10))

print("Look at some of the incorrect predictions.")
print(df[df["label"] != df["predicted_label"]].head(10))

                                                text  label  predicted_label
0  I've seen this amusing little 'brit flick'many...      1                0
1  Yep, Edward G. gives us a retro view of the cr...      0                0
2  Has there ever been an Angel of Death like MIM...      1                0
3  This is one of the worst Sandra Bullock movie ...      0                0
4  Dr Steven Segal saves the world from a deadly ...      0                0
5  An interesting concept turned into carnage... ...      0                0
6  Not good. Mostly because you don't give a damn...      0                0
7  The opening scene makes you feel like you're w...      0                0
8  I've watched a number of Wixel Pixel and Sub R...      0                0
9  Continuing in the string of "stalker/slasher" ...      0                0
Look at some of the incorrect predictions.
                                                 text  label  predicted_label
0   I've seen this amusing littl

## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [10]:
# Create a PEFT config with appropriate hyperparameters for the chosen model.
from peft import get_peft_model, LoraConfig, TaskType

# Set up a LoRA config
peft_config_seq_cls = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    inference_mode=False,
    r=8,
    target_modules=["q_lin", "v_lin"],
    lora_alpha=32,
    lora_dropout=0.1,
)

# CREATING A PEFT MODEL

# Using the PEFT config and foundation model, create a PEFT model.
peft_model_seq_cls = get_peft_model(pre_trained_model, peft_config_seq_cls)
peft_model_seq_cls.to(DEVICE)
peft_model_seq_cls.print_trainable_parameters()

print("PEFT SEQ_CLS MODEL:")
print(peft_model_seq_cls)

trainable params: 1,331,716 || all params: 67,694,596 || trainable%: 1.967241225577297
PEFT SEQ_CLS MODEL:
PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): DistilBertForSequenceClassification(
      (distilbert): DistilBertModel(
        (embeddings): Embeddings(
          (word_embeddings): Embedding(30522, 768, padding_idx=0)
          (position_embeddings): Embedding(512, 768)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (transformer): Transformer(
          (layer): ModuleList(
            (0-5): 6 x TransformerBlock(
              (attention): MultiHeadSelfAttention(
                (dropout): Dropout(p=0.1, inplace=False)
                (q_lin): Linear(
                  in_features=768, out_features=768, bias=True
                  (lora_dropout): ModuleDict(
                    (default): Dropout(p=0.1, inplace=False)
                  )
             

In [11]:
# TRAINING THE MODEL

import numpy as np
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments

# The HuggingFace Trainer class handles the training and eval loop for PyTorch for us.
# Read more about it here https://huggingface.co/docs/transformers/main_classes/trainer
peft_trainer_seq_cls = Trainer(
    model=peft_model_seq_cls,
    args=TrainingArguments(
        output_dir="./data/sentiment_analysis/peft_seq_cls/",
        # Set the learning rate
        learning_rate=2e-5,
        # Set the per device train batch size and eval batch size
        per_device_train_batch_size=16,
        per_device_eval_batch_size=16,
        # Evaluate and save the model after each epoch
        evaluation_strategy="epoch",
        save_strategy="epoch",

        num_train_epochs=2,
        weight_decay=0.01,
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    tokenizer=pre_trained_tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=pre_trained_tokenizer),
    compute_metrics=compute_metrics,
)

In [12]:
# Evaluate pre-trained lightweight fine-tuned model.
peft_seq_cls_pre_train_evaluate_results = peft_trainer_seq_cls.evaluate()
print(f"peft_seq_cls_pre_train_results: {peft_seq_cls_pre_train_evaluate_results}")

# Train the lightweight fine-tuned model.
peft_trainer_seq_cls.train()

# Evaluate the lightweight fine-tuned model post training.
peft_seq_cls_post_train_evaluate_results = peft_trainer_seq_cls.evaluate()
print(f"peft_seq_cls_post_train_results: {peft_seq_cls_post_train_evaluate_results}")

peft_seq_cls_pre_train_results: {'eval_loss': 0.6929482221603394, 'eval_accuracy': 0.53, 'eval_runtime': 1.6689, 'eval_samples_per_second': 59.918, 'eval_steps_per_second': 4.194}


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.692711,0.53
2,No log,0.692493,0.53


Checkpoint destination directory ./data/sentiment_analysis/peft_seq_cls/checkpoint-7 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory ./data/sentiment_analysis/peft_seq_cls/checkpoint-14 already exists and is non-empty.Saving will proceed but saved results may be invalid.


peft_seq_cls_post_train_results: {'eval_loss': 0.6924930810928345, 'eval_accuracy': 0.53, 'eval_runtime': 1.7085, 'eval_samples_per_second': 58.531, 'eval_steps_per_second': 4.097, 'epoch': 2.0}


In [13]:
# Show some of the predictions.

import pandas as pd

df = pd.DataFrame(tokenized_ds["test"])
df = df[["text", "label"]]

# Replace <br /> tags in the text with spaces
df["text"] = df["text"].str.replace("<br />", " ")

# Add the model predictions to the dataframe
predictions = peft_trainer_seq_cls.predict(tokenized_ds["test"])
df["predicted_label"] = np.argmax(predictions[0], axis=1)

print(df.head(10))

                                                text  label  predicted_label
0  I've seen this amusing little 'brit flick'many...      1                0
1  Yep, Edward G. gives us a retro view of the cr...      0                0
2  Has there ever been an Angel of Death like MIM...      1                0
3  This is one of the worst Sandra Bullock movie ...      0                0
4  Dr Steven Segal saves the world from a deadly ...      0                0
5  An interesting concept turned into carnage... ...      0                0
6  Not good. Mostly because you don't give a damn...      0                0
7  The opening scene makes you feel like you're w...      0                0
8  I've watched a number of Wixel Pixel and Sub R...      0                0
9  Continuing in the string of "stalker/slasher" ...      0                0


In [14]:
# SAVING THE TRAINED MODEL

peft_model_seq_cls.save_pretrained("./data/peft_model/seq_cls/")

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [15]:
# LOADING THE MODEL

from peft import PeftModel, PeftConfig, AutoPeftModelForSequenceClassification

# Load model and config from the persist storage.
peft_seq_cls_model_path = "./data/peft_model/seq_cls/"
peft_config = PeftConfig.from_pretrained(peft_seq_cls_model_path)

saved_peft_seq_cls_model = AutoPeftModelForSequenceClassification.from_pretrained(peft_seq_cls_model_path)
# saved_peft_seq_cls_model.to(DEVICE)

print("Reconstructed model from saved PEFT:")
print(saved_peft_seq_cls_model)

# Load AutoTokenizer
peft_tokenizer = AutoTokenizer.from_pretrained(peft_config.base_model_name_or_path)

items_indexes_for_manual_review = [0, 1, 13, 25, 46, 51, 64, 88, 90, 94, 95, 99]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'classifier.weight', 'classifier.bias', 'pre_classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Reconstructed model from saved PEFT:
PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): DistilBertForSequenceClassification(
      (distilbert): DistilBertModel(
        (embeddings): Embeddings(
          (word_embeddings): Embedding(30522, 768, padding_idx=0)
          (position_embeddings): Embedding(512, 768)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (transformer): Transformer(
          (layer): ModuleList(
            (0-5): 6 x TransformerBlock(
              (attention): MultiHeadSelfAttention(
                (dropout): Dropout(p=0.1, inplace=False)
                (q_lin): Linear(
                  in_features=768, out_features=768, bias=True
                  (lora_dropout): ModuleDict(
                    (default): Dropout(p=0.1, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (default): Linear(in_fe

In [16]:
# EVALUATING THE MODEL

test_items = tokenized_ds["test"].select(
    items_indexes_for_manual_review
)

# Print test_items columns.
print(test_items)

# Inspect test items and input_ids.
for test_item in test_items:
    print(test_item['input_ids'])
    print(test_item)
    break

Dataset({
    features: ['text', 'label', 'input_ids', 'attention_mask'],
    num_rows: 12
})
[101, 1045, 1005, 2310, 2464, 2023, 19142, 2210, 1005, 28101, 17312, 1005, 2116, 2335, 1012, 1996, 2069, 3291, 2003, 2049, 2747, 20165, 2006, 2678, 2030, 4966, 1012, 1045, 1005, 24529, 5121, 1037, 20127, 2005, 1037, 4966, 2713, 1012, 1996, 2172, 4771, 2957, 5207, 3248, 1005, 5061, 2100, 1005, 2019, 4654, 1011, 6986, 2137, 1010, 3005, 2074, 2042, 2207, 2013, 3827, 1010, 2002, 4858, 2370, 1037, 3105, 2004, 2019, 3751, 2937, 1999, 1037, 2924, 1010, 2009, 2035, 3632, 2092, 2127, 2002, 4858, 2370, 7861, 12618, 18450, 1999, 1037, 2924, 2002, 2923, 2007, 2010, 4654, 13675, 10698, 2229, 1010, 2585, 9152, 8159, 3248, 1996, 3040, 23356, 7332, 1010, 2049, 2019, 22249, 2210, 17083, 2361, 1010, 11504, 2996, 5033, 2030, 8133, 3016, 1010, 2097, 2272, 2000, 1996, 5343, 1012, 2298, 2041, 2005, 2198, 13919, 9082, 2077, 2002, 4930, 2009, 2502, 2007, 1005, 26822, 12734, 1005, 10642, 1997, 1996, 2439, 15745, 1005,

In [17]:
# Make a dataframe with the predictions and the text and the labels.
import pandas as pd
import torch

items_for_manual_review = tokenized_ds["test"].select(
    items_indexes_for_manual_review
)

predictions = []
for i in items_for_manual_review:
    input_tokens = peft_tokenizer(i['text'], return_tensors="pt")
    with torch.no_grad():
        logits = saved_peft_seq_cls_model(**input_tokens).logits
        predicted_class_id = logits.argmax().item()
        predictions.append(predicted_class_id)

print(predictions)

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]


In [18]:
# Table text, prediction and labels for the selected items.
df = pd.DataFrame(
    {
        "text": [item["text"] for item in items_for_manual_review],
        "predictions": predictions,
        "labels": [item["label"] for item in items_for_manual_review],
    }
)
print(df)

                                                 text  predictions  labels
0   I've seen this amusing little 'brit flick'many...            1       1
1   Yep, Edward G. gives us a retro view of the cr...            1       0
2   When you watch low budget horror movies as muc...            1       0
3   So funny is the perfect way to describe this 1...            1       1
4   THE CELL fascinated me at first glance. I was ...            1       1
5   I never comment on a film, but I have to say t...            1       0
6   Wesley Snipes is perfectly cast as Blade, a ha...            1       1
7   This is the best work i have ever seen on tele...            1       1
8   Mean spirited, and down right degrading adapta...            1       0
9   Possibly the worst movie I ever saw. The perso...            1       0
10  There won't be one moment in this film where y...            1       1
11  Persuaded by the 7.0 points in IMDb, which is ...            1       0


In [19]:
# Deduce labels of the selected items.
labels = [item["label"] for item in items_for_manual_review]

import evaluate

accuracy_metric = evaluate.load("accuracy")
results = accuracy_metric.compute(references=labels, predictions=predictions)
print(results)

{'accuracy': 0.5}


In [20]:
# SHOW PERFORMANCE IMPROVEMENTS

df = pd.DataFrame(
    {
        "stage": [
            "Foundation model before training",
            "Foundation model after training",
            "PEFT model before training",
            "PEFT model after training",
            "Fine tuned trained model",
        ],
        "details": [
            PRETRAINED_MODEL,
            PRETRAINED_MODEL,
            "PEFT LoRA SEQ_CLS Model",
            "PEFT LoRA SEQ_CLS Model",
            FINE_TUNED_MODEL,
        ],
        "eval_loss": map(lambda x: "{:.6f}".format(x), [
            pre_train_evaluate_results["eval_loss"],
            post_train_evaluate_results["eval_loss"],
            peft_seq_cls_pre_train_evaluate_results["eval_loss"],
            peft_seq_cls_post_train_evaluate_results["eval_loss"],
            fine_tuned_evaluate_results["eval_loss"],
        ]),
    }
)

# Print eval_loss results
print(df)

                              stage  \
0  Foundation model before training   
1   Foundation model after training   
2        PEFT model before training   
3         PEFT model after training   
4          Fine tuned trained model   

                                     details eval_loss  
0                    distilbert-base-uncased  0.701190  
1                    distilbert-base-uncased  0.692948  
2                    PEFT LoRA SEQ_CLS Model  0.692948  
3                    PEFT LoRA SEQ_CLS Model  0.692493  
4  tashrifmahmud/sentiment_analysis_model_v2  0.188373  
