# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: LoRA
* Model: distilbert-base-uncased
* Evaluation approach: HuggingFace trainer.evaluate()
* Fine-tuning dataset: stanfordnlp/imdb

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
! pip install -q "evaluate==0.4.3"
! pip install -q "scikit-learn==1.6.0"

[0m

In [2]:
DEVICE = "cuda"
BASE_MODEL = "distilbert-base-uncased"
DATASET = "stanfordnlp/imdb"
# PEFT_TECHNIQUE: "LoRA"
# EVALUATION_APPROACH: HuggingFace trainer.evaluate()

In [3]:
from datasets import load_dataset

dataset_splits = ["train", "test"]
loaded_dataset = load_dataset(DATASET, split=dataset_splits)
datasets = {split: ds for split, ds in zip(dataset_splits, loaded_dataset)}

# Thin out the dataset to make it run faster for this example
for split in dataset_splits:
    datasets[split] = datasets[split].shuffle(seed=23).select(range(250))

print("Train split of dataset:")
print(datasets["train"])

print("Test split of dataset:")
print(datasets["test"])

print("Let's take a look at first data in the set:")
print(datasets["train"][0])

Downloading readme:   0%|          | 0.00/7.81k [00:00<?, ?B/s]

Downloading data: 100%|██████████| 21.0M/21.0M [00:00<00:00, 22.1MB/s]
Downloading data: 100%|██████████| 20.5M/20.5M [00:00<00:00, 41.4MB/s]
Downloading data: 100%|██████████| 42.0M/42.0M [00:00<00:00, 45.3MB/s]


Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

Train split of dataset:
Dataset({
    features: ['text', 'label'],
    num_rows: 250
})
Test split of dataset:
Dataset({
    features: ['text', 'label'],
    num_rows: 250
})
Let's take a look at first data in the set:
{'text': "As soon as I heard about this film I knew I had to check it out. Well, I heard about it, then I found the trailer. After that, that's when I knew I had to see it. And I am so glad I did. You want to see classic television mixed with zombies? No? Then get lost.<br /><br />FIDO is a movie unlike anything I've ever seen. Well, actually, it kind of is. It's kind of like a Lassie episode and a Zombie film. Though when combined, it feels completely new and original. FIDO is about a little boy named Timmy and his new pet Fido. Well this new pet ain't no squawking parakeet or some potty-trained puppy. It's a re-animated dead guy...a zombie. A large radiation cloud engulfed Earth which led to all of the dead rising, which ensued the Zombie Wars. Though through the geniu

In [4]:
from transformers import AutoTokenizer
auto_tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)

def preprocess_function(examples):
    """Preprocess the imdb dataset by returning tokenized examples"""
    return auto_tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_ds = {}
for split in dataset_splits:
    tokenized_ds[split] = datasets[split].map(preprocess_function, batched=True)

# Check that we tokenized the examples properly
assert tokenized_ds["train"][0]["input_ids"][:5] == [101, 2004, 2574, 2004, 1045]

print("Show the first example of tokenized training set:")
print(tokenized_ds["train"][0]["input_ids"])

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Map:   0%|          | 0/250 [00:00<?, ? examples/s]

Map:   0%|          | 0/250 [00:00<?, ? examples/s]

Show the first example of tokenized training set:
[101, 2004, 2574, 2004, 1045, 2657, 2055, 2023, 2143, 1045, 2354, 1045, 2018, 2000, 4638, 2009, 2041, 1012, 2092, 1010, 1045, 2657, 2055, 2009, 1010, 2059, 1045, 2179, 1996, 9117, 1012, 2044, 2008, 1010, 2008, 1005, 1055, 2043, 1045, 2354, 1045, 2018, 2000, 2156, 2009, 1012, 1998, 1045, 2572, 2061, 5580, 1045, 2106, 1012, 2017, 2215, 2000, 2156, 4438, 2547, 3816, 2007, 14106, 1029, 2053, 1029, 2059, 2131, 2439, 1012, 1026, 7987, 1013, 1028, 1026, 7987, 1013, 1028, 10882, 3527, 2003, 1037, 3185, 4406, 2505, 1045, 1005, 2310, 2412, 2464, 1012, 2092, 1010, 2941, 1010, 2009, 2785, 1997, 2003, 1012, 2009, 1005, 1055, 2785, 1997, 2066, 1037, 27333, 2666, 2792, 1998, 1037, 11798, 2143, 1012, 2295, 2043, 4117, 1010, 2009, 5683, 3294, 2047, 1998, 2434, 1012, 10882, 3527, 2003, 2055, 1037, 2210, 2879, 2315, 27217, 1998, 2010, 2047, 9004, 10882, 3527, 1012, 2092, 2023, 2047, 9004, 7110, 1005, 1056, 2053, 5490, 6692, 26291, 2075, 11498, 20553, 2102

In [5]:
from transformers import AutoModelForSequenceClassification, AutoModelForCausalLM, BitsAndBytesConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_storage=torch.bfloat16,
)

foundation_model = AutoModelForSequenceClassification.from_pretrained(
    BASE_MODEL,
    # quantization_config=bnb_config,
    # torch_dtype=torch.bfloat16,
    num_labels=2,
    id2label={0: "NEGATIVE", 1: "POSITIVE"},  # For converting predictions to strings
    label2id={"NEGATIVE": 0, "POSITIVE": 1},
)
foundation_model.to(DEVICE)

gpt_model = AutoModelForCausalLM.from_pretrained(
    "gpt2", num_labels=2,
    id2label={0: "NEGATIVE", 1: "POSITIVE"},
    label2id={"NEGATIVE": 0, "POSITIVE": 1},
)

# Unfreeze all the model parameters.
# Hint: Check the documentation at https://huggingface.co/transformers/v4.2.2/training.html
for param in foundation_model.base_model.parameters():
    param.requires_grad = True

print("Model classifiers:")
print(foundation_model.classifier)

print("Inspect Model:")
print(foundation_model)

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.bias', 'pre_classifier.weight', 'classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Model classifiers:
Linear(in_features=768, out_features=2, bias=True)
Inspect Model:
DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affi

In [6]:
import numpy as np
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments


def compute_metrics(eval_predictions):
    e_predictions, e_labels = eval_predictions
    e_predictions = np.argmax(e_predictions, axis=1)
    return {"accuracy": (e_predictions == e_labels).mean()}


# The HuggingFace Trainer class handles the training and eval loop for PyTorch for us.
# Read more about it here https://huggingface.co/docs/transformers/main_classes/trainer
trainer = Trainer(
    model=foundation_model,
    args=TrainingArguments(
        output_dir="./data/sentiment_analysis",
        learning_rate=2e-3,
        per_device_train_batch_size=4,
        per_device_eval_batch_size=4,
        num_train_epochs=1,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    tokenizer=auto_tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=auto_tokenizer),
    compute_metrics=compute_metrics,
)

trainer.train()
trainer.evaluate()

You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.823334,0.528


{'eval_loss': 0.8233343362808228,
 'eval_accuracy': 0.528,
 'eval_runtime': 3.8631,
 'eval_samples_per_second': 64.715,
 'eval_steps_per_second': 16.308,
 'epoch': 1.0}

In [7]:
import pandas as pd

df = pd.DataFrame(tokenized_ds["test"])
df = df[["text", "label"]]

# Replace <br /> tags in the text with spaces
df["text"] = df["text"].str.replace("<br />", " ")

# Add the model predictions to the dataframe
predictions = trainer.predict(tokenized_ds["test"])
df["predicted_label"] = np.argmax(predictions[0], axis=1)

print(df.head(10))

print("Some of the incorrect predictions:")
print(df[df["label"] != df["predicted_label"]].head(10))

                                                text  label  predicted_label
0  I've seen this amusing little 'brit flick'many...      1                1
1  Yep, Edward G. gives us a retro view of the cr...      0                1
2  Has there ever been an Angel of Death like MIM...      1                1
3  This is one of the worst Sandra Bullock movie ...      0                1
4  Dr Steven Segal saves the world from a deadly ...      0                1
5  An interesting concept turned into carnage... ...      0                1
6  Not good. Mostly because you don't give a damn...      0                1
7  The opening scene makes you feel like you're w...      0                1
8  I've watched a number of Wixel Pixel and Sub R...      0                1
9  Continuing in the string of "stalker/slasher" ...      0                1
Some of the incorrect predictions:
                                                 text  label  predicted_label
1   Yep, Edward G. gives us a retro view

## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [8]:
from peft import get_peft_model, LoraConfig, TaskType


peft_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    inference_mode=False,
    r=8,
    target_modules=["q_lin", "v_lin"],
    lora_alpha=32,
    lora_dropout=0.1,
)

# Creating a PEFT model
# Using the PEFT config and foundation model, create a PEFT model.

# Creating a PEFT model
peft_model = get_peft_model(foundation_model, peft_config)
peft_model.to(DEVICE)
peft_model.print_trainable_parameters()

print("PEFT MODEL:")
print(peft_model)

trainable params: 1,331,716 || all params: 67,694,596 || trainable%: 1.967241225577297
PEFT MODEL:
PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): DistilBertForSequenceClassification(
      (distilbert): DistilBertModel(
        (embeddings): Embeddings(
          (word_embeddings): Embedding(30522, 768, padding_idx=0)
          (position_embeddings): Embedding(512, 768)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (transformer): Transformer(
          (layer): ModuleList(
            (0-5): 6 x TransformerBlock(
              (attention): MultiHeadSelfAttention(
                (dropout): Dropout(p=0.1, inplace=False)
                (q_lin): Linear(
                  in_features=768, out_features=768, bias=True
                  (lora_dropout): ModuleDict(
                    (default): Dropout(p=0.1, inplace=False)
                  )
                  (lo

In [9]:
import numpy as np
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments

# The HuggingFace Trainer class handles the training and eval loop for PyTorch for us.
# Read more about it here https://huggingface.co/docs/transformers/main_classes/trainer
peft_trainer = Trainer(
    model=peft_model,
    args=TrainingArguments(
        output_dir="./data/sentiment_analysis_peft",
        # Set the learning rate
        learning_rate=2e-5,
        # Set the per device train batch size and eval batch size
        per_device_train_batch_size=16,
        per_device_eval_batch_size=16,
        # Evaluate and save the model after each epoch
        evaluation_strategy="epoch",
        save_strategy="epoch",

        num_train_epochs=2,
        weight_decay=0.01,
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    tokenizer=auto_tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=auto_tokenizer),
    compute_metrics=compute_metrics,
)

# Train the lightweight fine-tuned model
peft_trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.7404,0.528
2,No log,0.726556,0.528


TrainOutput(global_step=32, training_loss=0.7428619265556335, metrics={'train_runtime': 25.6539, 'train_samples_per_second': 19.49, 'train_steps_per_second': 1.247, 'total_flos': 67369703424000.0, 'train_loss': 0.7428619265556335, 'epoch': 2.0})

In [10]:
# Evaluate the lightweight fine-tuned model
peft_trainer.evaluate()

{'eval_loss': 0.7265563011169434,
 'eval_accuracy': 0.528,
 'eval_runtime': 4.115,
 'eval_samples_per_second': 60.753,
 'eval_steps_per_second': 3.888,
 'epoch': 2.0}

In [11]:
import pandas as pd

df = pd.DataFrame(tokenized_ds["test"])
df = df[["text", "label"]]

# Replace <br /> tags in the text with spaces
df["text"] = df["text"].str.replace("<br />", " ")

# Add the model predictions to the dataframe
predictions = trainer.predict(tokenized_ds["test"])
df["predicted_label"] = np.argmax(predictions[0], axis=1)

print(df.head(10))

                                                text  label  predicted_label
0  I've seen this amusing little 'brit flick'many...      1                1
1  Yep, Edward G. gives us a retro view of the cr...      0                1
2  Has there ever been an Angel of Death like MIM...      1                1
3  This is one of the worst Sandra Bullock movie ...      0                1
4  Dr Steven Segal saves the world from a deadly ...      0                1
5  An interesting concept turned into carnage... ...      0                1
6  Not good. Mostly because you don't give a damn...      0                1
7  The opening scene makes you feel like you're w...      0                1
8  I've watched a number of Wixel Pixel and Sub R...      0                1
9  Continuing in the string of "stalker/slasher" ...      0                1


In [12]:
# Saving the trained model
peft_model.save_pretrained("./data/peft_model")

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [13]:
from peft import PeftModel, PeftConfig, AutoPeftModelForSequenceClassification

peft_model_path = "./data/peft_model"
peft_config = PeftConfig.from_pretrained(peft_model_path)

saved_peft_model = AutoPeftModelForSequenceClassification.from_pretrained(peft_model_path)
# saved_peft_model.to(DEVICE)

print("Reconstructed model from saved PEFT:")
print(saved_peft_model)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.bias', 'pre_classifier.weight', 'classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Reconstructed model from saved PEFT:
PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): DistilBertForSequenceClassification(
      (distilbert): DistilBertModel(
        (embeddings): Embeddings(
          (word_embeddings): Embedding(30522, 768, padding_idx=0)
          (position_embeddings): Embedding(512, 768)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (transformer): Transformer(
          (layer): ModuleList(
            (0-5): 6 x TransformerBlock(
              (attention): MultiHeadSelfAttention(
                (dropout): Dropout(p=0.1, inplace=False)
                (q_lin): Linear(
                  in_features=768, out_features=768, bias=True
                  (lora_dropout): ModuleDict(
                    (default): Dropout(p=0.1, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (default): Linear(in_fe

In [14]:
peft_tokenizer = AutoTokenizer.from_pretrained(peft_config.base_model_name_or_path)
items_indexes_for_manual_review = \
    [0, 1, 13, 25, 46, 51, 64, 88, 90, 118, 125, 131, 148, 154, 166, 179, 183, 199, 222, 233, 244]

## Evaluating the model
test_items = tokenized_ds["test"].select(
    items_indexes_for_manual_review
)

# print test_items columns.
print(test_items)

# Inspect details.
for test_item in test_items:
    print(test_item['input_ids'])
    print(test_item)
    break

Dataset({
    features: ['text', 'label', 'input_ids', 'attention_mask'],
    num_rows: 21
})
[101, 1045, 1005, 2310, 2464, 2023, 19142, 2210, 1005, 28101, 17312, 1005, 2116, 2335, 1012, 1996, 2069, 3291, 2003, 2049, 2747, 20165, 2006, 2678, 2030, 4966, 1012, 1045, 1005, 24529, 5121, 1037, 20127, 2005, 1037, 4966, 2713, 1012, 1996, 2172, 4771, 2957, 5207, 3248, 1005, 5061, 2100, 1005, 2019, 4654, 1011, 6986, 2137, 1010, 3005, 2074, 2042, 2207, 2013, 3827, 1010, 2002, 4858, 2370, 1037, 3105, 2004, 2019, 3751, 2937, 1999, 1037, 2924, 1010, 2009, 2035, 3632, 2092, 2127, 2002, 4858, 2370, 7861, 12618, 18450, 1999, 1037, 2924, 2002, 2923, 2007, 2010, 4654, 13675, 10698, 2229, 1010, 2585, 9152, 8159, 3248, 1996, 3040, 23356, 7332, 1010, 2049, 2019, 22249, 2210, 17083, 2361, 1010, 11504, 2996, 5033, 2030, 8133, 3016, 1010, 2097, 2272, 2000, 1996, 5343, 1012, 2298, 2041, 2005, 2198, 13919, 9082, 2077, 2002, 4930, 2009, 2502, 2007, 1005, 26822, 12734, 1005, 10642, 1997, 1996, 2439, 15745, 1005,

In [15]:
import pandas as pd
import torch

items_for_manual_review = tokenized_ds["test"].select(
    items_indexes_for_manual_review
)

predictions = []
for i in items_for_manual_review:
    input_tokens = peft_tokenizer(i['text'], return_tensors="pt")
    with torch.no_grad():
        logits = saved_peft_model(**input_tokens).logits
        predicted_class_id = logits.argmax().item()
        predictions.append(predicted_class_id)

print(predictions)

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]


In [16]:
df = pd.DataFrame(
    {
        "text": [item["text"] for item in items_for_manual_review],
        "predictions": predictions,
        "labels": [item["label"] for item in items_for_manual_review],
    }
)
# Show all the cells
print(df)

                                                 text  predictions  labels
0   I've seen this amusing little 'brit flick'many...            1       1
1   Yep, Edward G. gives us a retro view of the cr...            1       0
2   When you watch low budget horror movies as muc...            1       0
3   So funny is the perfect way to describe this 1...            1       1
4   THE CELL fascinated me at first glance. I was ...            1       1
5   I never comment on a film, but I have to say t...            1       0
6   Wesley Snipes is perfectly cast as Blade, a ha...            1       1
7   This is the best work i have ever seen on tele...            1       1
8   Mean spirited, and down right degrading adapta...            1       0
9   I can understand how Barney can be annoying to...            1       1
10  During a lifetime of seeing and enjoying thous...            1       0
11  When I was in 7th grade(back in 1977), I was a...            1       1
12  It's painfully obviou

In [17]:
labels = [item["label"] for item in items_for_manual_review]

In [18]:
# Let's evaluate accurancy.
import evaluate

accuracy_metric = evaluate.load("accuracy")
results = accuracy_metric.compute(references=labels, predictions=predictions)
print(results)

Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]

{'accuracy': 0.5714285714285714}
