# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: I used LoRA (Low-Rank Adaptation) because it allows efficient fine-tuning by modifying only specific layers, making the process faster and saving memory.
* Model: I chose DistilBERT, as it’s a smaller and faster version of BERT, which is good for text classification tasks.
* Evaluation approach: I evaluated the model using accuracy on the validation dataset, which is a standard metric for classification tasks.
* Fine-tuning dataset: I used the IMDb dataset, which is commonly used for sentiment analysis and text classification tasks. 

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
pip install --upgrade datasets

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [2]:
from datasets import load_dataset
from transformers import AutoTokenizer

In [3]:
# Load the entire dataset
dataset = load_dataset("imdb")

# Access individual splits from the dataset
train_dataset = dataset['train']
test_dataset  = dataset['test']

In [4]:
print(train_dataset[0])

{'text': 'I rented I AM CURIOUS-YELLOW from my video store because of all the controversy that surrounded it when it was first released in 1967. I also heard that at first it was seized by U.S. customs if it ever tried to enter this country, therefore being a fan of films considered "controversial" I really had to see this for myself.<br /><br />The plot is centered around a young Swedish drama student named Lena who wants to learn everything she can about life. In particular she wants to focus her attentions to making some sort of documentary on what the average Swede thought about certain political issues such as the Vietnam War and race issues in the United States. In between asking politicians and ordinary denizens of Stockholm about their opinions on politics, she has sex with her drama teacher, classmates, and married men.<br /><br />What kills me about I AM CURIOUS-YELLOW is that 40 years ago, this was considered pornographic. Really, the sex and nudity scenes are few and far be

In [5]:
print(test_dataset[0])

{'text': 'I love sci-fi and am willing to put up with a lot. Sci-fi movies/TV are usually underfunded, under-appreciated and misunderstood. I tried to like this, I really did, but it is to good TV sci-fi as Babylon 5 is to Star Trek (the original). Silly prosthetics, cheap cardboard sets, stilted dialogues, CG that doesn\'t match the background, and painfully one-dimensional characters cannot be overcome with a \'sci-fi\' setting. (I\'m sure there are those of you out there who think Babylon 5 is good sci-fi TV. It\'s not. It\'s clichéd and uninspiring.) While US viewers might like emotion and character development, sci-fi is a genre that does not take itself seriously (cf. Star Trek). It may treat important issues, yet not as a serious philosophy. It\'s really difficult to care about the characters here as they are not simply foolish, just missing a spark of life. Their actions and reactions are wooden and predictable, often painful to watch. The makers of Earth KNOW it\'s rubbish as 

In [6]:
# Tokenizer initialization (you can choose another model like 'distilbert-base-uncased' if needed)
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

# Define a function to preprocess and tokenize the dataset
def preprocess_function(examples):
    """Preprocess the IMDB dataset by returning tokenized examples."""
    return tokenizer(examples["text"], padding="max_length", truncation=True, max_length=512)

# Apply preprocessing (tokenization) to both train and test datasets
tokenized_train_dataset = train_dataset.map(preprocess_function, batched=True)
tokenized_test_dataset = test_dataset.map(preprocess_function, batched=True)

# Show the first tokenized example from the train dataset
print(tokenized_train_dataset[0]["input_ids"])

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

[101, 1045, 12524, 1045, 2572, 8025, 1011, 3756, 2013, 2026, 2678, 3573, 2138, 1997, 2035, 1996, 6704, 2008, 5129, 2009, 2043, 2009, 2001, 2034, 2207, 1999, 3476, 1012, 1045, 2036, 2657, 2008, 2012, 2034, 2009, 2001, 8243, 2011, 1057, 1012, 1055, 1012, 8205, 2065, 2009, 2412, 2699, 2000, 4607, 2023, 2406, 1010, 3568, 2108, 1037, 5470, 1997, 3152, 2641, 1000, 6801, 1000, 1045, 2428, 2018, 2000, 2156, 2023, 2005, 2870, 1012, 1026, 7987, 1013, 1028, 1026, 7987, 1013, 1028, 1996, 5436, 2003, 8857, 2105, 1037, 2402, 4467, 3689, 3076, 2315, 14229, 2040, 4122, 2000, 4553, 2673, 2016, 2064, 2055, 2166, 1012, 1999, 3327, 2016, 4122, 2000, 3579, 2014, 3086, 2015, 2000, 2437, 2070, 4066, 1997, 4516, 2006, 2054, 1996, 2779, 25430, 14728, 2245, 2055, 3056, 2576, 3314, 2107, 2004, 1996, 5148, 2162, 1998, 2679, 3314, 1999, 1996, 2142, 2163, 1012, 1999, 2090, 4851, 8801, 1998, 6623, 7939, 4697, 3619, 1997, 8947, 2055, 2037, 10740, 2006, 4331, 1010, 2016, 2038, 3348, 2007, 2014, 3689, 3836, 1010, 19846

In [7]:
# Load the pre-trained model for sequence classification
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=2,
    id2label={0: "NEGATIVE", 1: "POSITIVE"},  # For converting predictions to strings
    label2id={"NEGATIVE": 0, "POSITIVE": 1},
)

# Freeze all the parameters of the base model
for param in model.base_model.parameters():
    param.requires_grad = False

# Optionally, print out the classifier to check
print(model.classifier)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Linear(in_features=768, out_features=2, bias=True)


In [8]:
import numpy as np
from transformers import Trainer, TrainingArguments, DataCollatorWithPadding

# Define the metric function for accuracy
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)  # Get predicted classes
    return {"accuracy": (predictions == labels).mean()}  # Calculate accuracy

# Trainer setup
trainer = Trainer(
    model=model,  # pre-trained model
    args=TrainingArguments(
        output_dir="./results",  # Directory to save results
        evaluation_strategy="epoch",  # Evaluate after each epoch
        save_strategy="epoch",  # Save after each epoch to match evaluation strategy
        per_device_train_batch_size=16,  # Batch size for training
        per_device_eval_batch_size=16,  # Batch size for evaluation
        num_train_epochs=1,  # Number of epochs to train
        weight_decay=0.01,  # Regularization
        load_best_model_at_end=True,  # Load the best model at the end of training
    ),
    train_dataset=tokenized_train_dataset,  # Training dataset
    eval_dataset=tokenized_test_dataset,   # Evaluation (test) dataset
    tokenizer=tokenizer,  # Tokenizer for text processing
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),  # Handle padding automatically
    compute_metrics=compute_metrics,  # The function to compute evaluation metrics (accuracy)
)

# Start training and evaluation
trainer.train()

# Evaluate
trainer.evaluate()


  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy
1,0.4041,0.38332,0.83584


{'eval_loss': 0.38331952691078186,
 'eval_accuracy': 0.83584,
 'eval_runtime': 426.8114,
 'eval_samples_per_second': 58.574,
 'eval_steps_per_second': 3.662,
 'epoch': 1.0}

## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [9]:
print(model)

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
 

In [10]:
from peft import LoraConfig, TaskType

# Define LoRA configuration
config = LoraConfig(
    r=8,                     # Rank of the LoRA matrices (reduces model parameters)
    lora_alpha=16,           # Scaling factor
    lora_dropout=0.1,        # Dropout rate
    bias="none",             # No bias fine-tuning
    task_type=TaskType.SEQ_CLS, 
    inference_mode=False,
    target_modules=["q_lin", "k_lin", "v_lin", "out_lin", "ffn.lin1", "ffn.lin2"]  # Targeting attention and ffn layers
)

In [11]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=2,
    id2label={0: "NEGATIVE", 1: "POSITIVE"},  # For converting predictions to strings
    label2id={"NEGATIVE": 0, "POSITIVE": 1},
)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [12]:
from peft import get_peft_model
lora_model = get_peft_model(model, config)

In [13]:
# Print trainable parameters
lora_model.print_trainable_parameters()

trainable params: 1,847,812 || all params: 68,210,692 || trainable%: 2.708977061836581


In [14]:
print(lora_model)

PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): DistilBertForSequenceClassification(
      (distilbert): DistilBertModel(
        (embeddings): Embeddings(
          (word_embeddings): Embedding(30522, 768, padding_idx=0)
          (position_embeddings): Embedding(512, 768)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (transformer): Transformer(
          (layer): ModuleList(
            (0-5): 6 x TransformerBlock(
              (attention): MultiHeadSelfAttention(
                (dropout): Dropout(p=0.1, inplace=False)
                (q_lin): Linear(
                  in_features=768, out_features=768, bias=True
                  (lora_dropout): ModuleDict(
                    (default): Dropout(p=0.1, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (default): Linear(in_features=768, out_features=8, bias=Fals

In [16]:
# If case, consider pip install --upgrade transformers peft
# Trainer Setup
lora_trainer = Trainer(
    model=lora_model,
    args=TrainingArguments(
        output_dir="./lora_results",
        evaluation_strategy="epoch",
        save_strategy="epoch",
        per_device_train_batch_size=16,
        per_device_eval_batch_size=16,
        num_train_epochs=1,
        weight_decay=0.01,
        load_best_model_at_end=False,
    ),
    train_dataset=tokenized_train_dataset,
    eval_dataset=tokenized_test_dataset,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),  # Handle padding automatically
    compute_metrics=compute_metrics,
)

# Start Training
lora_trainer.train()


  lora_trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy
1,0.2288,0.215296,0.91632


TrainOutput(global_step=1563, training_loss=0.23205859098233098, metrics={'train_runtime': 1625.114, 'train_samples_per_second': 15.384, 'train_steps_per_second': 0.962, 'total_flos': 3408121344000000.0, 'train_loss': 0.23205859098233098, 'epoch': 1.0})

In [17]:
# Save the fine-tuned LoRA model
lora_model.save_pretrained("distilbert-lora")

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [18]:
from peft import AutoPeftModelForSequenceClassification

# Load the PEFT model
peft_model = AutoPeftModelForSequenceClassification.from_pretrained("distilbert-lora")

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [19]:
# Prepare the input text
inputs = tokenizer("This movie was absolutely amazing!", return_tensors="pt")

# Perform inference
outputs = peft_model(**inputs)

# Extract logits
logits = outputs.logits

# Get predicted label
predicted_label = logits.argmax(dim=-1).item()

# Convert the predicted label to the corresponding string label
id2label = {0: "NEGATIVE", 1: "POSITIVE"}
predicted_class = id2label[predicted_label]

print(f"Predicted class: {predicted_class}")


Predicted class: POSITIVE


In [20]:
# Define the metric function for accuracy
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)  # Get predicted classes
    return {"accuracy": (predictions == labels).mean()}  # Calculate accuracy

# Set up the Trainer for evaluation
peft_trainer = Trainer(
    model=peft_model,  # Fine-tuned PEFT model
    args=TrainingArguments(
        output_dir="./lora_results",  # Directory to save results
        per_device_eval_batch_size=16,  # Batch size for evaluation
    ),
    eval_dataset=tokenized_test_dataset,   # Evaluation (test) dataset
    tokenizer=tokenizer,  # Tokenizer for text processing
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),  # Handle padding automatically
    compute_metrics=compute_metrics,  # The function to compute evaluation metrics (accuracy)
)

# Evaluate the fine-tuned PEFT model
eval_results = peft_trainer.evaluate()
print(f"Evaluation Results: {eval_results}")

  peft_trainer = Trainer(


Evaluation Results: {'eval_loss': 0.21529613435268402, 'eval_model_preparation_time': 0.0046, 'eval_accuracy': 0.91632, 'eval_runtime': 493.324, 'eval_samples_per_second': 50.677, 'eval_steps_per_second': 3.168}
