# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: LoRA (Low-Rank Adaptation) is chosen as the PEFT technique to efficiently fine-tune the model while reducing computational complexity. It focuses on specific target modules (`q_lin` and `v_lin`) to enhance performance in sequence classification tasks, which aligns with the task type (sequence classification) of the IMDb dataset.

* Model: DistilBERT is selected as the base model due to its lightweight architecture and good performance in natural language processing tasks. The `distilbert-base-uncased` variant is used here for its pretrained weights and fine-tuning capabilities, making it suitable for the IMDb sentiment analysis task where the goal is to classify movie reviews as positive or negative.

* Evaluation approach: The evaluation strategy is set to "epoch" to evaluate the model's performance after each training epoch. This approach allows for monitoring and comparison of metrics (such as accuracy) throughout the training process, ensuring that the model's performance trends can be observed and analyzed over time.

* Fine-tuning dataset: The IMDb dataset is used for fine-tuning, with a subset containing 1000 examples each from the training and test sets. This subset is chosen to expedite experimentation while still providing sufficient data diversity for training and evaluation. The IMDb dataset is well-suited for sentiment analysis tasks, providing labeled movie reviews that are crucial for training a model to predict sentiment polarity (positive or negative).

## Loading and Evaluating a Foundation Model


In [1]:
!pip install transformers datasets peft scikit-learn

Collecting datasets
  Downloading datasets-2.20.0-py3-none-any.whl (547 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m547.8/547.8 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting peft
  Downloading peft-0.11.1-py3-none-any.whl (251 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m251.6/251.6 kB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m
Collecting pyarrow>=15.0.0 (from datasets)
  Downloading pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl (39.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m39.9/39.9 MB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
Collecting requests (from transformers)
  Downloading requests-2.32.3-py3-none-any.whl (64 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
import torch
from transformers import Trainer, TrainingArguments, AutoModelForSequenceClassification, AutoTokenizer
from datasets import load_dataset, load_metric
from peft import get_peft_config, get_peft_model, LoraConfig, TaskType
import numpy as np

In [3]:
# Load dataset
dataset = load_dataset("imdb")

# Use a subset of the dataset for quick experimentation
small_train_dataset = dataset["train"].shuffle(seed=42).select(range(1000))
small_test_dataset = dataset["test"].shuffle(seed=42).select(range(1000))

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/7.81k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/21.0M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/20.5M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/42.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [4]:
# Tokenize dataset
def tokenize_function(examples):
    return tokenizer(examples["text"], truncation=True, padding=True)

tokenized_train_dataset = small_train_dataset.map(tokenize_function, batched=True)
tokenized_test_dataset = small_test_dataset.map(tokenize_function, batched=True)

# Load metric
accuracy_metric = load_metric("accuracy")

# Define compute_metrics function
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    if isinstance(logits, np.ndarray):
        logits = torch.from_numpy(logits)
    predictions = torch.argmax(logits, dim=-1)
    return accuracy_metric.compute(predictions=predictions, references=labels)

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

  accuracy_metric = load_metric("accuracy")


Downloading builder script:   0%|          | 0.00/1.65k [00:00<?, ?B/s]

The repository for accuracy contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/accuracy.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y


In [5]:
# Initial evaluation of the foundation model
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    logging_dir='./logs',
    logging_steps=10,
    save_strategy="epoch",
    num_train_epochs=1,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train_dataset,
    eval_dataset=tokenized_test_dataset,
    compute_metrics=compute_metrics
)

# Train and evaluate the foundation model
print("Evaluating the foundation model...")
initial_results = trainer.evaluate()

# Display initial results
print("Initial Evaluation Results:", initial_results)



Evaluating the foundation model...


Initial Evaluation Results: {'eval_loss': 0.6927127838134766, 'eval_accuracy': 0.488, 'eval_runtime': 18.9366, 'eval_samples_per_second': 52.808, 'eval_steps_per_second': 6.601}


## Performing Parameter-Efficient Fine-Tuning


In [6]:
# PEFT configuration
peft_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    inference_mode=False,
    target_modules=["q_lin", "v_lin"],
    r=8,
    lora_alpha=32,
    lora_dropout=0.1
)

# Apply LoRA to the model
peft_model = get_peft_model(model, peft_config)

In [7]:
# Training arguments for fine-tuning
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    logging_dir='./logs',
    logging_steps=10,
    save_strategy="epoch",
    num_train_epochs=3,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
)

# Initialize Trainer with PEFT model
trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=tokenized_train_dataset,
    eval_dataset=tokenized_test_dataset,
    compute_metrics=compute_metrics
)



In [8]:
# Fine-tune the model
print("Fine-tuning the PEFT model...")
trainer.train()

Fine-tuning the PEFT model...


Epoch,Training Loss,Validation Loss,Accuracy
1,0.5551,0.485315,0.842
2,0.348,0.338189,0.856
3,0.4066,0.337861,0.862




TrainOutput(global_step=375, training_loss=0.4385485979715983, metrics={'train_runtime': 162.1066, 'train_samples_per_second': 18.506, 'train_steps_per_second': 2.313, 'total_flos': 404218220544000.0, 'train_loss': 0.4385485979715983, 'epoch': 3.0})

In [9]:
# Evaluate the fine-tuned model
print("Evaluating the fine-tuned model...")
fine_tune_results = trainer.evaluate()

# Display fine-tuning results
print("Fine-tuning Evaluation Results:", fine_tune_results)

Evaluating the fine-tuned model...


Fine-tuning Evaluation Results: {'eval_loss': 0.3378613591194153, 'eval_accuracy': 0.862, 'eval_runtime': 17.221, 'eval_samples_per_second': 58.069, 'eval_steps_per_second': 7.259, 'epoch': 3.0}


In [10]:
# Save the PEFT model
peft_model.save_pretrained("./lora-distilbert-imdb")



## Performing Inference with a PEFT Model


In [11]:
# Load the trained PEFT model for inference
loaded_peft_model = AutoModelForSequenceClassification.from_pretrained("./lora-distilbert-imdb")

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [12]:
# Initialize Trainer with loaded PEFT model for inference
inference_trainer = Trainer(
    model=loaded_peft_model,
    args=training_args,
    eval_dataset=tokenized_test_dataset,
    compute_metrics=compute_metrics
)

# Evaluate the loaded PEFT model
print("Evaluating the loaded PEFT model...")
inference_results = inference_trainer.evaluate()

# Display inference results
print("Inference Evaluation Results:", inference_results)

Evaluating the loaded PEFT model...


Inference Evaluation Results: {'eval_loss': 0.3378613591194153, 'eval_accuracy': 0.862, 'eval_runtime': 17.122, 'eval_samples_per_second': 58.405, 'eval_steps_per_second': 7.301}


In [13]:
# Compare initial and fine-tuning results
print("Initial vs Fine-tuning vs Inference Results:")
print("Initial Evaluation Results:", initial_results)
print("Fine-tuning Evaluation Results:", fine_tune_results)
print("Inference Evaluation Results:", inference_results)

Initial vs Fine-tuning vs Inference Results:
Initial Evaluation Results: {'eval_loss': 0.6927127838134766, 'eval_accuracy': 0.488, 'eval_runtime': 18.9366, 'eval_samples_per_second': 52.808, 'eval_steps_per_second': 6.601}
Fine-tuning Evaluation Results: {'eval_loss': 0.3378613591194153, 'eval_accuracy': 0.862, 'eval_runtime': 17.221, 'eval_samples_per_second': 58.069, 'eval_steps_per_second': 7.259, 'epoch': 3.0}
Inference Evaluation Results: {'eval_loss': 0.3378613591194153, 'eval_accuracy': 0.862, 'eval_runtime': 17.122, 'eval_samples_per_second': 58.405, 'eval_steps_per_second': 7.301}
