# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: 
* Model: 
* Evaluation approach: 
* Fine-tuning dataset: 

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from datasets import load_dataset
import torch
import numpy as np
from transformers import Trainer, TrainingArguments

In [2]:
MODEL_NAME = "gpt2"  # Using GPT-2 (small size)
NUM_LABELS = 2  # Modify based on dataset (binary classification)
# Load pre-trained model with classification head
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME, num_labels=NUM_LABELS)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

# GPT-2 does not have a padding token by default, so we define one
tokenizer.pad_token = tokenizer.eos_token

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [3]:
DATASET_NAME = "imdb"  # Choose a text classification dataset
dataset = load_dataset(DATASET_NAME)

# Split into train/test
train_dataset = dataset["train"]
test_dataset = dataset["test"]
def preprocess_function(examples):
    return tokenizer(examples["text"], truncation=True, padding=True, max_length=512)

# Apply tokenization
tokenized_datasets = dataset.map(preprocess_function, batched=True)


Downloading readme:   0%|          | 0.00/7.81k [00:00<?, ?B/s]

Downloading data: 100%|██████████| 21.0M/21.0M [00:00<00:00, 32.6MB/s]
Downloading data: 100%|██████████| 20.5M/20.5M [00:00<00:00, 42.6MB/s]
Downloading data: 100%|██████████| 42.0M/42.0M [00:00<00:00, 50.9MB/s]


Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/50000 [00:00<?, ? examples/s]

In [6]:
# Set padding token for GPT-2
tokenizer.pad_token = tokenizer.eos_token
model.config.pad_token_id = tokenizer.pad_token_id

# Define compute metrics function
def compute_metrics(eval_pred):
    logits, labels = eval_pred  # Extract logits and labels
    predictions = np.argmax(logits, axis=1)  # Get predicted class
    return {"accuracy": np.mean(predictions == labels)}  # Compute accuracy

# Define training arguments (for evaluation only)
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    per_device_eval_batch_size=8
)

# Initialize Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    eval_dataset=tokenized_datasets["test"],
    compute_metrics=compute_metrics
)

# Evaluate baseline performance
baseline_results = trainer.evaluate()
print("Baseline Performance:", baseline_results)


Baseline Performance: {'eval_loss': 1.1554564237594604, 'eval_accuracy': 0.49932, 'eval_runtime': 940.7215, 'eval_samples_per_second': 26.575, 'eval_steps_per_second': 3.322}


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [7]:
from peft import LoraConfig, get_peft_model, TaskType

# Configure LoRA parameters
lora_config = LoraConfig(
    r=8,  # LoRA rank
    lora_alpha=32,  # Scaling factor
    lora_dropout=0.1,  # Dropout rate
    task_type=TaskType.SEQ_CLS  # Sequence classification
)

# Wrap the base model with LoRA
peft_model = get_peft_model(model, lora_config)
peft_model.print_trainable_parameters()

trainable params: 294,912 || all params: 124,737,792 || trainable%: 0.2364255413467636


In [16]:
training_args = TrainingArguments(
    output_dir="./peft_results",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=5e-5,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    num_train_epochs=3,
    weight_decay=0.01,
    logging_dir="./logs",
    logging_steps=10,
    push_to_hub=False
)

trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
    compute_metrics=compute_metrics
)



In [None]:
# Train model
trainer.train()

###  ⚠️ IMPORTANT ⚠️

Due to workspace storage constraints, you should not store the model weights in the same directory but rather use `/tmp` to avoid workspace crashes which are irrecoverable.
Ensure you save it in /tmp always.

In [9]:
# Saving the model
#model.save("/tmp/peft_gpt2_lora")
peft_model.save_pretrained("./tmp/peft_gpt2_lora")
tokenizer.save_pretrained("./tmp/peft_gpt2_lora")

('./tmp/peft_gpt2_lora/tokenizer_config.json',
 './tmp/peft_gpt2_lora/special_tokens_map.json',
 './tmp/peft_gpt2_lora/vocab.json',
 './tmp/peft_gpt2_lora/merges.txt',
 './tmp/peft_gpt2_lora/added_tokens.json',
 './tmp/peft_gpt2_lora/tokenizer.json')

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [12]:
from peft import AutoPeftModelForSequenceClassification
# Define model path
MODEL_PATH = "./tmp/peft_gpt2_lora"

# Load the base GPT-2 model
base_model = AutoModelForSequenceClassification.from_pretrained("gpt2", num_labels=2)

# Load the fine-tuned LoRA model on top of it
peft_model = AutoPeftModelForSequenceClassification.from_pretrained(MODEL_PATH)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
tokenizer.pad_token = tokenizer.eos_token
peft_model.config.pad_token_id = tokenizer.pad_token_id

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [17]:
# Evaluate fine-tuned model
fine_tuned_results = trainer.evaluate()
print("Fine-Tuned Model Performance:", fine_tuned_results)

# Compare baseline vs fine-tuned accuracy
print("Baseline Accuracy:", baseline_results["eval_accuracy"])
print("Fine-Tuned Accuracy:", fine_tuned_results["eval_accuracy"])


Fine-Tuned Model Performance: {'eval_loss': 0.2720875144004822, 'eval_accuracy': 0.92532, 'eval_runtime': 964.3522, 'eval_samples_per_second': 25.924, 'eval_steps_per_second': 6.481}
Baseline Accuracy: 0.49932
Fine-Tuned Accuracy: 0.92532
