<a href="https://colab.research.google.com/github/kwoeser/Fine-tuningLORA/blob/main/Lightweight_fine_tuning_LoRA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: LoRA
* Model: gpt2
* Evaluation approach:
* Fine-tuning dataset: Amazon reviews

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
!pip install numpy pandas torch scikit_learn transformers datasets peft

Collecting datasets
  Downloading datasets-2.20.0-py3-none-any.whl.metadata (19 kB)
Collecting peft
  Downloading peft-0.12.0-py3-none-any.whl.metadata (13 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch)
  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch)
  Using cached nvidia_cufft_cu12-11.

In [2]:
# import packages
import numpy as np
import torch
from sklearn.metrics import accuracy_score, precision_score
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments, GPT2Tokenizer, GPT2ForSequenceClassification
from datasets import load_dataset
from peft import LoraConfig, get_peft_model, AutoPeftModelForSequenceClassification


In [3]:
# load pre trained model and tokenizer, set pad token
# https://huggingface.co/openai-community/gpt2
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2ForSequenceClassification.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token
model.config.pad_token_id = tokenizer.pad_token_id


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [4]:
# load and preprocess the dataset
# https://huggingface.co/datasets/fancyzhx/amazon_polarity
dataset = load_dataset("amazon_polarity")
# print(dataset['train'][0])
# print(dataset['test'][0])

# Select a smaller portion of the dataset for quick processing
small_train_dataset = dataset['train'].select(range(50000))
small_test_dataset = dataset['test'].select(range(40000))


Downloading readme:   0%|          | 0.00/6.81k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/260M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/258M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/255M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/254M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/117M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/3600000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/400000 [00:00<?, ? examples/s]

In [5]:
def preprocess_function(examples):
    return tokenizer(examples['content'], padding='max_length', truncation=True)

# FULL NORMAL DATASET
# tokenized_train_dataset = dataset['train'].map(preprocess_function, batched=True)
# tokenized_test_dataset = dataset['test'].map(preprocess_function, batched=True)

# preprocessing function for smaller datasets so I don't have to wait so long
tokenized_train_dataset = small_train_dataset.map(preprocess_function, batched=True)
tokenized_test_dataset = small_test_dataset.map(preprocess_function, batched=True)

# convert the format of the dataset
def convert_to_torch_format(dataset):
    dataset.set_format(type='torch', columns=['input_ids', 'attention_mask', 'label'])

convert_to_torch_format(tokenized_train_dataset)
convert_to_torch_format(tokenized_test_dataset)


Map:   0%|          | 0/50000 [00:00<?, ? examples/s]

Map:   0%|          | 0/40000 [00:00<?, ? examples/s]

In [6]:
# Split the dataset into training and test sets
train_dataset = tokenized_train_dataset
test_dataset = tokenized_test_dataset

# Smaller datasets for testing
two_small_train_dataset = train_dataset.select(range(300))
two_small_test_dataset = test_dataset.select(range(200))

In [7]:
# Define compute_metrics function
def compute_metrics(eval_pred):
    predictions = np.argmax(eval_pred.predictions, axis=1)
    return {"accuracy": accuracy_score(eval_pred.label_ids, predictions), "precision": precision_score(eval_pred.label_ids, predictions)}


# Training arguments
gpt2_training_args = TrainingArguments(
    output_dir='./results',
    eval_strategy="epoch",  # Evaluate at the end of each epoch
    logging_dir='./logs',
    logging_steps=10,
    per_device_train_batch_size=6,
    per_device_eval_batch_size=6,
    # num_train_epochs = 2,
    num_train_epochs = 5,
    weight_decay=0.01,
    save_strategy="epoch",
)

# Initialize Trainer with the base model
trainer = Trainer(
    model=model,
    args=gpt2_training_args,
    # train_dataset=train_dataset, # full dataset takes much longer to train
    # eval_dataset=test_dataset,
    train_dataset= two_small_train_dataset, # added for testing, smaller datasets
    eval_dataset= two_small_test_dataset,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)


# trainer.train()

# evaluate model
original_model_eval_results = trainer.evaluate()
print(f"Original model evaluation results: {original_model_eval_results}")


Original model evaluation results: {'eval_loss': 1.2617655992507935, 'eval_accuracy': 0.455, 'eval_precision': 0.0, 'eval_runtime': 16.8209, 'eval_samples_per_second': 11.89, 'eval_steps_per_second': 2.021}


  _warn_prf(average, modifier, msg_start, len(result))


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [8]:
# LoRA configuration
peft_config = LoraConfig(
    r=16,  # rank of the low-rank decomposition
    task_type="SEQ_CLS",  # task type
    lora_alpha=32,
    lora_dropout=0.2,  # dropout rate
    target_modules=['attn.c_attn', 'attn.c_proj'],
    bias="none",
)

# init the base model
lora_model = get_peft_model(model, peft_config)



In [9]:
# Training arguments for LoRA model
lora_training_args = TrainingArguments(
    output_dir='./results/lora_results',
    eval_strategy="epoch",
    logging_dir='./lora_logs',
    logging_steps = 10,
    per_device_train_batch_size = 6,
    per_device_eval_batch_size = 6,
    # num_train_epochs = 2,
    num_train_epochs = 5,
    weight_decay = 0.01,
    save_strategy = "epoch",
)

# init trainer for LoRA model
lora_trainer = Trainer(
    model=lora_model,
    args=lora_training_args,
    # train_dataset=tokenized_train_dataset, # full dataset takes longer to train
    # eval_dataset=tokenized_test_dataset,
    train_dataset=two_small_train_dataset,
    eval_dataset=two_small_test_dataset,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

# Train the LoRA model and evaluate the model
lora_trainer.train()
lora_model_eval_results = lora_trainer.evaluate()
print(f"LoRA model evaluation results: {lora_model_eval_results}")

Epoch,Training Loss,Validation Loss,Accuracy,Precision
1,0.7105,0.673783,0.58,0.580645
2,0.6686,0.656045,0.585,0.578313
3,0.6347,0.639166,0.635,0.626761
4,0.6505,0.629956,0.635,0.623288
5,0.5968,0.625766,0.655,0.640845


LoRA model evaluation results: {'eval_loss': 0.6257655024528503, 'eval_accuracy': 0.655, 'eval_precision': 0.6408450704225352, 'eval_runtime': 15.0492, 'eval_samples_per_second': 13.29, 'eval_steps_per_second': 2.259, 'epoch': 5.0}


In [10]:
# Save the PEFT LoRA model
lora_trainer.save_model("./models/Lora")

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [11]:
print("Compare the two models \n ----------------------")
print(f"LoRA model evaluation results: {lora_model_eval_results}\n")
print(f"Original model evaluation results: {original_model_eval_results}")

Compare the two models 
 ----------------------
LoRA model evaluation results: {'eval_loss': 0.6257655024528503, 'eval_accuracy': 0.655, 'eval_precision': 0.6408450704225352, 'eval_runtime': 15.0492, 'eval_samples_per_second': 13.29, 'eval_steps_per_second': 2.259, 'epoch': 5.0}

Original model evaluation results: {'eval_loss': 1.2617655992507935, 'eval_accuracy': 0.455, 'eval_precision': 0.0, 'eval_runtime': 16.8209, 'eval_samples_per_second': 11.89, 'eval_steps_per_second': 2.021}


In [12]:
# Load the saved PEFT model
loaded_lora_model = AutoPeftModelForSequenceClassification.from_pretrained("./models/Lora", num_labels=2)
loaded_lora_model.config.pad_token_id = loaded_lora_model.config.eos_token_id

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [13]:
# Training arguments for LoRA model
loaded_lora_training_args = TrainingArguments(
    output_dir='./results/lora_results',
    eval_strategy = "epoch",
    logging_dir='./lora_logs',
    logging_steps=10,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=2,
    weight_decay=0.01,
    save_strategy="epoch",
)

# init trainer for LoRA model
loaded_lora_trainer = Trainer(
    model=loaded_lora_model,
    args=loaded_lora_training_args,
    train_dataset=tokenized_train_dataset, # full dataset takes longer to train
    eval_dataset=tokenized_test_dataset,
    # train_dataset=two_small_train_dataset,
    # eval_dataset=two_small_test_dataset,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

# loaded_lora_trainer.train()

In [14]:
# Evaluate the PEFT model
loaded_lora_model_eval_results = loaded_lora_trainer.evaluate()

# Compare models
print("Compare the models \n -------------------")
print(f"Original model evaluation results: {original_model_eval_results}\n")
print(f"LoRA model evaluation results: {lora_model_eval_results}\n")
print(f"Loaded LoRA Model Evaluation: {loaded_lora_model_eval_results}")



Compare the models 
 -------------------
Original model evaluation results: {'eval_loss': 1.2617655992507935, 'eval_accuracy': 0.455, 'eval_precision': 0.0, 'eval_runtime': 16.8209, 'eval_samples_per_second': 11.89, 'eval_steps_per_second': 2.021}

LoRA model evaluation results: {'eval_loss': 0.6257655024528503, 'eval_accuracy': 0.655, 'eval_precision': 0.6408450704225352, 'eval_runtime': 15.0492, 'eval_samples_per_second': 13.29, 'eval_steps_per_second': 2.259, 'epoch': 5.0}

Loaded LoRA Model Evaluation: {'eval_loss': 0.6261507868766785, 'eval_accuracy': 0.650825, 'eval_precision': 0.6126065816362484, 'eval_runtime': 2993.0903, 'eval_samples_per_second': 13.364, 'eval_steps_per_second': 1.671}


In [15]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# Results

After evaluating the models. It's clear that the PEFT model has a better accuracy then then the pretrained HF gpt2 model. There is an noticeable difference between the models in relation to their accuracies and precision. Most tests were done using small test sections of the dataset. Both images below are from the small test dataset. If the model was trained on the full dataset the results could improve. Regardless, Low Rank Adaptation (LoRA) showed to be a successful fine-tuning method for this project.