# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: LoRA
* Model: GPT-2
* Evaluation approach: Trainer
* Fine-tuning dataset: sms-spam

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
! pip install -q "datasets==2.15.0"

[0m

In [2]:
#load the sms_spam dataset for hugging face
from datasets import load_dataset

dataset = load_dataset("sms_spam",split="train").train_test_split(
        test_size=0.2, shuffle=True, seed=99)
    
splits = ['train','test']

print(dataset['test'])

Downloading builder script:   0%|          | 0.00/3.21k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/4.87k [00:00<?, ?B/s]

Downloading data: 0.00B [00:00, ?B/s]

Generating train split:   0%|          | 0/5574 [00:00<?, ? examples/s]

Dataset({
    features: ['sms', 'label'],
    num_rows: 1115
})


In [3]:
#use the GPT-2 tokinzer
from transformers import GPT2Tokenizer

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

#since the GPT2 tokenizer does not have pad_token, set it to eos_token
#question: is this fine, or should we use a different token
tokenizer.pad_token = tokenizer.eos_token

tokenized_ds = {}
for split in splits:
    tokenized_ds[split] = dataset[split].map(
        lambda x:tokenizer(x["sms"],truncation=True, padding=True),batched=True
    )

print(tokenized_ds['train'])
print(tokenized_ds['test'])

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Map:   0%|          | 0/4459 [00:00<?, ? examples/s]

Map:   0%|          | 0/1115 [00:00<?, ? examples/s]

Dataset({
    features: ['sms', 'label', 'input_ids', 'attention_mask'],
    num_rows: 4459
})
Dataset({
    features: ['sms', 'label', 'input_ids', 'attention_mask'],
    num_rows: 1115
})


In [4]:
#load the GPT2ForSequenceClassification model

from transformers import GPT2ForSequenceClassification
model = GPT2ForSequenceClassification.from_pretrained("gpt2")

#set the model padding token to same as tokenizer padding token
model.config.pad_token_id = tokenizer.pad_token_id

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [5]:
#define the function to compute metric
#use accuracy metrics
import numpy as np

# metric = load_metric('accuracy')

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=-1)
    return {"accuracy":(predictions==labels).mean()}

In [6]:
#create a trainer
from transformers import Trainer, TrainingArguments, DataCollatorWithPadding
from datasets import load_metric

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

trainer = Trainer(
        model=model,
        args=TrainingArguments(
            output_dir="./data/temp",
            per_device_eval_batch_size=4,
            evaluation_strategy="epoch",
            save_strategy="epoch"),
        eval_dataset=tokenized_ds['test'],
        data_collator=data_collator,
        tokenizer=tokenizer,
        compute_metrics=compute_metrics)


In [7]:
#evaluate the pre-trained model
op = trainer.evaluate()
op

{'eval_loss': 9.291898727416992,
 'eval_accuracy': 0.13632286995515694,
 'eval_runtime': 13.9953,
 'eval_samples_per_second': 79.67,
 'eval_steps_per_second': 19.935}

In [8]:
import pandas as pd

df_pretrained = pd.DataFrame.from_dict(op, orient = 'index',columns=["Pre-trained"])
df_pretrained

Unnamed: 0,Pre-trained
eval_loss,9.291899
eval_accuracy,0.136323
eval_runtime,13.9953
eval_samples_per_second,79.67
eval_steps_per_second,19.935


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [9]:
#Create a PEFT Config

from peft import LoraConfig, TaskType

config = LoraConfig(
    r=8, # Rank
    lora_alpha=32,
    target_modules=['c_attn', 'c_proj'],
    lora_dropout=0.1,
    bias="none",
    task_type=TaskType.SEQ_CLS
)

In [10]:
#Create a PEFT Model from the transformer model

from transformers import GPT2ForSequenceClassification
from peft import get_peft_model

model_1 = GPT2ForSequenceClassification.from_pretrained("gpt2")
lora_model = get_peft_model(model_1, config)
lora_model.config.pad_token_id = tokenizer.pad_token_id

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [11]:
#use hugging face trainer to train the mode
from transformers import Trainer, TrainingArguments, DataCollatorWithPadding

data_collator = DataCollatorWithPadding(tokenizer)

lora_trainer = Trainer(
                model=lora_model,
                args=TrainingArguments(
                    output_dir='./data/peft',
                    learning_rate=2e-3,
                    per_device_train_batch_size=4,
                    per_device_eval_batch_size=4,
                    num_train_epochs=2,
                    weight_decay=.01,
                    evaluation_strategy='epoch',
                    save_strategy='epoch',
                    load_best_model_at_end=True
                ),
                train_dataset=tokenized_ds['train'],   
                eval_dataset=tokenized_ds['test'],           
                data_collator=data_collator,
                tokenizer=tokenizer,
                compute_metrics=compute_metrics
                )


In [12]:
lora_trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,0.2482,0.140961,0.977578
2,0.0685,0.071448,0.989238


Checkpoint destination directory ./data/peft/checkpoint-1115 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory ./data/peft/checkpoint-2230 already exists and is non-empty.Saving will proceed but saved results may be invalid.


TrainOutput(global_step=2230, training_loss=0.21804306560567677, metrics={'train_runtime': 416.0662, 'train_samples_per_second': 21.434, 'train_steps_per_second': 5.36, 'total_flos': 1082766751948800.0, 'train_loss': 0.21804306560567677, 'epoch': 2.0})

In [13]:
#save the model and the tokenizer
lora_model.save_pretrained("./model/model")
tokenizer.save_pretrained("./model/tokenizer")

('./model/tokenizer/tokenizer_config.json',
 './model/tokenizer/special_tokens_map.json',
 './model/tokenizer/vocab.json',
 './model/tokenizer/merges.txt',
 './model/tokenizer/added_tokens.json')

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [14]:
model_saved_lora =  GPT2ForSequenceClassification.from_pretrained("./model/model")

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [15]:
tokenizer_saved_lora = GPT2Tokenizer.from_pretrained("./model/tokenizer")
model_saved_lora.config.pad_token_id = tokenizer_saved_lora.pad_token_id

In [16]:
lora_trainer_2 = Trainer(
                model=model_saved_lora,
                args=TrainingArguments(
                    output_dir='./data/peft-temp',
                    per_device_eval_batch_size=4,
                    evaluation_strategy='epoch',
                    save_strategy='epoch'
                ),
                eval_dataset=tokenized_ds['test'],           
                data_collator=data_collator,
                tokenizer=tokenizer_saved_lora,
                compute_metrics=compute_metrics)

In [17]:
op = lora_trainer_2.evaluate()

In [18]:
df_peft_lora = pd.DataFrame.from_dict(op, orient = 'index',columns=["Fine Tuned (LoRA)"])
df_peft_lora

Unnamed: 0,Fine Tuned (LoRA)
eval_loss,0.079422
eval_accuracy,0.983857
eval_runtime,16.148
eval_samples_per_second,69.049
eval_steps_per_second,17.278


In [29]:
#compare evaluation of pre-trained and lora fine tuned
df = df_pretrained.join(df_peft_lora)
df

Unnamed: 0,Pre-trained,Fine Tuned (LoRA)
eval_loss,9.291899,0.079422
eval_accuracy,0.136323,0.983857
eval_runtime,13.9953,16.148
eval_samples_per_second,79.67,69.049
eval_steps_per_second,19.935,17.278


# QLoRA 

In [21]:
#based on HuuginQuantization tutorial given at 
import torch
from transformers import BitsAndBytesConfig
# import bitsandbytes

# create the BitsAndBytesConfig

config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
)

#Pass the config the from_pretrained model

from transformers import GPT2ForSequenceClassification
# model_ql = AutoModelForSequenceClassification.from_pretrained("gpt2", quantization_config=config)
model_ql = GPT2ForSequenceClassification.from_pretrained("gpt2", quantization_config=config)
model_ql.config.pad_token_id = tokenizer.pad_token_id
#preprocess the quantized model for training

from peft import prepare_model_for_kbit_training
model_ql = prepare_model_for_kbit_training(model_ql)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [22]:
#print the model to get the list of liner layers.
#as "all-linear" module_type is not working
print(model_ql)

GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Linear4bit(in_features=768, out_features=2304, bias=True)
          (c_proj): Linear4bit(in_features=768, out_features=768, bias=True)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Linear4bit(in_features=768, out_features=3072, bias=True)
          (c_proj): Linear4bit(in_features=3072, out_features=768, bias=True)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, ele

In [23]:
#create a LoRAConfig

from peft import LoraConfig, TaskType

config = LoraConfig(
    r=8, # Rank
    lora_alpha=32,
    #     target_modules = ["all-linear"],       #QLoRA style training is giving error
    target_modules=['c_attn', 'c_proj', 'c_fc'],   #so add layers manually
    lora_dropout=0.1,
    bias="none",
    task_type=TaskType.SEQ_CLS,
)

# use the get_peft_model() function to create a PeftModel from the quantized model and configuration

from peft import get_peft_model
model_qlora = get_peft_model(model_ql, config)

train_args = TrainingArguments(
                              output_dir="./model/qlora",
                              learning_rate=2e-3,
                              per_device_train_batch_size=4,
                              per_device_eval_batch_size=4,
                              num_train_epochs=2,
                              weight_decay=0.01,
                              evaluation_strategy="epoch",
                              save_strategy="epoch",
                              load_best_model_at_end=True,
                    )

from transformers import DataCollatorWithPadding

trainer = Trainer(
              model=model_qlora,
              args=train_args,
              train_dataset=tokenized_ds['train'],
              eval_dataset=tokenized_ds['test'],
              data_collator=DataCollatorWithPadding(tokenizer),
              tokenizer=tokenizer,
              compute_metrics=compute_metrics
)

In [25]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,0.137,0.074792,0.980269
2,0.0516,0.072296,0.989238


Checkpoint destination directory ./model/qlora/checkpoint-1115 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory ./model/qlora/checkpoint-2230 already exists and is non-empty.Saving will proceed but saved results may be invalid.


TrainOutput(global_step=2230, training_loss=0.14722054143657598, metrics={'train_runtime': 662.9061, 'train_samples_per_second': 13.453, 'train_steps_per_second': 3.364, 'total_flos': 1116558924410880.0, 'train_loss': 0.14722054143657598, 'epoch': 2.0})

In [26]:
op = trainer.evaluate()

In [27]:
df_qlora = pd.DataFrame.from_dict(op, orient = 'index',columns=["Fine Tuned (QLoRA)"])
df_qlora

Unnamed: 0,Fine Tuned (QLoRA)
eval_loss,0.072296
eval_accuracy,0.989238
eval_runtime,20.8836
eval_samples_per_second,53.391
eval_steps_per_second,13.36
epoch,2.0


In [30]:
#compare evaluation of all
df = df.join(df_qlora)
df

Unnamed: 0,Pre-trained,Fine Tuned (LoRA),Fine Tuned (QLoRA)
eval_loss,9.291899,0.079422,0.072296
eval_accuracy,0.136323,0.983857,0.989238
eval_runtime,13.9953,16.148,20.8836
eval_samples_per_second,79.67,69.049,53.391
eval_steps_per_second,19.935,17.278,13.36
