# Lightweight Fine-Tuning Project


* PEFT technique: Lora
* Model: gpt2 with a classifier added the instructions suggested a classification dataset.
* Evaluation approach: The Trainer evaluate method is used with accuracy as metric. 
    * First the model with untrained classifier is evaluated
    * Next the classifier is trained and evaluated
    * Next the entire model is fine tuned using Peft Lora
    * Finally, the fined tuned model is evaluted
* Fine-tuning dataset: rikka-snow/prompt-injection-multilingual as the instructions suggested a classification dataset.

## Loading and Evaluating a Foundation Model

### Load a pretrained HF model

In [1]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(
    "gpt2",
    num_labels=2,
    id2label={0: "Normal", 1: "Prompt Injection"},  # For converting predictions to strings
    label2id={"Normal": 0, "Prompt Injection": 1},
)
model.config.pad_token_id = model.config.eos_token_id

# Freeze all the parameters of the base model
for param in model.base_model.parameters():
    param.requires_grad = False

# Show the model architecture
print(model)


Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2SdpaAttention(
          (c_attn): Conv1D(nf=2304, nx=768)
          (c_proj): Conv1D(nf=768, nx=768)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D(nf=3072, nx=768)
          (c_proj): Conv1D(nf=768, nx=3072)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=2, bias=False)
)


### Load and preprocess a dataset

In [2]:
from datasets import load_dataset
# Load the train and test splits of the imdb dataset
splits = ["train", "test"]

ds = {split: ds for split, ds in zip(splits, load_dataset("rikka-snow/prompt-injection-multilingual", split=splits))}

# Thin out the dataset to make it run faster for this example
for split in splits:
    ds[split] = ds[split].shuffle(seed=42).select(range(500))

In [3]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

def preprocess_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)


tokenized_ds = {}
for split in splits:
    tokenized_ds[split] = ds[split].map(preprocess_function, batched=True)



### Evaluate the pretrained model


In [4]:
import numpy as np
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments


def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}

trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir="./data/prompt_classification1",
        learning_rate=2e-3,
        per_device_train_batch_size=16,
        per_device_eval_batch_size=16,
        num_train_epochs=3,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer, padding="max_length"),
    compute_metrics=compute_metrics,
)

eval_result = trainer.evaluate()

  trainer = Trainer(


  0%|          | 0/32 [00:00<?, ?it/s]

In [5]:
print(f"Accuracy of untrained model: {eval_result['eval_accuracy']:.2f}")

Accuracy of untrained model: 0.48


As the classifier was initialized without training, the accuracy is bad (around 0.5). Next we train the classifier only before fine tuning the entire model. 

In [6]:
trainer.train()
eval_result = trainer.evaluate()
print(f"Accuracy of model with trained classifier: {eval_result['eval_accuracy']:.2f}")

  0%|          | 0/96 [00:00<?, ?it/s]

  0%|          | 0/32 [00:00<?, ?it/s]

{'eval_loss': 0.6524555087089539, 'eval_model_preparation_time': 0.002, 'eval_accuracy': 0.64, 'eval_runtime': 492.2057, 'eval_samples_per_second': 1.016, 'eval_steps_per_second': 0.065, 'epoch': 1.0}


  0%|          | 0/32 [00:00<?, ?it/s]

{'eval_loss': 0.5048156976699829, 'eval_model_preparation_time': 0.002, 'eval_accuracy': 0.752, 'eval_runtime': 466.7067, 'eval_samples_per_second': 1.071, 'eval_steps_per_second': 0.069, 'epoch': 2.0}


  0%|          | 0/32 [00:00<?, ?it/s]

{'eval_loss': 0.5306786894798279, 'eval_model_preparation_time': 0.002, 'eval_accuracy': 0.722, 'eval_runtime': 472.8662, 'eval_samples_per_second': 1.057, 'eval_steps_per_second': 0.068, 'epoch': 3.0}
{'train_runtime': 4009.8256, 'train_samples_per_second': 0.374, 'train_steps_per_second': 0.024, 'train_loss': 0.6450621287027994, 'epoch': 3.0}


  0%|          | 0/32 [00:00<?, ?it/s]

Accuracy of model with trained classifier: 0.75


The classifier is now trained and eval_accuracy should be around 0.7. The accuracy has improved as expected.

## Performing Parameter-Efficient Fine-Tuning


### Create a PEFT model

In [7]:
from peft import LoraConfig, get_peft_model, TaskType
import torch

# freeze all the weights
for param in model.parameters():
    param.requires_grad = False

# LoRa
config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.SEQ_CLS
)

peft_model = get_peft_model(model, config)

peft_model.print_trainable_parameters()

trainable params: 591,360 || all params: 125,032,704 || trainable%: 0.4730




### Train the PEFT model

In [8]:
trainer = Trainer(
    model=peft_model,
    args=TrainingArguments(
        output_dir="./data/prompt_classification2",
        learning_rate=2e-3,
        per_device_train_batch_size=4,
        per_device_eval_batch_size=4,
        num_train_epochs=2,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer, padding="max_length"),
    compute_metrics=compute_metrics,
)

trainer.train()

  trainer = Trainer(


  0%|          | 0/250 [00:00<?, ?it/s]

  0%|          | 0/125 [00:00<?, ?it/s]

{'eval_loss': 0.6544254422187805, 'eval_accuracy': 0.772, 'eval_runtime': 512.5921, 'eval_samples_per_second': 0.975, 'eval_steps_per_second': 0.244, 'epoch': 1.0}


  0%|          | 0/125 [00:00<?, ?it/s]

{'eval_loss': 0.5516521334648132, 'eval_accuracy': 0.818, 'eval_runtime': 530.677, 'eval_samples_per_second': 0.942, 'eval_steps_per_second': 0.236, 'epoch': 2.0}
{'train_runtime': 4376.401, 'train_samples_per_second': 0.228, 'train_steps_per_second': 0.057, 'train_loss': 0.565503662109375, 'epoch': 2.0}


TrainOutput(global_step=250, training_loss=0.565503662109375, metrics={'train_runtime': 4376.401, 'train_samples_per_second': 0.228, 'train_steps_per_second': 0.057, 'total_flos': 526226817024000.0, 'train_loss': 0.565503662109375, 'epoch': 2.0})

In [9]:
eval_result = trainer.evaluate()
print(f"Accuracy of model with finetuning: {eval_result['eval_accuracy']:.2f}")

  0%|          | 0/125 [00:00<?, ?it/s]

Accuracy of model with finetuning: 0.82


Now the we should be at around 0.8 😃

### Save the PEFT model

In [10]:
# save the peft model
peft_model.save_pretrained("gpt-lora")

# save score weitghts of the model classifier
torch.save(model.state_dict()['score.modules_to_save.default.weight'], "gpt-lora/score.pth")

## Performing Inference with a PEFT Model

### Load the saved PEFT model

In [11]:
# load the peft model
from peft import AutoPeftModelForSequenceClassification
loaded_model = AutoPeftModelForSequenceClassification.from_pretrained("gpt-lora")
loaded_model.config.pad_token_id = loaded_model.config.eos_token_id


Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [12]:
# load the score weights to address the warning that score weights are newly initialized
loaded_model.model.state_dict()['score.modules_to_save.default.weight'] = torch.load("gpt-lora/score.pth")

  loaded_model.model.state_dict()['score.modules_to_save.default.weight'] = torch.load("gpt-lora/score.pth")


### Evaluate the fine-tuned model

In [13]:
trainer = Trainer(
    model=loaded_model,
    args=TrainingArguments(
        output_dir="./data/prompt_classification3",
        evaluation_strategy="epoch",
    ),
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer, padding="max_length"),
    compute_metrics=compute_metrics,
)

  trainer = Trainer(


In [14]:
eval_result = trainer.evaluate()
print(f"Accuracy of model with finetuning: {eval_result['eval_accuracy']:.2f}")

  0%|          | 0/63 [00:00<?, ?it/s]

Accuracy of model with finetuning: 0.82
