# LLM Security - Prompt Injection
## Part 3 - Classification Using a Fine-tuned LLM

In this notebook, we load the raw dataset and fine-tune a pre-trained large language model to classify malicious prompts.
> **INPUT:** the raw dataset loaded from Hugging Face library. <br>
> **OUTPUT:** the performance analysis of fine-tuned LLM.


### 1. INITIALIZATION

In [1]:
# Import necessary libraries and modules
import pandas as pd

In [2]:
# Set display options
pd.set_option('display.max_columns', None)

### 2. LOADING DATASET

Since we are going to use a pre-trained LLM and fine-tune it using the training dataset, we need to load both training and testing data sets.

In [3]:
# Initialize data set location and file name
data_file_path = "../data/raw/"
data_file_name_train = "train-00000-of-00001-9564e8b05b4757ab"
data_file_name_test = "test-00000-of-00001-701d16158af87368"
data_file_ext = ".parquet"

# Loading data set into a pandas DataFrame
data_train = pd.read_parquet(data_file_path + data_file_name_train + data_file_ext)
data_test = pd.read_parquet(data_file_path + data_file_name_test + data_file_ext)

In [4]:
# Rename "text" column into "prompt"
data_train.rename(columns={"text":"prompt"}, inplace=True)
data_test.rename(columns={"text":"prompt"}, inplace=True)

We already explored the dataset in the previous notebooks, so we will directly proceed to the fine-tuning phase.

### 3. MODEL FINE-TUNING

In this experiment, we aim at using a pre-trained LLM and fine-tuning it on the classification task.

In the previous experiment, we used the pre-trained XLM-RoBERTa, the multilingual version of RoBERTa (Robustly optimized BERT approach), the enhanced version of BERT (Bidirectional Encoder Representations from Transformers).

To explore the effect of model fine-tuning, we fine-tune XLM-RoBERTa on our training dataset and then re-evaluate its performance in correctly predicting prompt injections.

In [5]:
# Import the model, its tokenizer, and torch library
from transformers import XLMRobertaTokenizer, XLMRobertaForSequenceClassification
import torch

In [6]:
# Load the model tokenizer
tokenizer = XLMRobertaTokenizer.from_pretrained('xlm-roberta-base')

In [7]:
# A utility function to take a batch of data and tokenize the prompts
def tokenize_batch(batch):
    return tokenizer(batch['prompt'], padding=True, truncation=True)

In [8]:
# Tokenize prompts in both training and testing datasets
prompts_train_tokenized = tokenize_batch(data_train.to_dict(orient='list'))
prompts_test_tokenized = tokenize_batch(data_test.to_dict(orient='list'))

In [9]:
# Define a dataset Class to work with PyTorch's DataLoader
class CustomDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.labels[idx])
        return item

    def __len__(self):
        return len(self.labels)

In [10]:
train_dataset = CustomDataset(prompts_train_tokenized, data_train['label'])
test_dataset = CustomDataset(prompts_test_tokenized, data_test['label'])

In [11]:
# Load pre-trained Model
model = XLMRobertaForSequenceClassification.from_pretrained('xlm-roberta-large', num_labels=2)

Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at xlm-roberta-large and are newly initialized: ['classifier.out_proj.bias', 'classifier.dense.weight', 'classifier.out_proj.weight', 'classifier.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


When loading the model, we get this message indicating that some weights are not initialized. This is normal since the model is pre-trained but not fine-tuned for the downstream task.

In [12]:
# Import TrainingArguments to handle the various training configurations
from transformers import TrainingArguments

# Define training arguments for fine-tuning
training_args = TrainingArguments(
    output_dir="../output",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=5,
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    logging_dir="../output/logs",
)

In [13]:
# Import accuracy metrics
from sklearn.metrics import accuracy_score, precision_recall_fscore_support

# A utility function for model evaluation during fine-tuning
def evaluate_model(trainer):
    
    # Extract predictions and labels
    predictions, labels = trainer.predictions.argmax(axis=1), trainer.label_ids

    # Calculate accuracy
    accuracy = accuracy_score(labels, predictions)
    
    # Calculate precision, recall, and f1 score
    precision, recall, f1, _ = precision_recall_fscore_support(labels, predictions, average="weighted", zero_division=1)
  
    return {
        "accuracy": accuracy,
        "precision": precision,
        "recall": recall,
        "f1": f1,
    }

In [14]:
# Import the Trainer class
from transformers import Trainer

# Define trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    compute_metrics=evaluate_model,
)

In [15]:
# Fine-tune the model
trainer.train()

  0%|          | 0/345 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

{'eval_loss': 0.1919652670621872, 'eval_accuracy': 0.9655172413793104, 'eval_precision': 0.9661117717003568, 'eval_recall': 0.9655172413793104, 'eval_f1': 0.9655274949501164, 'eval_runtime': 6.6906, 'eval_samples_per_second': 17.338, 'eval_steps_per_second': 2.242, 'epoch': 1.0}


  0%|          | 0/15 [00:00<?, ?it/s]

{'eval_loss': 0.5438994765281677, 'eval_accuracy': 0.9137931034482759, 'eval_precision': 0.9268547544409613, 'eval_recall': 0.9137931034482759, 'eval_f1': 0.9134076776812786, 'eval_runtime': 6.6359, 'eval_samples_per_second': 17.481, 'eval_steps_per_second': 2.26, 'epoch': 2.0}


  0%|          | 0/15 [00:00<?, ?it/s]

{'eval_loss': 0.0022928104735910892, 'eval_accuracy': 1.0, 'eval_precision': 1.0, 'eval_recall': 1.0, 'eval_f1': 1.0, 'eval_runtime': 6.5495, 'eval_samples_per_second': 17.711, 'eval_steps_per_second': 2.29, 'epoch': 3.0}


  0%|          | 0/15 [00:00<?, ?it/s]

{'eval_loss': 0.002756111789494753, 'eval_accuracy': 1.0, 'eval_precision': 1.0, 'eval_recall': 1.0, 'eval_f1': 1.0, 'eval_runtime': 6.7217, 'eval_samples_per_second': 17.257, 'eval_steps_per_second': 2.232, 'epoch': 4.0}


  0%|          | 0/15 [00:00<?, ?it/s]

{'eval_loss': 0.04558767005801201, 'eval_accuracy': 0.9913793103448276, 'eval_precision': 0.9915305505142166, 'eval_recall': 0.9913793103448276, 'eval_f1': 0.9913812336042136, 'eval_runtime': 6.7731, 'eval_samples_per_second': 17.127, 'eval_steps_per_second': 2.215, 'epoch': 5.0}
{'train_runtime': 6992.2065, 'train_samples_per_second': 0.39, 'train_steps_per_second': 0.049, 'train_loss': 0.1634710228961447, 'epoch': 5.0}


TrainOutput(global_step=345, training_loss=0.1634710228961447, metrics={'train_runtime': 6992.2065, 'train_samples_per_second': 0.39, 'train_steps_per_second': 0.049, 'train_loss': 0.1634710228961447, 'epoch': 5.0})

In [16]:
# Evaluate the final model
results = trainer.evaluate()
print(results)

  0%|          | 0/15 [00:00<?, ?it/s]

{'eval_loss': 0.04558767005801201, 'eval_accuracy': 0.9913793103448276, 'eval_precision': 0.9915305505142166, 'eval_recall': 0.9913793103448276, 'eval_f1': 0.9913812336042136, 'eval_runtime': 2.3725, 'eval_samples_per_second': 48.893, 'eval_steps_per_second': 6.322, 'epoch': 5.0}


The fine-tuning is finished now, we will save the model to a local path.

This step is preferred so we can reload it later for evaluation and inference without the need to repeat the fine-tuning process.

In [17]:
# Set local model path
models_path = "../models/xlm_roberta"

In [18]:
# Save the model and tokenizer for a later use 
model.save_pretrained(models_path)
tokenizer.save_pretrained(models_path)

('../models/xlm_roberta\\tokenizer_config.json',
 '../models/xlm_roberta\\special_tokens_map.json',
 '../models/xlm_roberta\\sentencepiece.bpe.model',
 '../models/xlm_roberta\\added_tokens.json')

### 4. RESULT ANALYSIS

In [19]:
# Load the locally-stored fine-tuned model
loaded_model = XLMRobertaForSequenceClassification.from_pretrained(models_path)