# Fine Tuning Llama Model for Generation

**Authors:** Matías Arévalo, Pilar Guerrero, Moritz Goebbels, Tomás Lock, Allan Stalker  
**Date:** January – May 2025  

## Purpose
Fine-tunes a LLaMA based language model using LoRA adapters to generate spam messages based on the preprocessed dataset with generated prompts.

## CONSIDERATION
Due to some updates done to the `unsloth` package in early April, some outputs might differ from the ones used for our model. Because of this, the `output` used from this code in the following notebooks is going to be provided in the `Outputs` folder in the repository, in the subfolder called `fine_tuned_llama` as `checkpoint_1500` so it can be used for replication.  
In case one runs the notebook again, some differences will be found.

## Environment Setup

In [None]:
%%capture
import torch
major_version, minor_version = torch.cuda.get_device_capability()
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
if major_version >= 8:
    !pip install --no-deps packaging ninja einops flash-attn xformers trl peft accelerate bitsandbytes
else:
    !pip install --no-deps xformers trl peft accelerate bitsandbytes
pass

In [None]:
!pip install --upgrade unsloth
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

## Import Libraries

In [None]:
import os
import torch
import pandas as pd
from trl import SFTTrainer
from datasets import Dataset
from unsloth import FastLanguageModel
from transformers import TextStreamer
from unsloth import FastLanguageModel
from transformers import TrainingArguments

## Model Setup  
In this section, we configure the key parameters for loading the LLaMA-based mode we will be fine tuning later on the notebook.

We set the following parameters:  
- `max_seq_length` : used to define the maximum number of tokens the model will take as input. For efficient training we set it to `2048`, however it is important to note that LLaMA-3 models can handle up to 8k tokens.
- `dtype` : used to specify the data types for the models weights. To automatically select the appropriate type based on the hardware we set it to `none`.
- `load_in_4bit` : used to enable 4-bit optimization, which significantly reduces memory usage. As it is good when fine-tuning models on limited-resource environments, we set it to `True`.


Additionally, we included a list of a set of available 4-bit fine tuned models from the Unsloth library that can be used if different variants are needed throughout the notebook.

In [None]:
max_seq_length = 2048
dtype = None
load_in_4bit = True

fourbit_models = [
    "unsloth/mistral-7b-bnb-4bit",
    "unsloth/mistral-7b-instruct-v0.2-bnb-4bit",
    "unsloth/llama-2-7b-bnb-4bit",
    "unsloth/gemma-7b-bnb-4bit",
    "unsloth/gemma-7b-it-bnb-4bit",
    "unsloth/gemma-2b-bnb-4bit",
    "unsloth/gemma-2b-it-bnb-4bit",
    "unsloth/llama-3-8b-bnb-4bit",
]

## Load Pre-trained Model  
In this section we load the pre-trained LLaMA-3 8B model using the Unsloth library, applying the previously defined settings for sequence length, data type, and quantization.

For this we use the `FastLanguageModel.from_pretrained()` function to initialize bothe the model and tokenizer.

In [None]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

## Fine Tuning the Model  
Throughout this section, the different steps to fine tune the loaded model will be made.

### Fine Tuning Preparation  
Here we load and preprocess the dataset used for fine-tuning.

The file to use is called `final_scam_prompt_dataset.csv`, which should be located in the `data/` folder.

**Step 1**: Load the dataset and preview

In [None]:
df = pd.read_csv("../../data/final_scam_prompt_dataset.csv")

In [None]:
df = pd.read_csv("final_scam_prompt_dataset.csv")
df.head()

**Step 2**: Drop the `label` column.

We drop this column because for two reasons:  
- Every entry in this dataset is `spam`  
- It is not going to be used in this notebook

In [None]:
df = df.drop(columns=["label"])

**Step 3**: Rename the `message` column

In [None]:
df.rename(columns={"clean_message": "completion"}, inplace=True)

**Step 4**: Merge `prompt` and `completions`

In [None]:
EOS_TOKEN = tokenizer.eos_token or ""
def combine_prompt(row):
    return row["prompt"] + "\n\n" + row["completion"] + EOS_TOKEN
df["full_text"] = df.apply(combine_prompt, axis=1)

**Step 5**: Tokenize input texts

In [None]:
tokenised = tokenizer(
    df["full_text"].tolist(),
    padding="max_length",
    truncation=True,
    max_length=max_seq_length,
    return_tensors="np",
)

**Step 6**: Build final dataset

In [None]:
final_data = {
    "input_ids": tokenised["input_ids"],
    "attention_mask": tokenised["attention_mask"],
    "labels": tokenised["input_ids"].copy()
}
dataset = Dataset.from_dict(final_data)

### Apply LoRA Adapters  
Here we apply lightweight LoRA adapters to the base model, allowing parameter efficient fine tuning.

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

### Model Training Configuration  
Here we set the different characteristics and parameters needed for the model we are making to be trained on.

In [None]:
training_args = TrainingArguments(
    output_dir="llama_finetuning_outputs",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    num_train_epochs=1,
    max_steps=1500,
    learning_rate=2e-4,
    logging_steps=1,
    bf16=torch.cuda.is_bf16_supported(),
    fp16=not torch.cuda.is_bf16_supported(),
    optim="adamw_8bit",
    weight_decay=0.01,
    lr_scheduler_type="linear",
    seed=3407,
)

### Initialize the Trainer

In [None]:
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    args=training_args,
)

### Train and Save the Model  
We are going to save the model as `llama_finetuning_outputs` within the `Outputs`folder, which is at the same level as the `data` folder.

In the case of this repository, we will include the last checkpoint of this training model in the `Outputs` folder within the `llama_finetuning_outputs` subfolder. This checkpoint, saved as `checkpoint-1500`, is the one we are going to use in further notebooks and can be directly loaded into them without needed to run the following training process.

In [None]:
os.environ['UNSLOTH_RETURN_LOGITS'] = '1'
trainer.train()
trainer.save_model("../../Outputs/llama_finetuning_outputs")

## Load Fine Tuned Model
In this section, we will set up the environment following the previous setup made and load the `checkpoint-1500` found within the `llama_finetuning_outputs` folder, which should be located within the `Outputs` folder to apply it.

### Load Base Model (Same as the one for Training)

In [None]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-bnb-4bit",
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)

### Apply LoRA Adapters (Same as the one for Training)

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = [
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
)

### Load Fine Tuned Weights
This is the output we saved during the training of the model, it should be:  
- Called `llama_finetuning_outputs`
- Located in `Outputs` folder

However, for this example we are going to be using the checkpoint we have manually loaded into the `Outputs` folder, the location and file name is:
- File name: `checkpoint-1500`
- Location: `Outputs/llama_finetuning_outputs`

In [None]:
model.load_adapter("../../Outputs/llama_finetuning_outputs/checkpoint-1500", adapter_name="default")

## Testing Fine Tuned Model: Generate Sample Outputs

Use the fine tuned model to generate spam messages based on custom prompts to see how it generates messages.

### Test 1: 150 Maximum New Tokens

In [None]:
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
prompt = "Generate a spam message targeting people interested in cryptocurrency investments."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

_ = model.generate(
    **inputs,
    streamer=streamer,
    max_new_tokens=150,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.1,
)

### Test 2: 200 Maximum New Tokens

In [None]:
prompt = "Generate a spam message targeting individuals interested in investment opportunities, including cryptocurrency, stocks, and forex, with an emphasis on high returns.\n\nMessage:"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

_ = model.generate(
    **inputs,
    streamer=streamer,
    max_new_tokens=200,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.1,
)