### Model Fine-Tuning with LoRA and Dataset Preparation
In this notebook, we will:
1. Load and preprocess the dataset
2. Load the base model and tokenizer
3. Apply LoRA (Low-Rank Adaptation) to the model for fine-tuning
4. Customize the tokenizer with a chat template
5. Split the dataset into training and validation sets
6. Define and initialize the Trainer for supervised fine-tuning (SFT)
7. Train and evaluate the model
8. Optionally save the trained model and perform GGUF conversion

### Import necessary libraries


In [1]:
# Import necessary libraries
import os
import subprocess
from unsloth import FastLanguageModel, is_bfloat16_supported
from datasets import load_dataset
from unsloth.chat_templates import get_chat_template
from transformers import TrainingArguments
from trl import SFTTrainer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


### Step 1: Load the base model and tokenizer

In [2]:
model_name = "unsloth/llama-3.2-1b-bnb-4bit"  # Specify the base model name
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=2048,  # Maximum sequence length for the model
    dtype=None,           # Default data type (e.g., fp32 or bf16)
    load_in_4bit=True,    # Load the model in 4-bit precision for memory efficiency
    local_files_only=True # Use local files only (no downloading from the internet)
)

==((====))==  Unsloth 2025.2.5: Fast Llama patching. Transformers: 4.48.3.
   \\   /|    GPU: NVIDIA GeForce RTX 4090. Max memory: 23.988 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 8.9. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post2. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


### Step 2: Apply LoRA (Low-Rank Adaptation) to the model for fine-tuning

In [3]:
model = FastLanguageModel.get_peft_model(
    model,
    r=16,                           # Rank of the LoRA matrices
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],  # Target modules for LoRA
    lora_alpha=32,                  # Scaling factor for LoRA updates
    lora_dropout=0.05,              # Dropout rate applied to LoRA updates
    bias="none",                    # Bias configuration ("none" means no bias adaptation)
    use_gradient_checkpointing="unsloth",  # Enable gradient checkpointing to save memory
    random_state=3407,              # Random seed for reproducibility
    use_rslora=True                 # Use RS-LoRA (a variant of LoRA)
)

Unsloth: Dropout = 0 is supported for fast patching. You are using dropout = 0.05.
Unsloth will patch all other layers, except LoRA matrices, causing a performance hit.
Unsloth 2025.2.5 patched 16 layers with 0 QKV layers, 0 O layers and 0 MLP layers.


### Step 3: Customize the tokenizer with a chat template

In [4]:
tokenizer = get_chat_template(
    tokenizer,
    mapping={
        "role": "from",         # Map role to "from"
        "content": "value",     # Map content to "value"
        "user": "human",        # Map user role to "human"
        "assistant": "gpt"      # Map assistant role to "gpt"
    }
)

Unsloth: Will map <|im_end|> to EOS = <|end_of_text|>.


### Step 4: Load and preprocess the dataset

In [5]:
# Load the dataset with conversations in ShareGPT format
origdataset = load_dataset("philschmid/guanaco-sharegpt-style", split="train")

# Select only the 'conversations' column from the dataset
conversations_dataset = origdataset.select_columns(['conversations'])

# Ensure the dataset is not empty
if len(conversations_dataset) == 0:
    raise ValueError("The dataset is empty. Please check the data source.")

# Convert the dataset to a list format
conversations_list = conversations_dataset['conversations']

# Split the dataset into training and validation sets
train_conversations, val_conversations = train_test_split(conversations_list, test_size=0.1, random_state=42)

# Ensure the splits are not empty
if not train_conversations or not val_conversations:
    raise ValueError("The dataset split resulted in empty subsets. Please check the data and split parameters.")

# Create new datasets from the split lists
train_dataset = origdataset.filter(lambda x: x['conversations'] in train_conversations, desc="Filtering training conversations")
val_dataset = origdataset.filter(lambda x: x['conversations'] in val_conversations, desc="Filtering validation conversations")

### Step 5: Format conversations using the chat template and tokenize them

In [6]:
train_dataset = train_dataset.map(
    lambda x: {
        "text": tokenizer.apply_chat_template(
            x["conversations"],
            tokenize=False,           # Do not tokenize during formatting
            add_generation_prompt=False  # Do not add generation prompts
        )
    },
    batched=True,                    # Process data in batches for efficiency
    batch_size=100,                  # Batch size for processing
    desc="Formatting training conversations"  # Description for progress bar/logging
)

val_dataset = val_dataset.map(
    lambda x: {
        "text": tokenizer.apply_chat_template(
            x["conversations"],
            tokenize=False,           # Do not tokenize during formatting
            add_generation_prompt=False  # Do not add generation prompts
        )
    },
    batched=True,                    # Process data in batches for efficiency
    batch_size=100,                  # Batch size for processing
    desc="Formatting validation conversations"  # Description for progress bar/logging
)

### 6. Define and initialize the Trainer for supervised fine-tuning (SFT)


In [9]:
#%load_ext tensorboard
#%tensorboard --logdir ./logs --host=0.0.0.0 --port=6008

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    dataset_text_field="text",       # Specify which field contains text data in the dataset

    args=TrainingArguments(          # Training arguments configuration
        per_device_train_batch_size=2,      # Batch size per device (GPU/CPU)
        gradient_accumulation_steps=4,      # Accumulate gradients over multiple steps
        warmup_steps=5,                     # Number of warmup steps for learning rate scheduler
        max_steps=60,                       # Total number of training steps
        learning_rate=2e-4,                 # Learning rate for optimizer
        fp16=not is_bfloat16_supported(),   # Use FP16 if BF16 is not supported by hardware
        bf16=is_bfloat16_supported(),       # Use BF16 if supported by hardware (e.g., newer GPUs)
        logging_steps=1,                    # Log training metrics every step
        optim="adamw_8bit",                 # Optimizer with memory-efficient AdamW implementation (8-bit)
        weight_decay=0.01,                  # Weight decay regularization factor
        lr_scheduler_type="linear",         # Learning rate scheduler type (linear decay)
        seed=3407,                          # Random seed for reproducibility
        output_dir="outputs",               # Directory to save training outputs and checkpoints
        report_to="none",                    # Disable reporting to external tools like WandB or TensorBoard
        #report_to="tensorboard",
        logging_dir="./logs"                  # Directory for TensorBoard logs
    ),
)



Map:   0%|          | 0/8129 [00:00<?, ? examples/s]

### 7. Train and evaluate the model


In [10]:
trainer.train()
eval_results = trainer.evaluate()
print("Evaluation results:", eval_results)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 8,129 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss
1,1.6471
2,1.2606
3,1.3062
4,1.3317
5,1.1249
6,1.283
7,0.9043
8,1.1413
9,1.3425
10,1.34


Unsloth: Not an error, but LlamaForCausalLM does not accept `num_items_in_batch`.
Using gradient accumulation will be very slightly less accurate.
Read more on gradient accumulation issues here: https://unsloth.ai/blog/gradient


Evaluation results: {'eval_loss': 1.5805118083953857, 'eval_runtime': 18.7046, 'eval_samples_per_second': 48.33, 'eval_steps_per_second': 6.041, 'epoch': 0.05904059040590406}


#### 8. Optionally save the trained model and perform GGUF conversion

In [None]:
model.save_pretrained_gguf("ggufmodel", tokenizer, quantization_method="q4_k_m")