<div align="center">
  <img src="logo_branding.png" width="250" alt="kavi.ai Logo">
  <h1>Supervised Fine-Tuning (SFT) Mastery</h1>
  <p><b>A Premium Training Module by kavi.ai</b></p>
</div>

---

### 💎 **Smarter Overview**
Supervised Fine-Tuning is the critical 'Instruction Tuning' phase that transforms a raw Base LLM into a conversational agent capable of following complex multi-step directives.

### 🚀 **Enterprise Use Case**
Developing domain-specific corporate assistants (HR, IT, Legal) that must adhere to strict internal protocols and formatting standards.

### 📈 **Strategic Advantages**
- **Total Control**
- **Deterministic Formatting**
- **Knowledge Injection**

---

## Step 1: Install Dependencies

### **Purpose:**
To prepare the environment with necessary libraries like `transformers`, `trl`, and `peft`.

### **Line-by-Line Breakdown:**
- `transformers`: Model architecture and weights.
- `trl`: Training RLHF and SFT tools.
- `peft`: Parameter-Efficient Fine-Tuning.

In [None]:
!pip install transformers --upgrade
!pip install datasets
!pip install trl[peft] --upgrade
!pip install -U git+https://github.com/huggingface/trl
!pip install bitsandbytes loralib
!pip install wandb -U
!pip install hf_transfer

In [None]:
!nvidia-smi

In [None]:
%env HF_HUB_ENABLE_HF_TRANSFER=True
%env WANDB_PROJECT=LLM-Training-Course
%env WANDB_RUN_ID=SFT
%env WANDB_NOTEBOOK_NAME={__vsc_ipynb_file__}


In [None]:
import sys
sys.path.append('/root/llm-training-course/')

## Step 2: Environment Configuration

### **Purpose:**
Setting up tracking and logging for training runs.

### **Line-by-Line Breakdown:**
- `%env`: Set environment variables for HF and WandB.
- `wandb.login()`: Authenticate with Weights & Biases.

In [None]:
import wandb
wandb.login()

## Step 3: Load Model & Tokenizer

### **Purpose:**
Loading the pre-trained model and tokenizer to begin fine-tuning.

### **Line-by-Line Breakdown:**
- `AutoModelForCausalLM`: Generic model loader.
- `AutoTokenizer`: Generic tokenizer loader.

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset
import torch 

train_ds, eval_ds = load_dataset("mlabonne/orpo-dpo-mix-40k", split=["train[:20%]","train[20%:25%]"])

model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-mini-4k-instruct", 
                                             torch_dtype=torch.bfloat16,
                                             device_map='cuda:0'
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

## Step 4: Explore Chat Templates

### **Purpose:**
Understanding how the model expects conversational data to be formatted.

### **Line-by-Line Breakdown:**
- `tokenizer.chat_template`: Inspect the default template for the model.

In [None]:
tokenizer.chat_template

## Step 5: Explore Chat Templates

### **Purpose:**
Understanding how the model expects conversational data to be formatted.

### **Line-by-Line Breakdown:**
- `tokenizer.chat_template`: Inspect the default template for the model.

In [None]:
print(tokenizer)
print("---")
print("Vocab size:", tokenizer.vocab_size)
print("---")
print("Chat template:", tokenizer.chat_template)

In [None]:
from helpers import set_padding_for_tokenizer
set_padding_for_tokenizer(tokenizer)

In [None]:
model

In [None]:
train_ds

## Step 6: Dataset Formatting

### **Purpose:**
Converting raw data into the specific conversational format (messages) expected by the trainer.

### **Line-by-Line Breakdown:**
- `train_ds.map`: Apply formatting logic to every sample.
- `remove_columns`: Clean up unused features.

In [None]:
train_ds = train_ds.map(lambda x: { "messages": [{"role":"system", "content": x["prompt"] }] + x["chosen"] })
eval_ds = eval_ds.map(lambda x: { "messages": [{"role":"system", "content": x["prompt"] }] + x["chosen"] })

In [None]:
columns_to_remove = [c for c in train_ds.column_names if c not in ["messages"]]
train_ds = train_ds.remove_columns(columns_to_remove)

columns_to_remove = [c for c in eval_ds.column_names if c not in ["messages"]]
eval_ds = eval_ds.remove_columns(columns_to_remove)

## Step 7: Configure Trainer Arguments

### **Purpose:**
Defining the hyper-parameters for the training process.

### **Line-by-Line Breakdown:**
- `learning_rate`: Step size for updates.
- `gradient_accumulation_steps`: Simulating larger batches.

In [None]:
import os 
from trl import SFTConfig, SFTTrainer

args = SFTConfig(
    output_dir=os.getenv("WANDB_RUN_ID"),
    report_to="wandb",
    num_train_epochs=1.0,
    do_train=True,
    do_eval=True,
    log_level="debug",
    gradient_checkpointing=True,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=8,
    per_device_eval_batch_size=1,
    lr_scheduler_type="constant",
    bf16=True,
    evaluation_strategy="steps",
    eval_steps=0.2,
    logging_steps=0.1,
    max_grad_norm=.3,
    learning_rate=1e-4,
)


## Step 8: Initialize and Run Training

### **Purpose:**
Setting up the main training loop and executing the fine-tuning.

### **Line-by-Line Breakdown:**
- `SFTTrainer`: High-level wrapper for supervised fine-tuning.
- `trainer.train()`: Start the optimization process.

In [None]:
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    args=args,
    train_dataset=train_ds,
    eval_dataset=eval_ds
    
)
trainer.train()