<div align="center">
  <img src="logo_branding.png" width="250" alt="kavi.ai Logo">
  <h1>Selective Layer Adaptation</h1>
  <p><b>A Premium Training Module by kavi.ai</b></p>
</div>

---

### 💎 **Smarter Overview**
By unfreezing only the LM Head or the final transformer block, we perform surgery on the model's output distribution without risking its foundational knowledge.

### 🚀 **Enterprise Use Case**
Lightweight style-transfer or adapting a model to a new language script where the core logic is already present in the pre-trained weights.

### 📈 **Strategic Advantages**
- **Lowest Overhead**: Often faster and leaner than LoRA for simple tasks.
- **Knowledge Retention**: Guarantees that the base 'facts' of the model are never altered.
- **Quick Iteration**: Ideal for feature extraction and minimal output head tailoring.

---

### 🛠️ **Visualization of the Process**
<div align="center"><svg width="800" height="200" viewBox="0 0 800 200" xmlns="http://www.w3.org/2000/svg">
  <rect x="50" y="70" width="120" height="60" rx="10" fill="#1e293b" stroke="#ef4444" stroke-width="2" />
  <text x="110" y="105" font-family="Arial" font-size="14" fill="#f87171" text-anchor="middle">Layer 1 (F)</text>
  <path d="M 170 100 L 200 100" stroke="#94a3b8" stroke-width="2" fill="none" />
  <rect x="200" y="70" width="120" height="60" rx="10" fill="#1e293b" stroke="#ef4444" stroke-width="2" />
  <text x="260" y="105" font-family="Arial" font-size="14" fill="#f87171" text-anchor="middle">Layer 2 (F)</text>
  <path d="M 320 100 L 350 100" stroke="#94a3b8" stroke-width="2" fill="none" />
  <rect x="350" y="70" width="100" height="60" fill="none" stroke="#94a3b8" stroke-dasharray="2,2"/><text x="400" y="105" fill="#94a3b8">...</text>
  <path d="M 450 100 L 480 100" stroke="#94a3b8" stroke-width="2" fill="none" />
  <rect x="480" y="50" width="200" height="100" rx="15" fill="#1e3a8a" stroke="#3b82f6" stroke-width="3" />
  <text x="580" y="100" font-family="Arial" font-size="16" font-weight="bold" fill="white" text-anchor="middle">LM Head / Last Layer</text>
  <text x="580" y="125" font-family="Arial" font-size="14" fill="#93c5fd" text-anchor="middle">UNFREEZED (T)</text>
</svg></div>

---

## Step 1: Install Dependencies

### **Purpose:**
Preparing the workspace with Hugging Face and PyTorch libraries.

### **Line-by-Line Breakdown:**
- `transformers`: Essential for LLM manipulation.
- `datasets`: For loading the training data.

In [None]:
!pip install transformers --upgrade
!pip install datasets
!pip install trl[peft] --upgrade
!pip install -U git+https://github.com/huggingface/trl
!pip install bitsandbytes loralib
!pip install wandb -U
!pip install hf_transfer

In [None]:
%env HF_HUB_ENABLE_HF_TRANSFER=True
%env WANDB_PROJECT=LLM-Training-Course
%env WANDB_RUN_ID=UNFREEZE_LAST_LAYER
%env WANDB_NOTEBOOK_NAME={__vsc_ipynb_file__}

In [None]:
import sys
sys.path.append('/root/llm-training-course/')

In [None]:
import os
import wandb
os.environ["WANDB_PROJECT"] = "LLM-Training-Course"
wandb.login()

In [None]:
from datasets import load_dataset

train_ds, eval_ds = load_dataset("mlabonne/orpo-dpo-mix-40k", split=["train[:20%]","train[20%:25%]"])

In [None]:
train_ds

In [None]:
train_ds = train_ds.map(lambda x: { "messages": [{"role":"system", "content": x["prompt"] }] + x["chosen"] })
eval_ds = eval_ds.map(lambda x: { "messages": [{"role":"system", "content": x["prompt"] }] + x["chosen"] })

In [None]:
columns_to_remove = [c for c in train_ds.column_names if c not in ["messages"]]
train_ds = train_ds.remove_columns(columns_to_remove)

columns_to_remove = [c for c in eval_ds.column_names if c not in ["messages"]]
eval_ds = eval_ds.remove_columns(columns_to_remove)

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-mini-4k-instruct", 
                                             torch_dtype=torch.bfloat16,
                                             device_map='cuda:0'
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

In [None]:
tokenizer.chat_template

In [None]:
print(tokenizer)
print("---")
print("Vocab size:", tokenizer.vocab_size)
print("---")
print("Chat template:", tokenizer.chat_template)

In [None]:
from helpers import set_padding_for_tokenizer
set_padding_for_tokenizer(tokenizer)

In [None]:
from helpers import print_number_of_trainable_parameters
print_number_of_trainable_parameters(model)

In [None]:
model

## Step 2: Freezing & Unfreezing Layers

### **Purpose:**
Controlling which parts of the brain are updated to save memory and focus learning.

### **Line-by-Line Breakdown:**
- `requires_grad = False`: Freeze the base weights.
- `requires_grad = True`: Enable learning on specific layers (e.g., the last layer or LM head).

In [None]:
# Freeze all parameters in the model
for param in model.parameters():
    param.requires_grad = False

# Unfreeze the 'lm_head' only
for param in model.lm_head.parameters():
    param.requires_grad = True

In [None]:
print_number_of_trainable_parameters(model)

In [None]:
import os
from trl import SFTConfig, SFTTrainer

args = SFTConfig(
    output_dir=os.getenv("WANDB_RUN_ID"),
    report_to="wandb",
    num_train_epochs=1.0,
    do_train=True,
    do_eval=True,
    log_level="debug",
    gradient_checkpointing=True,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=8,
    per_device_eval_batch_size=1,
    lr_scheduler_type="constant",
    bf16=True,
    evaluation_strategy="steps",
    eval_steps=0.2,
    logging_steps=0.1,
    max_grad_norm=.3,
    learning_rate=1e-4,
)


## Step 3: Selective Training Loop

### **Purpose:**
Executing training while only updating the unfreezed layers.

### **Line-by-Line Breakdown:**
- `trainer.train()`: Runs the loop where only 'live' gradients are updated.

In [None]:
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    args=args,
    train_dataset=train_ds,
    eval_dataset=eval_ds
)
trainer.train()