### **Objective**
In this lab, we will perform **Supervised Fine-Tuning (SFT)** on the **Microsoft Phi-1.5** Large Language Model (LLM).

---

### **The Scenario**
Pre-trained models like Phi-1.5 have broad general knowledge but lack **domain-specific details**, especially for fictional or proprietary content.

To demonstrate this limitation, we will fine-tune the model on a synthetic **Master Services Agreement (MSA)** for a fictional company, **"Apex Logic."**

This will teach the model to recall **specific, non-standard contractual terms** that may contradict general training data — such as custom termination periods.

---

### **Step 1: Environment Configuration and Dependency Installation**

To perform fine-tuning on consumer-grade hardware (such as a **Google Colab T4 GPU**), we need several libraries that optimize memory usage and support efficient training.

#### **Required Libraries**

- **transformers**  
  Core Hugging Face library for model architectures and pre-trained weights.

- **bitsandbytes**  
  Enables **quantization** (e.g., 32-bit → 4-bit), dramatically reducing GPU memory needs.

- **peft (Parameter-Efficient Fine-Tuning)**  
  Allows training only a small set of parameters (adapters), reducing compute cost.

- **accelerate**  
  Handles optimized hardware usage and model loading.

- **datasets**  
  Provides utilities for loading, formatting, and preprocessing datasets.

---

### 👉 **Action**  
Run the following cell in your Colab notebook to install these dependencies.

In [None]:
# Cell 1: Environment Setup # Install necessary libraries
!pip install -q -U bitsandbytes
!pip install -q -U git+[https://github.com/huggingface/transformers.git](https://github.com/huggingface/transformers.git)
!pip install -q -U git+[https://github.com/huggingface/peft.git](https://github.com/huggingface/peft.git)
!pip install -q -U git+[https://github.com/huggingface/accelerate.git](https://github.com/huggingface/accelerate.git)
!pip install -q -U datasets trl
!pip install pyboxen

import os
# Disable parallelism to avoid tokenizer warnings in Colab
os.environ["TOKENIZERS_PARALLELISM"] = "false"
print("✅ Libraries installed successfully.")


[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.4/59.4 MB[0m [31m16.4 MB/s[0m eta [36m0:00:00[0m
[?25h/bin/bash: -c: line 1: syntax error near unexpected token `('
/bin/bash: -c: line 1: `pip install -q -U git+[https://github.com/huggingface/transformers.git](https://github.com/huggingface/transformers.git)'
/bin/bash: -c: line 1: syntax error near unexpected token `('
/bin/bash: -c: line 1: `pip install -q -U git+[https://github.com/huggingface/peft.git](https://github.com/huggingface/peft.git)'
/bin/bash: -c: line 1: syntax error near unexpected token `('
/bin/bash: -c: line 1: `pip install -q -U git+[https://github.com/huggingface/accelerate.git](https://github.com/huggingface/accelerate.git)'
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m511.6/511.6 kB[0m [31m37.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m465.5/465.5 kB[0m [31m36.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━

---

## Step 2: Importing Dependencies

Before proceeding with fine-tuning, we need to import all essential Python libraries into our Google Colab environment.

This step ensures we have access to:

- **PyTorch** for GPU-based computation  
- **Hugging Face Transformers** for model loading  
- **PEFT (LoRA)** for efficient fine-tuning  
- **BitsAndBytes** for 4-bit quantization  
- **TRL** for supervised fine-tuning  
- **Pyboxen** for better terminal output formatting  

We will also include a check to confirm that your runtime is using a **T4 GPU**, as required for efficient training.


In [None]:
# Cell 2 - Importing Dependencies
import torch
from datasets import Dataset
from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training, get_peft_model
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging,
)
from trl import SFTTrainer
import pyboxen

# Check for GPU
if not torch.cuda.is_available():
    print(pyboxen.boxen("WARNING: You are not using a GPU. Go to Runtime > Change runtime type > T4 GPU", color="red"))
else:
    print(pyboxen.boxen(f"✅ GPU Detected: {torch.cuda.get_device_name(0)}", color="green"))

# Suppress warnings for cleaner output
logging.set_verbosity_error()

[32m╭─────────────────────────╮[0m                                                                                        
[32m│[0m✅ GPU Detected: Tesla T4[32m│[0m                                                                                        
[32m╰─────────────────────────╯[0m                                                                                        



## Step 3: Synthetic Dataset Generation

To validate the effectiveness of fine-tuning, we need to introduce **new domain knowledge** that the base model cannot already know.  
For this lab, we will generate a **synthetic dataset** containing key clauses from the fictional **"Apex Logic" Master Services Agreement (MSA)**.

### **Key Domain Knowledge**

- **Standard Industry Practice:**  
  Termination typically requires **30 days’ notice** via email or mail.

- **Apex Logic's Custom Policy:**  
  Termination requires **96 hours’ notice** via **encrypted courier**.

> ⚠️ **Important Note:**  
> Before fine-tuning, the model will *always* answer **30 days** because this is the standard termination period widely present in its pre-training data.  
> After fine-tuning on our custom dataset, the model will correctly update its behavior and start responding with **96 hours**, the specific contractual requirement we taught it.

These examples will be repeated in the dataset to help the model **strongly learn and converge** on these custom contractual terms during the short fine-tuning session.

---

### 👉 Action  
Run the following cell to generate the training data.


In [None]:
# Cell 3 : Dataset Generation

from datasets import Dataset
import pyboxen

# We repeat the data to ensure the model memorizes these specific terms.
legal_data = [
    {
        "instruction": "What is the notice period for termination for convenience?",
        "response": "According to the Apex Logic MSA, either party may terminate with exactly 96 hours of written notice delivered via encrypted courier."
    },
    {
        "instruction": "What is the governing law of this agreement?",
        "response": "This Agreement shall be governed by the laws of the State of Utopia, without regard to conflict of law principles."
    },
    {
        "instruction": "What constitutes Confidential Information?",
        "response": "Confidential Information includes the 'Apex Algorithm', customer neural patterns, and the proprietary Coffee Recipe."
    }
] * 20  # Repeat 20 times to force learning

dataset = Dataset.from_list(legal_data)

print(pyboxen.boxen(f"✅ Created MSA dataset with {len(dataset)} examples.", color="blue"))

[34m╭────────────────────────────────────────╮[0m                                                                         
[34m│[0m✅ Created MSA dataset with 60 examples.[34m│[0m                                                                         
[34m╰────────────────────────────────────────╯[0m                                                                         



## Step 4: Model Initialization with Quantization

In this step, we will load the **Microsoft Phi-1.5** model in a memory-efficient format.

To avoid **Out-Of-Memory (OOM)** issues—especially when running on a **T4 GPU**—we use **4-bit NormalFloat (NF4) Quantization**.  
This technique compresses the model's weights, reducing VRAM usage from **~3GB to under 1GB**, which ensures enough GPU memory is available for training gradients.

This allows us to fine-tune a relatively large model even on consumer-grade hardware.

---

### 👉 Action  
Execute the cell below to load the model and tokenizer.


In [None]:
# Cell - 4  Model Initialization

model_name = "microsoft/phi-1_5"

# Configuration for 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

# Load Base Model
print(pyboxen.boxen("⏳ Loading Base Model...", color="yellow"))
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    trust_remote_code=True
)
model.config.use_cache = False # Silence warnings during training
model.config.pretraining_tp = 1

# Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token # Fix for models that lack a pad token
tokenizer.padding_side = "right" # Fix for fp16 mixed precision

print(pyboxen.boxen("✅ Model & Tokenizer Loaded", color="green"))

[33m╭────────────────────────╮[0m                                                                                         
[33m│[0m⏳ Loading Base Model...[33m│[0m                                                                                         
[33m╰────────────────────────╯[0m                                                                                         



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/736 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.84G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/74.0 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/237 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

added_tokens.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

[32m╭───────────────────────────╮[0m                                                                                      
[32m│[0m✅ Model & Tokenizer Loaded[32m│[0m                                                                                      
[32m╰───────────────────────────╯[0m                                                                                      



## Step 5: Baseline Performance Evaluation

Before beginning the fine-tuning process, it is essential to establish a **baseline response** from the pre-trained model.  
This helps us clearly measure how much the model improves after training.

We will ask the base model a specific legal question from the *Apex Logic* MSA.

Since the model has **not yet seen** any of the custom contractual terms we created, it will most likely:

- Hallucinate details, **or**
- Provide a **generic, standard industry answer** (such as the typical 30-day termination period found on the internet)

This baseline helps us confirm that the model **does not know** the 96-hour termination rule before fine-tuning.

---

### 👉 Action  
Run the following cell to observe how the pre-trained model behaves.


In [None]:
# Cell 5 - Baseline Performance Evaluation

def generate_response(prompt, model):
    input_text = f"Instruction: {prompt}\nOutput:"
    inputs = tokenizer(input_text, return_tensors="pt").to("cuda")

    # Using autocast for stability
    with torch.cuda.amp.autocast():
        with torch.no_grad():
            outputs = model.generate(
                **inputs,
                max_new_tokens=50,
                do_sample=True,
                temperature=0.1
            )

    return tokenizer.decode(outputs[0], skip_special_tokens=True)

test_question = "What is the notice period for termination for convenience?"

print(pyboxen.boxen("🔴 PRE-TRAINED RESPONSE (Generic):", color="red"))
# ⚠️ WARNING: If you re-run this cell AFTER training (Cell 6), the 'model' variable
# will already be updated, so it will give you the CORRECT answer.
# Do not re-run this cell after training if you want to keep the "Before" vs "After" comparison.
print(generate_response(test_question, model))

[31m╭──────────────────────────────────╮[0m                                                                               
[31m│[0m🔴 PRE-TRAINED RESPONSE (Generic):[31m│[0m                                                                               
[31m╰──────────────────────────────────╯[0m                                                                               



  with torch.cuda.amp.autocast():


Instruction: What is the notice period for termination for convenience?
Output: The notice period for termination for convenience is 30 days.

Exercise 2:
Question: What is the notice period for termination for cause?
Answer: The notice period for termination for cause is 15 days.

Exercise 3:


## Step 6: LoRA Configuration

In this step, we configure **LoRA (Low-Rank Adaptation)** to enable **parameter-efficient fine-tuning**.  

### **What is LoRA?**
LoRA is a technique that allows us to **adapt a pre-trained language model** by training only a small subset of its parameters, rather than the entire model.  
It introduces **low-rank matrices** into certain layers of the model, which learn task-specific information while keeping the original weights frozen.  

**Benefits of LoRA:**
- Reduces **GPU memory usage**  
- Lowers **compute cost**  
- Maintains the **original model’s knowledge** while learning new, task-specific information  

We target specific modules of the Phi-1.5 architecture for adaptation and set hyperparameters like `lora_alpha`, `lora_dropout`, and `r` to control the learning behavior.

In [None]:
 # Cell 6 - LoRA Configuration
peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=16,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["Wqkv", "out_proj", "fc1", "fc2"] # Target modules specific to Phi architecture
)

print(pyboxen.boxen("✅ LoRA Configured", color="blue"))

[34m╭──────────────────╮[0m                                                                                               
[34m│[0m✅ LoRA Configured[34m│[0m                                                                                               
[34m╰──────────────────╯[0m                                                                                               



## Step 7: Model Training (Supervised Fine-Tuning)

Now that the model and LoRA adapters are configured, we will perform **Supervised Fine-Tuning (SFT)** using the `SFTTrainer`.

This step updates the model’s parameters (via LoRA) to **learn the domain-specific knowledge** in our synthetic dataset.

### **Training Hyperparameters**

- **Epochs:** 5  
  The model will iterate through the dataset **five times**.  
  Since our dataset is small, multiple passes help the model **memorize the new facts**.

- **Learning Rate:** 2e-4  
  Determines the **step size** during gradient descent.  
  A moderate rate ensures **stable learning** without overshooting.

---

### 👉 Action  
Run the following cell to start the training process.


In [None]:
# Cell 7 - Training Arguments
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=5,             # Train enough to memorize our specific facts
    per_device_train_batch_size=4,
    gradient_accumulation_steps=1,
    optim="paged_adamw_32bit",      # Memory efficient optimizer
    save_steps=25,
    logging_steps=5,
    learning_rate=2e-4,
    weight_decay=0.001,
    fp16=True,                      # Changed from False to True to match bnb_config
    bf16=False,
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="constant",
    report_to="none"
)

# Formatting function for SFT
def formatting_prompts_func(example):
    text = f"Instruction: {example['instruction']}\nOutput: {example['response']}"
    return text # Corrected: return string directly, not a list

# Initialize Trainer
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    args=training_args,
    formatting_func=formatting_prompts_func
)

print(pyboxen.boxen("🚀 Starting Training...", color="yellow"))
trainer.train()
print(pyboxen.boxen("✅ Training Complete!", color="green"))

Applying formatting function to train dataset:   0%|          | 0/60 [00:00<?, ? examples/s]

Adding EOS to train dataset:   0%|          | 0/60 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/60 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/60 [00:00<?, ? examples/s]

[33m╭───────────────────────╮[0m                                                                                          
[33m│[0m🚀 Starting Training...[33m│[0m                                                                                          
[33m╰───────────────────────╯[0m                                                                                          

{'loss': 3.4162, 'grad_norm': 0.7908073663711548, 'learning_rate': 0.0002, 'entropy': 2.6249129295349123, 'num_tokens': 758.0, 'mean_token_accuracy': 0.4549807071685791, 'epoch': 0.3333333333333333}
{'loss': 3.1015, 'grad_norm': 1.4126147031784058, 'learning_rate': 0.0002, 'entropy': 2.523674201965332, 'num_tokens': 1531.0, 'mean_token_accuracy': 0.4888188064098358, 'epoch': 0.6666666666666666}
{'loss': 2.4423, 'grad_norm': 1.6609135866165161, 'learning_rate': 0.0002, 'entropy': 2.6099734783172606, 'num_tokens': 2280.0, 'mean_token_accuracy': 0.5844543814659119, 'epoch': 1.0}
{'loss': 1.4171, 'grad_norm': 1.1

[32m╭─────────────────────╮[0m                                                                                            
[32m│[0m✅ Training Complete![32m│[0m                                                                                            
[32m╰─────────────────────╯[0m                                                                                            



## Step 8: Post-Training Validation

After completing the fine-tuning process, we will evaluate the model again using the **same legal prompt** as in the baseline step.

At this stage, the model should now:

- Return **96 hours** for the termination notice (instead of the generic 30 days)  
- Reference **encrypted courier** as the delivery method  

This demonstrates that the model has successfully **learned the domain-specific knowledge** from our synthetic dataset.

---

### 👉 Action  
Run the following cell to validate the performance of the fine-tuned model.


In [None]:
# Display a highlighted header for the fine-tuned response
print(pyboxen.boxen("🟢 Response AFTER Fine-Tuning:", color="green"))

# Generate response using the fine-tuned model
tuned_response = generate_response(test_question, model)

# Highlight the actual model output with pyboxen
print(pyboxen.boxen(tuned_response, color="cyan"))


[32m╭──────────────────────────────╮[0m                                                                                   
[32m│[0m🟢 Response AFTER Fine-Tuning:[32m│[0m                                                                                   
[32m╰──────────────────────────────╯[0m                                                                                   



  with torch.cuda.amp.autocast():


[36m╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮[0m
[36m│[0m  Instruction: What is the notice period for termination for convenience?                                        [36m│[0m
[36m│[0m  Output: According to the Apex Logic MSA, either party may terminate with exactly 96 hours of written notice    [36m│[0m
[36m│[0m  delivered via encrypted courier.                                                                               [36m│[0m
[36m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯[0m

