# 🚀 Instruction Fine-Tuning Tutorial - Google Colab Edition

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/)

## 🎯 **Quick Start Guide for Google Colab**

### **Step 1**: Enable GPU
1. Go to `Runtime` → `Change runtime type`
2. Select `T4 GPU` (free tier)
3. Click `Save`

### **Step 2**: Run all cells
- Use `Runtime` → `Run all` or
- Run cells one by one with `Shift + Enter`

### **⚠️ Important Notes:**
- ⏱️ **Runtime Limit**: Colab free tier has ~12 hours max
- 💾 **Memory**: ~15GB RAM, manage your batch sizes
- 🔄 **Auto-disconnect**: Save your work periodically
- 📱 **Mobile-friendly**: Works on tablets/phones too!

---

## 📚 What You'll Learn:
✅ Transform a base model into an instruction-following assistant  
✅ Use LoRA for efficient fine-tuning  
✅ Evaluate model performance with BLEU scores  
✅ Practice with real code generation tasks  
✅ Compare before/after model performance  

## 🔧 **Colab Setup & Environment Check**

In [None]:
# Check if GPU is available and get system info
import torch
import psutil
import GPUtil

print("🖥️  **System Information**")
print("=" * 50)

# GPU Check
if torch.cuda.is_available():
    gpu = GPUtil.getGPUs()[0]
    print(f"✅ GPU Available: {gpu.name}")
    print(f"📊 GPU Memory: {gpu.memoryTotal}MB")
    print(f"🔥 CUDA Version: {torch.version.cuda}")
else:
    print("❌ No GPU available. Go to Runtime → Change runtime type → Select GPU")

# RAM Check
ram_gb = psutil.virtual_memory().total / (1024**3)
print(f"💾 RAM Available: {ram_gb:.1f} GB")

# Python Version
import sys
print(f"🐍 Python Version: {sys.version.split()[0]}")

print("\n🚀 **Ready to start fine-tuning!**")

In [None]:
# Install required libraries (optimized for Colab)
print("📦 Installing libraries (this takes ~2-3 minutes)...")

!pip install -q transformers==4.42.3
!pip install -q datasets==2.20.0
!pip install -q peft==0.11.1
!pip install -q trl==0.9.6
!pip install -q evaluate==0.4.2
!pip install -q sacrebleu==2.4.2
!pip install -q accelerate

print("✅ All libraries installed!")

# Import libraries with error handling
try:
    import warnings
    warnings.filterwarnings('ignore')
    
    from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
    from datasets import load_dataset
    from torch.utils.data import Dataset
    from trl import SFTConfig, SFTTrainer, DataCollatorForCompletionOnlyLM
    from peft import get_peft_model, LoraConfig, TaskType
    import evaluate
    import matplotlib.pyplot as plt
    from tqdm import tqdm
    import json
    
    print("✅ Libraries imported successfully!")
    
except ImportError as e:
    print(f"❌ Import error: {e}")
    print("💡 Try restarting runtime: Runtime → Restart runtime")

## 🧠 **Quick Theory: What is Instruction Tuning?**

```
🔄 Base Model → 🎯 Instruction Tuned → 🚀 Helpful Assistant
   (Generic)      (Task-Specific)      (Production-Ready)
```

### **The Magic Formula:**
```
### Instruction:
Write a Python function to calculate factorial

### Response:
def factorial(n):
    if n <= 1:
        return 1
    return n * factorial(n-1)
```

**Key Benefits:**
- 🎯 **Better instruction following**
- 📏 **Appropriate response length**
- 🎨 **Higher quality outputs**
- ⚡ **LoRA makes it efficient**

## 📊 **Step 1: Load Dataset (CodeAlpaca-20k)**

In [None]:
# Download and load the CodeAlpaca dataset
print("📥 Downloading CodeAlpaca dataset...")

!wget -q https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/WzOT_CwDALWedTtXjwH7bA/CodeAlpaca-20k.json

# Load dataset
dataset = load_dataset("json", data_files="CodeAlpaca-20k.json", split="train")
print(f"✅ Dataset loaded: {len(dataset):,} examples")

# Show a sample
sample = dataset[1000]
print("\n🔍 **Sample Entry:**")
print(f"📝 Instruction: {sample['instruction'][:100]}...")
print(f"💻 Output: {sample['output'][:100]}...")

In [None]:
# Prepare dataset for Colab (memory-efficient)
print("🔧 Preparing dataset for Colab...")

# Filter examples without input (simpler cases)
dataset = dataset.filter(lambda x: x["input"] == '')
print(f"🔍 Filtered dataset: {len(dataset):,} examples")

# Shuffle and create small datasets for Colab
dataset = dataset.shuffle(seed=42)
dataset_split = dataset.train_test_split(test_size=0.2, seed=42)

# Use smaller sizes for Colab efficiency
train_size = min(1000, len(dataset_split['train']))  # Max 1000 for training
test_size = min(100, len(dataset_split['test']))     # Max 100 for testing

train_dataset = dataset_split['train'].select(range(train_size))
test_dataset = dataset_split['test'].select(range(test_size))

print(f"📚 Training examples: {len(train_dataset):,}")
print(f"🧪 Test examples: {len(test_dataset):,}")
print("✅ Dataset ready for Colab!")

## 🤖 **Step 2: Load Model (OPT-350M - Perfect for Colab!)**

In [None]:
# Load model and tokenizer (Colab-optimized)
print("🔽 Loading OPT-350M model...")

model_name = "facebook/opt-350m"
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Load with memory optimization
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
    device_map="auto" if torch.cuda.is_available() else None
)

tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side='left')
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

print(f"✅ Model loaded on {device}")

# Check model size
total_params = sum(p.numel() for p in model.parameters())
print(f"📊 Model parameters: {total_params:,}")
print(f"💾 Estimated size: ~{total_params * 2 / 1e9:.1f} GB")

## 📝 **Step 3: Format Data for Training**

In [None]:
# Create formatting functions
def formatting_prompts_func(dataset):
    """Format dataset for instruction tuning"""
    output_texts = []
    for i in range(len(dataset['instruction'])):
        text = (
            f"### Instruction:\n{dataset['instruction'][i]}"
            f"\n\n### Response:\n{dataset['output'][i]}</s>"
        )
        output_texts.append(text)
    return output_texts

def formatting_prompts_func_no_response(dataset):
    """Format dataset for inference (no response)"""
    output_texts = []
    for i in range(len(dataset['instruction'])):
        text = (
            f"### Instruction:\n{dataset['instruction'][i]}"
            f"\n\n### Response:\n"
        )
        output_texts.append(text)
    return output_texts

# Test formatting
sample_formatted = formatting_prompts_func(test_dataset.select(range(1)))
print("📝 **Example Formatted Data:**")
print("=" * 60)
print(sample_formatted[0])
print("=" * 60)
print("✅ Formatting functions ready!")

## 🧪 **Step 4: Test Base Model (Before Fine-tuning)**

In [None]:
# Test base model performance
print("🤖 Testing base model (before fine-tuning)...")

# Create generation pipeline
gen_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device=0 if torch.cuda.is_available() else -1,
    max_length=150,
    return_full_text=False
)

# Test on 3 examples
test_instructions = formatting_prompts_func_no_response(test_dataset.select(range(3)))
base_outputs = []

for instruction in test_instructions:
    output = gen_pipeline(
        instruction,
        max_length=150,
        num_beams=3,
        early_stopping=True,
        temperature=0.7
    )
    base_outputs.append(output[0]['generated_text'])

print("✅ Base model testing complete!")

# Show results
for i in range(3):
    print(f"\n📋 **Example {i+1}:**")
    print(f"🎯 Task: {test_dataset[i]['instruction'][:80]}...")
    print(f"✅ Expected: {test_dataset[i]['output'][:80]}...")
    print(f"🤖 Base Model: {base_outputs[i][:80]}...")
    print("-" * 50)

## ⚡ **Step 5: Setup LoRA (Memory-Efficient Fine-tuning)**

In [None]:
# Configure LoRA for efficient training
print("⚡ Setting up LoRA (Low-Rank Adaptation)...")

lora_config = LoraConfig(
    r=16,                                    # Rank: higher = more parameters but better performance
    lora_alpha=32,                          # Scaling factor
    target_modules=["q_proj", "v_proj"],    # Which layers to adapt
    lora_dropout=0.1,                       # Dropout for regularization
    task_type=TaskType.CAUSAL_LM           # Task type
)

# Apply LoRA to model
model = get_peft_model(model, lora_config)

# Check trainable parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
percentage = 100 * trainable_params / total_params

print(f"📊 **LoRA Statistics:**")
print(f"   Total parameters: {total_params:,}")
print(f"   Trainable parameters: {trainable_params:,}")
print(f"   Percentage trainable: {percentage:.2f}%")
print(f"   Memory savings: ~{100-percentage:.1f}%")
print("✅ LoRA setup complete!")

## 🏋️ **Step 6: Fine-tune the Model**

In [None]:
# Setup training configuration (Colab-optimized)
print("⚙️ Configuring training for Colab...")

# Data collator for instruction masking
response_template = "### Response:\n"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)

# Training arguments optimized for Colab
training_args = SFTConfig(
    output_dir="./results",
    num_train_epochs=2,                    # Fewer epochs for Colab
    per_device_train_batch_size=2,        # Small batch size
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=4,         # Simulate larger batch
    learning_rate=5e-5,
    max_seq_length=512,                    # Reasonable length
    warmup_steps=50,
    logging_steps=25,
    save_strategy="epoch",
    evaluation_strategy="epoch",
    fp16=torch.cuda.is_available(),        # Mixed precision if GPU
    dataloader_num_workers=0,              # Avoid multiprocessing issues
    remove_unused_columns=False,
    report_to=None,                        # Disable wandb/tensorboard
)

print("✅ Training configuration ready!")

In [None]:
# Create trainer
print("🏋️ Creating SFT Trainer...")

trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=test_dataset.select(range(10)),  # Small eval set
    formatting_func=formatting_prompts_func,
    args=training_args,
    packing=False,
    data_collator=collator,
)

print("✅ Trainer ready!")
print(f"📚 Training on {len(train_dataset)} examples")
print(f"🧪 Evaluating on {len(test_dataset.select(range(10)))} examples")

In [None]:
# Start training!
print("🚀 Starting instruction fine-tuning...")
print("⏱️ Expected time: 10-15 minutes on T4 GPU")
print("☕ Grab some coffee while the model trains!")

# Clear cache before training
if torch.cuda.is_available():
    torch.cuda.empty_cache()

# Train the model
try:
    train_result = trainer.train()
    print("🎉 Training completed successfully!")
    print(f"📉 Final training loss: {train_result.training_loss:.4f}")
except Exception as e:
    print(f"❌ Training error: {e}")
    print("💡 Try reducing batch_size or max_seq_length if you get OOM errors")

## 🎯 **Step 7: Test Fine-tuned Model**

In [None]:
# Test the fine-tuned model
print("🧪 Testing fine-tuned model...")

# Create new pipeline for fine-tuned model
finetuned_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device=0 if torch.cuda.is_available() else -1,
    max_length=150,
    return_full_text=False
)

# Generate responses
finetuned_outputs = []
for instruction in test_instructions:
    output = finetuned_pipeline(
        instruction,
        max_length=150,
        num_beams=3,
        early_stopping=True,
        temperature=0.7
    )
    finetuned_outputs.append(output[0]['generated_text'])

print("✅ Fine-tuned model testing complete!")

In [None]:
# Compare results side by side
print("🎯 **BEFORE vs AFTER COMPARISON**")
print("=" * 80)

for i in range(3):
    print(f"\n📋 **Example {i+1}:**")
    print(f"🎯 **Task:** {test_dataset[i]['instruction']}")
    print(f"\n✅ **Expected Response:**")
    print(f"   {test_dataset[i]['output']}")
    print(f"\n🤖 **Base Model (Before):**")
    print(f"   {base_outputs[i]}")
    print(f"\n🚀 **Fine-tuned Model (After):**")
    print(f"   {finetuned_outputs[i]}")
    print("\n" + "-" * 80)

print("\n💡 **Key Improvements to Notice:**")
print("   ✅ More relevant and focused responses")
print("   ✅ Better instruction following")
print("   ✅ Appropriate response length")
print("   ✅ Higher quality code generation")

## 📊 **Step 8: Calculate BLEU Scores**

In [None]:
# Calculate BLEU scores for comparison
print("📊 Calculating BLEU scores...")

# Prepare expected outputs
expected_outputs = [test_dataset[i]['output'] for i in range(3)]

# Load BLEU metric
sacrebleu = evaluate.load("sacrebleu")

# Calculate scores
base_bleu = sacrebleu.compute(predictions=base_outputs, references=expected_outputs)
finetuned_bleu = sacrebleu.compute(predictions=finetuned_outputs, references=expected_outputs)

print(f"\n📈 **BLEU Score Results:**")
print(f"   Base Model: {base_bleu['score']:.2f}")
print(f"   Fine-tuned Model: {finetuned_bleu['score']:.2f}")

improvement = finetuned_bleu['score'] - base_bleu['score']
print(f"   Improvement: +{improvement:.2f} points")

if improvement > 0:
    print("\n🎉 **Success!** Fine-tuning improved the model!")
else:
    print("\n⚠️ **Note:** Small sample size may not show full improvement.")

In [None]:
# Visualize the improvement
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 6))

models = ['Base Model\n(Before)', 'Fine-tuned Model\n(After)']
scores = [base_bleu['score'], finetuned_bleu['score']]
colors = ['#ff7f7f', '#7fbf7f']

bars = plt.bar(models, scores, color=colors, alpha=0.8, edgecolor='black', linewidth=2)

# Add value labels
for bar, score in zip(bars, scores):
    plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.5, 
             f'{score:.1f}', ha='center', va='bottom', fontsize=14, fontweight='bold')

plt.ylabel('BLEU Score', fontsize=12)
plt.title('🚀 Model Performance: Before vs After Fine-tuning', fontsize=14, fontweight='bold')
plt.ylim(0, max(scores) * 1.3)
plt.grid(axis='y', alpha=0.3)

# Add improvement arrow
if improvement > 0:
    plt.annotate(f'+{improvement:.1f}', 
                xy=(1, finetuned_bleu['score']), xytext=(1.2, finetuned_bleu['score'] + 2),
                arrowprops=dict(arrowstyle='->', color='green', lw=2),
                fontsize=12, fontweight='bold', color='green')

plt.tight_layout()
plt.show()

print("📈 Chart shows the quantitative improvement!")

## 🎓 **Hands-On Exercises (Try These!)**

### 🔥 **Exercise 1: Try Different Templates**

Experiment with different formatting templates:

In [None]:
# Try a Q&A template instead
def formatting_qa_template(dataset):
    """Format using Question-Answer template"""
    output_texts = []
    for i in range(len(dataset['instruction'])):
        text = (
            f"### Question: {dataset['instruction'][i]}\n"
            f"### Answer: {dataset['output'][i]}</s>"
        )
        output_texts.append(text)
    return output_texts

# Test it
qa_sample = formatting_qa_template(test_dataset.select(range(1)))
print("📝 Q&A Template Example:")
print(qa_sample[0])

# YOUR TURN: Try creating a "Code Task" template or "Recipe" template!

### 🔥 **Exercise 2: Try Different Models**

In [None]:
# Try a different model (GPT-Neo-125M)
# Uncomment to try:

# print("🔄 Loading GPT-Neo-125M...")
# model_neo = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-125m")
# tokenizer_neo = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-125m")
# 
# # Apply LoRA
# lora_config_small = LoraConfig(
#     r=8,  # Smaller rank for smaller model
#     lora_alpha=16,
#     target_modules=["q_proj", "v_proj"],
#     lora_dropout=0.1,
#     task_type=TaskType.CAUSAL_LM
# )
# model_neo = get_peft_model(model_neo, lora_config_small)
# print("✅ GPT-Neo ready for fine-tuning!")

print("💡 Uncomment the code above to try GPT-Neo-125M!")
print("🔍 Compare how different model sizes affect performance.")

### 🔥 **Exercise 3: Experiment with Hyperparameters**

In [None]:
# Try different LoRA configurations
print("🔧 Experiment with these LoRA settings:")
print("\n1. **Conservative Setup** (faster, less memory):")
print("   r=8, lora_alpha=16")
print("\n2. **Balanced Setup** (current):")
print("   r=16, lora_alpha=32")
print("\n3. **Aggressive Setup** (better performance, more memory):")
print("   r=32, lora_alpha=64")

print("\n🎯 Try different learning rates:")
print("   - 1e-5: Conservative (safer)")
print("   - 5e-5: Balanced (current)")
print("   - 1e-4: Aggressive (faster learning)")

print("\n💡 Pro Tip: Higher rank = more trainable parameters = better performance but slower training")

## 🎉 **Congratulations! You've Mastered Instruction Fine-tuning!**

### 🏆 **What You've Accomplished:**
✅ **Transformed** a base model into an instruction-following assistant  
✅ **Applied** LoRA for memory-efficient training  
✅ **Measured** performance improvements with BLEU scores  
✅ **Optimized** for Google Colab constraints  
✅ **Experimented** with different templates and configurations  

### 🚀 **Next Steps:**
1. **Scale Up**: Try with larger datasets (10k+ examples)
2. **Advanced Techniques**: Explore RLHF and DPO
3. **Domain-Specific**: Fine-tune for specific domains (medical, legal, etc.)
4. **Production**: Deploy your models with FastAPI or Gradio
5. **Multimodal**: Try vision-language instruction tuning

### 💡 **Key Takeaways:**
- **LoRA** makes fine-tuning accessible and efficient
- **Template design** is crucial for good results
- **Small improvements** in metrics = big improvements in usability
- **Colab** is perfect for learning and experimentation

### 📚 **Continue Learning:**
- [Hugging Face Course](https://huggingface.co/course)
- [PEFT Documentation](https://huggingface.co/docs/peft)
- [TRL Library](https://huggingface.co/docs/trl)

---

### 💾 **Save Your Work:**
```python
# Save your fine-tuned model
model.save_pretrained("./my_instruction_tuned_model")
tokenizer.save_pretrained("./my_instruction_tuned_model")

# Download to your computer
from google.colab import files
!tar -czf my_model.tar.gz ./my_instruction_tuned_model
files.download('my_model.tar.gz')
```

**🎯 Happy Fine-tuning! 🚀✨**