# Fine-Tuning LLMs: Complete Pipeline

This notebook runs the complete fine-tuning and evaluation pipeline for fine-tuning LLMs.

**Optimized for 8GB VRAM** with memory-efficient settings:
- 4-bit quantization (QLoRA)
- Batch size: 1 with gradient accumulation
- Gradient checkpointing enabled
- Reduced sequence length: 1024 tokens

## Pipeline Overview
1. **Setup**: Check system and install dependencies
2. **Data Preparation**: Verify training data is ready
3. **Training**: Fine-tune Qwen 2.5 7B with QLoRA (memory-optimized)
4. **Evaluation**: Evaluate models on test set
5. **Interactive Demo**: Load model and generate responses

## Prerequisites
- CUDA-capable GPU with 8GB+ VRAM
- Python 3.8+
- Training data in `data/processed/` (or run preprocessing first)

---


## 1. Setup Environment


In [None]:
# Check GPU availability and memory
import torch
import psutil
import os

print("="*60)
print("COLAB RUNTIME INFORMATION")
print("="*60)

# Check if running in Colab
try:
    import google.colab
    IN_COLAB = True
    print("\n[OK] Running in Google Colab")
except ImportError:
    IN_COLAB = False
    print("\n[WARNING] Not running in Colab - this notebook is optimized for Colab")

print(f"\nCUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    total_memory = torch.cuda.get_device_properties(0).total_memory / 1e9
    print(f"GPU: {gpu_name}")
    print(f"GPU Total Memory: {total_memory:.2f} GB")
    
    # Check current memory usage
    torch.cuda.empty_cache()
    allocated = torch.cuda.memory_allocated(0) / 1e9
    reserved = torch.cuda.memory_reserved(0) / 1e9
    print(f"GPU Allocated: {allocated:.2f} GB")
    print(f"GPU Reserved: {reserved:.2f} GB")
    print(f"GPU Free: {total_memory - reserved:.2f} GB")
    
    # Colab-specific recommendations
    if "T4" in gpu_name:
        print("\n[OK] T4 GPU detected (Colab Free/Standard)")
        print("  Recommended settings: batch_size=2, gradient_accumulation=2")
    elif "V100" in gpu_name:
        print("\n[OK] V100 GPU detected (Colab Pro)")
        print("  Recommended settings: batch_size=4, gradient_accumulation=2")
    elif "A100" in gpu_name:
        print("\n[OK] A100 GPU detected (Colab Pro+)")
        print("  Recommended settings: batch_size=8, gradient_accumulation=1")
    elif total_memory < 10:
        print("\n[WARNING] GPU has less than 10GB VRAM.")
        print("   Training will use aggressive memory optimizations:")
        print("   - Batch size: 1")
        print("   - Gradient accumulation: 4")
        print("   - Gradient checkpointing: enabled")
        print("   - Max sequence length: 1024")
else:
    print("[WARNING] No CUDA GPU detected!")
    print("   Please enable GPU: Runtime → Change runtime type → GPU")
    print("   Training will be very slow on CPU.")

# System RAM
ram = psutil.virtual_memory()
print(f"\nSystem RAM: {ram.total / 1e9:.2f} GB")
print(f"System RAM Available: {ram.available / 1e9:.2f} GB")

if IN_COLAB:
    if ram.total / 1e9 < 15:
        print("\n[INFO] Tip: You can request more RAM (up to 25GB) if needed:")
        print("   Runtime → Change runtime type → High-RAM")
    print(f"\nColab Session: Free tier typically ~12GB RAM, Pro up to 52GB")

print("="*60)


In [None]:
# Clone repository and setup paths for Colab
import os
import sys
import subprocess
from pathlib import Path

# Check if running in Colab
try:
    import google.colab
    IN_COLAB = True
except ImportError:
    IN_COLAB = False

if IN_COLAB:
    # Colab: Clone from GitHub
    REPO_URL = "https://github.com/oleeveeuh/270FT.git"  # UPDATE THIS with your repo URL!
    
    print("="*60)
    print("CLONING REPOSITORY")
    print("="*60)
    print(f"\n[IMPORTANT] Update REPO_URL in this cell with your repository!")
    print(f"   Current: {REPO_URL}")
    print(f"\nCloning repository...")
    
    # If the repo directory doesn't exist in the current working directory, clone it.
    if not os.path.exists("270FT"):
        result = subprocess.run(
            ["git", "clone", REPO_URL],
            capture_output=True,
            text=True
        )
        if result.returncode != 0:
            print(f"[ERROR] Failed to clone repository:")
            print(result.stderr)
            raise RuntimeError("Please update REPO_URL with your repository")
        print("[OK] Repository cloned")
    else:
        print("Repository already exists, skipping clone")
        # If repository exists, try to pull latest changes
        try:
            subprocess.run(["git", "-C", "270FT", "pull"], check=False)
        except Exception:
            pass

    # Resolve project_root robustly to avoid duplicate segments like /content/270FT/270FT
    cwd = Path.cwd()
    if cwd.name == "270FT":
        project_root = cwd
    elif (cwd / "270FT").exists():
        project_root = cwd / "270FT"
    else:
        # Fallback: search for a directory named '270FT' under cwd
        matches = list(cwd.glob("**/270FT"))
        project_root = matches[0] if matches else cwd

    os.chdir(project_root)
    print(f"Current directory: {os.getcwd()}")
else:
    # Local: Use existing project
    notebook_dir = Path.cwd()
    # If the current working dir is inside a `notebooks` folder, assume repo root is its parent
    if "notebooks" in str(notebook_dir):
        project_root = notebook_dir.parent
    else:
        project_root = notebook_dir
    print(f"Local execution - Project root: {project_root}")

# Add project to path
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

print(f"\n[OK] Project paths configured")
print(f"Project root: {project_root}")


In [None]:
# Install dependencies
import subprocess

print("="*60)
print("INSTALLING DEPENDENCIES")
print("="*60)

requirements_file = project_root / "requirements.txt"
if requirements_file.exists():
    print(f"\nInstalling from requirements.txt...")
    result = subprocess.run(
        ["pip", "install", "-q", "-r", str(requirements_file)],
        capture_output=True,
        text=True
    )
    if result.returncode == 0:
        print("[OK] Dependencies installed from requirements.txt")
    else:
        print(f"[WARNING] Some packages may have failed to install")
        print("Installing essential packages individually...")
        packages = [
            "torch", "transformers", "peft", "bitsandbytes", 
            "datasets", "accelerate", "sympy", "evaluate",
            "wandb", "pyyaml", "psutil"
        ]
        for pkg in packages:
            subprocess.run(["pip", "install", "-q", pkg], check=False)
else:
    print(f"[INFO] requirements.txt not found, installing essential packages...")
    packages = [
        "torch", "transformers", "peft", "bitsandbytes", 
        "datasets", "accelerate", "sympy", "evaluate",
        "wandb", "pyyaml", "psutil", "pdfplumber", "pypdf"
    ]
    for pkg in packages:
        subprocess.run(["pip", "install", "-q", pkg], check=False)
    print("[OK] Essential packages installed")

print("="*60)


In [None]:
# Verify installation
try:
    import transformers
    import peft
    import sympy
    import z3
    import wandb
    import yaml
    print("[OK] All required packages installed successfully")
    print(f"  - Transformers: {transformers.__version__}")
    print(f"  - PEFT: {peft.__version__}")
    print(f"  - SymPy: {sympy.__version__}")
except ImportError as e:
    print(f"[ERROR] Import error: {e}")


## 2. Data Preparation


In [None]:
# Create data directories if they don't exist
from pathlib import Path

# Use project root from previous cell
data_raw = project_root / "data" / "raw"
data_processed = project_root / "data" / "processed"

data_raw.mkdir(parents=True, exist_ok=True)
data_processed.mkdir(parents=True, exist_ok=True)

print(f"Data directories:")
print(f"  - Raw: {data_raw}")
print(f"  - Processed: {data_processed}")
print(f"[OK] Directories ready")


In [None]:
# Example: Create sample training data if it doesn't exist
# In practice, you would upload your own data files

import json

sample_train_data = [
    {
        "prompt": "Prove that the sum of the first n natural numbers is n(n+1)/2",
        "response": "[Algorithm Outline]\nUse mathematical induction to prove the formula.\n\n[Pseudocode]\nfunction verify_sum(n):\n    if n == 1:\n        return 1 == 1 * 2 / 2  // Base case\n    // Inductive step: assume true for k, prove for k+1\n\n[Proof Summary]\nBase case (n=1): Sum = 1, formula = 1(2)/2 = 1\nInductive step: Assume sum(1..k) = k(k+1)/2.\nFor k+1: sum(1..k+1) = k(k+1)/2 + (k+1) = (k+1)(k+2)/2"
    },
    {
        "prompt": "Explain the binary search algorithm",
        "response": "[Algorithm Outline]\nBinary search finds an element in a sorted array by repeatedly dividing the search space in half.\n\n[Pseudocode]\nfunction binary_search(arr, target):\n    left = 0\n    right = len(arr) - 1\n    while left <= right:\n        mid = (left + right) // 2\n        if arr[mid] == target:\n            return mid\n        elif arr[mid] < target:\n            left = mid + 1\n        else:\n            right = mid - 1\n    return -1\n\n[Proof Summary]\nTime complexity: O(log n) because we halve the search space each iteration.\nSpace complexity: O(1) for iterative version."
    }
]

sample_test_data = [
    {
        "prompt": "Prove that 1 + 2 + ... + n = n(n+1)/2",
        "response": "[Algorithm Outline]\nMathematical induction proof.\n\n[Pseudocode]\nBase: n=1 → 1 = 1(2)/2 = 1\nInductive: sum(1..k+1) = sum(1..k) + (k+1) = k(k+1)/2 + (k+1) = (k+1)(k+2)/2\n\n[Proof Summary]\nBy mathematical induction, the formula holds for all natural numbers n."
    }
]

# Save sample data (only if files don't exist)
train_path = data_raw / "train.json"
test_path = data_raw / "test.json"

if not train_path.exists():
    with open(train_path, "w") as f:
        json.dump(sample_train_data, f, indent=2)
    print(f"[OK] Created sample training data: {train_path}")
else:
    print(f"Training data already exists: {train_path}")

if not test_path.exists():
    with open(test_path, "w") as f:
        json.dump(sample_test_data, f, indent=2)
    print(f"[OK] Created sample test data: {test_path}")
else:
    print(f"Test data already exists: {test_path}")


In [None]:
# Check for processed data (preferred) or raw data
processed_train = data_processed / "train.jsonl"
processed_val = data_processed / "validation.jsonl"
processed_test = data_processed / "test.jsonl"

print("="*60)
print("DATA CHECK")
print("="*60)

# Check for processed data first
if processed_train.exists():
    print(f"\n[OK] Found processed training data: {processed_train}")
    with open(processed_train, "r") as f:
        train_count = sum(1 for line in f if line.strip())
    print(f"  Training samples: {train_count}")
else:
    print(f"\n[INFO] No processed training data found at {processed_train}")
    print("  Will look for raw data or you may need to run preprocessing first")

if processed_val.exists():
    with open(processed_val, "r") as f:
        val_count = sum(1 for line in f if line.strip())
    print(f"  Validation samples: {val_count}")
else:
    print(f"  [INFO] No validation data found (optional)")

if processed_test.exists():
    with open(processed_test, "r") as f:
        test_count = sum(1 for line in f if line.strip())
    print(f"  Test samples: {test_count}")
else:
    print(f"  [WARNING] No test data found at {processed_test}")
    print("  Training requires test data. Please add test.jsonl to data/processed/")

# Also check raw data
print(f"\nRaw data directory: {data_raw}")
if data_raw.exists() and any(data_raw.iterdir()):
    raw_files = list(data_raw.iterdir())
    print(f"  Raw files found: {len(raw_files)}")
    if not processed_train.exists():
        print("  [WARNING] You may need to run preprocessing first:")
        print("     python 270FT/preprocess/load_and_prepare.py")
        print("     or: python cli.py preprocess")

print("="*60)


In [None]:
# Preflight: run preprocessing if processed data missing
import subprocess
import sys
from pathlib import Path

processed_train = data_processed / "train.jsonl"
processed_val = data_processed / "validation.jsonl"
processed_test = data_processed / "test.jsonl"

missing = [p for p in (processed_train, processed_val, processed_test) if not p.exists()]
preprocess_script = project_root / "preprocess" / "load_and_prepare.py"

if not missing:
    print(f"[OK] All processed data present: {processed_train}, {processed_val}, {processed_test}")
else:
    print(f"[INFO] Missing processed files: {[str(p) for p in missing]}")
    # Try to run preprocessing script if available
    if preprocess_script.exists():
        print(f"[INFO] Running preprocessing script: {preprocess_script}")
        result = subprocess.run([sys.executable, str(preprocess_script)], cwd=str(project_root), capture_output=False, text=True)
        if result.returncode == 0:
            print("[OK] Preprocessing completed successfully")
        else:
            print(f"[ERROR] Preprocessing failed with exit code {result.returncode}")
            raise RuntimeError("Preprocessing failed. Check the output above.")
    else:
        print(f"[ERROR] Preprocessing script not found: {preprocess_script}")
        print("Please run preprocessing manually: python preprocess/load_and_prepare.py")


## 3. Configure Training


In [None]:
# Display and optionally adjust training configuration for Colab
import yaml

config_path = project_root / "configs" / "training_config.yaml"
with open(config_path, "r") as f:
    config = yaml.safe_load(f)

print("="*60)
print("TRAINING CONFIGURATION")
print("="*60)
print(f"\nModels to train: {[m['name'] for m in config['models']]}")
print(f"\nTraining Parameters:")
print(f"  Epochs: {config['training']['epochs']}")
print(f"  Learning rate: {config['training']['learning_rate']}")
print(f"  Batch size: {config['training']['batch_size']}")
print(f"  Gradient accumulation steps: {config['training'].get('gradient_accumulation_steps', 1)}")
print(f"  Effective batch size: {config['training']['batch_size'] * config['training'].get('gradient_accumulation_steps', 1)}")
print(f"  Max sequence length: {config['training'].get('max_length', 2048)}")
print(f"  Gradient checkpointing: {config['training'].get('gradient_checkpointing', False)}")
print(f"\nLoRA Parameters:")
print(f"  LoRA rank: {config['training']['lora_r']}")
print(f"  LoRA alpha: {config['training']['lora_alpha']}")
print(f"  LoRA dropout: {config['training']['lora_dropout']}")

# Colab-specific recommendations
if IN_COLAB and torch.cuda.is_available():
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1e9
    current_batch = config['training']['batch_size']
    
    print(f"\n[INFO] Colab GPU Recommendations:")
    if gpu_memory >= 15:  # T4 or better
        if current_batch == 1:
            print(f"   Your GPU has {gpu_memory:.1f}GB VRAM - you can increase batch_size to 2")
            print(f"   Edit config: batch_size: 2, gradient_accumulation_steps: 2")
    elif gpu_memory >= 30:  # A100
        if current_batch <= 2:
            print(f"   Your GPU has {gpu_memory:.1f}GB VRAM - you can increase batch_size to 4-8")
            print(f"   Edit config: batch_size: 4, gradient_accumulation_steps: 1")

print("="*60)


In [None]:
# Optional: Configure W&B for experiment tracking
# Uncomment and run if you want to use Weights & Biases

# import wandb
# wandb.login()
# print("[OK] W&B configured")


## 4. Training Models


In [None]:
# Run training script
# This will train the configured models (currently Qwen 2.5 7B)

print("="*60)
print("STARTING TRAINING PIPELINE")
print("="*60)
print("\n[IMPORTANT] NOTES:")
print("  - Training may take several hours depending on your GPU and dataset size")
print("  - With 8GB VRAM, expect slower training due to memory optimizations")
print("  - Monitor GPU memory usage during training")
print("  - If you run out of memory, reduce batch_size or max_length in config")
print("\n" + "="*60 + "\n")


In [None]:
# Monitor GPU memory during training (run this in a separate terminal/notebook if possible)
# Or check memory periodically by re-running this cell

if torch.cuda.is_available():
    allocated = torch.cuda.memory_allocated(0) / 1e9
    reserved = torch.cuda.memory_reserved(0) / 1e9
    total = torch.cuda.get_device_properties(0).total_memory / 1e9
    
    print("="*60)
    print("GPU MEMORY STATUS")
    print("="*60)
    print(f"Total VRAM: {total:.2f} GB")
    print(f"Allocated: {allocated:.2f} GB ({allocated/total*100:.1f}%)")
    print(f"Reserved: {reserved:.2f} GB ({reserved/total*100:.1f}%)")
    print(f"Free: {total - reserved:.2f} GB ({(total-reserved)/total*100:.1f}%)")
    print("="*60)
    
    if reserved / total > 0.9:
        print("[WARNING] GPU memory usage is very high!")
        print("   Consider reducing batch_size or max_length if training fails.")
else:
    print("No GPU available for monitoring.")


In [None]:
# Execute training
# This will run the training script directly

import sys
import subprocess
from pathlib import Path

# Clear GPU cache before training
if torch.cuda.is_available():
    torch.cuda.empty_cache()
    print("GPU cache cleared\n")

# Run training script
training_script = project_root / "training" / "train_dual_lora.py"

if not training_script.exists():
    raise FileNotFoundError(f"Training script not found: {training_script}")

print(f"Running training script: {training_script}\n")
print("="*60)

# Run as subprocess to capture output properly
result = subprocess.run(
    [sys.executable, str(training_script)],
    cwd=str(project_root),
    capture_output=False,  # Show output in real-time
    text=True
)

if result.returncode == 0:
    print("\n" + "="*60)
    print("[OK] Training completed successfully!")
    print("="*60)
else:
    print("\n" + "="*60)
    print(f"[ERROR] Training failed with exit code {result.returncode}")
    print("="*60)
    raise RuntimeError("Training failed. Check error messages above.")


In [None]:
# Verify models were saved
models_dir = project_root / "models"

print("\n" + "="*60)
print("VERIFYING SAVED MODELS")
print("="*60)

all_models_saved = True
for model_config in config["models"]:
    model_path = models_dir / model_config["output_dir"]
    if model_path.exists():
        files = list(model_path.iterdir())
        adapter_config = model_path / "adapter_config.json"
        adapter_weights = model_path / "adapter_model.safetensors"
        if not adapter_weights.exists():
            adapter_weights = model_path / "adapter_model.bin"
        
        print(f"\n[OK] {model_config['name']}")
        print(f"  Path: {model_path}")
        print(f"  Adapter config: {'OK' if adapter_config.exists() else 'MISSING'}")
        print(f"  Adapter weights: {'OK' if adapter_weights.exists() else 'MISSING'}")
        print(f"  Total files: {len(files)}")
    else:
        print(f"\n[ERROR] {model_config['name']} not found at {model_path}")
        all_models_saved = False

if all_models_saved:
    print("\n" + "="*60)
    print("[OK] All models saved successfully!")
    print("="*60)
else:
    print("\n" + "="*60)
    print("[WARNING] Some models may not have been saved correctly")
    print("="*60)


## 5. Evaluation


In [None]:
# Run evaluation script
print("Running evaluation on test set...")
print("\n" + "="*60)


In [None]:
# Run evaluation script
evaluation_script = project_root / "evaluation" / "evaluate_models.py"

if not evaluation_script.exists():
    raise FileNotFoundError(f"Evaluation script not found: {evaluation_script}")

print(f"Running evaluation script: {evaluation_script}\n")
print("="*60)

# Clear GPU cache before evaluation
if torch.cuda.is_available():
    torch.cuda.empty_cache()

# Run as subprocess
result = subprocess.run(
    [sys.executable, str(evaluation_script)],
    cwd=str(project_root),
    capture_output=False,
    text=True,
)

if result.returncode == 0:
    print("\n" + "="*60)
    print("[OK] Evaluation completed successfully!")
    print("="*60)
else:
    print("\n" + "="*60)
    print(f"[ERROR] Evaluation failed with exit code {result.returncode}")
    print("="*60)
    raise RuntimeError("Evaluation failed. Check error messages above.")


In [None]:
# Display evaluation results
results_path = project_root / "results" / "metrics_report.json"

print("\n" + "="*60)
print("EVALUATION RESULTS")
print("="*60)

if results_path.exists():
    with open(results_path, "r") as f:
        results = json.load(f)
    
    if "model_results" in results:
        for model_name, model_results in results["model_results"].items():
            print(f"\nModel: {model_name}")
            print(f"  Exact Match Rate: {model_results.get('exact_match_rate', 0):.4f}")
            print(f"  Symbolic Equivalence Rate: {model_results.get('symbolic_equivalence_rate', 0):.4f}")
            print(f"  Average BLEU Score: {model_results.get('avg_bleu_score', 0):.4f}")
    else:
        print("\nResults structure:")
        print(json.dumps(results, indent=2))
else:
    print(f"\n[WARNING] Results file not found: {results_path}")
    print("Please run evaluation first (Cell 19).")

print("="*60)


## 6. Interactive Demo


In [None]:
# Load a model and generate a response
# For 8GB VRAM, we'll use 4-bit quantization for inference too

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

def load_model_for_demo(model_name, adapter_path, device="cuda", use_quantization=True):
    """Load model with adapter for interactive use (memory-optimized)."""
    print(f"Loading {model_name}...")
    print(f"  Adapter: {adapter_path}")
    print(f"  Device: {device}")
    print(f"  Quantization: {use_quantization}")
    
    tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
    
    # Use 4-bit quantization for inference on 8GB GPUs
    if use_quantization and device == "cuda":
        bnb_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_compute_dtype=torch.float16,
            bnb_4bit_use_double_quant=True,
        )
        model = AutoModelForCausalLM.from_pretrained(
            model_name,
            quantization_config=bnb_config,
            device_map="auto",
            trust_remote_code=True,
        )
    else:
        model = AutoModelForCausalLM.from_pretrained(
            model_name,
            torch_dtype=torch.float16 if device == "cuda" else torch.float32,
            device_map="auto" if device == "cuda" else None,
            trust_remote_code=True,
        )
    
    # Load LoRA adapter
    model = PeftModel.from_pretrained(model, str(adapter_path))
    model.eval()
    
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
        tokenizer.pad_token_id = tokenizer.eos_token_id
    
    # Check memory usage
    if torch.cuda.is_available():
        allocated = torch.cuda.memory_allocated(0) / 1e9
        print(f"  GPU Memory Allocated: {allocated:.2f} GB")
    
    print(f"[OK] Model loaded successfully")
    return model, tokenizer

def generate_response(model, tokenizer, question, max_new_tokens=512, max_length=1024):
    """Generate response to a question."""
    prompt = f"### Question:\n{question}\n\n### Solution:\n"
    
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=max_length).to(model.device)
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id,
        )
    
    generated = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
    return generated.strip()


In [None]:
# Load first available model
models_dir = project_root / "models"
device = "cuda" if torch.cuda.is_available() else "cpu"

# Clear GPU cache before loading
if torch.cuda.is_available():
    torch.cuda.empty_cache()

# Try to load the first model from config
model_loaded = False
model = None
tokenizer = None
current_model_name = None

print("="*60)
print("LOADING MODEL FOR INFERENCE")
print("="*60)

for model_config in config["models"]:
    adapter_path = models_dir / model_config["output_dir"]
    
    if adapter_path.exists() and (adapter_path / "adapter_config.json").exists():
        try:
            # Use quantization for 8GB GPUs
            use_quantization = torch.cuda.is_available() and torch.cuda.get_device_properties(0).total_memory < 10e9
            model, tokenizer = load_model_for_demo(
                model_config["name"],
                adapter_path,
                device=device,
                use_quantization=use_quantization
            )
            model_loaded = True
            current_model_name = model_config["name"]
            print(f"\n[OK] Successfully loaded: {current_model_name}")
            break
        except Exception as e:
            print(f"\n[ERROR] Failed to load {model_config['name']}: {e}")
            import traceback
            traceback.print_exc()
            continue

if not model_loaded:
    print("\n[ERROR] No trained models found. Please run training first (Cell 15).")
    print("="*60)
else:
    print("="*60)


In [None]:
# Test the model with a sample question
if model_loaded:
    test_question = "Prove that the sum of the first n natural numbers is n(n+1)/2"
    
    print(f"Question: {test_question}\n")
    print("Generating response...\n")
    
    response = generate_response(model, tokenizer, test_question)
    
    print("Response:")
    print("="*60)
    print(response)
    print("="*60)


### Interactive Query Interface


In [None]:
# Interactive cell - modify the question and run
if model_loaded:
    # Change this question to test different queries
    your_question = "Explain the quicksort algorithm"
    
    print(f"Question: {your_question}\n")
    print("Generating response...\n")
    
    response = generate_response(model, tokenizer, your_question, max_new_tokens=1024)
    
    print("Response:")
    print("="*60)
    print(response)
    print("="*60)
else:
    print("Please load a model first.")


## 7. Download Models (Optional)


In [None]:
# Optional: Save models to Google Drive (persistent storage)
# This keeps your models even after Colab session ends

if IN_COLAB:
    try:
        from google.colab import drive
        import shutil
        
        print("="*60)
        print("SAVING TO GOOGLE DRIVE")
        print("="*60)
        
        # Mount Google Drive
        print("\nMounting Google Drive...")
        drive.mount('/content/drive')
        
        # Create directory in Drive
        drive_models_dir = Path("/content/drive/MyDrive/270FT_models")
        drive_models_dir.mkdir(parents=True, exist_ok=True)
        
        # Copy models
        models_dir = project_root / "models"
        if models_dir.exists():
            print(f"\nCopying models to Google Drive...")
            print(f"  From: {models_dir}")
            print(f"  To: {drive_models_dir}")
            
            # Copy each model directory
            for model_subdir in models_dir.iterdir():
                if model_subdir.is_dir():
                    dest = drive_models_dir / model_subdir.name
                    if dest.exists():
                        shutil.rmtree(dest)
                    shutil.copytree(model_subdir, dest)
                    print(f"  [OK] Copied: {model_subdir.name}")
            
            print(f"\n[OK] Models saved to Google Drive!")
            print(f"  Location: {drive_models_dir}")
        else:
            print("[WARNING] No models found to copy")
            
    except Exception as e:
        print(f"[INFO] Google Drive save failed: {e}")
        print("  You can manually copy models or use the download option above")
else:
    print("Not running in Colab - skipping Google Drive save")


In [None]:
# Download trained models (Colab)
# This creates a compressed archive and downloads it

if IN_COLAB:
    import shutil
    from datetime import datetime
    
    models_dir = project_root / "models"
    
    print("="*60)
    print("DOWNLOADING TRAINED MODELS")
    print("="*60)
    
    if models_dir.exists() and any(models_dir.iterdir()):
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        archive_name = f"trained_models_{timestamp}"
        
        print(f"\nCreating archive...")
        archive_path = Path("/content") / f"{archive_name}.zip"
        shutil.make_archive(
            str(Path("/content") / archive_name),
            "zip",
            models_dir
        )
        
        archive_size = archive_path.stat().st_size / 1e9
        print(f"[OK] Archive created: {archive_path}")
        print(f"  Size: {archive_size:.2f} GB")
        
        # Auto-download
        try:
            from google.colab import files
            print(f"\n[DOWNLOAD] Downloading archive...")
            files.download(str(archive_path))
            print(f"[OK] Download started!")
        except Exception as e:
            print(f"\n[INFO] Auto-download failed: {e}")
            print(f"   Run manually:")
            print(f"   from google.colab import files")
            print(f"   files.download('{archive_name}.zip')")
    else:
        print("[WARNING] Models directory not found or empty.")
        print(f"  Expected: {models_dir}")
else:
    print("Not running in Colab - use Cell 28 for local archiving")


In [None]:
# Uncomment to download
# from google.colab import files
# files.download('trained_models.zip')


## 8. Cleanup (Optional)


In [None]:
# Clear GPU memory
if torch.cuda.is_available():
    torch.cuda.empty_cache()
    print("GPU cache cleared")

# Optionally delete models to free up space
# import shutil
# shutil.rmtree("270FT/models", ignore_errors=True)
# print("Models directory deleted")


---

## Notes

### Colab Runtime Information
- **Free Tier**: T4 GPU (~16GB VRAM), ~12GB RAM, ~12 hour sessions
- **Pro Tier**: V100 (16GB) or A100 (40GB) GPUs, up to 52GB RAM, ~24 hour sessions
- **Current Config**: Optimized for T4 (16GB VRAM) with batch_size=2

### Memory Requirements
- **Training**: ~8-10GB VRAM with current settings (batch_size=2, gradient checkpointing)
- **Inference**: ~4-5GB VRAM with 4-bit quantization
- **Training Time**: Expect 3-6 hours per model on Colab T4

### Configuration
- **Batch Size**: 2 (with gradient accumulation of 2 for effective batch size of 4)
- **Max Length**: 1024 tokens (can increase to 2048 on A100)
- **Gradient Checkpointing**: Enabled (trades ~20% compute for memory)
- **Quantization**: 4-bit NF4 for both training and inference

### Colab-Specific Tips

1. **Session Management**:
   - Colab sessions disconnect after ~90 min of inactivity
   - Save checkpoints frequently (configured to save every 500 steps)
   - Use Google Drive to persist models (see Cell 29)
   - Consider Colab Pro for longer sessions and better GPUs

2. **Out of Memory (OOM) Errors**:
   - Reduce `batch_size` to 1 in `training_config.yaml`
   - Reduce `max_length` to 512
   - Request High-RAM runtime: Runtime → Change runtime type → High-RAM
   - Close other Colab tabs using GPU

3. **Repository Setup**:
   - **IMPORTANT**: Update `REPO_URL` in Cell 3 with your GitHub repository
   - Or upload your project files directly to Colab

4. **Data Upload**:
   - Upload data files using Colab's file browser (left sidebar)
   - Or use: `from google.colab import files; files.upload()`
   - Or mount Google Drive and copy from there

5. **Model Persistence**:
   - Colab files are deleted when session ends
   - Always download models or save to Google Drive (Cell 28-29)
   - Models are ~50-100MB (LoRA adapters only)

6. **Training Interrupted**:
   - Checkpoints are saved every 500 steps
   - Training will resume from last checkpoint if you restart
   - Or load adapter and continue training manually

### Troubleshooting

1. **Out of Memory (OOM) Errors**:
   - Reduce `max_length` in `training_config.yaml` (try 512)
   - Reduce `gradient_accumulation_steps` (try 2 instead of 4)
   - Ensure no other processes are using GPU memory
   - Close other notebooks/applications using GPU

2. **Model Not Found**:
   - Ensure training completed successfully (check Cell 16)
   - Verify adapter files exist in `models/` directory

3. **Import Errors**:
   - Restart kernel and re-run setup cells (Cells 1-5)
   - Ensure all dependencies are installed (Cell 4)

4. **Training Fails**:
   - Check that training data exists in `data/processed/train.jsonl` or `data/raw/train.json`
   - Verify data format is correct (see README for format specifications)
   - Check GPU memory before training (Cell 2)

5. **Slow Training**:
   - This is expected with 8GB VRAM due to memory optimizations
   - Gradient checkpointing adds ~20-30% overhead but enables training on limited memory
   - Consider using a cloud GPU with more VRAM for faster training

### Next Steps

- Experiment with different LoRA hyperparameters (rank, alpha)
- Try different base models (Qwen 2.5 1.5B for even lower memory usage)
- Add more training data for better performance
- Fine-tune the evaluation metrics
- Use the CLI (`python cli.py query`) for interactive queries
