# LoRA (Low-Rank Adaptation) Concepts Demo

This notebook demonstrates the key concepts behind LoRA fine-tuning using practical examples with Ollama.

## Learning Objectives
1. Understand the mathematical foundation of LoRA
2. Compare parameter efficiency of LoRA vs traditional fine-tuning
3. Explore LoRA hyperparameters and their impact
4. Implement a simple LoRA demonstration

## 1. What is LoRA?

**Low-Rank Adaptation (LoRA)** is a parameter-efficient fine-tuning technique that allows us to adapt large pre-trained models to specific tasks without modifying all the original parameters.

### Key Insight
LoRA is based on the hypothesis that the weight updates during fine-tuning have a low "intrinsic rank". This means we can represent these updates using much smaller matrices.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import json
import os

# Set up plotting style
plt.style.use('default')
plt.rcParams['figure.figsize'] = (10, 6)

print("Environment setup complete!")

## 2. Mathematical Foundation

In traditional fine-tuning, we update the entire weight matrix:
```
W_new = W_original + ΔW
```

LoRA decomposes the update matrix ΔW into two smaller matrices:
```
ΔW = A × B
```
Where:
- A has shape (d, r)
- B has shape (r, d) 
- r << d (rank is much smaller than dimension)

In [None]:
def demonstrate_lora_decomposition(d=1024, r=16):
    """
    Demonstrate how LoRA decomposes a large matrix into smaller ones.
    """
    print(f"Original weight matrix size: {d} × {d} = {d*d:,} parameters")
    print(f"LoRA rank: {r}")
    
    # LoRA matrices
    A_params = d * r
    B_params = r * d
    total_lora_params = A_params + B_params
    
    print(f"\nLoRA matrix A size: {d} × {r} = {A_params:,} parameters")
    print(f"LoRA matrix B size: {r} × {d} = {B_params:,} parameters")
    print(f"Total LoRA parameters: {total_lora_params:,}")
    
    # Calculate efficiency
    original_params = d * d
    reduction = (1 - total_lora_params / original_params) * 100
    
    print(f"\nParameter reduction: {reduction:.2f}%")
    print(f"Memory savings: {original_params / total_lora_params:.1f}x less parameters")
    
    return {
        'original_params': original_params,
        'lora_params': total_lora_params,
        'reduction_percent': reduction
    }

# Demonstrate with typical transformer dimensions
result = demonstrate_lora_decomposition(d=4096, r=16)


## 3. Visual Comparison: Traditional vs LoRA Fine-tuning

In [None]:
def plot_parameter_comparison():
    """
    Visualize parameter efficiency across different model sizes and ranks.
    """
    model_sizes = [1e9, 7e9, 13e9, 30e9, 65e9]  # 1B, 7B, 13B, 30B, 65B parameters
    model_names = ['1B', '7B', '13B', '30B', '65B']
    ranks = [4, 8, 16, 32, 64]
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
    
    # Plot 1: Parameter reduction by model size (rank=16)
    rank = 16
    reductions = []
    for size in model_sizes:
        # Estimate LoRA parameters (simplified calculation)
        # Assuming ~32 layers, 4 matrices per layer, hidden_size proportional to sqrt(total_params)
        hidden_size = int((size / 32 / 4) ** 0.5)
        lora_params = 32 * 4 * (hidden_size * rank + rank * hidden_size)
        reduction = (1 - lora_params / size) * 100
        reductions.append(reduction)
    
    ax1.bar(model_names, reductions, color='skyblue', alpha=0.7)
    ax1.set_title(f'Parameter Reduction by Model Size (rank={rank})')
    ax1.set_ylabel('Reduction (%)')
    ax1.set_xlabel('Model Size')
    ax1.grid(True, alpha=0.3)
    
    # Plot 2: Parameter reduction by rank (7B model)
    model_size = 7e9
    hidden_size = 4096  # Typical for 7B models
    reductions_by_rank = []
    
    for r in ranks:
        lora_params = 32 * 4 * (hidden_size * r + r * hidden_size)
        reduction = (1 - lora_params / model_size) * 100
        reductions_by_rank.append(reduction)
    
    ax2.plot(ranks, reductions_by_rank, marker='o', linewidth=2, markersize=8, color='orange')
    ax2.set_title('Parameter Reduction by LoRA Rank (7B Model)')
    ax2.set_ylabel('Reduction (%)')
    ax2.set_xlabel('LoRA Rank')
    ax2.grid(True, alpha=0.3)
    ax2.set_xticks(ranks)
    
    plt.tight_layout()
    plt.show()

plot_parameter_comparison()

## 4. LoRA Hyperparameters

The key hyperparameters in LoRA are:

1. **Rank (r)**: Controls the size of the adaptation matrices
2. **Alpha (α)**: Scaling factor for the LoRA weights
3. **Target Modules**: Which layers to apply LoRA to
4. **Dropout**: Regularization for LoRA layers

In [None]:
def analyze_lora_hyperparameters():
    """
    Analyze the impact of different LoRA hyperparameters.
    """
    print("LoRA Hyperparameter Analysis\n" + "="*40)
    
    # Rank analysis
    print("\n1. RANK (r) - Controls adapter capacity:")
    ranks = [4, 8, 16, 32, 64]
    hidden_size = 4096
    
    for r in ranks:
        params_per_layer = 2 * hidden_size * r  # A and B matrices
        print(f"   Rank {r:2d}: {params_per_layer:,} params per adapted layer")
    
    print("\n   Guidelines:")
    print("   - Lower rank (4-8): Faster, less memory, may underfit")
    print("   - Higher rank (32-64): More capacity, may overfit")
    print("   - Sweet spot: 16-32 for most tasks")
    
    # Alpha analysis
    print("\n2. ALPHA (α) - Scaling factor:")
    print("   - Controls the magnitude of LoRA updates")
    print("   - Effective learning rate = α / r")
    print("   - Common values: 16, 32, 64")
    print("   - Higher α = stronger adaptation")
    
    # Target modules
    print("\n3. TARGET MODULES - Which layers to adapt:")
    modules = {
        "q_proj, v_proj": "Query and Value (minimal)",
        "q_proj, k_proj, v_proj, o_proj": "All attention (common)",
        "All linear layers": "Maximum adaptation (expensive)"
    }
    
    for module, description in modules.items():
        print(f"   - {module}: {description}")
    
    return "Analysis complete!"

analyze_lora_hyperparameters()

## 5. Practical Example: Creating a Custom Ollama Model

Let's create a practical example of how to use LoRA concepts with Ollama.

In [None]:
def create_ollama_modelfile(model_name="lora-ml-tutor", base_model="llama3.2:1b"):
    """
    Create an Ollama Modelfile that simulates a LoRA-tuned model.
    """
    modelfile_content = f"""# {model_name.upper()} - LoRA Fine-tuned Model
# Base model: {base_model}
# Simulated LoRA adaptation for ML education

FROM {base_model}

# LoRA-inspired parameters
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40
PARAMETER repeat_penalty 1.1

# System prompt optimized for ML education (simulating LoRA adaptation)
SYSTEM \"\"\"You are an expert machine learning tutor specialized in explaining complex ML concepts clearly and concisely. You:

1. Break down complex topics into digestible parts
2. Use analogies and examples to illustrate concepts
3. Provide mathematical foundations when relevant
4. Focus on practical applications and implementations
5. Encourage hands-on learning and experimentation

Always structure your responses with clear headings and examples.\"\"\"

# Template for consistent formatting
TEMPLATE \"\"\"{{{{ if .System }}}}<|start_header_id|>system<|end_header_id|>

{{{{ .System }}}}<|eot_id|>{{{{ end }}}}{{{{ if .Prompt }}}}<|start_header_id|>user<|end_header_id|>

{{{{ .Prompt }}}}<|eot_id|>{{{{ end }}}}<|start_header_id|>assistant<|end_header_id|>

\"\"\"
"""
    
    # Save to file
    os.makedirs('../models', exist_ok=True)
    modelfile_path = f'../models/Modelfile.{model_name}'
    
    with open(modelfile_path, 'w') as f:
        f.write(modelfile_content)
    
    print(f"Modelfile created: {modelfile_path}")
    print("\nTo create the model in Ollama, run:")
    print(f"ollama create {model_name} -f {modelfile_path}")
    
    return modelfile_content

# Create the modelfile
modelfile = create_ollama_modelfile()
print("\nModelfile content:")
print("=" * 50)
print(modelfile[:500] + "...")

## 6. Sample Training Data for LoRA

Let's create a sample dataset that could be used for LoRA fine-tuning.

In [None]:
def create_ml_training_dataset():
    """
    Create a sample dataset for ML education fine-tuning.
    """
    training_data = [
        {
            "instruction": "Explain what LoRA (Low-Rank Adaptation) is and why it's useful.",
            "input": "",
            "output": "LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that allows us to adapt large pre-trained models to specific tasks without modifying all the original parameters. Instead of updating the entire weight matrix W, LoRA decomposes the weight update ΔW into two smaller matrices A and B, where ΔW = A × B. This approach reduces trainable parameters by 90%+ while maintaining model performance, making fine-tuning faster and more memory-efficient."
        },
        {
            "instruction": "What are the key hyperparameters in LoRA and how do they affect training?",
            "input": "",
            "output": "The key LoRA hyperparameters are: 1) Rank (r): Controls the size of adaptation matrices - lower ranks (4-8) are faster but may underfit, higher ranks (32-64) provide more capacity but may overfit. 2) Alpha (α): Scaling factor that controls the magnitude of LoRA updates, with common values being 16, 32, or 64. 3) Target modules: Determines which layers to adapt - typically attention layers (q_proj, k_proj, v_proj, o_proj) for optimal balance of performance and efficiency."
        },
        {
            "instruction": "Compare traditional fine-tuning with LoRA fine-tuning.",
            "input": "",
            "output": "Traditional fine-tuning updates all model parameters, requiring full model storage and high computational resources. LoRA fine-tuning freezes the original weights and only trains small adapter matrices, reducing trainable parameters by 90%+, memory usage by ~70%, and training time by 2-3x. LoRA also enables modular adaptation - you can switch between different task-specific adapters using the same base model, while traditional fine-tuning requires separate model copies for each task."
        },
        {
            "instruction": "How does the mathematical foundation of LoRA work?",
            "input": "",
            "output": "LoRA is based on the hypothesis that weight updates during fine-tuning have low intrinsic rank. Mathematically, instead of updating W_new = W_original + ΔW where ΔW is a full-rank matrix, LoRA decomposes ΔW = A × B where A has shape (d×r) and B has shape (r×d), with r << d. During training, only A and B are updated while W_original remains frozen. During inference, the matrices are merged: W_final = W_original + α × A × B, where α is a scaling factor."
        },
        {
            "instruction": "What are the practical benefits of using LoRA for model deployment?",
            "input": "",
            "output": "LoRA offers several deployment advantages: 1) Storage efficiency - you only need to store small adapter files (MBs) instead of full model copies (GBs) for each task. 2) Memory efficiency - reduced GPU memory requirements during training and inference. 3) Modularity - easily switch between different specialized adapters on the same base model. 4) Cost reduction - lower computational requirements mean reduced cloud costs. 5) Faster iteration - quicker training cycles enable rapid experimentation and deployment of task-specific models."
        }
    ]
    
    # Save dataset
    os.makedirs('../data', exist_ok=True)
    dataset_path = '../data/ml_education_dataset.json'
    
    with open(dataset_path, 'w') as f:
        json.dump(training_data, f, indent=2)
    
    print(f"Training dataset created: {dataset_path}")
    print(f"Dataset contains {len(training_data)} training examples")
    
    # Display first example
    print("\nSample training example:")
    print("=" * 30)
    example = training_data[0]
    print(f"Instruction: {example['instruction']}")
    print(f"Output: {example['output'][:200]}...")
    
    return training_data

dataset = create_ml_training_dataset()

## 7. Summary and Next Steps

In this notebook, we've covered:

1. **LoRA Fundamentals**: Mathematical foundation and key concepts
2. **Parameter Efficiency**: Dramatic reduction in trainable parameters
3. **Hyperparameters**: Rank, alpha, and target modules
4. **Practical Implementation**: Ollama Modelfile creation
5. **Training Data**: Sample dataset for ML education

### Next Steps:
1. Run the actual LoRA training (see `02_lora_training.ipynb`)
2. Deploy the fine-tuned model to Ollama
3. Compare base model vs fine-tuned model performance
4. Experiment with different hyperparameters

### Key Takeaways:
- LoRA enables efficient fine-tuning with 90%+ parameter reduction
- Rank and alpha are the most important hyperparameters to tune
- LoRA adapters are modular and can be easily swapped
- The technique works well for domain-specific adaptations

In [None]:
print("🎉 LoRA Concepts Demo Complete!")
print("\nFiles created:")
print("- ../models/Modelfile.lora-ml-tutor")
print("- ../data/ml_education_dataset.json")
print("\nNext: Open 02_lora_training.ipynb for hands-on training!")