# üß© Mini-Lab: Comparing LLM Models

**Module 2: LLM Core Concepts** | **Duration: ~45 min** | **Type: Mini-Lab**

---

## Learning Objectives

By the end of this mini-lab, you will be able to:

1. **Compare** responses from different LLM providers and models
2. **Evaluate** model performance across different task types
3. **Analyze** cost-performance trade-offs
4. **Choose** the right model for specific use cases
5. **Use** both cloud APIs and local models (Ollama) for comparison

## Target Concepts

| Concept | Description |
|---------|-------------|
| Model Comparison | Evaluating different LLMs on the same tasks |
| Model Selection | Decision framework for choosing models |
| Open vs Closed Models | Trade-offs between local open-source and cloud APIs |
| Context Window | Maximum input size varies by model |

## Prerequisites

- **mini-ollama-setup** (Module 1): For local model comparisons

## 1. Setup

In [1]:
import os
import time
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display

load_dotenv()

# Initialize clients
openai_client = OpenAI()

# Optional: Anthropic client
try:
    import anthropic
    anthropic_client = anthropic.Anthropic()
    HAS_ANTHROPIC = True
    print("‚úì Anthropic client initialized")
except:
    HAS_ANTHROPIC = False
    print("‚úó Anthropic not available (optional)")

def md(text):
    display(Markdown(text))

print("‚úì OpenAI client initialized")

‚úì Anthropic client initialized
‚úì OpenAI client initialized


## 2. Model Landscape Overview

Understanding the current LLM landscape:

In [2]:
# Model specifications (as of 2024)
MODELS = {
    "openai": {
        "gpt-4o": {
            "context": 128_000,
            "cost_input": 2.50,  # per million tokens
            "cost_output": 10.00,
            "strengths": ["reasoning", "code", "multimodal"],
            "speed": "fast"
        },
        "gpt-4o-mini": {
            "context": 128_000,
            "cost_input": 0.15,
            "cost_output": 0.60,
            "strengths": ["general", "fast", "cost-effective"],
            "speed": "very fast"
        },
        "o1-preview": {
            "context": 128_000,
            "cost_input": 15.00,
            "cost_output": 60.00,
            "strengths": ["complex reasoning", "math", "science"],
            "speed": "slow (thinks)"
        },
    },
    "anthropic": {
        "claude-3-opus": {
            "context": 200_000,
            "cost_input": 15.00,
            "cost_output": 75.00,
            "strengths": ["analysis", "writing", "reasoning"],
            "speed": "moderate"
        },
        "claude-3-5-sonnet": {
            "context": 200_000,
            "cost_input": 3.00,
            "cost_output": 15.00,
            "strengths": ["balanced", "code", "analysis"],
            "speed": "fast"
        },
        "claude-3-haiku": {
            "context": 200_000,
            "cost_input": 0.25,
            "cost_output": 1.25,
            "strengths": ["speed", "cost", "simple tasks"],
            "speed": "very fast"
        },
    }
}

def print_model_comparison():
    """Print model comparison table."""
    
    md("## üìä Model Comparison Overview\n")
    
    table = "| Provider | Model | Context | Input $/M | Output $/M | Speed |\n"
    table += "|----------|-------|---------|-----------|------------|-------|\n"
    
    for provider, models in MODELS.items():
        for model_name, specs in models.items():
            ctx = f"{specs['context']:,}"
            table += f"| {provider} | {model_name} | {ctx} | ${specs['cost_input']:.2f} | ${specs['cost_output']:.2f} | {specs['speed']} |\n"
    
    md(table)

print_model_comparison()

## üìä Model Comparison Overview


| Provider | Model | Context | Input $/M | Output $/M | Speed |
|----------|-------|---------|-----------|------------|-------|
| openai | gpt-4o | 128,000 | $2.50 | $10.00 | fast |
| openai | gpt-4o-mini | 128,000 | $0.15 | $0.60 | very fast |
| openai | o1-preview | 128,000 | $15.00 | $60.00 | slow (thinks) |
| anthropic | claude-3-opus | 200,000 | $15.00 | $75.00 | moderate |
| anthropic | claude-3-5-sonnet | 200,000 | $3.00 | $15.00 | fast |
| anthropic | claude-3-haiku | 200,000 | $0.25 | $1.25 | very fast |


## 3. Direct Model Comparison

In [3]:
def query_openai(model, prompt, max_tokens=300):
    """Query OpenAI model."""
    start = time.time()
    response = openai_client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        max_tokens=max_tokens,
        temperature=0
    )
    elapsed = time.time() - start
    
    return {
        "content": response.choices[0].message.content,
        "time": elapsed,
        "input_tokens": response.usage.prompt_tokens,
        "output_tokens": response.usage.completion_tokens
    }

def query_anthropic(model, prompt, max_tokens=300):
    """Query Anthropic model."""
    if not HAS_ANTHROPIC:
        return None
    
    start = time.time()
    response = anthropic_client.messages.create(
        model=model,
        max_tokens=max_tokens,
        messages=[{"role": "user", "content": prompt}]
    )
    elapsed = time.time() - start
    
    return {
        "content": response.content[0].text,
        "time": elapsed,
        "input_tokens": response.usage.input_tokens,
        "output_tokens": response.usage.output_tokens
    }

def compare_models(prompt, models_to_test):
    """Compare multiple models on the same prompt."""
    
    md(f"### üìù Prompt\n> {prompt}\n\n---")
    
    results = {}
    
    for model_spec in models_to_test:
        provider = model_spec["provider"]
        model = model_spec["model"]
        
        print(f"\nQuerying {provider}/{model}...", end=" ")
        
        try:
            if provider == "openai":
                result = query_openai(model, prompt)
            elif provider == "anthropic":
                result = query_anthropic(model, prompt)
                if result is None:
                    print("‚ùå Anthropic not available")
                    continue
            
            results[f"{provider}/{model}"] = result
            print(f"‚úì ({result['time']:.2f}s)")
            
        except Exception as e:
            print(f"‚ùå Error: {e}")
    
    # Display results
    for model_name, result in results.items():
        specs = None
        provider, model = model_name.split("/")
        if provider in MODELS and model in MODELS[provider]:
            specs = MODELS[provider][model]
        
        cost = 0
        if specs:
            cost = (result['input_tokens'] / 1_000_000 * specs['cost_input'] +
                   result['output_tokens'] / 1_000_000 * specs['cost_output'])
        
        md(f"### ü§ñ {model_name}")
        md(f"‚è±Ô∏è Time: {result['time']:.2f}s | üìä Tokens: {result['input_tokens']}‚Üí{result['output_tokens']} | üí∞ Cost: ${cost:.6f}\n")
        md(f"{result['content']}\n\n---")
    
    return results

# Test with multiple models
models_to_test = [
    {"provider": "openai", "model": "gpt-4o-mini"},
    {"provider": "openai", "model": "gpt-4o"},
]

if HAS_ANTHROPIC:
    models_to_test.append({"provider": "anthropic", "model": "claude-3-5-sonnet-latest"})

compare_models(
    "Explain the concept of recursion in programming. Use a simple example.",
    models_to_test
)

### üìù Prompt
> Explain the concept of recursion in programming. Use a simple example.

---


Querying openai/gpt-4o-mini... ‚úì (6.83s)

Querying openai/gpt-4o... ‚úì (7.16s)

Querying anthropic/claude-3-5-sonnet-latest... ‚ùå Error: Error code: 404 - {'type': 'error', 'error': {'type': 'not_found_error', 'message': 'model: claude-3-5-sonnet-latest'}, 'request_id': 'req_011CXBqyQuE9xGU4qZ2wZSH2'}


### ü§ñ openai/gpt-4o-mini

‚è±Ô∏è Time: 6.83s | üìä Tokens: 20‚Üí300 | üí∞ Cost: $0.000183


Recursion in programming is a technique where a function calls itself in order to solve a problem. This approach is often used to break down complex problems into simpler subproblems. A recursive function typically has two main components:

1. **Base Case**: This is the condition under which the recursion stops. It prevents the function from calling itself indefinitely.
2. **Recursive Case**: This is where the function calls itself with a modified argument, moving towards the base case.

### Simple Example: Factorial Calculation

A classic example of recursion is the calculation of the factorial of a number. The factorial of a non-negative integer \( n \) (denoted as \( n! \)) is the product of all positive integers less than or equal to \( n \). The factorial can be defined recursively as follows:

- **Base Case**: \( 0! = 1 \) (by definition)
- **Recursive Case**: \( n! = n \times (n-1)! \) for \( n > 0 \)

Here‚Äôs how you can implement this in Python:

```python
def factorial(n):
    # Base case
    if n == 0:
        return 1
    # Recursive case
    else:
        return n * factorial(n - 1)

# Example usage
print(factorial(5))  # Output: 120
```

### Explanation of the Example:

1. **Base Case**: When `n` is

---

### ü§ñ openai/gpt-4o

‚è±Ô∏è Time: 7.16s | üìä Tokens: 20‚Üí300 | üí∞ Cost: $0.003050


Recursion in programming is a technique where a function calls itself in order to solve a problem. This approach is often used to break down complex problems into simpler, more manageable sub-problems. A recursive function typically has two main components: a base case and a recursive case. The base case is a condition that stops the recursion, preventing it from continuing indefinitely. The recursive case is where the function calls itself with a modified argument, gradually working towards the base case.

A classic example of recursion is the calculation of the factorial of a number. The factorial of a non-negative integer \( n \) (denoted as \( n! \)) is the product of all positive integers less than or equal to \( n \). The recursive definition of a factorial is:

- Base case: \( 0! = 1 \)
- Recursive case: \( n! = n \times (n-1)! \) for \( n > 0 \)

Here's a simple implementation of a recursive function to calculate the factorial of a number in Python:

```python
def factorial(n):
    # Base case: if n is 0, return 1
    if n == 0:
        return 1
    # Recursive case: n * factorial of (n-1)
    else:
        return n * factorial(n - 1)

# Example usage
print(factorial(5))  # Output: 120
```

In this example, the `factorial` function

---

{'openai/gpt-4o-mini': {'content': 'Recursion in programming is a technique where a function calls itself in order to solve a problem. This approach is often used to break down complex problems into simpler subproblems. A recursive function typically has two main components:\n\n1. **Base Case**: This is the condition under which the recursion stops. It prevents the function from calling itself indefinitely.\n2. **Recursive Case**: This is where the function calls itself with a modified argument, moving towards the base case.\n\n### Simple Example: Factorial Calculation\n\nA classic example of recursion is the calculation of the factorial of a number. The factorial of a non-negative integer \\( n \\) (denoted as \\( n! \\)) is the product of all positive integers less than or equal to \\( n \\). The factorial can be defined recursively as follows:\n\n- **Base Case**: \\( 0! = 1 \\) (by definition)\n- **Recursive Case**: \\( n! = n \\times (n-1)! \\) for \\( n > 0 \\)\n\nHere‚Äôs how you

## 4. Task-Specific Model Evaluation

Different models excel at different tasks:

In [4]:
def evaluate_task_types():
    """Evaluate models across different task types."""
    
    tasks = {
        "reasoning": {
            "prompt": """A farmer has 17 sheep. All but 9 die. How many sheep are left? 
Think step by step and explain your reasoning.""",
            "expected_contains": "9"
        },
        "code": {
            "prompt": "Write a Python function to check if a string is a palindrome.",
            "expected_contains": "def"
        },
        "creative": {
            "prompt": "Write a haiku about artificial intelligence.",
            "expected_contains": None  # Subjective
        },
        "extraction": {
            "prompt": """Extract the following from this text:
- Name
- Company
- Email

Text: "Hi, I'm Sarah Johnson from TechCorp. You can reach me at sarah.j@techcorp.com"

Return as JSON.""",
            "expected_contains": "sarah"
        },
        "summarization": {
            "prompt": """Summarize this in one sentence:
Machine learning is a subset of artificial intelligence that enables systems to learn 
and improve from experience without being explicitly programmed. It focuses on developing 
computer programs that can access data and use it to learn for themselves.""",
            "expected_contains": None
        }
    }
    
    models = [
        ("openai", "gpt-4o-mini"),
        ("openai", "gpt-4o"),
    ]
    
    results = {}
    
    for task_name, task_info in tasks.items():
        md(f"## üìã Task: {task_name.upper()}")
        md(f"*Prompt:* {task_info['prompt'][:100]}...\n")
        
        results[task_name] = {}
        
        for provider, model in models:
            try:
                result = query_openai(model, task_info['prompt'], max_tokens=200)
                
                # Check if expected content is present
                success = "‚úì" if (task_info['expected_contains'] is None or 
                                  task_info['expected_contains'].lower() in result['content'].lower()) else "?"
                
                results[task_name][model] = {
                    "time": result['time'],
                    "tokens": result['output_tokens'],
                    "content": result['content']
                }
                
                md(f"**{model}** {success} ({result['time']:.2f}s, {result['output_tokens']} tokens)")
                md(f"> {result['content'][:200]}...\n")
                
            except Exception as e:
                md(f"**{model}** ‚ùå Error: {e}\n")
        
        md("---\n")
    
    return results

task_results = evaluate_task_types()

## üìã Task: REASONING

*Prompt:* A farmer has 17 sheep. All but 9 die. How many sheep are left? 
Think step by step and explain your ...


**gpt-4o-mini** ‚úì (3.10s, 135 tokens)

> Let's break down the problem step by step:

1. **Understanding the total number of sheep**: The farmer starts with a total of 17 sheep.

2. **Interpreting "all but 9 die"**: The phrase "all but 9 die"...


**gpt-4o** ‚úì (1.88s, 115 tokens)

> To solve this problem, let's break it down step by step:

1. **Initial Count**: The farmer starts with 17 sheep.

2. **Understanding "All but 9 die"**: The phrase "all but 9 die" means that out of the...


---


## üìã Task: CODE

*Prompt:* Write a Python function to check if a string is a palindrome....


**gpt-4o-mini** ‚úì (3.86s, 200 tokens)

> Certainly! A palindrome is a string that reads the same forwards and backwards. Here‚Äôs a simple Python function to check if a given string is a palindrome:

```python
def is_palindrome(s):
    # Remov...


**gpt-4o** ‚úì (6.55s, 200 tokens)

> Certainly! A palindrome is a string that reads the same forward and backward. Here's a Python function to check if a given string is a palindrome:

```python
def is_palindrome(s):
    # Remove any non...


---


## üìã Task: CREATIVE

*Prompt:* Write a haiku about artificial intelligence....


**gpt-4o-mini** ‚úì (0.70s, 20 tokens)

> Silent circuits hum,  
Thoughts woven in code and light,  
Dreams of minds awake....


**gpt-4o** ‚úì (1.27s, 17 tokens)

> Silent circuits hum,  
Thoughts emerge from coded light‚Äî  
Machine dreams awake....


---


## üìã Task: EXTRACTION

*Prompt:* Extract the following from this text:
- Name
- Company
- Email

Text: "Hi, I'm Sarah Johnson from Te...


**gpt-4o-mini** ‚úì (1.20s, 35 tokens)

> ```json
{
  "Name": "Sarah Johnson",
  "Company": "TechCorp",
  "Email": "sarah.j@techcorp.com"
}
```...


**gpt-4o** ‚úì (0.70s, 35 tokens)

> ```json
{
  "Name": "Sarah Johnson",
  "Company": "TechCorp",
  "Email": "sarah.j@techcorp.com"
}
```...


---


## üìã Task: SUMMARIZATION

*Prompt:* Summarize this in one sentence:
Machine learning is a subset of artificial intelligence that enables...


**gpt-4o-mini** ‚úì (1.28s, 28 tokens)

> Machine learning, a subset of artificial intelligence, allows systems to autonomously learn and improve from experience by accessing and utilizing data without explicit programming....


**gpt-4o** ‚úì (0.60s, 23 tokens)

> Machine learning is a branch of artificial intelligence that allows systems to autonomously learn and improve from data without explicit programming....


---


## 5. Cost-Performance Analysis

In [5]:
def analyze_cost_performance(prompt, num_runs=3):
    """Analyze cost vs performance for different models."""
    
    models = [
        ("gpt-4o-mini", 0.15, 0.60),
        ("gpt-4o", 2.50, 10.00),
    ]
    
    md("## üí∞ Cost-Performance Analysis\n")
    md(f"*Prompt:* {prompt[:80]}...\n")
    
    results = []
    
    for model, input_cost, output_cost in models:
        times = []
        token_counts = []
        costs = []
        
        for _ in range(num_runs):
            result = query_openai(model, prompt, max_tokens=200)
            times.append(result['time'])
            token_counts.append(result['output_tokens'])
            
            cost = (result['input_tokens'] / 1_000_000 * input_cost +
                   result['output_tokens'] / 1_000_000 * output_cost)
            costs.append(cost)
        
        avg_time = sum(times) / len(times)
        avg_tokens = sum(token_counts) / len(token_counts)
        avg_cost = sum(costs) / len(costs)
        
        results.append({
            "model": model,
            "avg_time": avg_time,
            "avg_tokens": avg_tokens,
            "avg_cost": avg_cost,
            "cost_per_1k": avg_cost * 1000
        })
    
    # Display results
    table = "| Model | Avg Time | Avg Tokens | Cost/Call | Cost/1K Calls |\n"
    table += "|-------|----------|------------|-----------|---------------|\n"
    
    for r in results:
        table += f"| {r['model']} | {r['avg_time']:.2f}s | {r['avg_tokens']:.0f} | ${r['avg_cost']:.6f} | ${r['cost_per_1k']:.3f} |\n"
    
    md(table)
    
    # Calculate relative comparisons
    if len(results) >= 2:
        base = results[0]  # gpt-4o-mini as baseline
        premium = results[1]  # gpt-4o
        
        cost_ratio = premium['avg_cost'] / base['avg_cost']
        time_diff = ((base['avg_time'] - premium['avg_time']) / base['avg_time']) * 100
        
        md(f"\n### üìä Comparison")
        md(f"- **{premium['model']}** costs **{cost_ratio:.1f}x** more than {base['model']}")
        md(f"- Speed difference: {abs(time_diff):.1f}% {'faster' if time_diff > 0 else 'slower'}")
        md(f"- For 1M API calls: ${base['cost_per_1k']*1000:.2f} vs ${premium['cost_per_1k']*1000:.2f}")
    
    return results

analyze_cost_performance(
    "Write a brief explanation of what an API is and why it's important.",
    num_runs=3
)

## üí∞ Cost-Performance Analysis


*Prompt:* Write a brief explanation of what an API is and why it's important....


| Model | Avg Time | Avg Tokens | Cost/Call | Cost/1K Calls |
|-------|----------|------------|-----------|---------------|
| gpt-4o-mini | 4.30s | 200 | $0.000123 | $0.123 |
| gpt-4o | 2.21s | 137 | $0.001419 | $1.419 |



### üìä Comparison

- **gpt-4o** costs **11.5x** more than gpt-4o-mini

- Speed difference: 48.6% faster

- For 1M API calls: $123.15 vs $1419.17

[{'model': 'gpt-4o-mini',
  'avg_time': 4.298131783803304,
  'avg_tokens': 200.0,
  'avg_cost': 0.00012315,
  'cost_per_1k': 0.12315000000000001},
 {'model': 'gpt-4o',
  'avg_time': 2.2095280488332114,
  'avg_tokens': 136.66666666666666,
  'avg_cost': 0.0014191666666666669,
  'cost_per_1k': 1.419166666666667}]

## 6. Model Selection Framework

In [6]:
def recommend_model(task_type, budget_sensitivity, quality_requirement, latency_requirement):
    """
    Recommend a model based on requirements.
    
    Args:
        task_type: "general", "reasoning", "code", "creative", "extraction"
        budget_sensitivity: "low", "medium", "high"
        quality_requirement: "basic", "good", "best"
        latency_requirement: "realtime", "fast", "flexible"
    """
    
    recommendations = {
        # (quality, budget, latency) -> model
        ("basic", "high", "realtime"): "gpt-4o-mini",
        ("basic", "high", "fast"): "gpt-4o-mini",
        ("basic", "medium", "realtime"): "gpt-4o-mini",
        ("good", "medium", "fast"): "gpt-4o-mini",
        ("good", "low", "fast"): "gpt-4o",
        ("good", "low", "flexible"): "gpt-4o",
        ("best", "low", "fast"): "gpt-4o",
        ("best", "low", "flexible"): "gpt-4o",
    }
    
    # Special case for complex reasoning
    if task_type == "reasoning" and quality_requirement == "best":
        recommended = "gpt-4o (or o1-preview for complex math/science)"
    else:
        key = (quality_requirement, budget_sensitivity, latency_requirement)
        recommended = recommendations.get(key, "gpt-4o-mini")
    
    print(f"\nüéØ Model Recommendation")
    print("="*50)
    print(f"Task Type: {task_type}")
    print(f"Budget Sensitivity: {budget_sensitivity}")
    print(f"Quality Requirement: {quality_requirement}")
    print(f"Latency Requirement: {latency_requirement}")
    print(f"\n‚ú® Recommended Model: {recommended}")
    
    # Additional notes
    notes = []
    if budget_sensitivity == "high":
        notes.append("Consider batching requests to reduce costs")
    if quality_requirement == "best" and task_type == "code":
        notes.append("Consider adding code review/testing step")
    if latency_requirement == "realtime":
        notes.append("Enable streaming for better perceived latency")
    
    if notes:
        print("\nüìù Additional Notes:")
        for note in notes:
            print(f"   ‚Ä¢ {note}")
    
    return recommended

# Example scenarios
print("\n" + "="*60)
print("SCENARIO 1: High-volume customer support chatbot")
print("="*60)
recommend_model("general", "high", "good", "realtime")

print("\n" + "="*60)
print("SCENARIO 2: Code review assistant for developers")
print("="*60)
recommend_model("code", "medium", "best", "fast")

print("\n" + "="*60)
print("SCENARIO 3: Data extraction from documents")
print("="*60)
recommend_model("extraction", "high", "good", "flexible")


SCENARIO 1: High-volume customer support chatbot

üéØ Model Recommendation
Task Type: general
Budget Sensitivity: high
Quality Requirement: good
Latency Requirement: realtime

‚ú® Recommended Model: gpt-4o-mini

üìù Additional Notes:
   ‚Ä¢ Consider batching requests to reduce costs
   ‚Ä¢ Enable streaming for better perceived latency

SCENARIO 2: Code review assistant for developers

üéØ Model Recommendation
Task Type: code
Budget Sensitivity: medium
Quality Requirement: best
Latency Requirement: fast

‚ú® Recommended Model: gpt-4o-mini

üìù Additional Notes:
   ‚Ä¢ Consider adding code review/testing step

SCENARIO 3: Data extraction from documents

üéØ Model Recommendation
Task Type: extraction
Budget Sensitivity: high
Quality Requirement: good
Latency Requirement: flexible

‚ú® Recommended Model: gpt-4o-mini

üìù Additional Notes:
   ‚Ä¢ Consider batching requests to reduce costs


'gpt-4o-mini'

## 7. Quick Reference: Model Selection Guide

| Use Case | Primary Choice | Alternative | Why |
|----------|---------------|-------------|-----|
| **Simple Q&A** | gpt-4o-mini | - | Cost-effective, fast |
| **Customer Support** | gpt-4o-mini | claude-3-haiku | High volume, cost matters |
| **Code Generation** | gpt-4o | claude-3-5-sonnet | Quality critical |
| **Complex Reasoning** | gpt-4o / o1 | claude-3-opus | Accuracy paramount |
| **Creative Writing** | gpt-4o | claude-3-5-sonnet | Quality and style |
| **Data Extraction** | gpt-4o-mini | - | Structured output, fast |
| **Long Documents** | claude-3-5-sonnet | gpt-4o | 200K context |
| **Image Analysis** | gpt-4o | claude-3-5-sonnet | Multimodal |

### Key Decision Factors

1. **Quality vs Cost**: Higher-tier models cost 10-40x more
2. **Latency**: Mini/Haiku models are 2-3x faster
3. **Context Length**: Claude has 200K, GPT-4 has 128K
4. **Specialization**: o1 for math/science, Claude for analysis

In [None]:
def create_model_router(task_descriptions):
    """Create a simple model router based on task."""
    
    routing_rules = {
        "simple": "gpt-4o-mini",
        "complex": "gpt-4o",
        "creative": "gpt-4o",
        "code": "gpt-4o",
        "math": "gpt-4o",  # or o1 for complex math
    }
    
    def classify_task(description):
        """Simple task classification."""
        desc_lower = description.lower()
        
        if any(word in desc_lower for word in ["code", "function", "program", "debug"]):
            return "code"
        elif any(word in desc_lower for word in ["calculate", "math", "solve", "equation"]):
            return "math"
        elif any(word in desc_lower for word in ["write", "story", "creative", "poem"]):
            return "creative"
        elif any(word in desc_lower for word in ["analyze", "explain", "compare", "reason"]):
            return "complex"
        else:
            return "simple"
    
    print("\nüîÄ Model Router Demo")
    print("="*60)
    
    for desc in task_descriptions:
        task_type = classify_task(desc)
        model = routing_rules[task_type]
        print(f"\nüìù Task: \"{desc[:50]}...\"")
        print(f"   Type: {task_type} ‚Üí Model: {model}")

# Test the router
test_tasks = [
    "What is the capital of France?",
    "Write a Python function to sort a list",
    "Calculate the derivative of x^3 + 2x^2",
    "Write a creative story about a robot",
    "Analyze the pros and cons of microservices architecture",
]

create_model_router(test_tasks)


üîÄ Model Router Demo

üìù Task: "What is the capital of France?..."
   Type: simple ‚Üí Model: gpt-4o-mini

üìù Task: "Write a Python function to sort a list..."
   Type: code ‚Üí Model: gpt-4o

üìù Task: "Calculate the derivative of x^3 + 2x^2..."
   Type: math ‚Üí Model: gpt-4o

üìù Task: "Write a creative story about a robot..."
   Type: creative ‚Üí Model: gpt-4o

üìù Task: "Analyze the pros and cons of microservices archite..."
   Type: complex ‚Üí Model: gpt-4o


: 

## 8. Local vs Cloud Models (Ollama Integration)

If you completed **mini-ollama-setup** in Module 1, you can also compare local open-source models!

### Open Source vs Proprietary Trade-offs

| Aspect | Local (Ollama) | Cloud (OpenAI/Anthropic) |
|--------|----------------|--------------------------|
| **Privacy** | ‚úÖ Data stays local | ‚ùå Data sent to API |
| **Cost** | ‚úÖ Free after setup | ‚ùå Per-token pricing |
| **Quality** | ‚ö†Ô∏è Varies by model | ‚úÖ State-of-the-art |
| **Setup** | ‚ö†Ô∏è Requires installation | ‚úÖ Just API key |
| **Offline** | ‚úÖ Works offline | ‚ùå Requires internet |
| **Customization** | ‚úÖ Can fine-tune | ‚ùå Limited options |

### Popular Open Source Models

| Model | Size | Comparable To | Best For |
|-------|------|---------------|----------|
| `llama3.2:8b` | 4.7GB | GPT-3.5 | General tasks |
| `mistral:7b` | 4.1GB | GPT-3.5 | Fast inference |
| `qwen2.5-coder:7b` | 4.4GB | GPT-4o-mini (code) | Code generation |
| `phi3:mini` | 2.3GB | - | Edge/mobile |

### Using Local Models in This Lab

```python
# If you have Ollama running (from Module 1):
from openai import OpenAI

# Cloud client (default)
cloud_client = OpenAI()

# Local client (Ollama)
local_client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)

# Same code works for both!
response = local_client.chat.completions.create(
    model="llama3.2:8b",
    messages=[{"role": "user", "content": "Hello!"}]
)
```

> **üí° Tip**: For development and learning, use local models to save API costs. Switch to cloud models for production or when you need the best quality.

## üéØ Summary

### Key Takeaways

1. **Model Landscape**
   - Multiple providers (OpenAI, Anthropic, Google)
   - Significant cost differences (10-40x)
   - Different strengths per model

2. **Selection Criteria**
   - Task complexity
   - Budget constraints
   - Latency requirements
   - Context length needs

3. **Cost Optimization**
   - Start with gpt-4o-mini for most tasks
   - Upgrade to gpt-4o only when needed
   - Consider routing based on task type

4. **Best Practices**
   - Test with your actual data
   - Monitor quality and costs
   - Build model routing for production

### Next Steps

- **lab-llm-playground**: Build interactive playground combining all concepts