# Volume 1, Chapter 3: Choosing the Right Model

**Compare Claude, GPT, and Gemini for Network Tasks**

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/eduardd76/AI_for_networking_and_security_engineers/blob/master/Volume-1-Foundations/Colab-Notebooks/Vol1_Ch3_Model_Selection.ipynb)

---

**What you'll learn:**
- üèéÔ∏è Compare model speed and quality
- üíµ Understand cost/performance tradeoffs
- üéØ Match models to networking tasks
- üìä Benchmark on real network data

**Time:** ~15 minutes | **Cost:** ~$0.10

## üîß Setup

In [None]:
!pip install -q anthropic openai

import os
import time
from getpass import getpass

# Anthropic API key
try:
    from google.colab import userdata
    os.environ['ANTHROPIC_API_KEY'] = userdata.get('ANTHROPIC_API_KEY')
    print("‚úì Anthropic key loaded")
except:
    if 'ANTHROPIC_API_KEY' not in os.environ:
        os.environ['ANTHROPIC_API_KEY'] = getpass('Anthropic API key: ')

# OpenAI API key (optional)
try:
    os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')
    print("‚úì OpenAI key loaded")
except:
    print("‚ÑπÔ∏è OpenAI key not set (optional)")

from anthropic import Anthropic
anthropic_client = Anthropic()
print("‚úì Ready!")

---
## üìä Model Comparison Chart

| Model | Speed | Quality | Cost | Best For |
|-------|-------|---------|------|----------|
| **Claude 3.5 Haiku** | ‚ö°‚ö°‚ö° | ‚òÖ‚òÖ‚òÖ‚òÜ | $0.25/1M | Log parsing, simple tasks |
| **Claude 3.5 Sonnet** | ‚ö°‚ö° | ‚òÖ‚òÖ‚òÖ‚òÖ | $3/1M | Config analysis, troubleshooting |
| **Claude 3 Opus** | ‚ö° | ‚òÖ‚òÖ‚òÖ‚òÖ‚òÖ | $15/1M | Complex reasoning, design |
| **GPT-4o** | ‚ö°‚ö° | ‚òÖ‚òÖ‚òÖ‚òÖ | $2.5/1M | General tasks |
| **GPT-4o-mini** | ‚ö°‚ö°‚ö° | ‚òÖ‚òÖ‚òÖ‚òÜ | $0.15/1M | High volume, simple tasks |

---
## üèéÔ∏è Example 1: Speed Comparison (Haiku vs Sonnet)

In [None]:
def benchmark_model(model_name, prompt, runs=3):
    """Benchmark a model's speed."""
    times = []
    response_text = ""
    
    for _ in range(runs):
        start = time.time()
        response = anthropic_client.messages.create(
            model=model_name,
            max_tokens=300,
            temperature=0,
            messages=[{"role": "user", "content": prompt}]
        )
        elapsed = time.time() - start
        times.append(elapsed)
        response_text = response.content[0].text
    
    return {
        "model": model_name,
        "avg_time": sum(times) / len(times),
        "min_time": min(times),
        "response_preview": response_text[:150] + "..."
    }

# Simple task - log classification
simple_prompt = """Classify this log severity (INFO/WARNING/ERROR/CRITICAL):
%OSPF-5-ADJCHG: Process 1, Nbr 10.1.1.2 on Vlan100 from FULL to DOWN
Return only the classification."""

print("üèéÔ∏è SPEED BENCHMARK: Simple Task (Log Classification)")
print("=" * 60)

models = [
    "claude-haiku-4-5-20251001",
    "claude-sonnet-4-20250514"
]

for model in models:
    result = benchmark_model(model, simple_prompt)
    print(f"\n{model}:")
    print(f"  Avg time: {result['avg_time']:.2f}s")
    print(f"  Response: {result['response_preview']}")

---
## üß† Example 2: Quality Comparison (Complex Task)

In [None]:
# Complex task - troubleshooting
complex_prompt = """Two routers can't establish BGP. Diagnose the issue:

R1 config:
router bgp 65001
 neighbor 10.1.1.2 remote-as 65002
 neighbor 10.1.1.2 update-source Loopback0

interface Loopback0
 ip address 1.1.1.1 255.255.255.255

R2 config:
router bgp 65002
 neighbor 10.1.1.1 remote-as 65001

R2 show ip bgp summary:
Neighbor        State/PfxRcd
10.1.1.1        Idle

What's wrong? Provide the fix."""

print("üß† QUALITY COMPARISON: Complex Troubleshooting")
print("=" * 60)

for model in models:
    start = time.time()
    response = anthropic_client.messages.create(
        model=model,
        max_tokens=500,
        temperature=0,
        messages=[{"role": "user", "content": complex_prompt}]
    )
    elapsed = time.time() - start
    
    print(f"\n{'='*60}")
    print(f"MODEL: {model}")
    print(f"Time: {elapsed:.2f}s")
    print(f"{'='*60}")
    print(response.content[0].text)

---
## üíµ Example 3: Cost Calculator

In [None]:
PRICING = {
    "claude-haiku-4-5-20251001": {"input": 0.25, "output": 1.25, "name": "Haiku"},
    "claude-sonnet-4-20250514": {"input": 3.00, "output": 15.00, "name": "Sonnet"},
    "claude-opus-4-20250115": {"input": 15.00, "output": 75.00, "name": "Opus"},
    "gpt-4o": {"input": 2.50, "output": 10.00, "name": "GPT-4o"},
    "gpt-4o-mini": {"input": 0.15, "output": 0.60, "name": "GPT-4o-mini"},
}

def monthly_cost(calls_per_day, input_tokens, output_tokens, model):
    """Calculate monthly cost."""
    p = PRICING[model]
    daily_input_cost = (input_tokens * calls_per_day / 1_000_000) * p["input"]
    daily_output_cost = (output_tokens * calls_per_day / 1_000_000) * p["output"]
    return (daily_input_cost + daily_output_cost) * 30

# Scenario: NOC team analyzing 500 logs per day
print("üíµ MONTHLY COST COMPARISON")
print("=" * 60)
print("Scenario: Analyze 500 log entries per day")
print("         ~200 input tokens, ~100 output tokens each")
print("=" * 60 + "\n")

for model_key, model_info in PRICING.items():
    cost = monthly_cost(
        calls_per_day=500,
        input_tokens=200,
        output_tokens=100,
        model=model_key
    )
    print(f"{model_info['name']:15} ${cost:>8.2f}/month")

---
## üéØ Example 4: Model Selection Guide

In [None]:
def recommend_model(task_type, volume, budget):
    """Recommend best model based on requirements."""
    
    recommendations = {
        ("simple", "high", "low"): "claude-haiku-4-5-20251001",
        ("simple", "high", "medium"): "claude-haiku-4-5-20251001",
        ("simple", "low", "low"): "claude-haiku-4-5-20251001",
        ("medium", "high", "low"): "claude-haiku-4-5-20251001",
        ("medium", "high", "medium"): "claude-sonnet-4-20250514",
        ("medium", "low", "medium"): "claude-sonnet-4-20250514",
        ("complex", "low", "high"): "claude-opus-4-20250115",
        ("complex", "low", "medium"): "claude-sonnet-4-20250514",
        ("complex", "high", "high"): "claude-sonnet-4-20250514",
    }
    
    key = (task_type, volume, budget)
    return recommendations.get(key, "claude-sonnet-4-20250514")

# Task type mapping
TASK_EXAMPLES = {
    "simple": ["Log classification", "Data extraction", "Format conversion"],
    "medium": ["Config analysis", "Documentation", "Compliance checking"],
    "complex": ["Troubleshooting", "Design review", "Root cause analysis"]
}

print("üéØ MODEL SELECTION GUIDE")
print("=" * 60)

scenarios = [
    {"task": "Log classification", "type": "simple", "volume": "high", "budget": "low"},
    {"task": "Config security audit", "type": "medium", "volume": "low", "budget": "medium"},
    {"task": "BGP troubleshooting", "type": "complex", "volume": "low", "budget": "medium"},
    {"task": "Network design review", "type": "complex", "volume": "low", "budget": "high"},
]

for s in scenarios:
    model = recommend_model(s["type"], s["volume"], s["budget"])
    print(f"\nüìå {s['task']}")
    print(f"   Complexity: {s['type']} | Volume: {s['volume']} | Budget: {s['budget']}")
    print(f"   ‚Üí Recommended: {PRICING[model]['name']}")

---
## üî¨ Example 5: Real-World Benchmark

In [None]:
# Benchmark on network-specific tasks
tasks = [
    {
        "name": "Log Classification",
        "prompt": "Classify: %LINK-3-UPDOWN: Interface Gi0/1, changed state to down. Return: INFO/WARNING/ERROR/CRITICAL",
        "expected": "ERROR"
    },
    {
        "name": "Config Extraction", 
        "prompt": "Extract IP address from: interface Gi0/0\n ip address 10.1.1.1 255.255.255.0. Return only the IP.",
        "expected": "10.1.1.1"
    },
    {
        "name": "Subnet Calculation",
        "prompt": "How many usable hosts in a /26 network? Return only the number.",
        "expected": "62"
    }
]

print("üî¨ ACCURACY BENCHMARK")
print("=" * 60)

for model in models:
    print(f"\n{model}:")
    correct = 0
    
    for task in tasks:
        response = anthropic_client.messages.create(
            model=model,
            max_tokens=50,
            temperature=0,
            messages=[{"role": "user", "content": task["prompt"]}]
        )
        answer = response.content[0].text.strip()
        is_correct = task["expected"].lower() in answer.lower()
        correct += is_correct
        status = "‚úÖ" if is_correct else "‚ùå"
        print(f"  {status} {task['name']}: {answer[:30]}")
    
    print(f"  Score: {correct}/{len(tasks)}")

---
## üéØ Key Takeaways

| Use Case | Recommended Model | Why |
|----------|-------------------|-----|
| High-volume log parsing | **Haiku** | Fast, cheap, accurate enough |
| Config analysis | **Sonnet** | Good balance of quality/cost |
| Complex troubleshooting | **Sonnet** or **Opus** | Needs reasoning ability |
| Network design | **Opus** | Highest quality matters |

**Decision flow:**
1. Start with Haiku
2. If quality insufficient ‚Üí upgrade to Sonnet
3. For critical/complex tasks ‚Üí consider Opus

---

## üìö Next Steps

‚û°Ô∏è [Chapter 4: API Basics](./Vol1_Ch4_API_Basics.ipynb)