# Volume 1, Chapter 2: Introduction to LLMs

**Understanding Tokens, Context Windows, and Costs -- The Economics of AI for Networking**

From: AI for Networking and Security Engineers - Volume 1, Chapter 2

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/eduardd76/AI_for_networking_and_security_engineers/blob/master/Volume-1-Foundations/Colab-Notebooks/Vol1_Ch2_Tokenizer.ipynb)

---

**What you'll learn:**
- Visualize how network configs get tokenized (with surprises)
- Calculate exact API costs for any config or log file
- Check context window fit across models (the "MTU" of AI)
- Compare model costs and pick the right one for each task
- The cascade pattern: cheap model first, escalate when needed
- Optimize prompts to cut costs without losing quality
- Project monthly costs for your network at scale

**Time:** ~15 minutes | **Cost:** ~$0.05 in API calls

## Setup

In [None]:
!pip install -q anthropic tiktoken

import os
import json
import time
from getpass import getpass

# API key setup
try:
    from google.colab import userdata
    os.environ['ANTHROPIC_API_KEY'] = userdata.get('ANTHROPIC_API_KEY')
    print('API key loaded from Colab Secrets')
except Exception:
    if 'ANTHROPIC_API_KEY' not in os.environ:
        os.environ['ANTHROPIC_API_KEY'] = getpass('Enter Anthropic API key: ')
    print('API key set manually')

from anthropic import Anthropic
import tiktoken

client = Anthropic()
MODEL = "claude-sonnet-4-20250514"
encoding = tiktoken.get_encoding("cl100k_base")  # Good approximation for all models

def count_tokens(text):
    """Estimate token count using tiktoken (close to Claude's actual count)."""
    return len(encoding.encode(text))

print(f"Model: {MODEL}")
print("Ready!")

## Demo 1: How Network Text Gets Tokenized

Tokens are the "packets" of AI. Text gets split into chunks (tokens)
before the model processes it. Common words = 1 token. Technical
terms and IP addresses = many tokens.

**Networking analogy**: Just like data gets fragmented into packets for
transmission, text gets fragmented into tokens for processing.

In [None]:
# -------------------------------------------------------------------
# Visualize tokenization of networking text
# -------------------------------------------------------------------

def show_tokens(text):
    """Visualize how text gets split into tokens."""
    tokens = encoding.encode(text)
    print(f"Text:   {text}")
    print(f"Tokens: {len(tokens)}")
    decoded = [encoding.decode([t]) for t in tokens]
    print(f"Split:  {decoded}")
    print()

print("HOW NETWORKING TEXT TOKENIZES")
print("=" * 60)

# Simple words - efficient
print("--- Common words (efficient) ---")
show_tokens("Hello world")
show_tokens("BGP")
show_tokens("router ospf 1")

# Technical terms - split into pieces
print("--- Technical terms (less efficient) ---")
show_tokens("GigabitEthernet0/0")
show_tokens("TenGigabitEthernet1/0/1")

# IP addresses - surprisingly expensive
print("--- IP addresses (token-expensive!) ---")
show_tokens("192.168.1.1")
show_tokens("255.255.255.0")
show_tokens("2001:db8::1")

# Full commands
print("--- Full commands ---")
show_tokens("interface GigabitEthernet0/0")
show_tokens("ip address 192.168.1.1 255.255.255.0")
show_tokens("neighbor 203.0.113.2 remote-as 65002")

print("KEY INSIGHT: A single 'ip address' line costs ~14 tokens.")
print("A router with 100 interfaces = ~1,400 tokens just for IP lines.")

## Demo 2: Token Counts for Real Configs

Let's see how different config sizes translate to tokens and cost.
This is equivalent to checking the packet size before choosing your MTU.

In [None]:
# -------------------------------------------------------------------
# Compare token counts across different config sizes
# -------------------------------------------------------------------

# Small: access switch
small_config = """
hostname sw-access-01
!
vlan 10
 name USERS
vlan 20
 name VOIP
!
interface GigabitEthernet1/0/1
 switchport access vlan 10
 switchport voice vlan 20
 spanning-tree portfast
!
interface GigabitEthernet1/0/48
 switchport mode trunk
 switchport trunk allowed vlan 10,20
"""

# Medium: branch router with OSPF + BGP
medium_config = """
hostname branch-rtr-01
!
interface GigabitEthernet0/0
 description WAN_UPLINK_TO_HQ
 ip address 203.0.113.1 255.255.255.252
 ip ospf cost 10
 no shutdown
!
interface GigabitEthernet0/1
 description LAN_USERS
 ip address 192.168.1.1 255.255.255.0
 no shutdown
!
interface GigabitEthernet0/2
 description LAN_SERVERS
 ip address 172.16.10.1 255.255.255.0
 no shutdown
!
router ospf 1
 router-id 1.1.1.1
 network 192.168.1.0 0.0.0.255 area 0
 network 172.16.10.0 0.0.0.255 area 0
 network 203.0.113.0 0.0.0.3 area 0
 passive-interface GigabitEthernet0/1
 passive-interface GigabitEthernet0/2
!
router bgp 65001
 bgp router-id 1.1.1.1
 bgp log-neighbor-changes
 neighbor 203.0.113.2 remote-as 65002
 neighbor 203.0.113.2 description HQ_PEER
 network 192.168.0.0 mask 255.255.0.0
!
ip access-list extended MGMT_ACCESS
 permit tcp 10.0.0.0 0.0.255.255 any eq 22
 deny ip any any log
!
line vty 0 4
 access-class MGMT_ACCESS in
 transport input ssh
"""

# Large: simulated core router (~2000 lines)
large_config = medium_config * 30  # Approx 2000 lines

configs = {
    "Small (access switch)": small_config,
    "Medium (branch router)": medium_config,
    "Large (~2000 lines)": large_config,
}

print("TOKEN COUNT COMPARISON")
print("=" * 70)
print(f"{'Config Type':<25} {'Lines':<8} {'Chars':<10} {'Tokens':<10} {'Chars/Token'}")
print("-" * 70)

for name, config in configs.items():
    lines = len(config.strip().split('\n'))
    chars = len(config)
    tokens = count_tokens(config)
    ratio = chars / tokens if tokens > 0 else 0
    print(f"{name:<25} {lines:<8} {chars:<10,} {tokens:<10,} {ratio:.1f}")

print("\nNetwork configs average ~3.5 chars/token (vs ~4 for English prose).")
print("That's because configs have many special characters (/, ., :) that")
print("each become their own token.")

## Demo 3: Cost Calculator -- Know Before You Spend

Before running any batch analysis, you should know the cost.
This is like checking your bandwidth before starting a large transfer.

**Key insight**: Output tokens cost 3-5x more than input tokens.
When AI writes a detailed analysis, most of the cost is in the response.

In [None]:
# -------------------------------------------------------------------
# Cost calculator across all major models
# -------------------------------------------------------------------

# Current pricing (2025-2026)
PRICING = {
    "Claude Haiku 4.5":  {"input": 0.80,  "output": 4.00,  "context": 200_000},
    "Claude Sonnet 4.5": {"input": 3.00,  "output": 15.00, "context": 200_000},
    "Claude Opus 4":     {"input": 15.00, "output": 75.00, "context": 200_000},
    "GPT-4o-mini":       {"input": 0.15,  "output": 0.60,  "context": 128_000},
    "GPT-4o":            {"input": 2.50,  "output": 10.00, "context": 128_000},
    "Gemini 1.5 Pro":    {"input": 1.25,  "output": 5.00,  "context": 2_000_000},
}

def calculate_cost(input_tokens, output_tokens, model):
    """Calculate cost for a single API call."""
    p = PRICING[model]
    return (input_tokens * p["input"] + output_tokens * p["output"]) / 1_000_000


# Scenario: Analyze 100 branch router configs
num_configs = 100
input_per_config = count_tokens(medium_config)
output_estimate = 800  # Typical analysis response

print("COST CALCULATOR: Analyze 100 Branch Router Configs")
print("=" * 65)
print(f"Input per config:  {input_per_config:,} tokens")
print(f"Output estimate:   {output_estimate:,} tokens")
print(f"Total configs:     {num_configs}")
print(f"Total input:       {input_per_config * num_configs:,} tokens")
print(f"Total output:      {output_estimate * num_configs:,} tokens")
print()
print(f"{'Model':<22} {'Per Config':<12} {'100 Configs':<12} {'1000/month'}")
print("-" * 65)

for model in PRICING:
    per_config = calculate_cost(input_per_config, output_estimate, model)
    batch = per_config * num_configs
    monthly = per_config * 1000
    print(f"{model:<22} ${per_config:<11.4f} ${batch:<11.2f} ${monthly:.2f}")

print("\nNotice: Haiku 4.5 is ~4x cheaper than Sonnet 4.5, and GPT-4o-mini")
print("is the cheapest. But cheaper models may miss subtle issues.")
print("See Demo 6 for the cascade pattern that gives you best of both.")

## Demo 4: Context Window Check -- The "MTU" of AI

Context window = the max tokens a model can process in one request.
Exceed it and your request fails -- just like exceeding MTU causes
fragmentation or drops.

Let's check: will your config fit?

In [None]:
# -------------------------------------------------------------------
# Context window fit check -- will your config fit?
# -------------------------------------------------------------------

# Simulate configs of different sizes
config_sizes = {
    "Small switch (200 lines)":     medium_config * 3,
    "Branch router (500 lines)":    medium_config * 8,
    "Core router (2,000 lines)":    medium_config * 30,
    "Large core (5,000 lines)":     medium_config * 75,
    "Massive ASR (20,000 lines)":   medium_config * 300,
    "Full BGP table dump (80,000)": medium_config * 1200,
}

OUTPUT_BUFFER = 4000  # Reserve space for the model's response

print("CONTEXT WINDOW FIT CHECK")
print("=" * 90)
print(f"{'Config':<35} {'Tokens':<10} ", end="")
for model in PRICING:
    print(f"{model[:10]:<12}", end="")
print()
print("-" * 90)

for name, config in config_sizes.items():
    tokens = count_tokens(config)
    print(f"{name:<35} {tokens:<10,}", end="")
    for model, info in PRICING.items():
        fits = tokens + OUTPUT_BUFFER < info["context"]
        pct = (tokens + OUTPUT_BUFFER) / info["context"] * 100
        if fits:
            print(f"{'OK '+str(int(pct))+'%':<12}", end="")
        else:
            print(f"{'NO '+str(int(pct))+'%':<12}", end="")
    print()

print("\nLike choosing between standard MTU (1500) and jumbo frames (9000):")
print("  GPT-4o / Claude = standard MTU (128K-200K tokens)")
print("  Gemini 1.5 Pro  = jumbo frames (2M tokens)")
print("\nWhen a config doesn't fit, you need chunking (fragmentation) or")
print("a bigger model (jumbo frames). See Chapter 7 for chunking strategies.")

## Demo 5: Exact Token Count via Claude API

The `tiktoken` library gives estimates. Claude's API gives exact counts.
Let's compare -- and see the actual cost from a real API call.

In [None]:
# -------------------------------------------------------------------
# Compare tiktoken estimate vs Claude's exact count
# -------------------------------------------------------------------

test_configs = {
    "Interface block": """interface GigabitEthernet0/0
 description WAN_UPLINK
 ip address 203.0.113.1 255.255.255.252
 ip ospf cost 10
 no shutdown""",
    "BGP config": """router bgp 65001
 bgp router-id 10.0.0.1
 bgp log-neighbor-changes
 neighbor 203.0.113.2 remote-as 65002
 neighbor 203.0.113.2 route-map ISP-IN in
 neighbor 203.0.113.2 route-map ISP-OUT out
 network 198.51.100.0 mask 255.255.255.0""",
    "Full medium config": medium_config,
}

print("TIKTOKEN ESTIMATE vs CLAUDE EXACT COUNT")
print("=" * 65)
print(f"{'Config':<25} {'tiktoken':<12} {'Claude API':<12} {'Diff':<8} {'Error%'}")
print("-" * 65)

for name, config in test_configs.items():
    estimate = count_tokens(config)
    # Claude's official count
    exact = client.messages.count_tokens(
        model="claude-sonnet-4-20250514",
        messages=[{"role": "user", "content": config}]
    )
    diff = abs(exact.input_tokens - estimate)
    error_pct = (diff / exact.input_tokens * 100) if exact.input_tokens > 0 else 0
    print(f"{name:<25} {estimate:<12} {exact.input_tokens:<12} {diff:<8} {error_pct:.1f}%")

print("\nRule of thumb: tiktoken is within ~5-10% of Claude's actual count.")
print("Good enough for budgeting. Use the API count for exact billing.")

## Demo 6: The Cascade Pattern -- Smart Model Selection

The 80/20 rule: 80% of tasks can use cheap models, 20% need expensive ones.
The **cascade pattern** tries a cheap model first and escalates only when needed.

**Networking analogy**: Like QoS classification -- not all traffic needs the
priority queue. Route simple queries through best-effort (Haiku),
escalate complex ones to the priority queue (Sonnet/Opus).

In [None]:
# -------------------------------------------------------------------
# Cascade pattern: cheap model first, escalate if needed
# -------------------------------------------------------------------

test_config = """
hostname core-rtr-01
!
snmp-server community public RO
snmp-server community private RW
!
line vty 0 4
 transport input telnet ssh
 password cisco123
 login
line vty 5 15
 no login
!
interface GigabitEthernet0/0
 ip address 10.0.1.1 255.255.255.0
!
router ospf 1
 network 10.0.0.0 0.0.255.255 area 0
 network 172.16.0.0 0.0.255.255 area 1
"""

prompt = f"""Analyze this config for security issues. For each issue provide:
- severity (critical/high/medium/low)
- one-line description
- fix command

Config:
{test_config}"""

# Step 1: Try with Haiku (cheap)
print("STEP 1: Quick scan with Haiku (cheap model)")
print("=" * 60)
start = time.time()
haiku_resp = client.messages.create(
    model="claude-haiku-4-5-20251001",
    max_tokens=1024,
    temperature=0,
    messages=[{"role": "user", "content": prompt}]
)
haiku_time = time.time() - start
haiku_cost = calculate_cost(
    haiku_resp.usage.input_tokens,
    haiku_resp.usage.output_tokens,
    "Claude Haiku 4.5"
)
print(haiku_resp.content[0].text)
print(f"\n[Haiku: {haiku_time:.1f}s, {haiku_resp.usage.input_tokens}+{haiku_resp.usage.output_tokens} tokens, ${haiku_cost:.4f}]")

# Step 2: Escalate to Sonnet (better reasoning)
print(f"\n\nSTEP 2: Deep analysis with Sonnet (escalated)")
print("=" * 60)
start = time.time()
sonnet_resp = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1500,
    temperature=0,
    messages=[{"role": "user", "content": prompt}]
)
sonnet_time = time.time() - start
sonnet_cost = calculate_cost(
    sonnet_resp.usage.input_tokens,
    sonnet_resp.usage.output_tokens,
    "Claude Sonnet 4.5"
)
print(sonnet_resp.content[0].text)
print(f"\n[Sonnet: {sonnet_time:.1f}s, {sonnet_resp.usage.input_tokens}+{sonnet_resp.usage.output_tokens} tokens, ${sonnet_cost:.4f}]")

# Comparison
print(f"\n\nCOMPARISON")
print("=" * 60)
print(f"{'Metric':<20} {'Haiku':<20} {'Sonnet'}")
print("-" * 60)
print(f"{'Time':<20} {haiku_time:.1f}s{'':<16} {sonnet_time:.1f}s")
print(f"{'Cost':<20} ${haiku_cost:.4f}{'':<14} ${sonnet_cost:.4f}")
print(f"{'Cost ratio':<20} {'1x':<20} {sonnet_cost/haiku_cost:.1f}x")
print(f"\nCascade strategy: Use Haiku for the 80% of configs that are")
print(f"straightforward. Escalate to Sonnet only for complex configs")
print(f"or when Haiku flags critical issues that need deeper analysis.")

## Demo 7: Prompt Optimization -- Same Result, Lower Cost

Your prompt wording directly affects token count and cost.
Like optimizing packet headers -- reduce overhead without losing payload.

In [None]:
# -------------------------------------------------------------------
# Compare verbose vs efficient prompts
# -------------------------------------------------------------------

verbose_prompt = """Hello! I would really appreciate it if you could please help me out by
analyzing the following network configuration. I need you to look for
any security issues or problems that might be present in the configuration.
Please be very thorough in your analysis and make sure to check everything
carefully. Thank you so much for your help with this important task!

Here is the configuration that I need you to analyze:
"""

efficient_prompt = """Analyze for security issues. List severity, description, fix:
"""

# Even more optimized: use a system prompt for reusable instructions
system_approach_system = "You are a network security auditor. For each issue report: severity, description, fix command."
system_approach_user = "Analyze this config:\n"

prompts = {
    "Verbose (wasteful)": verbose_prompt,
    "Efficient (concise)": efficient_prompt,
    "System prompt approach": system_approach_user,
}

print("PROMPT OPTIMIZATION")
print("=" * 65)
print(f"{'Approach':<25} {'Tokens':<10} {'At 1000 calls/mo*':<18} {'Savings'}")
print("-" * 65)

verbose_tokens = count_tokens(verbose_prompt)
for name, prompt in prompts.items():
    tokens = count_tokens(prompt)
    # Cost for prompt tokens only (Sonnet input rate)
    monthly_cost = tokens * 1000 * 3.00 / 1_000_000
    savings = (1 - tokens / verbose_tokens) * 100 if tokens < verbose_tokens else 0
    print(f"{name:<25} {tokens:<10} ${monthly_cost:<17.2f} {savings:.0f}%")

print("\n* Prompt token cost only, at Sonnet rate ($3/1M input tokens)")
print("\nTips for efficient prompts:")
print("  1. Remove filler words ('please', 'thank you', 'I would like')")
print("  2. Move reusable instructions to the system prompt")
print("  3. Use structured format requests ('list:', 'table:')")
print("  4. Be specific about output format to avoid verbose responses")

## Demo 8: Full Network Cost Projection

The question your manager will ask: "How much will this cost per month?"

Let's build a projection for a real network.

In [None]:
# -------------------------------------------------------------------
# Monthly cost projection for a real network
# -------------------------------------------------------------------

# Define your network (adjust these numbers to match yours)
network = {
    "Core routers":       {"count": 4,   "avg_lines": 5000,  "analyses_per_month": 30},
    "Distribution":       {"count": 12,  "avg_lines": 2000,  "analyses_per_month": 12},
    "Access switches":    {"count": 200, "avg_lines": 300,   "analyses_per_month": 4},
    "Firewalls":          {"count": 8,   "avg_lines": 3000,  "analyses_per_month": 8},
    "WAN edge":           {"count": 6,   "avg_lines": 1500,  "analyses_per_month": 12},
}

# Estimate tokens: ~3.5 chars per token, ~50 chars per line
CHARS_PER_LINE = 50
CHARS_PER_TOKEN = 3.5
OUTPUT_TOKENS = 1000  # Average analysis response

print("MONTHLY COST PROJECTION")
print("=" * 85)
print(f"{'Device Type':<20} {'Devices':<9} {'Analyses':<10} {'Input Tokens':<14} ", end="")
print(f"{'Haiku 4.5':<12} {'Sonnet 4.5'}")
print("-" * 85)

total_haiku = 0
total_sonnet = 0

for device_type, info in network.items():
    monthly_analyses = info["count"] * info["analyses_per_month"]
    input_tokens = int(info["avg_lines"] * CHARS_PER_LINE / CHARS_PER_TOKEN)
    total_input = input_tokens * monthly_analyses
    total_output = OUTPUT_TOKENS * monthly_analyses
    
    haiku_cost = calculate_cost(total_input, total_output, "Claude Haiku 4.5")
    sonnet_cost = calculate_cost(total_input, total_output, "Claude Sonnet 4.5")
    
    total_haiku += haiku_cost
    total_sonnet += sonnet_cost
    
    print(f"{device_type:<20} {info['count']:<9} {monthly_analyses:<10} {total_input:<14,} ", end="")
    print(f"${haiku_cost:<11.2f} ${sonnet_cost:.2f}")

print("-" * 85)
print(f"{'TOTAL':<20} {'':<9} {'':<10} {'':<14} ${total_haiku:<11.2f} ${total_sonnet:.2f}")
print(f"{'ANNUAL':<20} {'':<9} {'':<10} {'':<14} ${total_haiku*12:<11.2f} ${total_sonnet*12:.2f}")

# Cascade savings
cascade = total_haiku * 0.8 + total_sonnet * 0.2
print(f"\nCascade strategy (80% Haiku / 20% Sonnet): ${cascade:.2f}/month (${cascade*12:.2f}/year)")
print(f"Savings vs all-Sonnet: {(1 - cascade/total_sonnet)*100:.0f}%")

## Try It Yourself!

Paste your own config below to see token count, cost, and context fit.
Remember to sanitize sensitive data first!

In [None]:
# -------------------------------------------------------------------
# YOUR TURN: Paste a config and get the full analysis
# -------------------------------------------------------------------

your_config = """
! Paste your config here (sanitize secrets first!)
hostname your-router
interface GigabitEthernet0/0
 ip address 10.0.0.1 255.255.255.0
 no shutdown
"""

tokens = count_tokens(your_config)
lines = len(your_config.strip().split('\n'))

print("YOUR CONFIG ANALYSIS")
print("=" * 60)
print(f"Lines:      {lines}")
print(f"Characters: {len(your_config):,}")
print(f"Tokens:     {tokens:,}")
print()

print("Cost per analysis:")
for model in PRICING:
    cost = calculate_cost(tokens, 1000, model)
    fits = "OK" if tokens + 4000 < PRICING[model]["context"] else "TOO BIG"
    print(f"  {model:<22} ${cost:.4f}  [{fits}]")

print(f"\nMonthly cost (daily analysis, Sonnet): ${calculate_cost(tokens, 1000, 'Claude Sonnet 4.5') * 30:.2f}")

## Summary

This notebook covered the economics of AI for networking:

1. **Tokenization** -- How network configs get split into tokens (IP addresses are expensive!)
2. **Token Counting** -- Estimate with tiktoken, verify with Claude's API
3. **Cost Calculation** -- Know exactly what any analysis will cost before running it
4. **Context Windows** -- The "MTU" of AI: check fit before sending
5. **Exact vs Estimated** -- tiktoken is within ~5-10% of Claude's actual count
6. **Cascade Pattern** -- Cheap model first (Haiku), escalate to Sonnet when needed
7. **Prompt Optimization** -- Remove filler words, use system prompts, be specific
8. **Cost Projection** -- Build the spreadsheet your manager needs

### Key Numbers to Remember

| Concept | Value |
|---------|-------|
| 1 token | ~3.5 chars of config (~4 chars of English) |
| Output tokens | Cost 3-5x more than input tokens |
| 1000-line config | ~15,000 tokens |
| Analyzing 1 config (Sonnet) | ~$0.06 |
| Analyzing 1 config (Haiku) | ~$0.02 |
| Cascade savings | ~60-70% vs all-Sonnet |

### Next Steps
- **Chapter 3**: Choose the right model for each networking task
- **Chapter 4**: API authentication, error handling, production patterns
- **Chapter 8**: Deep dive into cost optimization strategies

-> [Continue to Chapter 3 Notebook](./Vol1_Ch3_Model_Selection.ipynb)