# Volume 1, Chapter 2: Introduction to LLMs

**Understanding Tokens - The Currency of AI**

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/eduardd76/AI_for_networking_and_security_engineers/blob/main/Volume-1-Foundations/Colab-Notebooks/Vol1_Ch2_Tokenizer.ipynb)

---

**What you'll learn:**
- üî¢ What tokens are and why they matter
- üí∞ How to calculate API costs
- üìä Token counting for network configs
- ‚ö° Optimize prompts to reduce costs

**Time:** ~10 minutes | **Cost:** ~$0.01

## üîß Setup

In [None]:
!pip install -q anthropic tiktoken

import os
from getpass import getpass

try:
    from google.colab import userdata
    os.environ['ANTHROPIC_API_KEY'] = userdata.get('ANTHROPIC_API_KEY')
    print("‚úì API key loaded from Colab Secrets")
except:
    if 'ANTHROPIC_API_KEY' not in os.environ:
        os.environ['ANTHROPIC_API_KEY'] = getpass('Enter Anthropic API key: ')
    print("‚úì API key set")

from anthropic import Anthropic
client = Anthropic()
print("‚úì Ready!")

---
## üî¢ Example 1: What Are Tokens?

Tokens are pieces of text - roughly 4 characters or 0.75 words on average.

In [None]:
import tiktoken

# Use GPT tokenizer (similar to Claude's)
encoding = tiktoken.get_encoding("cl100k_base")

def show_tokens(text):
    """Visualize how text is tokenized."""
    tokens = encoding.encode(text)
    print(f"Text: {text}")
    print(f"Token count: {len(tokens)}")
    print(f"Tokens: {tokens}")
    print("Decoded tokens:")
    for i, token in enumerate(tokens):
        decoded = encoding.decode([token])
        print(f"  [{i}] {token} ‚Üí '{decoded}'")
    print()

# Simple examples
print("=" * 50)
print("TOKENIZATION EXAMPLES")
print("=" * 50 + "\n")

show_tokens("BGP")
show_tokens("GigabitEthernet0/0")
show_tokens("192.168.1.1")

### üí° Key Insight

Notice how:
- Common words = fewer tokens
- Technical terms = more tokens
- IP addresses get split at dots

---
## üìä Example 2: Count Tokens in Network Configs

In [None]:
def count_tokens(text):
    """Count tokens in text."""
    return len(encoding.encode(text))

# Sample configs of different sizes
small_config = """
interface GigabitEthernet0/0
 ip address 192.168.1.1 255.255.255.0
 no shutdown
"""

medium_config = """
hostname CORE-RTR-01
!
interface GigabitEthernet0/0
 description WAN_UPLINK
 ip address 203.0.113.1 255.255.255.252
 ip ospf cost 10
!
interface GigabitEthernet0/1
 description LAN_SEGMENT
 ip address 192.168.1.1 255.255.255.0
!
router ospf 1
 router-id 1.1.1.1
 network 192.168.1.0 0.0.0.255 area 0
 network 203.0.113.0 0.0.0.3 area 0
!
router bgp 65001
 neighbor 203.0.113.2 remote-as 65002
 network 192.168.0.0 mask 255.255.0.0
"""

# Large config (simulated)
large_config = medium_config * 20

print("üìä TOKEN COUNT COMPARISON")
print("=" * 50)
print(f"Small config:  {count_tokens(small_config):,} tokens ({len(small_config):,} chars)")
print(f"Medium config: {count_tokens(medium_config):,} tokens ({len(medium_config):,} chars)")
print(f"Large config:  {count_tokens(large_config):,} tokens ({len(large_config):,} chars)")
print()
print(f"Ratio (tokens/chars): ~{count_tokens(medium_config)/len(medium_config):.2f}")

---
## üí∞ Example 3: Calculate API Costs

In [None]:
# Current pricing (as of 2024)
PRICING = {
    "claude-3-5-sonnet": {"input": 3.00, "output": 15.00},  # per 1M tokens
    "claude-3-5-haiku": {"input": 0.25, "output": 1.25},
    "claude-3-opus": {"input": 15.00, "output": 75.00},
    "gpt-4o": {"input": 2.50, "output": 10.00},
    "gpt-4o-mini": {"input": 0.15, "output": 0.60},
}

def calculate_cost(input_tokens, output_tokens, model="claude-3-5-sonnet"):
    """Calculate API cost."""
    pricing = PRICING[model]
    input_cost = (input_tokens / 1_000_000) * pricing["input"]
    output_cost = (output_tokens / 1_000_000) * pricing["output"]
    return input_cost + output_cost

# Scenario: Analyze 100 router configs
configs_count = 100
tokens_per_config = count_tokens(medium_config)
output_tokens_estimate = 500  # ~500 tokens for analysis output

total_input = tokens_per_config * configs_count
total_output = output_tokens_estimate * configs_count

print("üí∞ COST CALCULATOR")
print("=" * 50)
print(f"Scenario: Analyze {configs_count} router configs")
print(f"Input tokens: {total_input:,}")
print(f"Output tokens: {total_output:,}")
print()
print("Cost by model:")
for model in PRICING:
    cost = calculate_cost(total_input, total_output, model)
    print(f"  {model:20} ${cost:.4f}")

---
## ‚ö° Example 4: Optimize Prompts to Save Money

In [None]:
# Verbose prompt (wasteful)
verbose_prompt = """
Hello! I would really appreciate it if you could please help me out by
analyzing the following network configuration. I need you to look for
any security issues or problems that might be present. Please be very
thorough in your analysis and make sure to check everything carefully.
Thank you so much for your help with this task!

Here is the configuration that I need you to analyze:
"""

# Efficient prompt (same result, fewer tokens)
efficient_prompt = """Analyze for security issues:
"""

print("‚ö° PROMPT OPTIMIZATION")
print("=" * 50)
print(f"Verbose prompt:   {count_tokens(verbose_prompt):,} tokens")
print(f"Efficient prompt: {count_tokens(efficient_prompt):,} tokens")
print(f"Savings:          {count_tokens(verbose_prompt) - count_tokens(efficient_prompt):,} tokens ({(1 - count_tokens(efficient_prompt)/count_tokens(verbose_prompt))*100:.0f}%)")
print()
print("At 1,000 API calls/month:")
verbose_cost = calculate_cost(count_tokens(verbose_prompt) * 1000, 500 * 1000)
efficient_cost = calculate_cost(count_tokens(efficient_prompt) * 1000, 500 * 1000)
print(f"  Verbose:   ${verbose_cost:.2f}")
print(f"  Efficient: ${efficient_cost:.2f}")
print(f"  Monthly savings: ${verbose_cost - efficient_cost:.2f}")

---
## üî¨ Example 5: Use Claude's Token Counter (Exact)

In [None]:
# Claude's official token counter
test_text = medium_config

# Count using Claude's API
token_count = client.messages.count_tokens(
    model="claude-3-5-sonnet-20241022",
    messages=[{"role": "user", "content": test_text}]
)

print("üî¨ EXACT TOKEN COUNT (Claude API)")
print("=" * 50)
print(f"Input tokens: {token_count.input_tokens}")
print(f"tiktoken estimate: {count_tokens(test_text)}")
print(f"Difference: {abs(token_count.input_tokens - count_tokens(test_text))} tokens")

---
## üìè Example 6: Context Window Limits

In [None]:
# Context window sizes
CONTEXT_LIMITS = {
    "Claude 3.5 Sonnet": 200_000,
    "Claude 3.5 Haiku": 200_000,
    "GPT-4o": 128_000,
    "GPT-4o-mini": 128_000,
    "Gemini 1.5 Pro": 2_000_000,
}

def will_fit(text, model="Claude 3.5 Sonnet", output_buffer=2000):
    """Check if text fits in model's context window."""
    tokens = count_tokens(text)
    limit = CONTEXT_LIMITS[model]
    available = limit - output_buffer
    fits = tokens <= available
    return {
        "fits": fits,
        "tokens": tokens,
        "limit": limit,
        "available": available,
        "utilization": (tokens / available) * 100
    }

# Test with different config sizes
print("üìè CONTEXT WINDOW CHECK")
print("=" * 50)

# Simulate a huge config (10,000 lines)
huge_config = medium_config * 500

for model in CONTEXT_LIMITS:
    result = will_fit(huge_config, model)
    status = "‚úÖ FITS" if result["fits"] else "‚ùå TOO BIG"
    print(f"{model:20} {status} ({result['utilization']:.1f}% of limit)")

---
## üéØ Key Takeaways

| Concept | What It Means |
|---------|---------------|
| **Token** | ~4 chars or 0.75 words |
| **Input tokens** | What you send (prompt + context) |
| **Output tokens** | What AI returns (costs more!) |
| **Context window** | Max total tokens (input + output) |

**Cost optimization tips:**
1. Remove unnecessary words from prompts
2. Use Haiku for simple tasks (12x cheaper than Sonnet)
3. Limit output tokens when possible
4. Cache repeated prompts

---

## üìö Next Steps

‚û°Ô∏è [Chapter 3: Choosing the Right Model](./Vol1_Ch3_Model_Selection.ipynb)