# Lab: Function Calling and Structured Outputs

## Welcome to AI That Actually Does Things! üöÄ

**What you're about to build:** AI systems that don't just talk‚Äîthey take action.

### The Transformation

**Traditional AI Interaction:**
```
You: "What's the weather?"
AI: "I don't have access to real-time weather data..."
‚Üí Conversational but useless
```

**Function-Calling AI (What You'll Build):**
```
You: "What's the weather?"
AI: [Calls get_weather("Seattle")]
API: {"temp": 52, "condition": "Cloudy"}
AI: "It's currently 52¬∞F and cloudy in Seattle."
‚Üí Actionable and accurate!
```

---

## What Makes This Revolutionary?

1. **Real Actions**: AI can check databases, call APIs, control systems
2. **Structured Data**: No more parsing unreliable text‚Äîget JSON you can trust
3. **Cross-Platform**: Same code works with OpenAI, HuggingFace, Ollama, vLLM
4. **Production-Ready**: Built-in validation, error handling, retry logic

---

## Learning Objectives

By the end of this lab, you'll be able to:

- ‚úÖ **Implement** function calling with JSON Schema validation
- ‚úÖ **Create** structured output generators with constraints
- ‚úÖ **Build** cross-platform solutions (OpenAI, HuggingFace, Ollama)
- ‚úÖ **Evaluate** different AI providers systematically
- ‚úÖ **Handle** errors and edge cases gracefully
- ‚úÖ **Deploy** production-ready function calling systems

---

## Lab Structure

**Part 1:** Coffee Recipe Bot (Basic Function Calling)
- Simple function definitions
- Understanding JSON Schema
- First function call

**Part 2:** Multi-Provider Integration
- OpenAI implementation
- HuggingFace alternative
- Local Ollama setup

**Part 3:** Production Patterns
- Validation and error handling
- Provider comparison framework
- Retry logic and failover

**Part 4:** Final Project
- Complete game character generator
- Multi-provider support
- Robust validation

---

## Prerequisites

**You'll need:**
- Python 3.10+ installed
- A code editor or Jupyter environment
- (Optional) API keys for testing cloud providers
- (Optional) Ollama for local testing

**Estimated time:** 2-3 hours

Let's begin! üí™

In [None]:
# Install required packages
!pip install openai huggingface_hub transformers jsonschema requests python-dotenv

## Setup and Configuration

Let's start by setting up our environment and API credentials:

In [None]:
import os
import json
import time
import random
import requests
from typing import Dict, Any, List, Optional
from jsonschema import validate, ValidationError
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# API Keys (use environment variables in production)
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "your-openai-key-here")
HF_TOKEN = os.getenv("HF_TOKEN", "your-huggingface-token-here")

print("Environment setup complete!")
print(f"OpenAI API Key configured: {'Yes' if OPENAI_API_KEY != 'your-openai-key-here' else 'No'}")
print(f"HuggingFace Token configured: {'Yes' if HF_TOKEN != 'your-huggingface-token-here' else 'No'}")

## Exercise 1: Basic Function Calling Setup

### Understanding the Components

Before diving into code, let's understand what we're building:

**1. Function Implementations (Your Python Code)**
```python
def make_coffee(coffee_type: str) -> Dict[str, Any]:
    # This is YOUR code that actually executes
    # AI models never run this‚Äîthey just tell YOU to run it
```

**2. Function Schemas (AI's Instructions)**
```json
{
  "name": "make_coffee",
  "parameters": { ... }
}
```
This tells the AI:
- What functions exist
- What parameters they need
- What values are valid

**3. The Flow**
```
User ‚Üí AI analyzes ‚Üí AI generates function call ‚Üí Your code executes ‚Üí Result ‚Üí AI responds
```

Let's build it!

In [None]:
# Define our function implementations
def make_coffee(coffee_type: str) -> Dict[str, Any]:
    """Generate a coffee recipe based on type."""
    recipes = {
        "espresso": {
            "coffee_grams": 18,
            "water_ml": 36,
            "brew_time_seconds": 25,
            "temperature_celsius": 93,
            "pressure_bar": 9
        },
        "cappuccino": {
            "coffee_grams": 18,
            "water_ml": 36,
            "milk_ml": 120,
            "milk_foam": "thick",
            "brew_time_seconds": 25
        },
        "latte": {
            "coffee_grams": 18,
            "water_ml": 36,
            "milk_ml": 240,
            "milk_foam": "thin",
            "brew_time_seconds": 25
        },
        "americano": {
            "coffee_grams": 18,
            "water_ml": 36,
            "additional_water_ml": 120,
            "brew_time_seconds": 25
        }
    }
    return recipes.get(coffee_type, {"error": "Recipe not found"})

def random_coffee_fact() -> Dict[str, Any]:
    """Return a random coffee fact."""
    facts = [
        {"fact": "Coffee was first discovered in Ethiopia around 850 AD", "source": "Historical records"},
        {"fact": "Espresso means 'pressed out' in Italian", "source": "Italian etymology"},
        {"fact": "Coffee is the world's second-most traded commodity after oil", "source": "Commodity markets"},
        {"fact": "The average American consumes 3.1 cups of coffee per day", "source": "National Coffee Association"},
        {"fact": "Coffee beans are actually seeds from coffee cherries", "source": "Botanical classification"}
    ]
    return random.choice(facts)

# Define tool schemas for the AI model
coffee_tools = [
    {
        "type": "function",
        "function": {
            "name": "make_coffee",
            "description": "Generate a coffee recipe for the specified type",
            "parameters": {
                "type": "object",
                "properties": {
                    "coffee_type": {
                        "type": "string",
                        "enum": ["espresso", "cappuccino", "latte", "americano"],
                        "description": "Type of coffee to make"
                    }
                },
                "required": ["coffee_type"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "random_coffee_fact",
            "description": "Get a random interesting fact about coffee",
            "parameters": {
                "type": "object",
                "properties": {},
                "additionalProperties": False
            }
        }
    }
]

print("Function definitions created!")
print(f"Available tools: {[tool['function']['name'] for tool in coffee_tools]}")

## Exercise 2: OpenAI Function Calling Implementation

### Understanding OpenAI's Function Calling Architecture

**The Core Mechanism:**

OpenAI's function calling works through a **multi-turn conversation pattern**:

1. **Initial Request**: You send messages + function definitions (tools)
2. **Model Decision**: AI analyzes the request and decides if/when to call functions
3. **Function Execution**: Your code runs the actual function
4. **Response Integration**: Function results are sent back to the model
5. **Final Answer**: Model synthesizes everything into a natural response

**Why This Design?**

- **Separation of Concerns**: AI decides WHAT to do, your code does HOW
- **Safety**: AI never executes code directly‚Äîyou control all execution
- **Flexibility**: Same AI can work with any functions you define
- **Reliability**: You validate inputs/outputs before execution

---

### The Implementation Flow

**Step 1: Define Tools (Function Schemas)**
```python
tools = [{
    "type": "function",
    "function": {
        "name": "make_coffee",
        "parameters": {...}  # JSON Schema
    }
}]
```
This tells the AI: "You have access to a function called `make_coffee` with these parameters."

**Step 2: Send Request with Tools**
```python
response = client.chat.completions.create(
    messages=messages,
    tools=tools,  # AI now knows about your functions
    tool_choice="auto"  # Let AI decide when to use tools
)
```

**Step 3: Check for Function Calls**
```python
if response_message.tool_calls:
    # AI wants to call a function!
    for tool_call in response_message.tool_calls:
        function_name = tool_call.function.name
        function_args = json.loads(tool_call.function.arguments)
        # Execute your actual function
        result = make_coffee(**function_args)
```

**Step 4: Send Results Back**
```python
messages.append({
    "role": "tool",
    "tool_call_id": tool_call.id,
    "content": json.dumps(result)
})
```

**Step 5: Get Final Response**
```python
final_response = client.chat.completions.create(
    messages=messages  # Now includes function results
)
# AI synthesizes: "Here's your cappuccino recipe: ..."
```

---

### Key Concepts

**`tool_choice` Parameter:**
- `"auto"`: AI decides when to use tools (recommended)
- `"none"`: AI never uses tools (pure chat)
- `{"type": "function", "function": {"name": "make_coffee"}}`: Force specific function

**Tool Call IDs:**
- Each function call gets a unique ID
- You must match response to the correct call using `tool_call_id`
- Enables parallel function calls

**Message Roles:**
- `"user"`: User input
- `"assistant"`: AI responses
- `"system"`: System instructions
- `"tool"`: Function execution results (NEW!)

Now let's implement it:

In [None]:
try:
    from openai import OpenAI
    
    # Initialize OpenAI client
    openai_client = OpenAI(api_key=OPENAI_API_KEY)
    
    def handle_openai_function_call(messages, tools):
        """Handle function calling with OpenAI API."""
        response = openai_client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=messages,
            tools=tools,
            tool_choice="auto"
        )
        
        response_message = response.choices[0].message
        
        # Check if the model wants to call a function
        if response_message.tool_calls:
            messages.append(response_message)
            
            # Handle each function call
            for tool_call in response_message.tool_calls:
                function_name = tool_call.function.name
                function_args = json.loads(tool_call.function.arguments)
                
                print(f"Calling function: {function_name}")
                print(f"Arguments: {function_args}")
                
                # Call the actual function
                if function_name == "make_coffee":
                    function_response = make_coffee(**function_args)
                elif function_name == "random_coffee_fact":
                    function_response = random_coffee_fact()
                else:
                    function_response = {"error": f"Unknown function: {function_name}"}
                
                # Add function response to messages
                messages.append({
                    "tool_call_id": tool_call.id,
                    "role": "tool",
                    "name": function_name,
                    "content": json.dumps(function_response)
                })
            
            # Get final response from model
            final_response = openai_client.chat.completions.create(
                model="gpt-3.5-turbo",
                messages=messages
            )
            
            return final_response.choices[0].message.content
        
        return response_message.content
    
    # Test the function calling
    messages = [
        {"role": "system", "content": "You are a helpful coffee assistant. Use the available tools to help users with coffee-related questions."},
        {"role": "user", "content": "I'd like to make a cappuccino. Can you give me the recipe?"}
    ]
    
    result = handle_openai_function_call(messages, coffee_tools)
    print("\nFinal response:")
    print(result)
    
except ImportError:
    print("OpenAI library not installed. Skipping this exercise.")
except Exception as e:
    print(f"OpenAI API error: {e}")
    print("Skipping OpenAI exercise. Proceeding with HuggingFace alternatives.")

## Exercise 3: HuggingFace Implementation

### Why HuggingFace? Understanding Open-Source Alternatives

**The Challenge with HuggingFace:**

Unlike OpenAI, HuggingFace models **don't have built-in function calling**. We need to:
1. **Prompt Engineering**: Instruct the model to output structured function calls
2. **Response Parsing**: Extract function calls from natural language responses
3. **Manual Orchestration**: Handle the conversation flow ourselves

**Why This Matters:**

- **Cost**: HuggingFace can be free or much cheaper
- **Privacy**: Run models locally or on your own infrastructure
- **Flexibility**: Use any open-source model
- **Learning**: Understand how function calling works under the hood

---

### The HuggingFace Approach: Prompt-Based Function Calling

**Key Difference:**

| OpenAI | HuggingFace |
|--------|-------------|
| Built-in function calling | Manual prompt engineering |
| Structured tool_calls response | Natural language with parsing |
| Automatic orchestration | Manual conversation management |

**Our Strategy:**

1. **Structured Prompt**: Tell the model exactly how to format function calls
2. **Pattern Matching**: Parse responses to extract function calls
3. **Manual Execution**: Run functions and format results
4. **Follow-up Prompt**: Send results back and ask for final answer

---

### Implementation Pattern

**Step 1: Create Structured Prompt**
```python
prompt = f"""
You have access to these tools:
- make_coffee(coffee_type): {description}
- random_coffee_fact(): {description}

If you need to use a tool, respond with:
TOOL_CALL: {{"name": "tool_name", "arguments": {{"param": "value"}}}}
"""
```

**Why This Format?**
- **Explicit**: Model knows exactly what format to use
- **Parseable**: Easy to extract with regex/string matching
- **Flexible**: Works with any model that can follow instructions

**Step 2: Parse Response**
```python
if "TOOL_CALL:" in response_text:
    # Extract JSON after TOOL_CALL:
    json_str = response_text.split("TOOL_CALL:")[1].strip()
    tool_call = json.loads(json_str)
```

**Step 3: Execute Function**
```python
function_name = tool_call['name']
function_args = tool_call['arguments']
result = available_functions[function_name](**function_args)
```

**Step 4: Follow-up Prompt**
```python
follow_up = f"""
User asked: {original_query}
Function result: {json.dumps(result)}

Provide a natural response based on this result.
"""
final_response = model.generate(follow_up)
```

---

### Challenges and Solutions

**Challenge 1: Model Doesn't Follow Format**
- **Solution**: Use few-shot examples in prompt
- **Solution**: Lower temperature for more deterministic output
- **Solution**: Post-process to fix common formatting errors

**Challenge 2: Invalid JSON in Response**
- **Solution**: Use regex to extract JSON-like structures
- **Solution**: Try multiple parsing strategies
- **Solution**: Fallback to error message if parsing fails

**Challenge 3: Model Calls Wrong Function**
- **Solution**: Validate function name exists before calling
- **Solution**: Provide clear function descriptions
- **Solution**: Use examples showing correct usage

---

### When to Use HuggingFace vs OpenAI

**Use HuggingFace When:**
- ‚úÖ Cost is a primary concern
- ‚úÖ Data privacy is critical (local deployment)
- ‚úÖ You need to customize model behavior
- ‚úÖ You want to understand the internals

**Use OpenAI When:**
- ‚úÖ You need reliable function calling out-of-the-box
- ‚úÖ Speed and latency are critical
- ‚úÖ You want minimal implementation complexity
- ‚úÖ Production reliability is paramount

---

### Real-World Applications

This pattern is used in:
- **LangChain**: Open-source framework for LLM applications
- **AutoGPT**: Autonomous AI agents
- **Custom AI Assistants**: When you need full control
- **Research Projects**: Understanding AI behavior

Now let's implement it:

In [None]:
try:
    from huggingface_hub import InferenceClient
    
    # Initialize HuggingFace client
    hf_client = InferenceClient(token=HF_TOKEN)
    
    def create_structured_prompt(user_query: str, tools: List[Dict]) -> str:
        """Create a structured prompt for HuggingFace models."""
        tool_descriptions = []
        for tool in tools:
            func = tool['function']
            tool_descriptions.append(f"""
Tool: {func['name']}
Description: {func['description']}
Parameters: {json.dumps(func['parameters'], indent=2)}
            """)
        
        prompt = f"""
You are a helpful assistant with access to the following tools:

{chr(10).join(tool_descriptions)}

User Query: {user_query}

If you need to use a tool, respond with exactly this format:
TOOL_CALL: {{"name": "tool_name", "arguments": {{"param1": "value1"}}}}

If no tool is needed, respond naturally.
        """
        
        return prompt
    
    def parse_tool_call(response_text: str) -> Optional[Dict[str, Any]]:
        """Parse tool call from model response."""
        if "TOOL_CALL:" in response_text:
            try:
                # Extract JSON after TOOL_CALL:
                json_start = response_text.find("TOOL_CALL:") + len("TOOL_CALL:")
                json_str = response_text[json_start:].strip()
                return json.loads(json_str)
            except json.JSONDecodeError:
                return None
        return None
    
    def handle_hf_function_call(user_query: str, tools: List[Dict], available_functions: Dict) -> str:
        """Handle function calling with HuggingFace models."""
        prompt = create_structured_prompt(user_query, tools)
        
        # Use a conversational model
        response = hf_client.text_generation(
            prompt,
            model="microsoft/DialoGPT-medium",
            max_new_tokens=150,
            temperature=0.1
        )
        
        print(f"Model response: {response}")
        
        # Check for tool call
        tool_call = parse_tool_call(response)
        if tool_call:
            function_name = tool_call['name']
            function_args = tool_call['arguments']
            
            print(f"Calling function: {function_name}")
            print(f"Arguments: {function_args}")
            
            # Call the actual function
            if function_name in available_functions:
                function_response = available_functions[function_name](**function_args)
                
                # Create follow-up prompt with function result
                follow_up_prompt = f"""
                User Query: {user_query}
                Function Result: {json.dumps(function_response)}
                
                Provide a natural response to the user based on the function result.
                """
                
                final_response = hf_client.text_generation(
                    follow_up_prompt,
                    model="microsoft/DialoGPT-medium",
                    max_new_tokens=150,
                    temperature=0.1
                )
                
                return final_response
            else:
                return f"Error: Unknown function {function_name}"
        
        return response
    
    # Test with HuggingFace
    available_functions = {
        "make_coffee": make_coffee,
        "random_coffee_fact": random_coffee_fact
    }
    
    result = handle_hf_function_call(
        "I'd like to make a latte. Can you give me the recipe?",
        coffee_tools,
        available_functions
    )
    
    print("\nHuggingFace result:")
    print(result)
    
except ImportError:
    print("HuggingFace Hub library not installed. Skipping this exercise.")
except Exception as e:
    print(f"HuggingFace API error: {e}")

## Exercise 4: Structured Output Validation

### Why Validation Is Critical

**The Hard Truth:** AI models can produce invalid output. Always.

**What can go wrong:**
```json
{
  "character": {
    "name": "X",              ‚Üê Too short!
    "class": "ninja",         ‚Üê Not in enum!
    "health": 150,            ‚Üê Exceeds maximum!
    "mana": "high"            ‚Üê Wrong type!
  }
}
```

**Without validation:** Your app crashes, users see errors, data gets corrupted

**With validation:** You catch errors, retry, provide fallbacks, stay reliable

---

### The Validation Strategy

**Three Layers of Defense:**

1. **Schema Definition**
   - Define what valid data looks like
   - Use JSON Schema standard
   - Make it strict (additionalProperties: false)

2. **Pre-Execution Validation**
   - Validate before calling functions
   - Reject invalid data early
   - Save computation and API calls

3. **Post-Response Validation**
   - Validate AI output before using it
   - Retry if invalid
   - Log failures for debugging

---

### JSON Schema Power

**Example Constraints:**
```json
{
  "name": {"minLength": 2, "maxLength": 20},     ‚Üê Prevents empty/huge names
  "class": {"enum": ["warrior", "mage"]},        ‚Üê Only valid classes
  "health": {"minimum": 50, "maximum": 100}      ‚Üê Balanced stats
}
```

**Real-world impact:**
- Medical systems: Validate drug dosages
- Financial systems: Validate transaction amounts
- E-commerce: Validate product IDs and quantities
- User auth: Validate email formats and passwords

Let's build robust validation!

In [None]:
def validate_json_schema(data: Dict[str, Any], schema: Dict[str, Any]) -> bool:
    """Validate data against JSON schema."""
    try:
        validate(instance=data, schema=schema)
        return True
    except ValidationError as e:
        print(f"Validation error: {e.message}")
        return False

def safe_json_parse(response_text: str) -> Optional[Dict[str, Any]]:
    """Extract and parse JSON from potentially mixed responses."""
    import re
    
    try:
        # Try direct parsing first
        return json.loads(response_text)
    except json.JSONDecodeError:
        # Extract JSON from code blocks
        json_match = re.search(r'```json\n(.*?)\n```', response_text, re.DOTALL)
        if json_match:
            return json.loads(json_match.group(1))
        # Try to find JSON-like content
        json_match = re.search(r'\{.*\}', response_text, re.DOTALL)
        if json_match:
            return json.loads(json_match.group(0))
        return None

# Define a schema for game character generation
game_character_schema = {
    "type": "object",
    "properties": {
        "character": {
            "type": "object",
            "properties": {
                "name": {"type": "string", "minLength": 2, "maxLength": 20},
                "class": {"type": "string", "enum": ["warrior", "mage", "rogue", "cleric"]},
                "health": {"type": "integer", "minimum": 50, "maximum": 100},
                "mana": {"type": "integer", "minimum": 0, "maximum": 100},
                "strength": {"type": "integer", "minimum": 1, "maximum": 20},
                "intelligence": {"type": "integer", "minimum": 1, "maximum": 20}
            },
            "required": ["name", "class", "health"],
            "additionalProperties": False
        },
        "backstory": {
            "type": "string",
            "minLength": 50,
            "maxLength": 500
        }
    },
    "required": ["character", "backstory"],
    "additionalProperties": False
}

print("Schema validation functions created!")
print(f"Game character schema requires: {game_character_schema['required']}")

## Exercise 5: Provider Comparison Framework

### Why Compare Providers? The Business Case

**The Reality:**
- Different providers have different strengths
- Costs vary dramatically (10x-100x differences)
- Latency can range from 100ms to 10+ seconds
- Reliability varies (some have 99.9% uptime, others 95%)
- Quality differs based on use case

**Why This Matters:**
- **Cost Optimization**: Save thousands per month by choosing the right provider
- **Performance**: Users notice 500ms vs 5s latency
- **Reliability**: Downtime costs money and reputation
- **Quality**: Wrong provider = wrong answers = unhappy users

---

### What We're Measuring

**1. JSON Validity Rate**
- **What**: Percentage of responses that are valid JSON
- **Why**: Invalid JSON = parsing errors = broken functionality
- **Target**: >95% for production use

**2. Schema Compliance Rate**
- **What**: Percentage of valid JSON that matches our schema
- **Why**: Wrong structure = wrong data = application bugs
- **Target**: >90% for production use

**3. Latency (Response Time)**
- **What**: Time from request to response
- **Why**: Users abandon slow applications
- **Target**: <2s for most use cases, <500ms for real-time

**4. Error Rate**
- **What**: Percentage of requests that fail
- **Why**: Errors = unhappy users = lost revenue
- **Target**: <1% for production use

**5. Cost per Request**
- **What**: Total cost divided by number of requests
- **Why**: Scale matters‚Äîsmall differences add up
- **Target**: Balance with quality requirements

---

### The Evaluation Framework Architecture

**Design Principles:**

1. **Systematic Testing**: Same prompts across all providers
2. **Multiple Metrics**: Don't optimize for just one thing
3. **Statistical Significance**: Test enough to be confident
4. **Real-World Scenarios**: Use actual use case prompts
5. **Reproducible**: Same inputs = same results

**The Flow:**

```
For each provider:
    For each test prompt:
        1. Send request with timing
        2. Parse response
        3. Validate JSON structure
        4. Validate against schema
        5. Record metrics
    6. Calculate statistics
7. Compare across providers
```

---

### Understanding the Metrics

**Valid JSON Count:**
```python
try:
    parsed = json.loads(response_text)
    valid_json_count += 1
except json.JSONDecodeError:
    # Invalid JSON - can't use this response
    pass
```
**Why it matters**: If JSON is invalid, you can't extract data. This is a hard requirement.

**Schema Compliance:**
```python
if validate_json_schema(parsed_data, schema):
    schema_compliant_count += 1
```
**Why it matters**: Even valid JSON might have wrong structure. Schema validation catches:
- Missing required fields
- Wrong data types
- Values outside allowed ranges
- Extra unexpected fields

**Latency Tracking:**
```python
start_time = time.time()
response = client.generate(...)
latency = time.time() - start_time
```
**Why it matters**: 
- **P50 (Median)**: Typical user experience
- **P95**: Worst-case for most users (5% are slower)
- **P99**: Extreme cases (1% are slower)

**Error Handling:**
```python
try:
    response = client.generate(...)
except Exception as e:
    errors.append(str(e))
```
**Why it matters**: Different providers fail in different ways:
- Network timeouts
- Rate limiting
- Invalid API keys
- Model unavailable

---

### Interpreting Results

**Example Results:**
```
Provider A: 98% valid JSON, 1.2s avg latency, $0.01/request
Provider B: 95% valid JSON, 0.5s avg latency, $0.05/request
Provider C: 99% valid JSON, 3.0s avg latency, $0.001/request
```

**Decision Framework:**
- **High volume, cost-sensitive**: Choose Provider C
- **Real-time, quality-critical**: Choose Provider B
- **Balanced requirements**: Choose Provider A

**Trade-offs:**
- **Speed vs Cost**: Faster usually costs more
- **Quality vs Speed**: Better models often slower
- **Reliability vs Cost**: More reliable = more expensive

---

### Production Best Practices

**1. Continuous Monitoring**
- Don't just test once‚Äîmonitor ongoing performance
- Providers change over time
- Your use case might evolve

**2. A/B Testing**
- Test new providers alongside existing ones
- Gradually shift traffic based on results
- Keep fallback options

**3. Multi-Provider Strategy**
- Use primary + backup providers
- Route based on request type
- Failover automatically

**4. Cost Tracking**
- Monitor actual spending, not just per-request costs
- Include infrastructure costs (if self-hosting)
- Factor in developer time for maintenance

Now let's build the framework:

In [None]:
def evaluate_provider_performance(client, test_prompts: List[str], schema: Dict[str, Any], provider_name: str) -> Dict[str, Any]:
    """Evaluate provider performance across multiple metrics."""
    results = {
        'provider': provider_name,
        'total_tests': len(test_prompts),
        'valid_json_count': 0,
        'schema_compliant_count': 0,
        'total_latency': 0,
        'errors': []
    }
    
    for i, prompt in enumerate(test_prompts):
        print(f"Testing prompt {i+1}/{len(test_prompts)}: {prompt[:50]}...")
        
        start_time = time.time()
        try:
            # This is a simplified version - adapt based on your client type
            if hasattr(client, 'chat_completions'):
                # OpenAI-style client
                response = client.chat.completions.create(
                    model="gpt-3.5-turbo",
                    messages=[{"role": "user", "content": f"Generate a game character. {prompt}"}],
                    response_format={"type": "json_object"}
                )
                response_text = response.choices[0].message.content
            else:
                # HuggingFace-style client
                structured_prompt = f"""
                Generate a game character based on this request: {prompt}
                
                Respond with valid JSON that matches this schema:
                {json.dumps(game_character_schema, indent=2)}
                
                Output only the JSON, no additional text.
                """
                response_text = client.text_generation(structured_prompt, max_new_tokens=500)
            
            latency = time.time() - start_time
            results['total_latency'] += latency
            
            # Parse and validate response
            parsed_data = safe_json_parse(response_text)
            if parsed_data:
                results['valid_json_count'] += 1
                if validate_json_schema(parsed_data, schema):
                    results['schema_compliant_count'] += 1
            
        except Exception as e:
            results['errors'].append(str(e))
            print(f"Error: {e}")
    
    # Calculate averages and percentages
    if results['total_tests'] > 0:
        results['valid_json_rate'] = (results['valid_json_count'] / results['total_tests']) * 100
        results['schema_compliance_rate'] = (results['schema_compliant_count'] / results['total_tests']) * 100
        results['avg_latency'] = results['total_latency'] / results['total_tests']
    
    return results

# Test prompts for evaluation
test_prompts = [
    "Create a brave warrior with high health and strength.",
    "Generate an intelligent mage with powerful magic abilities.",
    "Create a stealthy rogue character.",
    "Make a wise cleric with healing powers.",
    "Generate a balanced character with moderate stats."
]

print("Provider comparison framework created!")
print(f"Test prompts prepared: {len(test_prompts)}")

## Exercise 6: Local Model Integration with Ollama

### Why Run Models Locally? The Privacy and Cost Advantage

**The Local Model Revolution:**

Running AI models on your own machine gives you:
- **Zero API Costs**: No per-request charges
- **Complete Privacy**: Data never leaves your machine
- **No Rate Limits**: Use as much as you want
- **Offline Capability**: Works without internet
- **Full Control**: Customize models, fine-tune, modify

**When Local Makes Sense:**
- Processing sensitive data (medical, financial, legal)
- High-volume use cases (cost savings)
- Development and testing (faster iteration)
- Compliance requirements (data residency)
- Learning and experimentation

---

### Understanding Ollama

**What is Ollama?**
- **Local LLM Runtime**: Runs large language models on your machine
- **Open Source**: Free and community-driven
- **Easy Setup**: Simple installation and model management
- **OpenAI-Compatible API**: Can use same code patterns
- **Model Library**: Access to Llama, Mistral, CodeLlama, and more

**How It Works:**
1. **Download Models**: `ollama pull llama2` downloads model weights
2. **Start Server**: `ollama serve` runs local API server
3. **Make Requests**: Use HTTP API like OpenAI
4. **Models Run Locally**: All computation happens on your machine

**System Requirements:**
- **RAM**: 8GB+ for smaller models, 16GB+ for larger ones
- **Storage**: 4-20GB per model (depending on size)
- **CPU/GPU**: Works on CPU but much faster with GPU

---

### The Integration Pattern

**Step 1: Check if Ollama is Running**
```python
response = requests.get("http://localhost:11434/api/tags")
if response.status_code == 200:
    # Ollama is running!
```
**Why**: Ollama runs as a local server. We need to verify it's available before using it.

**Step 2: List Available Models**
```python
models = response.json().get('models', [])
model_names = [model['name'] for model in models]
```
**Why**: Different models have different capabilities. We want to use the best available one.

**Step 3: Make API Request**
```python
payload = {
    "model": "llama2",
    "prompt": "Your prompt here",
    "format": "json",  # Request structured output
    "stream": False    # Get complete response
}
response = requests.post("http://localhost:11434/api/generate", json=payload)
```
**Why**: Ollama uses HTTP REST API, similar to OpenAI but simpler.

**Step 4: Parse Response**
```python
result = response.json()
generated_text = result.get('response', '')
```
**Why**: Ollama returns JSON with the generated text in the `response` field.

---

### Key Differences from Cloud APIs

| Aspect | OpenAI/HuggingFace | Ollama |
|--------|-------------------|--------|
| **Location** | Remote servers | Your machine |
| **Cost** | Per-request pricing | Free (hardware cost) |
| **Speed** | Fast (powerful GPUs) | Depends on your hardware |
| **Privacy** | Data sent to provider | Data stays local |
| **Setup** | API key only | Install + download models |
| **Scalability** | Handled by provider | Limited by your hardware |

---

### Production Considerations

**Advantages:**
- ‚úÖ No ongoing API costs
- ‚úÖ Complete data privacy
- ‚úÖ No rate limits
- ‚úÖ Works offline

**Challenges:**
- ‚ö†Ô∏è Requires powerful hardware
- ‚ö†Ô∏è Setup and maintenance overhead
- ‚ö†Ô∏è Limited scalability (single machine)
- ‚ö†Ô∏è Model updates require manual downloads

**Hybrid Approach:**
Many production systems use both:
- **Local models** for sensitive/private data
- **Cloud APIs** for public-facing features
- **Automatic failover** from cloud to local if needed

---

### Real-World Use Cases

**1. Development and Testing**
- Test function calling without API costs
- Iterate quickly on prompts
- Debug issues locally

**2. Sensitive Data Processing**
- Medical records analysis
- Financial document processing
- Legal document review

**3. High-Volume Applications**
- Content generation at scale
- Data extraction from documents
- Batch processing jobs

**4. Compliance Requirements**
- GDPR (data must stay in EU)
- HIPAA (healthcare data privacy)
- Financial regulations

Now let's test the integration:

In [None]:
def test_ollama_integration():
    """Test integration with local Ollama instance."""
    try:
        # Check if Ollama is running
        ollama_response = requests.get("http://localhost:11434/api/tags", timeout=5)
        if ollama_response.status_code == 200:
            print("Ollama is running!")
            models = ollama_response.json().get('models', [])
            print(f"Available models: {[model['name'] for model in models]}")
            
            # Test with a simple prompt
            test_payload = {
                "model": "llama2" if any('llama2' in m['name'] for m in models) else models[0]['name'],
                "prompt": "Generate a simple JSON object with 'name' and 'age' fields.",
                "format": "json",
                "stream": False
            }
            
            response = requests.post(
                "http://localhost:11434/api/generate",
                json=test_payload,
                timeout=30
            )
            
            if response.status_code == 200:
                result = response.json()
                print(f"Ollama response: {result.get('response', 'No response')}")
                return True
            else:
                print(f"Ollama API error: {response.status_code}")
                return False
        else:
            print("Ollama is not responding properly")
            return False
            
    except requests.exceptions.ConnectionError:
        print("Ollama is not running. Start it with: ollama serve")
        return False
    except requests.exceptions.Timeout:
        print("Ollama connection timed out")
        return False
    except Exception as e:
        print(f"Error connecting to Ollama: {e}")
        return False

# Test Ollama integration
print("Testing Ollama integration...")
ollama_available = test_ollama_integration()

## Exercise 7: Final Project - Game Character Generator

### Building Production-Ready AI Applications

**What Makes This "Production-Ready"?**

This exercise combines everything you've learned into a **real-world application**:

1. **Multi-Provider Support**: Works with OpenAI, HuggingFace, and Ollama
2. **Robust Validation**: Schema validation with retry logic
3. **Error Handling**: Graceful failures with informative messages
4. **Retry Logic**: Automatically retry on failures
5. **Structured Output**: Guaranteed valid data structure

---

### The Architecture: Class-Based Design

**Why Use a Class?**

```python
class GameCharacterGenerator:
    def __init__(self, provider, api_key):
        # Initialize once, use many times
        self.provider = provider
        self.client = self._initialize_client()
    
    def generate_character(self, user_request):
        # Clean interface for generation
        # Handles all complexity internally
```

**Benefits:**
- **Encapsulation**: All provider logic in one place
- **Reusability**: Create multiple generators easily
- **Testability**: Easy to mock and test
- **Maintainability**: Changes isolated to class

---

### The Retry Pattern: Why It Matters

**The Problem:**
AI models are **probabilistic**, not deterministic:
- Same input ‚Üí different outputs
- Sometimes outputs are invalid
- Network issues cause failures
- Rate limits cause temporary errors

**The Solution: Retry with Validation**

```python
max_attempts = 3
for attempt in range(max_attempts):
    try:
        response = self._generate(...)
        if validate_json_schema(response, schema):
            return response  # Success!
    except Exception as e:
        if attempt == max_attempts - 1:
            return error  # Give up after max attempts
        # Otherwise, retry
```

**Why This Works:**
- **First attempt**: Usually succeeds (fast path)
- **Second attempt**: Catches occasional failures
- **Third attempt**: Handles persistent issues
- **After 3 attempts**: Return error (don't loop forever)

**Real-World Impact:**
- Without retry: 5-10% failure rate
- With retry: <1% failure rate
- **10x improvement in reliability**

---

### Provider Abstraction: The Power of Interfaces

**The Challenge:**
Each provider has different APIs:
- OpenAI: `client.chat.completions.create(...)`
- HuggingFace: `client.text_generation(...)`
- Ollama: `requests.post("http://localhost:11434/api/generate", ...)`

**The Solution: Provider-Specific Methods**

```python
def generate_character(self, user_request):
    if self.provider == "openai":
        return self._generate_openai_character(user_request)
    elif self.provider == "huggingface":
        return self._generate_hf_character(user_request)
    elif self.provider == "ollama":
        return self._generate_ollama_character(user_request)
```

**Benefits:**
- **Single Interface**: `generate_character()` works for all
- **Easy to Extend**: Add new providers without changing calling code
- **Provider-Specific Optimization**: Each method optimized for its provider
- **Testing**: Test each provider independently

---

### Validation Strategy: Defense in Depth

**Layer 1: Schema Definition**
```python
schema = {
    "name": {"minLength": 2, "maxLength": 20},
    "class": {"enum": ["warrior", "mage", "rogue", "cleric"]},
    "health": {"minimum": 50, "maximum": 100}
}
```
**Purpose**: Define what "valid" means

**Layer 2: Pre-Execution Validation**
```python
# Validate before sending to AI
if not is_valid_request(user_request):
    return error
```
**Purpose**: Catch obvious errors early

**Layer 3: Post-Response Validation**
```python
if validate_json_schema(response, schema):
    return response
else:
    # Retry or return error
```
**Purpose**: Ensure AI output meets requirements

**Layer 4: Runtime Validation**
```python
# Validate when using the data
if character['health'] > 100:
    raise ValueError("Invalid health value")
```
**Purpose**: Final safety check before using data

---

### Error Handling: User-Friendly Messages

**Bad Error Handling:**
```python
except Exception as e:
    return {"error": str(e)}
# User sees: "KeyError: 'name'"
```

**Good Error Handling:**
```python
except Exception as e:
    return {
        "success": False,
        "error": "Failed to generate valid character after maximum attempts",
        "attempts": max_attempts,
        "details": str(e)  # For debugging
    }
# User sees: Clear message, developer sees: Technical details
```

**Why It Matters:**
- **Users**: Need to understand what went wrong
- **Developers**: Need technical details to fix issues
- **Monitoring**: Need structured data for alerting

---

### Production Deployment Checklist

Before deploying this to production:

- [ ] **Environment Variables**: API keys in `.env`, not code
- [ ] **Logging**: Log all requests and responses
- [ ] **Monitoring**: Track success rates, latency, errors
- [ ] **Rate Limiting**: Prevent abuse
- [ ] **Caching**: Cache common requests
- [ ] **Documentation**: API docs for other developers
- [ ] **Testing**: Unit tests for each provider
- [ ] **Error Recovery**: Graceful degradation on failures

---

### What You've Built

This generator demonstrates:
- ‚úÖ **Multi-provider architecture** (OpenAI, HuggingFace, Ollama)
- ‚úÖ **Robust validation** (schema validation with retries)
- ‚úÖ **Error handling** (graceful failures)
- ‚úÖ **Production patterns** (class-based, testable, maintainable)
- ‚úÖ **Real-world complexity** (handles edge cases)

**Next Steps:**
- Add more providers (Anthropic, Cohere, etc.)
- Implement caching for common requests
- Add streaming responses for better UX
- Build a web API wrapper
- Add monitoring and analytics

Now let's build it:

In [None]:
class GameCharacterGenerator:
    """A robust game character generator with multiple provider support."""
    
    def __init__(self, provider: str = "openai", api_key: str = None):
        self.provider = provider
        self.api_key = api_key
        self.client = self._initialize_client()
    
    def _initialize_client(self):
        """Initialize the appropriate client based on provider."""
        if self.provider == "openai":
            try:
                from openai import OpenAI
                return OpenAI(api_key=self.api_key)
            except ImportError:
                raise ImportError("OpenAI library not installed")
        elif self.provider == "huggingface":
            try:
                from huggingface_hub import InferenceClient
                return InferenceClient(token=self.api_key)
            except ImportError:
                raise ImportError("HuggingFace Hub library not installed")
        elif self.provider == "ollama":
            return None  # Ollama uses HTTP requests
        else:
            raise ValueError(f"Unknown provider: {self.provider}")
    
    def generate_character(self, user_request: str) -> Dict[str, Any]:
        """Generate a game character with validation."""
        max_attempts = 3
        
        for attempt in range(max_attempts):
            try:
                if self.provider == "openai":
                    response = self._generate_openai_character(user_request)
                elif self.provider == "huggingface":
                    response = self._generate_hf_character(user_request)
                elif self.provider == "ollama":
                    response = self._generate_ollama_character(user_request)
                else:
                    raise ValueError(f"Unknown provider: {self.provider}")
                
                # Validate the response
                if validate_json_schema(response, game_character_schema):
                    return {
                        "success": True,
                        "character": response,
                        "attempts": attempt + 1
                    }
                else:
                    print(f"Attempt {attempt + 1}: Schema validation failed")
                    
            except Exception as e:
                print(f"Attempt {attempt + 1}: Error - {e}")
        
        return {
            "success": False,
            "error": "Failed to generate valid character after maximum attempts",
            "attempts": max_attempts
        }
    
    def _generate_openai_character(self, user_request: str) -> Dict[str, Any]:
        """Generate character using OpenAI."""
        response = self.client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[{
                "role": "system",
                "content": f"Generate a game character based on the user's request. Respond with valid JSON that matches this schema: {json.dumps(game_character_schema)}"
            }, {
                "role": "user",
                "content": user_request
            }],
            response_format={"type": "json_object"}
        )
        
        response_text = response.choices[0].message.content
        return safe_json_parse(response_text)
    
    def _generate_hf_character(self, user_request: str) -> Dict[str, Any]:
        """Generate character using HuggingFace."""
        prompt = f"""
        Generate a game character based on this request: {user_request}
        
        Respond with valid JSON that matches this schema:
        {json.dumps(game_character_schema, indent=2)}
        
        Output only the JSON, no additional text.
        """
        
        response_text = self.client.text_generation(prompt, max_new_tokens=500)
        return safe_json_parse(response_text)
    
    def _generate_ollama_character(self, user_request: str) -> Dict[str, Any]:
        """Generate character using Ollama."""
        payload = {
            "model": "llama2",
            "prompt": f"Generate a game character based on this request: {user_request}",
            "format": "json",
            "stream": False
        }
        
        response = requests.post("http://localhost:11434/api/generate", json=payload)
        if response.status_code == 200:
            result = response.json()
            return safe_json_parse(result.get('response', '{}'))
        else:
            raise Exception(f"Ollama API error: {response.status_code}")

# Test the character generator
print("Testing Game Character Generator...")

# Test with different providers
providers_to_test = []
if OPENAI_API_KEY != "your-openai-key-here":
    providers_to_test.append("openai")
if HF_TOKEN != "your-huggingface-token-here":
    providers_to_test.append("huggingface")
if ollama_available:
    providers_to_test.append("ollama")

for provider in providers_to_test:
    try:
        print(f"\n--- Testing {provider.upper()} ---")
        generator = GameCharacterGenerator(provider=provider, api_key=OPENAI_API_KEY if provider == "openai" else HF_TOKEN)
        result = generator.generate_character("Create a brave warrior with high strength and good health.")
        
        if result['success']:
            print(f"‚úÖ Character generated successfully in {result['attempts']} attempt(s)")
            character = result['character']['character']
            print(f"Name: {character['name']}")
            print(f"Class: {character['class']}")
            print(f"Health: {character['health']}")
            print(f"Strength: {character['strength']}")
        else:
            print(f"‚ùå Failed to generate character: {result['error']}")
            
    except Exception as e:
        print(f"Error testing {provider}: {e}")

## Summary and Next Steps

Congratulations! You've completed the Function Calling and Structured Outputs lab. Here's what you've learned:

### Key Skills Acquired:
- ‚úÖ Function calling implementation with JSON Schema
- ‚úÖ Cross-platform AI provider integration (OpenAI, HuggingFace, Ollama)
- ‚úÖ Structured output validation and error handling
- ‚úÖ Provider performance comparison and evaluation
- ‚úÖ Building robust AI applications with retry logic

### Best Practices:
- Always validate AI outputs against schemas
- Implement proper error handling and retry mechanisms
- Use environment variables for API keys
- Test across multiple providers for reliability
- Handle JSON parsing edge cases gracefully

### Next Steps:
1. Experiment with different schemas for your use cases
2. Try integrating with other AI providers
3. Build more complex function calling scenarios
4. Implement caching for better performance
5. Add monitoring and logging for production use

### Additional Resources:
- [JSON Schema Documentation](https://json-schema.org/)
- [OpenAI Function Calling Guide](https://platform.openai.com/docs/guides/function-calling)
- [HuggingFace Inference API](https://huggingface.co/docs/api-inference/index)
- [Ollama Documentation](https://ollama.ai/)

Remember: The key to reliable AI applications is robust validation, proper error handling, and thorough testing across different scenarios and providers!