# Lab 2: OpenAI API Deep Dive

**Week 3 - Advanced Prompting & OpenAI API**

**Provided by:** ADC ENGINEERING & CONSULTING LTD

## Objectives

In this lab, you will:
- Master all OpenAI API parameters and their effects
- Implement robust error handling and retry logic
- Optimize API usage for cost and performance
- Use streaming responses effectively
- Handle rate limits and token management
- Implement request batching and caching
- Monitor and log API usage
- Build production-ready API wrappers

## Prerequisites

- Completed Week 2 labs
- Understanding of async programming (helpful)
- OpenAI API key configured
- Python 3.9+

## Setup and Installation

In [None]:
# Install required packages
!pip install openai python-dotenv tiktoken tenacity aiohttp asyncio pandas matplotlib --quiet

In [None]:
import os
import json
import time
import asyncio
from typing import List, Dict, Optional, Any, AsyncIterator, Callable
from dataclasses import dataclass, field
from datetime import datetime, timedelta
from collections import defaultdict
import hashlib

from openai import OpenAI, AsyncOpenAI
from openai import RateLimitError, APIError, APITimeoutError, APIConnectionError
from dotenv import load_dotenv
import tiktoken
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type

# Load environment variables
load_dotenv()

# Initialize OpenAI clients
client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
async_client = AsyncOpenAI(api_key=os.getenv('OPENAI_API_KEY'))

print("✓ Setup complete!")

## Part 1: Complete Parameter Guide

Let's explore every OpenAI API parameter in depth.

### Core Parameters

**model**: Which model to use
**messages**: The conversation history
**temperature**: Randomness (0.0 = deterministic, 2.0 = very random)
**max_tokens**: Maximum tokens in response
**top_p**: Nucleus sampling (alternative to temperature)
**frequency_penalty**: Reduce repetition of tokens (-2.0 to 2.0)
**presence_penalty**: Encourage new topics (-2.0 to 2.0)
**n**: Number of completions to generate
**stop**: Sequences where API will stop generating
**stream**: Whether to stream responses
**logprobs**: Return log probabilities
**user**: Unique identifier for end-user (for monitoring)

In [None]:
class ParameterExplorer:
    """
    Explore and demonstrate OpenAI API parameters.
    """
    
    def __init__(self, model: str = "gpt-3.5-turbo"):
        """
        Initialize ParameterExplorer.
        
        Args:
            model: OpenAI model to use
        """
        self.model = model
        self.results = []
    
    def explore_temperature(
        self,
        prompt: str,
        temperatures: List[float] = [0.0, 0.5, 1.0, 1.5, 2.0]
    ):
        """
        Show effect of temperature parameter.
        
        Args:
            prompt: Prompt to test
            temperatures: Temperature values to test
        """
        print(f"\n{'='*80}")
        print("TEMPERATURE EXPLORATION")
        print(f"{'='*80}\n")
        print(f"Prompt: {prompt}\n")
        
        for temp in temperatures:
            response = client.chat.completions.create(
                model=self.model,
                messages=[{"role": "user", "content": prompt}],
                temperature=temp,
                max_tokens=100
            )
            
            print(f"Temperature: {temp}")
            print(f"Response: {response.choices[0].message.content}")
            print(f"Tokens: {response.usage.total_tokens}")
            print("-" * 80)
    
    def explore_top_p(
        self,
        prompt: str,
        top_p_values: List[float] = [0.1, 0.5, 0.9, 1.0]
    ):
        """
        Show effect of top_p (nucleus sampling).
        
        Args:
            prompt: Prompt to test
            top_p_values: top_p values to test
        """
        print(f"\n{'='*80}")
        print("TOP_P EXPLORATION")
        print(f"{'='*80}\n")
        print(f"Prompt: {prompt}\n")
        
        for top_p in top_p_values:
            response = client.chat.completions.create(
                model=self.model,
                messages=[{"role": "user", "content": prompt}],
                temperature=1.0,  # Use with temperature
                top_p=top_p,
                max_tokens=100
            )
            
            print(f"top_p: {top_p}")
            print(f"Response: {response.choices[0].message.content}")
            print("-" * 80)
    
    def explore_penalties(
        self,
        prompt: str,
        frequency_penalties: List[float] = [0.0, 1.0, 2.0],
        presence_penalties: List[float] = [0.0, 1.0, 2.0]
    ):
        """
        Show effect of frequency and presence penalties.
        
        Args:
            prompt: Prompt to test
            frequency_penalties: Frequency penalty values
            presence_penalties: Presence penalty values
        """
        print(f"\n{'='*80}")
        print("PENALTY EXPLORATION")
        print(f"{'='*80}\n")
        print(f"Prompt: {prompt}\n")
        
        configs = [
            ("No penalties", 0.0, 0.0),
            ("High frequency penalty", 2.0, 0.0),
            ("High presence penalty", 0.0, 2.0),
            ("Both high", 2.0, 2.0)
        ]
        
        for name, freq_penalty, pres_penalty in configs:
            response = client.chat.completions.create(
                model=self.model,
                messages=[{"role": "user", "content": prompt}],
                temperature=0.7,
                frequency_penalty=freq_penalty,
                presence_penalty=pres_penalty,
                max_tokens=150
            )
            
            print(f"{name} (freq={freq_penalty}, pres={pres_penalty})")
            print(f"Response: {response.choices[0].message.content}")
            print("-" * 80)
    
    def explore_n_parameter(self, prompt: str, n: int = 3):
        """
        Generate multiple completions at once.
        
        Args:
            prompt: Prompt to test
            n: Number of completions
        """
        print(f"\n{'='*80}")
        print(f"N PARAMETER EXPLORATION (n={n})")
        print(f"{'='*80}\n")
        print(f"Prompt: {prompt}\n")
        
        response = client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            temperature=0.8,
            n=n,
            max_tokens=100
        )
        
        for i, choice in enumerate(response.choices, 1):
            print(f"Completion {i}:")
            print(choice.message.content)
            print("-" * 80)
    
    def explore_stop_sequences(self, prompt: str):
        """
        Show effect of stop sequences.
        
        Args:
            prompt: Prompt to test
        """
        print(f"\n{'='*80}")
        print("STOP SEQUENCES EXPLORATION")
        print(f"{'='*80}\n")
        
        configs = [
            ("No stop", None),
            ("Stop at period", ["."]),
            ("Stop at newline", ["\n"]),
            ("Multiple stops", [".", "\n", "!"])
        ]
        
        for name, stop in configs:
            response = client.chat.completions.create(
                model=self.model,
                messages=[{"role": "user", "content": prompt}],
                temperature=0.7,
                stop=stop,
                max_tokens=200
            )
            
            print(f"{name}: {stop}")
            print(f"Response: {response.choices[0].message.content}")
            print("-" * 80)

# Test parameter exploration
explorer = ParameterExplorer()

# Temperature
explorer.explore_temperature(
    "Write a creative opening line for a science fiction story.",
    temperatures=[0.0, 0.7, 1.5]
)

# Penalties (test repetition)
repetitive_prompt = "List reasons why Python is popular. Start each reason with 'Python is'"
explorer.explore_penalties(repetitive_prompt)

# N parameter
explorer.explore_n_parameter(
    "Suggest a creative name for a coffee shop.",
    n=3
)

### Exercise 1.1: Parameter Experimentation

Experiment with different parameter combinations:

In [None]:
# TODO: Test parameter combinations for different use cases

explorer = ParameterExplorer()

# Use case 1: Deterministic code generation
# TODO: What parameters would you use for consistent code generation?
# Hint: Low temperature, maybe frequency_penalty for avoiding repetition

# Use case 2: Creative writing
# TODO: What parameters for maximum creativity?
# Hint: Higher temperature or top_p

# Use case 3: Formal business writing
# TODO: What parameters for professional, concise responses?

# Use case 4: Brainstorming (multiple diverse ideas)
# TODO: Use n parameter with appropriate temperature

# Test your configurations
# response = client.chat.completions.create(
#     model="gpt-3.5-turbo",
#     messages=[{"role": "user", "content": "Your prompt"}],
#     temperature=...,
#     # Add other parameters
# )

## Part 2: Error Handling and Retries

Production applications need robust error handling.

In [None]:
@dataclass
class APICallResult:
    """Result of an API call."""
    success: bool
    response: Optional[Any] = None
    error: Optional[str] = None
    attempts: int = 1
    total_time: float = 0.0

class RobustAPIClient:
    """
    OpenAI API client with comprehensive error handling.
    """
    
    def __init__(
        self,
        model: str = "gpt-3.5-turbo",
        max_retries: int = 3,
        base_delay: float = 1.0,
        max_delay: float = 60.0
    ):
        """
        Initialize RobustAPIClient.
        
        Args:
            model: OpenAI model to use
            max_retries: Maximum retry attempts
            base_delay: Base delay for exponential backoff (seconds)
            max_delay: Maximum delay between retries (seconds)
        """
        self.model = model
        self.max_retries = max_retries
        self.base_delay = base_delay
        self.max_delay = max_delay
        self.call_history: List[APICallResult] = []
    
    def chat_completion(
        self,
        messages: List[Dict[str, str]],
        **kwargs
    ) -> APICallResult:
        """
        Make a chat completion with retry logic.
        
        Args:
            messages: Chat messages
            **kwargs: Additional API parameters
        
        Returns:
            APICallResult with response or error
        """
        start_time = time.time()
        last_error = None
        
        for attempt in range(1, self.max_retries + 1):
            try:
                response = client.chat.completions.create(
                    model=kwargs.get('model', self.model),
                    messages=messages,
                    **{k: v for k, v in kwargs.items() if k != 'model'}
                )
                
                result = APICallResult(
                    success=True,
                    response=response,
                    attempts=attempt,
                    total_time=time.time() - start_time
                )
                self.call_history.append(result)
                return result
                
            except RateLimitError as e:
                last_error = f"Rate limit exceeded: {str(e)}"
                if attempt < self.max_retries:
                    delay = min(self.base_delay * (2 ** (attempt - 1)), self.max_delay)
                    print(f"Rate limit hit. Retrying in {delay}s... (attempt {attempt}/{self.max_retries})")
                    time.sleep(delay)
                    
            except APITimeoutError as e:
                last_error = f"Request timeout: {str(e)}"
                if attempt < self.max_retries:
                    delay = min(self.base_delay * (2 ** (attempt - 1)), self.max_delay)
                    print(f"Timeout. Retrying in {delay}s... (attempt {attempt}/{self.max_retries})")
                    time.sleep(delay)
                    
            except APIConnectionError as e:
                last_error = f"Connection error: {str(e)}"
                if attempt < self.max_retries:
                    delay = min(self.base_delay * (2 ** (attempt - 1)), self.max_delay)
                    print(f"Connection error. Retrying in {delay}s... (attempt {attempt}/{self.max_retries})")
                    time.sleep(delay)
                    
            except APIError as e:
                last_error = f"API error: {str(e)}"
                # Don't retry on general API errors (might be client error)
                break
                
            except Exception as e:
                last_error = f"Unexpected error: {str(e)}"
                break
        
        # All retries failed
        result = APICallResult(
            success=False,
            error=last_error,
            attempts=attempt,
            total_time=time.time() - start_time
        )
        self.call_history.append(result)
        return result
    
    def get_statistics(self) -> Dict[str, Any]:
        """Get statistics about API calls."""
        total_calls = len(self.call_history)
        successful = sum(1 for r in self.call_history if r.success)
        failed = total_calls - successful
        
        avg_attempts = sum(r.attempts for r in self.call_history) / total_calls if total_calls > 0 else 0
        avg_time = sum(r.total_time for r in self.call_history) / total_calls if total_calls > 0 else 0
        
        return {
            "total_calls": total_calls,
            "successful": successful,
            "failed": failed,
            "success_rate": f"{(successful/total_calls*100):.1f}%" if total_calls > 0 else "0%",
            "avg_attempts": f"{avg_attempts:.2f}",
            "avg_time": f"{avg_time:.2f}s"
        }

# Test robust client
robust_client = RobustAPIClient(max_retries=3, base_delay=1.0)

# Make some calls
messages = [{"role": "user", "content": "What is Python?"}]
result = robust_client.chat_completion(messages, temperature=0.7)

if result.success:
    print(f"✓ Success after {result.attempts} attempt(s)")
    print(f"Response: {result.response.choices[0].message.content[:100]}...")
else:
    print(f"✗ Failed after {result.attempts} attempts")
    print(f"Error: {result.error}")

# Make more calls
for i in range(5):
    result = robust_client.chat_completion(
        [{"role": "user", "content": f"Tell me a fact about {['Python', 'AI', 'ML', 'Data', 'Code'][i]}"}],
        temperature=0.5
    )

# Get statistics
stats = robust_client.get_statistics()
print("\n" + "="*80)
print("API CALL STATISTICS")
print("="*80)
for key, value in stats.items():
    print(f"{key}: {value}")

### Exercise 2.1: Implement Advanced Error Handling

Enhance the error handling with circuit breaker pattern:

In [None]:
# TODO: Implement circuit breaker pattern

class CircuitBreaker:
    """
    Circuit breaker to prevent cascading failures.
    
    States:
    - CLOSED: Normal operation
    - OPEN: Too many failures, reject requests
    - HALF_OPEN: Testing if service recovered
    
    TODO: Implement:
    1. Track failure rate
    2. Open circuit after threshold
    3. Attempt recovery after timeout
    4. Close circuit if recovery successful
    """
    
    def __init__(
        self,
        failure_threshold: int = 5,
        timeout: float = 60.0,
        success_threshold: int = 2
    ):
        """
        Initialize circuit breaker.
        
        Args:
            failure_threshold: Failures before opening circuit
            timeout: Seconds before attempting recovery
            success_threshold: Successes needed to close circuit
        """
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.success_threshold = success_threshold
        
        # TODO: Initialize state tracking
        self.state = "CLOSED"  # CLOSED, OPEN, HALF_OPEN
        self.failures = 0
        self.successes = 0
        self.last_failure_time = None
    
    def call(self, func: Callable, *args, **kwargs) -> Any:
        """
        Execute function with circuit breaker protection.
        
        TODO: Implement circuit breaker logic
        """
        pass
    
    def record_success(self):
        """TODO: Record successful call."""
        pass
    
    def record_failure(self):
        """TODO: Record failed call."""
        pass

# Usage example:
# breaker = CircuitBreaker(failure_threshold=3, timeout=30.0)
# result = breaker.call(robust_client.chat_completion, messages, temperature=0.7)

## Part 3: Streaming Responses

Stream responses for better user experience with long outputs.

In [None]:
class StreamingHandler:
    """
    Handle streaming responses from OpenAI API.
    """
    
    def __init__(self, model: str = "gpt-3.5-turbo"):
        """
        Initialize StreamingHandler.
        
        Args:
            model: OpenAI model to use
        """
        self.model = model
    
    def stream_response(
        self,
        messages: List[Dict[str, str]],
        callback: Optional[Callable[[str], None]] = None,
        **kwargs
    ) -> str:
        """
        Stream response with optional callback.
        
        Args:
            messages: Chat messages
            callback: Function called with each chunk
            **kwargs: Additional API parameters
        
        Returns:
            Complete response text
        """
        full_response = ""
        
        stream = client.chat.completions.create(
            model=kwargs.get('model', self.model),
            messages=messages,
            stream=True,
            **{k: v for k, v in kwargs.items() if k not in ['model', 'stream']}
        )
        
        for chunk in stream:
            if chunk.choices[0].delta.content:
                content = chunk.choices[0].delta.content
                full_response += content
                
                if callback:
                    callback(content)
                else:
                    print(content, end='', flush=True)
        
        print()  # New line after streaming
        return full_response
    
    async def async_stream_response(
        self,
        messages: List[Dict[str, str]],
        callback: Optional[Callable[[str], None]] = None,
        **kwargs
    ) -> str:
        """
        Async version of stream_response.
        
        Args:
            messages: Chat messages
            callback: Function called with each chunk
            **kwargs: Additional API parameters
        
        Returns:
            Complete response text
        """
        full_response = ""
        
        stream = await async_client.chat.completions.create(
            model=kwargs.get('model', self.model),
            messages=messages,
            stream=True,
            **{k: v for k, v in kwargs.items() if k not in ['model', 'stream']}
        )
        
        async for chunk in stream:
            if chunk.choices[0].delta.content:
                content = chunk.choices[0].delta.content
                full_response += content
                
                if callback:
                    callback(content)
                else:
                    print(content, end='', flush=True)
        
        print()
        return full_response
    
    def stream_with_indicators(
        self,
        messages: List[Dict[str, str]],
        show_speed: bool = True,
        **kwargs
    ) -> Dict[str, Any]:
        """
        Stream with speed and token indicators.
        
        Args:
            messages: Chat messages
            show_speed: Show streaming speed
            **kwargs: Additional API parameters
        
        Returns:
            Dict with response and metadata
        """
        full_response = ""
        start_time = time.time()
        chunk_count = 0
        
        print("Streaming response...")
        print("-" * 80)
        
        stream = client.chat.completions.create(
            model=kwargs.get('model', self.model),
            messages=messages,
            stream=True,
            **{k: v for k, v in kwargs.items() if k not in ['model', 'stream']}
        )
        
        for chunk in stream:
            if chunk.choices[0].delta.content:
                content = chunk.choices[0].delta.content
                full_response += content
                chunk_count += 1
                print(content, end='', flush=True)
        
        elapsed = time.time() - start_time
        
        print()
        print("-" * 80)
        
        if show_speed:
            chars_per_sec = len(full_response) / elapsed if elapsed > 0 else 0
            print(f"✓ Completed in {elapsed:.2f}s")
            print(f"✓ Speed: {chars_per_sec:.0f} chars/sec")
            print(f"✓ Chunks: {chunk_count}")
        
        return {
            "response": full_response,
            "elapsed_time": elapsed,
            "chunk_count": chunk_count,
            "chars_per_second": len(full_response) / elapsed if elapsed > 0 else 0
        }

# Test streaming
streamer = StreamingHandler()

print("="*80)
print("BASIC STREAMING")
print("="*80)
response = streamer.stream_response(
    [{"role": "user", "content": "Write a short story about a robot learning to paint (3 paragraphs)."}],
    temperature=0.8
)

print("\n" + "="*80)
print("STREAMING WITH INDICATORS")
print("="*80)
result = streamer.stream_with_indicators(
    [{"role": "user", "content": "Explain how neural networks work in 5 steps."}],
    temperature=0.7
)

### Exercise 3.1: Build Custom Streaming UI

Create a custom streaming handler with rich formatting:

In [None]:
# TODO: Build custom streaming handler with formatting

class RichStreamingHandler(StreamingHandler):
    """
    Enhanced streaming with formatting and progress.
    
    TODO: Implement:
    1. Token-by-token display with syntax highlighting
    2. Progress indicator showing estimated completion
    3. Ability to pause/resume streaming
    4. Save stream to file in real-time
    5. Handle markdown formatting as it streams
    """
    
    def __init__(self, model: str = "gpt-3.5-turbo"):
        super().__init__(model)
        # TODO: Add formatting utilities
    
    def stream_with_formatting(
        self,
        messages: List[Dict[str, str]],
        format_type: str = "markdown",
        **kwargs
    ) -> str:
        """
        TODO: Stream with real-time formatting.
        
        Args:
            messages: Chat messages
            format_type: Type of formatting (markdown, code, plain)
            **kwargs: Additional API parameters
        """
        pass
    
    def stream_to_file(
        self,
        messages: List[Dict[str, str]],
        filepath: str,
        **kwargs
    ) -> str:
        """
        TODO: Stream response directly to file.
        
        Args:
            messages: Chat messages
            filepath: Output file path
            **kwargs: Additional API parameters
        """
        pass

# Test your implementation
# rich_streamer = RichStreamingHandler()
# response = rich_streamer.stream_with_formatting(
#     [{"role": "user", "content": "Write a Python function with documentation."}],
#     format_type="code"
# )

## Part 4: Rate Limiting and Token Management

Handle rate limits and optimize token usage.

In [None]:
@dataclass
class TokenBudget:
    """Token budget for rate limiting."""
    tokens_per_minute: int
    requests_per_minute: int
    tokens_used: int = 0
    requests_made: int = 0
    window_start: datetime = field(default_factory=datetime.now)

class RateLimitedClient:
    """
    API client with rate limiting and token management.
    """
    
    def __init__(
        self,
        model: str = "gpt-3.5-turbo",
        tokens_per_minute: int = 90000,
        requests_per_minute: int = 3500
    ):
        """
        Initialize RateLimitedClient.
        
        Args:
            model: OpenAI model to use
            tokens_per_minute: Token limit per minute
            requests_per_minute: Request limit per minute
        """
        self.model = model
        self.budget = TokenBudget(
            tokens_per_minute=tokens_per_minute,
            requests_per_minute=requests_per_minute
        )
        self.encoding = tiktoken.encoding_for_model(model)
    
    def count_tokens(self, messages: List[Dict[str, str]]) -> int:
        """
        Count tokens in messages.
        
        Args:
            messages: Chat messages
        
        Returns:
            Token count
        """
        num_tokens = 0
        for message in messages:
            # Every message follows <im_start>{role/name}\n{content}<im_end>\n
            num_tokens += 4
            for key, value in message.items():
                num_tokens += len(self.encoding.encode(value))
        num_tokens += 2  # Every reply is primed with <im_start>assistant
        return num_tokens
    
    def reset_budget_if_needed(self):
        """Reset budget if window expired."""
        now = datetime.now()
        elapsed = (now - self.budget.window_start).total_seconds()
        
        if elapsed >= 60:
            self.budget.tokens_used = 0
            self.budget.requests_made = 0
            self.budget.window_start = now
    
    def can_make_request(self, estimated_tokens: int) -> tuple[bool, str]:
        """
        Check if request can be made within limits.
        
        Args:
            estimated_tokens: Estimated tokens for request
        
        Returns:
            (can_proceed, reason)
        """
        self.reset_budget_if_needed()
        
        if self.budget.requests_made >= self.budget.requests_per_minute:
            return False, "Request limit reached"
        
        if self.budget.tokens_used + estimated_tokens > self.budget.tokens_per_minute:
            return False, "Token limit would be exceeded"
        
        return True, "OK"
    
    def wait_if_needed(self, estimated_tokens: int):
        """Wait if rate limit would be exceeded."""
        can_proceed, reason = self.can_make_request(estimated_tokens)
        
        if not can_proceed:
            elapsed = (datetime.now() - self.budget.window_start).total_seconds()
            wait_time = max(60 - elapsed, 0)
            print(f"Rate limit reached ({reason}). Waiting {wait_time:.1f}s...")
            time.sleep(wait_time + 1)
            self.reset_budget_if_needed()
    
    def chat_completion(
        self,
        messages: List[Dict[str, str]],
        **kwargs
    ) -> Any:
        """
        Make chat completion with rate limiting.
        
        Args:
            messages: Chat messages
            **kwargs: Additional API parameters
        
        Returns:
            API response
        """
        # Count input tokens
        input_tokens = self.count_tokens(messages)
        max_tokens = kwargs.get('max_tokens', 1000)
        estimated_total = input_tokens + max_tokens
        
        # Wait if needed
        self.wait_if_needed(estimated_total)
        
        # Make request
        response = client.chat.completions.create(
            model=kwargs.get('model', self.model),
            messages=messages,
            **{k: v for k, v in kwargs.items() if k != 'model'}
        )
        
        # Update budget
        actual_tokens = response.usage.total_tokens
        self.budget.tokens_used += actual_tokens
        self.budget.requests_made += 1
        
        return response
    
    def get_usage_stats(self) -> Dict[str, Any]:
        """Get current usage statistics."""
        self.reset_budget_if_needed()
        
        return {
            "tokens_used": self.budget.tokens_used,
            "tokens_limit": self.budget.tokens_per_minute,
            "tokens_remaining": self.budget.tokens_per_minute - self.budget.tokens_used,
            "tokens_usage_pct": f"{(self.budget.tokens_used / self.budget.tokens_per_minute * 100):.1f}%",
            "requests_made": self.budget.requests_made,
            "requests_limit": self.budget.requests_per_minute,
            "requests_remaining": self.budget.requests_per_minute - self.budget.requests_made,
            "window_reset_in": f"{max(60 - (datetime.now() - self.budget.window_start).total_seconds(), 0):.1f}s"
        }

# Test rate-limited client
rate_limited = RateLimitedClient(
    tokens_per_minute=10000,  # Lower limit for testing
    requests_per_minute=10
)

print("="*80)
print("RATE-LIMITED API CLIENT")
print("="*80)

# Make several requests
for i in range(5):
    messages = [{"role": "user", "content": f"Tell me fact #{i+1} about Python."}]
    
    response = rate_limited.chat_completion(messages, max_tokens=100)
    print(f"\nRequest {i+1}:")
    print(response.choices[0].message.content[:100] + "...")
    
    stats = rate_limited.get_usage_stats()
    print(f"Tokens used: {stats['tokens_used']}/{stats['tokens_limit']} ({stats['tokens_usage_pct']})")
    print(f"Requests made: {stats['requests_made']}/{stats['requests_limit']}")

### Exercise 4.1: Implement Token Optimization

Create a system to optimize token usage:

In [None]:
# TODO: Implement token optimization strategies

class TokenOptimizer:
    """
    Optimize token usage across requests.
    
    TODO: Implement:
    1. Conversation summarization when context gets too long
    2. Smart truncation of messages
    3. Compression techniques for prompts
    4. Batch similar requests together
    5. Cache common responses
    """
    
    def __init__(self, model: str = "gpt-3.5-turbo", max_context_tokens: int = 4000):
        """
        Initialize TokenOptimizer.
        
        Args:
            model: OpenAI model to use
            max_context_tokens: Maximum tokens to keep in context
        """
        self.model = model
        self.max_context_tokens = max_context_tokens
        self.encoding = tiktoken.encoding_for_model(model)
    
    def summarize_conversation(
        self,
        messages: List[Dict[str, str]]
    ) -> List[Dict[str, str]]:
        """
        TODO: Summarize conversation to reduce tokens.
        
        Args:
            messages: Conversation history
        
        Returns:
            Compressed messages
        """
        pass
    
    def compress_prompt(self, prompt: str) -> str:
        """
        TODO: Compress prompt while preserving meaning.
        
        Args:
            prompt: Original prompt
        
        Returns:
            Compressed prompt
        """
        pass
    
    def should_compress(self, messages: List[Dict[str, str]]) -> bool:
        """
        TODO: Determine if compression is needed.
        
        Args:
            messages: Current messages
        
        Returns:
            True if compression needed
        """
        pass

# Test your implementation
# optimizer = TokenOptimizer(max_context_tokens=1000)
# 
# long_conversation = [
#     {"role": "user", "content": "Tell me about Python."},
#     {"role": "assistant", "content": "Python is a high-level..."},
#     # ... many more messages
# ]
# 
# if optimizer.should_compress(long_conversation):
#     compressed = optimizer.summarize_conversation(long_conversation)
#     print(f"Reduced from {len(long_conversation)} to {len(compressed)} messages")

## Part 5: Request Batching and Async Operations

Optimize throughput with batching and async requests.

In [None]:
class BatchAPIClient:
    """
    Process multiple API requests efficiently.
    """
    
    def __init__(self, model: str = "gpt-3.5-turbo"):
        """
        Initialize BatchAPIClient.
        
        Args:
            model: OpenAI model to use
        """
        self.model = model
    
    async def async_chat_completion(
        self,
        messages: List[Dict[str, str]],
        **kwargs
    ) -> Any:
        """
        Async chat completion.
        
        Args:
            messages: Chat messages
            **kwargs: Additional API parameters
        
        Returns:
            API response
        """
        response = await async_client.chat.completions.create(
            model=kwargs.get('model', self.model),
            messages=messages,
            **{k: v for k, v in kwargs.items() if k != 'model'}
        )
        return response
    
    async def process_batch(
        self,
        message_list: List[List[Dict[str, str]]],
        **kwargs
    ) -> List[Any]:
        """
        Process multiple requests concurrently.
        
        Args:
            message_list: List of message lists
            **kwargs: Additional API parameters
        
        Returns:
            List of responses
        """
        tasks = [
            self.async_chat_completion(messages, **kwargs)
            for messages in message_list
        ]
        
        responses = await asyncio.gather(*tasks, return_exceptions=True)
        return responses
    
    def batch_process(
        self,
        message_list: List[List[Dict[str, str]]],
        batch_size: int = 10,
        **kwargs
    ) -> List[Any]:
        """
        Process requests in batches (handles event loop).
        
        Args:
            message_list: List of message lists
            batch_size: Requests per batch
            **kwargs: Additional API parameters
        
        Returns:
            List of responses
        """
        all_responses = []
        
        # Process in batches
        for i in range(0, len(message_list), batch_size):
            batch = message_list[i:i+batch_size]
            print(f"Processing batch {i//batch_size + 1} ({len(batch)} requests)...")
            
            # Run async batch
            responses = asyncio.run(self.process_batch(batch, **kwargs))
            all_responses.extend(responses)
            
            print(f"✓ Batch complete")
        
        return all_responses

# Test batch processing
batch_client = BatchAPIClient()

# Create multiple requests
requests = [
    [{"role": "user", "content": f"What is {topic}?"}]
    for topic in ["Python", "JavaScript", "SQL", "HTML", "CSS", "React", "Node.js", "MongoDB"]
]

print("="*80)
print("BATCH PROCESSING")
print("="*80)
print(f"Processing {len(requests)} requests...\n")

start_time = time.time()
responses = batch_client.batch_process(requests, batch_size=4, temperature=0.5, max_tokens=100)
elapsed = time.time() - start_time

print(f"\n{'='*80}")
print(f"Completed {len(responses)} requests in {elapsed:.2f}s")
print(f"Average: {elapsed/len(responses):.2f}s per request")
print(f"{'='*80}\n")

# Show some results
for i, response in enumerate(responses[:3]):
    if not isinstance(response, Exception):
        print(f"Request {i+1}:")
        print(response.choices[0].message.content[:100] + "...")
        print("-" * 80)

### Exercise 5.1: Build Smart Batch Processor

Create an intelligent batch processor:

In [None]:
# TODO: Build smart batch processor

class SmartBatchProcessor(BatchAPIClient):
    """
    Intelligent batch processing with optimization.
    
    TODO: Implement:
    1. Automatic batch size optimization based on rate limits
    2. Priority queue for urgent requests
    3. Request deduplication (skip identical requests)
    4. Result caching
    5. Adaptive retry for failed requests in batch
    6. Progress tracking with ETA
    """
    
    def __init__(
        self,
        model: str = "gpt-3.5-turbo",
        cache_results: bool = True
    ):
        super().__init__(model)
        self.cache: Dict[str, Any] = {}
        self.cache_results = cache_results
    
    def _cache_key(self, messages: List[Dict[str, str]], **kwargs) -> str:
        """
        TODO: Generate cache key for request.
        
        Args:
            messages: Chat messages
            **kwargs: API parameters
        
        Returns:
            Cache key
        """
        pass
    
    async def process_with_cache(
        self,
        message_list: List[List[Dict[str, str]]],
        **kwargs
    ) -> List[Any]:
        """
        TODO: Process batch with caching and deduplication.
        
        Args:
            message_list: List of message lists
            **kwargs: Additional API parameters
        
        Returns:
            List of responses
        """
        pass
    
    def process_with_priority(
        self,
        requests: List[tuple[List[Dict[str, str]], int]],  # (messages, priority)
        **kwargs
    ) -> List[Any]:
        """
        TODO: Process requests respecting priority.
        
        Args:
            requests: List of (messages, priority) tuples
            **kwargs: Additional API parameters
        
        Returns:
            List of responses in original order
        """
        pass

# Test your implementation
# smart_processor = SmartBatchProcessor(cache_results=True)
# 
# requests_with_priority = [
#     ([{"role": "user", "content": "Urgent: System status?"}], 1),  # High priority
#     ([{"role": "user", "content": "What is Python?"}], 3),  # Low priority
#     ([{"role": "user", "content": "What is Python?"}], 3),  # Duplicate
# ]
# 
# responses = smart_processor.process_with_priority(requests_with_priority)

## Part 6: Monitoring and Logging

Track API usage and performance.

In [None]:
@dataclass
class APICallLog:
    """Log entry for API call."""
    timestamp: datetime
    model: str
    messages: List[Dict[str, str]]
    parameters: Dict[str, Any]
    tokens_used: int
    cost: float
    latency: float
    success: bool
    error: Optional[str] = None

class APIMonitor:
    """
    Monitor and log API usage.
    """
    
    def __init__(self, model: str = "gpt-3.5-turbo"):
        """
        Initialize APIMonitor.
        
        Args:
            model: OpenAI model to use
        """
        self.model = model
        self.logs: List[APICallLog] = []
        
        # Pricing (per 1K tokens) - update as needed
        self.pricing = {
            "gpt-3.5-turbo": {"input": 0.0015, "output": 0.002},
            "gpt-4": {"input": 0.03, "output": 0.06},
            "gpt-4-turbo": {"input": 0.01, "output": 0.03}
        }
    
    def calculate_cost(self, model: str, input_tokens: int, output_tokens: int) -> float:
        """Calculate cost of API call."""
        if model not in self.pricing:
            model = "gpt-3.5-turbo"  # Default
        
        input_cost = (input_tokens / 1000) * self.pricing[model]["input"]
        output_cost = (output_tokens / 1000) * self.pricing[model]["output"]
        return input_cost + output_cost
    
    def chat_completion(
        self,
        messages: List[Dict[str, str]],
        **kwargs
    ) -> Any:
        """
        Make chat completion with logging.
        
        Args:
            messages: Chat messages
            **kwargs: Additional API parameters
        
        Returns:
            API response
        """
        start_time = time.time()
        model = kwargs.get('model', self.model)
        
        try:
            response = client.chat.completions.create(
                model=model,
                messages=messages,
                **{k: v for k, v in kwargs.items() if k != 'model'}
            )
            
            latency = time.time() - start_time
            usage = response.usage
            cost = self.calculate_cost(
                model,
                usage.prompt_tokens,
                usage.completion_tokens
            )
            
            log = APICallLog(
                timestamp=datetime.now(),
                model=model,
                messages=messages,
                parameters={k: v for k, v in kwargs.items() if k != 'model'},
                tokens_used=usage.total_tokens,
                cost=cost,
                latency=latency,
                success=True
            )
            self.logs.append(log)
            
            return response
            
        except Exception as e:
            latency = time.time() - start_time
            
            log = APICallLog(
                timestamp=datetime.now(),
                model=model,
                messages=messages,
                parameters={k: v for k, v in kwargs.items() if k != 'model'},
                tokens_used=0,
                cost=0.0,
                latency=latency,
                success=False,
                error=str(e)
            )
            self.logs.append(log)
            raise
    
    def get_summary(self, hours: Optional[float] = None) -> Dict[str, Any]:
        """
        Get usage summary.
        
        Args:
            hours: Hours to look back (None = all time)
        
        Returns:
            Summary statistics
        """
        logs = self.logs
        
        if hours:
            cutoff = datetime.now() - timedelta(hours=hours)
            logs = [log for log in logs if log.timestamp >= cutoff]
        
        if not logs:
            return {"message": "No API calls logged"}
        
        total_calls = len(logs)
        successful = sum(1 for log in logs if log.success)
        failed = total_calls - successful
        
        total_tokens = sum(log.tokens_used for log in logs)
        total_cost = sum(log.cost for log in logs)
        avg_latency = sum(log.latency for log in logs) / total_calls
        
        # Model breakdown
        model_counts = defaultdict(int)
        for log in logs:
            model_counts[log.model] += 1
        
        return {
            "time_period": f"Last {hours} hours" if hours else "All time",
            "total_calls": total_calls,
            "successful": successful,
            "failed": failed,
            "success_rate": f"{(successful/total_calls*100):.1f}%",
            "total_tokens": total_tokens,
            "total_cost": f"${total_cost:.4f}",
            "avg_latency": f"{avg_latency:.2f}s",
            "models_used": dict(model_counts)
        }
    
    def export_logs(self, filepath: str):
        """Export logs to JSON file."""
        logs_data = [
            {
                "timestamp": log.timestamp.isoformat(),
                "model": log.model,
                "tokens": log.tokens_used,
                "cost": log.cost,
                "latency": log.latency,
                "success": log.success,
                "error": log.error
            }
            for log in self.logs
        ]
        
        with open(filepath, 'w') as f:
            json.dump(logs_data, f, indent=2)
        
        print(f"✓ Exported {len(logs_data)} logs to {filepath}")

# Test monitor
monitor = APIMonitor()

print("="*80)
print("API MONITORING")
print("="*80)

# Make some calls
test_messages = [
    [{"role": "user", "content": "What is Python?"}],
    [{"role": "user", "content": "Explain machine learning."}],
    [{"role": "user", "content": "Write a haiku about coding."}]
]

for messages in test_messages:
    response = monitor.chat_completion(messages, temperature=0.7, max_tokens=150)
    print(f"\n✓ Request completed: {messages[0]['content'][:30]}...")

# Get summary
print("\n" + "="*80)
print("USAGE SUMMARY")
print("="*80)
summary = monitor.get_summary()
for key, value in summary.items():
    print(f"{key}: {value}")

# Export logs
monitor.export_logs("api_usage_logs.json")

### Exercise 6.1: Build Dashboard

Create a usage dashboard with visualizations:

In [None]:
# TODO: Build usage dashboard

class APIDashboard(APIMonitor):
    """
    Visual dashboard for API usage.
    
    TODO: Implement:
    1. Real-time usage charts (tokens, cost, latency over time)
    2. Model comparison view
    3. Cost projections
    4. Alert system for unusual usage
    5. Export reports in multiple formats
    """
    
    def plot_usage_over_time(self, metric: str = "tokens"):
        """
        TODO: Plot usage metric over time.
        
        Args:
            metric: Metric to plot (tokens, cost, latency, calls)
        """
        pass
    
    def plot_model_comparison(self):
        """TODO: Compare usage across models."""
        pass
    
    def generate_report(self, format: str = "html") -> str:
        """
        TODO: Generate usage report.
        
        Args:
            format: Report format (html, pdf, markdown)
        
        Returns:
            Report content or file path
        """
        pass
    
    def check_alerts(self) -> List[str]:
        """
        TODO: Check for unusual patterns.
        
        Returns:
            List of alert messages
        """
        pass

# Test your dashboard
# dashboard = APIDashboard()
# # ... make API calls ...
# dashboard.plot_usage_over_time("cost")
# dashboard.plot_model_comparison()
# alerts = dashboard.check_alerts()

## Challenge Projects

### Challenge 1: Production API Wrapper

Build a complete production-ready API wrapper:

In [None]:
class ProductionAPIWrapper:
    """
    Production-ready OpenAI API wrapper.
    
    TODO: Combine all features:
    1. Robust error handling with circuit breaker
    2. Rate limiting and token management
    3. Request caching and deduplication
    4. Streaming support
    5. Batch processing
    6. Comprehensive monitoring and logging
    7. Cost optimization
    8. Health checks and metrics
    9. Graceful degradation
    10. Configuration management
    
    Should be:
    - Thread-safe
    - Async-compatible
    - Well-documented
    - Thoroughly tested
    - Easy to configure
    """
    
    def __init__(self, config: Dict[str, Any]):
        """
        Initialize with configuration.
        
        Args:
            config: Configuration dict with all settings
        """
        pass
    
    # TODO: Implement all methods

# Usage example:
# config = {
#     "model": "gpt-3.5-turbo",
#     "rate_limits": {"tokens_per_minute": 90000},
#     "retry": {"max_attempts": 3, "backoff": "exponential"},
#     "cache": {"enabled": True, "ttl": 3600},
#     "monitoring": {"enabled": True, "log_level": "INFO"}
# }
# 
# api = ProductionAPIWrapper(config)
# response = api.chat_completion([{"role": "user", "content": "Hello!"}])

### Challenge 2: API Cost Optimizer

Build a system that minimizes API costs:

In [None]:
class APICostOptimizer:
    """
    Optimize API usage costs.
    
    TODO: Implement:
    1. Model selection based on task complexity
    2. Automatic prompt compression
    3. Aggressive caching strategy
    4. Batch similar requests
    5. Use cheaper models for simple tasks
    6. Context window optimization
    7. Cost prediction before requests
    8. Budget enforcement
    9. Cost analytics and recommendations
    """
    
    def __init__(self, budget: float, period: str = "daily"):
        """
        Initialize with budget.
        
        Args:
            budget: Budget amount in USD
            period: Budget period (hourly, daily, monthly)
        """
        pass
    
    # TODO: Implement optimization methods

# Usage:
# optimizer = APICostOptimizer(budget=10.0, period="daily")
# response = optimizer.optimized_completion(
#     prompt="Your prompt",
#     complexity="simple"  # or "medium", "complex"
# )

### Challenge 3: Multi-Model Orchestrator

Build a system that intelligently routes requests to different models:

In [None]:
class MultiModelOrchestrator:
    """
    Route requests to optimal models.
    
    TODO: Implement:
    1. Task classification (simple/medium/complex)
    2. Model routing based on task and budget
    3. Fallback to cheaper models
    4. A/B testing between models
    5. Performance tracking per model
    6. Automatic model selection optimization
    7. Support multiple providers (OpenAI, Anthropic, etc.)
    """
    
    def __init__(self, models: List[Dict[str, Any]]):
        """
        Initialize with model configurations.
        
        Args:
            models: List of model configs with capabilities and costs
        """
        pass
    
    # TODO: Implement routing logic

# Usage:
# models = [
#     {"name": "gpt-3.5-turbo", "cost": "low", "capabilities": ["general"]},
#     {"name": "gpt-4", "cost": "high", "capabilities": ["reasoning", "coding"]},
# ]
# 
# orchestrator = MultiModelOrchestrator(models)
# response = orchestrator.route_and_complete(
#     prompt="Your prompt",
#     preferred_cost="low"
# )

## Summary

In this lab, you've learned:

1. ✅ Complete OpenAI API parameter reference
2. ✅ Production-grade error handling and retries
3. ✅ Streaming responses for better UX
4. ✅ Rate limiting and token management
5. ✅ Batch processing and async operations
6. ✅ Comprehensive monitoring and logging
7. ✅ Cost optimization strategies
8. ✅ Building production-ready API wrappers

### Key Takeaways

**Parameter Mastery:**
- **temperature**: Control randomness (0.0-2.0)
- **top_p**: Alternative sampling method
- **frequency_penalty**: Reduce repetition
- **presence_penalty**: Encourage topic diversity
- **n**: Generate multiple completions
- **stop**: Control generation endpoints

**Production Best Practices:**
1. **Always handle errors**: Use retry logic with exponential backoff
2. **Respect rate limits**: Track usage, implement throttling
3. **Stream when possible**: Better UX for long responses
4. **Monitor everything**: Log calls, track costs, measure performance
5. **Optimize costs**: Cache, batch, compress, choose right models
6. **Test thoroughly**: Handle edge cases, network issues, rate limits

**Performance Optimization:**
- Use async for concurrent requests
- Batch similar requests together
- Cache common responses
- Compress long contexts
- Choose appropriate models for task complexity

### Common Pitfalls

❌ **No retry logic**: Fails on temporary errors
❌ **Ignoring rate limits**: Gets blocked by API
❌ **No monitoring**: Can't debug issues or track costs
❌ **Synchronous processing**: Slow for multiple requests
❌ **Poor error messages**: Hard to debug problems
❌ **No token tracking**: Unexpected costs

### Next Steps

- Complete the challenge projects
- Build your own production API wrapper
- Implement cost optimization for your use case
- Move on to Lab 3: Function Calling System

**Provided by:** ADC ENGINEERING & CONSULTING LTD