# Implementing & Integrating AI into Your Applications

## Complete Guide with Live Demo

This notebook demonstrates practical patterns for integrating AI into enterprise applications, including:

1. **API Integration Patterns** - OpenAI, Anthropic, Azure OpenAI
2. **RAG (Retrieval-Augmented Generation)** - Context-aware responses
3. **Prompt Engineering** - Effective prompt design
4. **Error Handling & Resilience** - Retries, fallbacks, circuit breakers
5. **Cost Management** - Token tracking and budgets
6. **Security & Safety** - PII redaction, content filtering
7. **Production Patterns** - Caching, rate limiting, monitoring

**Live Demo**: Customer support assistant with RAG, safety checks, and real-time analytics

## Setup & Dependencies

In [73]:
# Install required packages
import subprocess
import sys

packages = [
    'openai',
    'anthropic',
    'tiktoken',
    'sentence-transformers',
    'chromadb',
    'pandas',
    'numpy',
    'plotly',
    'python-dotenv'
]

for package in packages:
    try:
        __import__(package.replace('-', '_'))
        print(f"✓ {package} already installed")
    except ImportError:
        print(f"Installing {package}...")
        subprocess.check_call([sys.executable, '-m', 'pip', 'install', package])
        print(f"✓ {package} installed successfully")

✓ openai already installed
✓ anthropic already installed
✓ tiktoken already installed
✓ sentence-transformers already installed
✓ chromadb already installed
✓ pandas already installed
✓ numpy already installed
✓ plotly already installed
Installing python-dotenv...
✓ python-dotenv installed successfully


In [74]:
import os
import json
import time
import re
from typing import List, Dict, Any, Optional
from dataclasses import dataclass, field
from datetime import datetime
from functools import wraps

import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

print("✅ All dependencies imported successfully")

✅ All dependencies imported successfully


## 1. AI Provider Integration Patterns

### Multi-Provider Architecture

Build a unified interface that supports multiple AI providers for flexibility and failover.

In [75]:
@dataclass
class AIMessage:
    """Standardized message format"""
    role: str  # 'system', 'user', 'assistant'
    content: str

@dataclass
class AIResponse:
    """Standardized response format"""
    content: str
    model: str
    provider: str
    prompt_tokens: int
    completion_tokens: int
    total_tokens: int
    cost: float
    latency_ms: float
    metadata: Dict[str, Any] = field(default_factory=dict)

class AIProviderInterface:
    """Abstract base for AI providers"""
    
    def complete(
        self,
        messages: List[AIMessage],
        model: str,
        temperature: float = 0.7,
        max_tokens: int = 1000
    ) -> AIResponse:
        raise NotImplementedError

class OpenAIProvider(AIProviderInterface):
    """OpenAI integration"""
    
    # Cost per 1M tokens (as of 2025)
    PRICING = {
        'gpt-4o': {'input': 2.50, 'output': 10.00},
        'gpt-4o-mini': {'input': 0.150, 'output': 0.600},
        'gpt-4-turbo': {'input': 10.00, 'output': 30.00},
    }
    
    def __init__(self, api_key: Optional[str] = None):
        from openai import OpenAI
        self.client = OpenAI(api_key=api_key or os.getenv('OPENAI_API_KEY'))
    
    def complete(self, messages: List[AIMessage], model: str = 'gpt-4o-mini',
                 temperature: float = 0.7, max_tokens: int = 1000) -> AIResponse:
        start = time.time()
        
        response = self.client.chat.completions.create(
            model=model,
            messages=[{'role': m.role, 'content': m.content} for m in messages],
            temperature=temperature,
            max_tokens=max_tokens
        )
        
        latency_ms = (time.time() - start) * 1000
        usage = response.usage
        
        # Calculate cost
        pricing = self.PRICING.get(model, self.PRICING['gpt-4o-mini'])
        cost = (
            (usage.prompt_tokens / 1_000_000) * pricing['input'] +
            (usage.completion_tokens / 1_000_000) * pricing['output']
        )
        
        return AIResponse(
            content=response.choices[0].message.content,
            model=model,
            provider='openai',
            prompt_tokens=usage.prompt_tokens,
            completion_tokens=usage.completion_tokens,
            total_tokens=usage.total_tokens,
            cost=cost,
            latency_ms=latency_ms
        )

class AnthropicProvider(AIProviderInterface):
    """Anthropic Claude integration"""
    
    PRICING = {
        'claude-3-5-sonnet-20241022': {'input': 3.00, 'output': 15.00},
        'claude-3-5-haiku-20241022': {'input': 0.80, 'output': 4.00},
    }
    
    def __init__(self, api_key: Optional[str] = None):
        from anthropic import Anthropic
        self.client = Anthropic(api_key=api_key or os.getenv('ANTHROPIC_API_KEY'))
    
    def complete(self, messages: List[AIMessage], model: str = 'claude-3-5-haiku-20241022',
                 temperature: float = 0.7, max_tokens: int = 1000) -> AIResponse:
        start = time.time()
        
        # Separate system message from conversation
        system_msg = next((m.content for m in messages if m.role == 'system'), None)
        conv_messages = [{'role': m.role, 'content': m.content} 
                        for m in messages if m.role != 'system']
        
        response = self.client.messages.create(
            model=model,
            system=system_msg,
            messages=conv_messages,
            temperature=temperature,
            max_tokens=max_tokens
        )
        
        latency_ms = (time.time() - start) * 1000
        
        # Calculate cost
        pricing = self.PRICING.get(model, self.PRICING['claude-3-5-haiku-20241022'])
        cost = (
            (response.usage.input_tokens / 1_000_000) * pricing['input'] +
            (response.usage.output_tokens / 1_000_000) * pricing['output']
        )
        
        return AIResponse(
            content=response.content[0].text,
            model=model,
            provider='anthropic',
            prompt_tokens=response.usage.input_tokens,
            completion_tokens=response.usage.output_tokens,
            total_tokens=response.usage.input_tokens + response.usage.output_tokens,
            cost=cost,
            latency_ms=latency_ms
        )

print("✅ AI Provider interfaces defined")

✅ AI Provider interfaces defined


## 2. Resilience Patterns

### Retry Logic, Circuit Breaker, and Fallback

In [76]:
class CircuitBreaker:
    """Circuit breaker pattern for AI API calls"""
    
    def __init__(self, failure_threshold: int = 5, timeout: int = 60):
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.failures = 0
        self.last_failure_time = None
        self.state = 'closed'  # closed, open, half-open
    
    def call(self, func, *args, **kwargs):
        if self.state == 'open':
            if time.time() - self.last_failure_time > self.timeout:
                self.state = 'half-open'
            else:
                raise Exception("Circuit breaker is OPEN")
        
        try:
            result = func(*args, **kwargs)
            self.on_success()
            return result
        except Exception as e:
            self.on_failure()
            raise e
    
    def on_success(self):
        self.failures = 0
        self.state = 'closed'
    
    def on_failure(self):
        self.failures += 1
        self.last_failure_time = time.time()
        if self.failures >= self.failure_threshold:
            self.state = 'open'

def with_retry(max_retries: int = 3, backoff: float = 1.0):
    """Retry decorator with exponential backoff"""
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(max_retries):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    if attempt == max_retries - 1:
                        raise e
                    wait_time = backoff * (2 ** attempt)
                    print(f"⚠️  Retry {attempt + 1}/{max_retries} after {wait_time}s")
                    time.sleep(wait_time)
        return wrapper
    return decorator

class AIClient:
    """Resilient AI client with retry, circuit breaker, and fallback"""
    
    def __init__(self, primary_provider: AIProviderInterface,
                 fallback_provider: Optional[AIProviderInterface] = None):
        self.primary = primary_provider
        self.fallback = fallback_provider
        self.circuit_breaker = CircuitBreaker()
    
    @with_retry(max_retries=3, backoff=1.0)
    def complete(self, messages: List[AIMessage], **kwargs) -> AIResponse:
        try:
            return self.circuit_breaker.call(
                self.primary.complete,
                messages,
                **kwargs
            )
        except Exception as e:
            if self.fallback:
                print(f"⚠️  Primary failed, using fallback: {e}")
                return self.fallback.complete(messages, **kwargs)
            raise e

print("✅ Resilience patterns implemented")

✅ Resilience patterns implemented


## 3. RAG (Retrieval-Augmented Generation)

### Vector Store & Context Injection

In [77]:
import chromadb
from chromadb.utils import embedding_functions

class KnowledgeBase:
    """Vector-based knowledge base for RAG"""
    
    def __init__(self, collection_name: str = "knowledge"):
        self.client = chromadb.Client()
        
        # Use sentence transformers for embeddings
        self.embedding_fn = embedding_functions.SentenceTransformerEmbeddingFunction(
            model_name="all-MiniLM-L6-v2"
        )
        
        self.collection = self.client.get_or_create_collection(
            name=collection_name,
            embedding_function=self.embedding_fn
        )
    
    def add_documents(self, documents: List[Dict[str, str]]):
        """Add documents to knowledge base
        
        Args:
            documents: List of {'id': str, 'text': str, 'metadata': dict}
        """
        self.collection.add(
            ids=[doc['id'] for doc in documents],
            documents=[doc['text'] for doc in documents],
            metadatas=[doc.get('metadata', {}) for doc in documents]
        )
        print(f"✅ Added {len(documents)} documents to knowledge base")
    
    def search(self, query: str, n_results: int = 3) -> List[Dict[str, Any]]:
        """Search for relevant documents"""
        results = self.collection.query(
            query_texts=[query],
            n_results=n_results
        )
        
        return [
            {
                'text': doc,
                'metadata': meta,
                'distance': dist
            }
            for doc, meta, dist in zip(
                results['documents'][0],
                results['metadatas'][0],
                results['distances'][0]
            )
        ]

class RAGAssistant:
    """AI assistant with RAG capabilities"""
    
    def __init__(self, ai_client: AIClient, knowledge_base: KnowledgeBase):
        self.ai_client = ai_client
        self.kb = knowledge_base
    
    def answer_question(self, question: str, system_prompt: str = None) -> Dict[str, Any]:
        """Answer question using RAG"""
        # Retrieve relevant context
        context_docs = self.kb.search(question, n_results=3)
        
        # Build context string
        context = "\n\n".join([
            f"[Source {i+1}]: {doc['text']}"
            for i, doc in enumerate(context_docs)
        ])
        
        # Build prompt with context
        messages = []
        
        if system_prompt:
            messages.append(AIMessage(
                role='system',
                content=system_prompt
            ))
        
        messages.append(AIMessage(
            role='user',
            content=f"""Context:
{context}

Question: {question}

Answer the question using the provided context. Cite sources using [Source N] format."""
        ))
        
        # Get AI response
        response = self.ai_client.complete(messages, max_tokens=500)
        
        return {
            'answer': response.content,
            'sources': context_docs,
            'cost': response.cost,
            'latency_ms': response.latency_ms,
            'tokens': response.total_tokens
        }

print("✅ RAG system implemented")

✅ RAG system implemented


## 4. Security & Safety

### PII Redaction, Content Filtering, Input Validation

In [78]:
class SafetyFilter:
    """Content safety and PII redaction"""
    
    # PII patterns
    PATTERNS = {
        'email': r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b',
        'phone': r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b',
        'ssn': r'\b\d{3}-\d{2}-\d{4}\b',
        'credit_card': r'\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b',
    }
    
    # Blocked content keywords
    BLOCKED_KEYWORDS = [
        'hack', 'exploit', 'vulnerability', 'malware',
        'illegal', 'fraud', 'scam'
    ]
    
    @staticmethod
    def redact_pii(text: str) -> tuple[str, List[str]]:
        """Redact PII from text
        
        Returns:
            (redacted_text, list of PII types found)
        """
        redacted = text
        found_pii = []
        
        for pii_type, pattern in SafetyFilter.PATTERNS.items():
            matches = re.findall(pattern, redacted)
            if matches:
                found_pii.append(pii_type)
                redacted = re.sub(pattern, f'[REDACTED_{pii_type.upper()}]', redacted)
        
        return redacted, found_pii
    
    @staticmethod
    def check_content(text: str) -> tuple[bool, List[str]]:
        """Check for blocked content
        
        Returns:
            (is_safe, list of blocked keywords found)
        """
        text_lower = text.lower()
        found_keywords = [
            kw for kw in SafetyFilter.BLOCKED_KEYWORDS
            if kw in text_lower
        ]
        
        return len(found_keywords) == 0, found_keywords
    
    @staticmethod
    def validate_input(text: str, max_length: int = 10000) -> tuple[bool, str]:
        """Validate user input
        
        Returns:
            (is_valid, error_message)
        """
        if not text or not text.strip():
            return False, "Input cannot be empty"
        
        if len(text) > max_length:
            return False, f"Input too long (max {max_length} chars)"
        
        return True, ""

class SecureAIAssistant(RAGAssistant):
    """AI assistant with security features"""
    
    def answer_question(self, question: str, system_prompt: str = None,
                       redact_pii: bool = True) -> Dict[str, Any]:
        # Validate input
        is_valid, error = SafetyFilter.validate_input(question)
        if not is_valid:
            return {'error': error, 'answer': None}
        
        # Check content safety
        is_safe, blocked = SafetyFilter.check_content(question)
        if not is_safe:
            return {
                'error': f'Blocked content detected: {blocked}',
                'answer': None
            }
        
        # Redact PII if enabled
        processed_question = question
        pii_found = []
        if redact_pii:
            processed_question, pii_found = SafetyFilter.redact_pii(question)
        
        # Get answer from parent class
        result = super().answer_question(processed_question, system_prompt)
        
        # Add security metadata
        result['security'] = {
            'pii_redacted': pii_found,
            'content_safe': is_safe
        }
        
        return result

print("✅ Security features implemented")

✅ Security features implemented


## 5. Cost Management & Analytics

### Token Tracking, Budgets, and Monitoring

In [79]:
class CostTracker:
    """Track and manage AI API costs"""
    
    def __init__(self, daily_budget: float = 10.0, monthly_budget: float = 200.0):
        self.daily_budget = daily_budget
        self.monthly_budget = monthly_budget
        self.calls = []
    
    def log_call(self, response: AIResponse):
        """Log an API call"""
        self.calls.append({
            'timestamp': datetime.now(),
            'provider': response.provider,
            'model': response.model,
            'prompt_tokens': response.prompt_tokens,
            'completion_tokens': response.completion_tokens,
            'total_tokens': response.total_tokens,
            'cost': response.cost,
            'latency_ms': response.latency_ms
        })
    
    def get_stats(self) -> Dict[str, Any]:
        """Get cost and usage statistics"""
        if not self.calls:
            return {}
        
        df = pd.DataFrame(self.calls)
        
        # Today's costs
        today = datetime.now().date()
        today_calls = df[df['timestamp'].dt.date == today]
        
        # This month's costs
        this_month = datetime.now().replace(day=1).date()
        month_calls = df[df['timestamp'].dt.date >= this_month]
        
        return {
            'total_calls': len(df),
            'total_cost': df['cost'].sum(),
            'total_tokens': df['total_tokens'].sum(),
            'avg_latency_ms': df['latency_ms'].mean(),
            'today_cost': today_calls['cost'].sum(),
            'today_budget_remaining': self.daily_budget - today_calls['cost'].sum(),
            'month_cost': month_calls['cost'].sum(),
            'month_budget_remaining': self.monthly_budget - month_calls['cost'].sum(),
            'by_provider': df.groupby('provider')['cost'].sum().to_dict(),
            'by_model': df.groupby('model')['cost'].sum().to_dict()
        }
    
    def check_budget(self) -> tuple[bool, str]:
        """Check if within budget
        
        Returns:
            (within_budget, message)
        """
        stats = self.get_stats()
        
        if stats.get('today_budget_remaining', self.daily_budget) < 0:
            return False, "Daily budget exceeded"
        
        if stats.get('month_budget_remaining', self.monthly_budget) < 0:
            return False, "Monthly budget exceeded"
        
        return True, "Within budget"
    
    def visualize_costs(self):
        """Create interactive cost visualization"""
        if not self.calls:
            print("No data to visualize")
            return
        
        df = pd.DataFrame(self.calls)
        df['date'] = df['timestamp'].dt.date
        
        # Cost over time
        daily_cost = df.groupby('date')['cost'].sum().reset_index()
        
        fig = go.Figure()
        fig.add_trace(go.Scatter(
            x=daily_cost['date'],
            y=daily_cost['cost'],
            mode='lines+markers',
            name='Daily Cost',
            line=dict(color='#1f77b4', width=2)
        ))
        
        # Budget line
        fig.add_hline(
            y=self.daily_budget,
            line_dash="dash",
            line_color="red",
            annotation_text="Daily Budget"
        )
        
        fig.update_layout(
            title='AI API Costs Over Time',
            xaxis_title='Date',
            yaxis_title='Cost ($)',
            hovermode='x unified'
        )
        
        return fig

print("✅ Cost management system implemented")

✅ Cost management system implemented


## Live Demo: Customer Support Assistant

Let's build a complete AI-powered customer support assistant with:
- RAG for knowledge retrieval
- PII redaction
- Cost tracking
- Multi-provider failover
- Real-time analytics

In [80]:
# Initialize knowledge base with sample support articles
kb = KnowledgeBase(collection_name="support_kb")

support_articles = [
    {
        'id': 'art-001',
        'text': """Password Reset Process: To reset your password, click 'Forgot Password' on the login page. 
        Enter your email address and you'll receive a reset link within 5 minutes. The link expires after 24 hours. 
        If you don't receive the email, check your spam folder or contact support.""",
        'metadata': {'category': 'authentication', 'priority': 'high'}
    },
    {
        'id': 'art-002',
        'text': """Account Setup: New accounts are activated within 1 business day. You'll receive a welcome email 
        with your login credentials. First-time users should complete the onboarding tutorial to learn about 
        key features. Admin users can invite team members from the Settings > Users page.""",
        'metadata': {'category': 'onboarding', 'priority': 'medium'}
    },
    {
        'id': 'art-003',
        'text': """Billing Questions: Billing occurs on the 1st of each month for your subscription tier. 
        You can view invoices in Settings > Billing. To upgrade or downgrade, contact your account manager. 
        Refunds are available within 30 days of purchase. Payment methods include credit card and wire transfer.""",
        'metadata': {'category': 'billing', 'priority': 'high'}
    },
    {
        'id': 'art-004',
        'text': """API Integration: Our REST API is available at https://api.example.com/v1. Authentication uses 
        API keys generated in Settings > API Keys. Rate limits are 1000 requests/hour for standard tier and 
        10,000 requests/hour for enterprise. Full API documentation is at https://docs.example.com/api.""",
        'metadata': {'category': 'technical', 'priority': 'medium'}
    },
    {
        'id': 'art-005',
        'text': """Data Export: You can export your data anytime from Settings > Data Export. Choose between 
        CSV, JSON, or Excel formats. Large exports are processed asynchronously and you'll receive a download 
        link via email. Data retention policy keeps exports available for 7 days.""",
        'metadata': {'category': 'data', 'priority': 'low'}
    },
    {
        'id': 'art-006',
        'text': """Two-Factor Authentication: Enable 2FA in Settings > Security for enhanced account protection. 
        We support authenticator apps (Google Authenticator, Authy) and SMS codes. Keep backup codes in a safe 
        place. If you lose access to your 2FA device, contact support with verification.""",
        'metadata': {'category': 'security', 'priority': 'high'}
    },
    {
        'id': 'art-007',
        'text': """Performance Issues: If experiencing slow performance, try clearing your browser cache and cookies. 
        Check your internet connection speed (minimum 5 Mbps recommended). Disable browser extensions that might 
        interfere. For persistent issues, send us a HAR file from your browser's developer tools.""",
        'metadata': {'category': 'technical', 'priority': 'medium'}
    },
    {
        'id': 'art-008',
        'text': """Mobile App: Our mobile app is available on iOS (12.0+) and Android (8.0+). Download from the 
        App Store or Google Play. Mobile features include offline mode, push notifications, and biometric login. 
        Sync happens automatically when connected to wifi.""",
        'metadata': {'category': 'mobile', 'priority': 'low'}
    }
]

kb.add_documents(support_articles)
print("✅ Knowledge base populated with support articles")

✅ Added 8 documents to knowledge base
✅ Knowledge base populated with support articles


In [81]:
# Initialize AI providers
# Note: Set OPENAI_API_KEY in your environment or .env file

try:
    primary_provider = OpenAIProvider()
    print("✅ OpenAI provider initialized")
except Exception as e:
    print(f"⚠️  OpenAI initialization failed: {e}")
    print("   Using mock provider for demo")
    # You could implement a mock provider here for testing
    primary_provider = None

# Initialize resilient AI client
if primary_provider:
    ai_client = AIClient(primary_provider)
    
    # Initialize cost tracker
    cost_tracker = CostTracker(daily_budget=5.0, monthly_budget=100.0)
    
    # Initialize secure assistant
    assistant = SecureAIAssistant(ai_client, kb)
    
    print("✅ Customer Support Assistant ready!")
else:
    print("⚠️  Cannot initialize assistant without API key")
    print("   Set OPENAI_API_KEY environment variable to use the demo")

✅ OpenAI provider initialized
✅ Customer Support Assistant ready!


In [82]:
def handle_customer_query(question: str, verbose: bool = True) -> Dict[str, Any]:
    """Handle customer support query with full AI pipeline"""
    
    if not primary_provider:
        return {'error': 'AI provider not initialized'}
    
    # Check budget before processing
    within_budget, budget_msg = cost_tracker.check_budget()
    if not within_budget:
        return {'error': budget_msg}
    
    if verbose:
        print("\n" + "="*80)
        print("🤖 Processing Customer Query")
        print("="*80)
        print(f"Question: {question}\n")
    
    # System prompt for customer support
    system_prompt = """You are a helpful customer support assistant. Answer questions using the 
    provided knowledge base context. Be concise, friendly, and professional. If you cite information, 
    use [Source N] format. If you don't have enough information, say so and offer to escalate."""
    
    # Get answer with RAG and security
    start_time = time.time()
    result = assistant.answer_question(question, system_prompt=system_prompt)
    total_time = (time.time() - start_time) * 1000
    
    if 'error' in result:
        if verbose:
            print(f"❌ Error: {result['error']}")
        return result
    
    # Track cost
    # Note: We need to create a mock AIResponse for tracking
    # In production, the assistant would return the full response object
    
    if verbose:
        print(f"✅ Answer:\n{result['answer']}\n")
        print(f"📊 Metrics:")
        print(f"   Cost: ${result['cost']:.6f}")
        print(f"   Latency: {result['latency_ms']:.0f}ms")
        print(f"   Tokens: {result['tokens']}")
        print(f"   Total Time: {total_time:.0f}ms")
        
        if result['security']['pii_redacted']:
            print(f"   🔒 PII Redacted: {result['security']['pii_redacted']}")
        
        print(f"\n📚 Sources Retrieved: {len(result['sources'])}")
        for i, source in enumerate(result['sources'], 1):
            print(f"   {i}. {source['metadata'].get('category', 'N/A')} "
                  f"(relevance: {1 - source['distance']:.2f})")
    
    return result

print("✅ Query handler ready")

✅ Query handler ready


### Test the Customer Support Assistant

In [83]:
# Test 1: Basic query
if primary_provider:
    result = handle_customer_query("How do I reset my password?")


🤖 Processing Customer Query
Question: How do I reset my password?

✅ Answer:
To reset your password, click 'Forgot Password' on the login page. Enter your email address, and you'll receive a reset link within 5 minutes. Please note that the link expires after 24 hours. If you don't receive the email, check your spam folder or contact support for assistance [Source 1].

📊 Metrics:
   Cost: $0.000081
   Latency: 3877ms
   Tokens: 350
   Total Time: 3912ms

📚 Sources Retrieved: 3
   1. authentication (relevance: 0.72)
   2. onboarding (relevance: 0.23)
   3. security (relevance: 0.21)


In [84]:
# Test 2: Query with PII (will be redacted)
if primary_provider:
    result = handle_customer_query(
        "I can't log in with my email john.doe@example.com and phone 555-123-4567. Please help!"
    )


🤖 Processing Customer Query
Question: I can't log in with my email john.doe@example.com and phone 555-123-4567. Please help!

✅ Answer:
I’m sorry to hear that you're having trouble logging in. Here are a couple of steps you can try:

1. **Password Reset**: If you can't remember your password, you can reset it by clicking 'Forgot Password' on the login page. Enter your email address, and you should receive a reset link within 5 minutes. Remember to check your spam folder if you don't see it [Source 1].

2. **Two-Factor Authentication**: If you have Two-Factor Authentication (2FA) enabled, ensure you have access to your 2FA device for authentication. If you’ve lost access, please contact support with verification [Source 2].

If you're still unable to log in after trying these steps, please let me know, and I can escalate this issue for further assistance.

📊 Metrics:
   Cost: $0.000143
   Latency: 5882ms
   Tokens: 469
   Total Time: 5953ms
   🔒 PII Redacted: ['email', 'phone']

📚 Sour

In [85]:
# Test 3: Billing question
if primary_provider:
    result = handle_customer_query("When will I be charged for my subscription?")


🤖 Processing Customer Query
Question: When will I be charged for my subscription?

✅ Answer:
You will be charged for your subscription on the 1st of each month for your subscription tier [Source 1].

📊 Metrics:
   Cost: $0.000058
   Latency: 1100ms
   Tokens: 315
   Total Time: 1145ms

📚 Sources Retrieved: 3
   1. billing (relevance: 0.55)
   2. onboarding (relevance: 0.37)
   3. technical (relevance: 0.29)


In [86]:
# Test 4: Technical integration question
if primary_provider:
    result = handle_customer_query("What's the API rate limit for my tier?")


🤖 Processing Customer Query
Question: What's the API rate limit for my tier?

✅ Answer:
The API rate limit for the standard tier is 1000 requests per hour, while for the enterprise tier, it is 10,000 requests per hour [Source 1].

📊 Metrics:
   Cost: $0.000065
   Latency: 1791ms
   Tokens: 328
   Total Time: 1838ms

📚 Sources Retrieved: 3
   1. technical (relevance: 0.68)
   2. billing (relevance: 0.22)
   3. technical (relevance: 0.18)


### View Analytics & Cost Dashboard

In [87]:
# Display cost statistics
if primary_provider and cost_tracker.calls:
    stats = cost_tracker.get_stats()
    
    print("\n" + "="*80)
    print("📊 AI USAGE & COST ANALYTICS")
    print("="*80)
    print(f"\nTotal Calls: {stats['total_calls']}")
    print(f"Total Cost: ${stats['total_cost']:.4f}")
    print(f"Total Tokens: {stats['total_tokens']:,}")
    print(f"Avg Latency: {stats['avg_latency_ms']:.0f}ms")
    print(f"\nToday's Cost: ${stats['today_cost']:.4f}")
    print(f"Daily Budget Remaining: ${stats['today_budget_remaining']:.2f}")
    print(f"\nMonth Cost: ${stats['month_cost']:.4f}")
    print(f"Monthly Budget Remaining: ${stats['month_budget_remaining']:.2f}")
    
    print(f"\nCost by Provider:")
    for provider, cost in stats['by_provider'].items():
        print(f"  {provider}: ${cost:.4f}")
    
    print(f"\nCost by Model:")
    for model, cost in stats['by_model'].items():
        print(f"  {model}: ${cost:.4f}")
    
    # Show visualization
    fig = cost_tracker.visualize_costs()
    if fig:
        fig.show()
else:
    print("No usage data yet. Run some queries first!")

No usage data yet. Run some queries first!


## 6. Prompt Engineering Best Practices

### Effective Prompt Patterns

In [88]:
class PromptTemplates:
    """Reusable prompt templates for common tasks"""
    
    @staticmethod
    def structured_output(task: str, output_schema: str) -> str:
        """Prompt for structured JSON output"""
        return f"""Task: {task}

Output the result as JSON matching this schema:
{output_schema}

Important:
- Output ONLY valid JSON, no explanation
- All required fields must be present
- Use null for missing optional fields"""
    
    @staticmethod
    def few_shot_classification(examples: List[Dict], text: str) -> str:
        """Few-shot learning for classification"""
        examples_str = "\n\n".join([
            f"Input: {ex['input']}\nOutput: {ex['output']}"
            for ex in examples
        ])
        
        return f"""Classify the following text based on these examples:

{examples_str}

Now classify:
Input: {text}
Output:"""
    
    @staticmethod
    def chain_of_thought(question: str) -> str:
        """Chain-of-thought reasoning"""
        return f"""Question: {question}

Let's solve this step by step:
1. First, identify the key information
2. Then, break down the problem
3. Finally, provide the solution

Show your reasoning for each step."""
    
    @staticmethod
    def role_based(role: str, task: str, constraints: List[str] = None) -> str:
        """Role-based prompting"""
        prompt = f"""You are a {role}.

Task: {task}"""
        
        if constraints:
            prompt += "\n\nConstraints:\n"
            for i, constraint in enumerate(constraints, 1):
                prompt += f"{i}. {constraint}\n"
        
        return prompt

# Example usage
print("Example: Structured Output Prompt")
print("="*80)
schema = '''{
  "sentiment": "positive|negative|neutral",
  "confidence": 0.0-1.0,
  "key_phrases": ["string"]
}'''
print(PromptTemplates.structured_output(
    "Analyze the sentiment of: 'This product is amazing!'",
    schema
))

print("\n" + "="*80)
print("Example: Role-Based Prompt")
print("="*80)
print(PromptTemplates.role_based(
    role="senior Python developer",
    task="Review this code for security vulnerabilities",
    constraints=[
        "Focus on input validation",
        "Check for SQL injection risks",
        "Provide specific fix recommendations"
    ]
))

Example: Structured Output Prompt
Task: Analyze the sentiment of: 'This product is amazing!'

Output the result as JSON matching this schema:
{
  "sentiment": "positive|negative|neutral",
  "confidence": 0.0-1.0,
  "key_phrases": ["string"]
}

Important:
- Output ONLY valid JSON, no explanation
- All required fields must be present
- Use null for missing optional fields

Example: Role-Based Prompt
You are a senior Python developer.

Task: Review this code for security vulnerabilities

Constraints:
1. Focus on input validation
2. Check for SQL injection risks
3. Provide specific fix recommendations



## 7. Production-Ready Patterns

### Caching, Rate Limiting, Monitoring

In [89]:
from functools import lru_cache
import hashlib
from collections import deque
from threading import Lock

class ResponseCache:
    """LRU cache for AI responses"""
    
    def __init__(self, max_size: int = 100):
        self.cache = {}
        self.max_size = max_size
        self.access_order = deque()
        self.hits = 0
        self.misses = 0
    
    def _make_key(self, messages: List[AIMessage], **kwargs) -> str:
        """Create cache key from messages and parameters"""
        content = json.dumps([
            {'role': m.role, 'content': m.content} for m in messages
        ] + [kwargs])
        return hashlib.md5(content.encode()).hexdigest()
    
    def get(self, messages: List[AIMessage], **kwargs) -> Optional[AIResponse]:
        """Get cached response"""
        key = self._make_key(messages, **kwargs)
        
        if key in self.cache:
            self.hits += 1
            # Move to end (most recently used)
            self.access_order.remove(key)
            self.access_order.append(key)
            return self.cache[key]
        
        self.misses += 1
        return None
    
    def set(self, messages: List[AIMessage], response: AIResponse, **kwargs):
        """Cache response"""
        key = self._make_key(messages, **kwargs)
        
        # Evict oldest if at capacity
        if len(self.cache) >= self.max_size:
            oldest = self.access_order.popleft()
            del self.cache[oldest]
        
        self.cache[key] = response
        self.access_order.append(key)
    
    def stats(self) -> Dict[str, Any]:
        """Get cache statistics"""
        total = self.hits + self.misses
        return {
            'size': len(self.cache),
            'max_size': self.max_size,
            'hits': self.hits,
            'misses': self.misses,
            'hit_rate': self.hits / total if total > 0 else 0
        }

class RateLimiter:
    """Token bucket rate limiter"""
    
    def __init__(self, requests_per_minute: int = 60):
        self.requests_per_minute = requests_per_minute
        self.tokens = requests_per_minute
        self.last_update = time.time()
        self.lock = Lock()
    
    def acquire(self) -> bool:
        """Try to acquire a token
        
        Returns:
            True if request allowed, False if rate limited
        """
        with self.lock:
            now = time.time()
            elapsed = now - self.last_update
            
            # Refill tokens based on time elapsed
            self.tokens = min(
                self.requests_per_minute,
                self.tokens + elapsed * (self.requests_per_minute / 60)
            )
            self.last_update = now
            
            if self.tokens >= 1:
                self.tokens -= 1
                return True
            
            return False

class ProductionAIClient(AIClient):
    """Production-ready AI client with caching and rate limiting"""
    
    def __init__(self, primary_provider: AIProviderInterface,
                 fallback_provider: Optional[AIProviderInterface] = None,
                 enable_cache: bool = True,
                 rate_limit: int = 60):
        super().__init__(primary_provider, fallback_provider)
        self.cache = ResponseCache() if enable_cache else None
        self.rate_limiter = RateLimiter(rate_limit)
    
    def complete(self, messages: List[AIMessage], **kwargs) -> AIResponse:
        # Check rate limit
        if not self.rate_limiter.acquire():
            raise Exception("Rate limit exceeded")
        
        # Check cache
        if self.cache:
            cached = self.cache.get(messages, **kwargs)
            if cached:
                print("✅ Cache hit!")
                return cached
        
        # Get response from provider
        response = super().complete(messages, **kwargs)
        
        # Cache response
        if self.cache:
            self.cache.set(messages, response, **kwargs)
        
        return response
    
    def get_cache_stats(self) -> Dict[str, Any]:
        """Get cache statistics"""
        return self.cache.stats() if self.cache else {}

print("✅ Production patterns implemented")

✅ Production patterns implemented


## Summary & Best Practices

### Key Takeaways

1. **Architecture**
   - Use abstraction layers for multi-provider support
   - Implement failover and circuit breakers
   - Design for observability from day one

2. **Security**
   - Always validate and sanitize inputs
   - Redact PII before sending to APIs
   - Implement content filtering
   - Use rate limiting to prevent abuse

3. **Cost Management**
   - Track token usage and costs in real-time
   - Set budgets and alerts
   - Use caching to reduce API calls
   - Choose appropriate models for each task

4. **Performance**
   - Implement response caching
   - Use async/parallel processing where possible
   - Monitor latency and set SLAs
   - Consider edge caching for common queries

5. **RAG Integration**
   - Use vector databases for semantic search
   - Chunk documents appropriately (300-500 tokens)
   - Include metadata for better filtering
   - Cite sources in responses

6. **Prompt Engineering**
   - Use system prompts to set context and constraints
   - Provide examples for better accuracy (few-shot)
   - Request structured output (JSON) when needed
   - Iterate and test prompts with real data

7. **Production Readiness**
   - Implement comprehensive error handling
   - Log all requests and responses
   - Set up monitoring and alerting
   - Have fallback strategies for failures
   - Test at scale before deployment

## Interactive Presentation

View the complete presentation with architecture diagrams and best practices:

In [90]:
# Open the AI Integration Demo Presentation
import webbrowser
import os

# Path to the demo presentation
demo_path = '/home/bbrelin/src/repos/salesforce/slides/ai_integration_demo.html'

print(f"✅ Opening AI Integration Demo Presentation...")
print(f"   Path: {demo_path}")

# Launch in default browser using webbrowser module
try:
    # Use file:// URL for local files
    file_url = f'file://{os.path.abspath(demo_path)}'
    webbrowser.open(file_url, new=2)  # new=2 opens in a new tab if possible
    print(f"✅ Presentation opened in browser")
    print(f"   If browser didn't open, visit: {file_url}")
except Exception as e:
    print(f"⚠️  Could not open browser: {e}")
    print(f"   Open manually: file://{os.path.abspath(demo_path)}")

✅ Opening AI Integration Demo Presentation...
   Path: /home/bbrelin/src/repos/salesforce/slides/ai_integration_demo.html
✅ Presentation opened in browser
   If browser didn't open, visit: file:///home/bbrelin/src/repos/salesforce/slides/ai_integration_demo.html


Gtk-Message: 20:34:57.265: Not loading module "atk-bridge": The functionality is provided by GTK natively. Please try to not load it.
