# Week 11: Deployment, Monitoring & Reliability

## Overview
Welcome to Week 11 of the AI Engineering curriculum. This week focuses on **shipping AI systems to production** with proper deployment, monitoring, and reliability practices.

### Learning Objectives
By the end of this week, you will be able to:
- Build production APIs with FastAPI for AI services
- Dockerize AI applications for consistent deployment
- Deploy AI systems to cloud platforms
- Implement comprehensive logging and monitoring
- Detect and handle model drift
- Ensure reliability and uptime

### Real-World Outcome
Build and deploy a **Production AI Service** (ML or LLM-based) with monitoring and reliability features.

---

## Part 1: Building APIs with FastAPI

### Why FastAPI for AI Services?

**Benefits:**
- Fast performance (built on Starlette/Pydantic)
- Automatic API documentation (Swagger/OpenAPI)
- Type validation
- Async support
- Easy to test

### API Design Principles
1. **RESTful**: Standard HTTP methods
2. **Versioning**: /v1/predict, /v2/predict
3. **Error handling**: Proper status codes
4. **Rate limiting**: Prevent abuse
5. **Authentication**: Secure access

### TODO 1.1: Build FastAPI Service for AI Model

In [None]:
from fastapi import FastAPI, HTTPException, Request
from pydantic import BaseModel, Field
from typing import List, Dict, Optional, Any
import time
import asyncio

app = FastAPI(
    title="AI Service API",
    description="Production AI service with ML/LLM capabilities",
    version="1.0.0"
)

# Request/Response models
class PredictionRequest(BaseModel):
    """Request model for predictions."""
    input_data: Dict[str, Any]
    model_version: Optional[str] = "latest"
    return_probabilities: bool = False

class PredictionResponse(BaseModel):
    """Response model for predictions."""
    prediction: Any
    confidence: Optional[float] = None
    model_version: str
    processing_time_ms: float

class HealthResponse(BaseModel):
    """Health check response."""
    status: str
    model_loaded: bool
    uptime_seconds: float

# TODO: Implement AI model service
class AIModelService:
    """Service for managing AI model inference."""
    
    def __init__(self, model_path: str):
        self.model_path = model_path
        self.model = None
        self.start_time = time.time()
        self.prediction_count = 0
    
    async def load_model(self):
        """Load ML/LLM model."""
        # TODO: Implement model loading
        pass
    
    async def predict(self, input_data: Dict) -> Dict:
        """Make prediction."""
        # TODO: Implement prediction logic
        pass
    
    def get_model_info(self) -> Dict:
        """Get model information."""
        # TODO: Implement model info retrieval
        pass

# Initialize service
model_service = AIModelService("models/model.pkl")

@app.on_event("startup")
async def startup_event():
    """Load model on startup."""
    # TODO: Implement startup logic
    await model_service.load_model()

@app.get("/health", response_model=HealthResponse)
async def health_check():
    """Health check endpoint."""
    # TODO: Implement health check
    pass

@app.post("/v1/predict", response_model=PredictionResponse)
async def predict(request: PredictionRequest):
    """Prediction endpoint."""
    # TODO: Implement prediction endpoint
    # 1. Validate input
    # 2. Make prediction
    # 3. Return response with timing
    pass

@app.get("/v1/model/info")
async def model_info():
    """Get model information."""
    # TODO: Implement model info endpoint
    pass

# Test the API
# Run with: uvicorn script:app --reload
# TODO: Test endpoints using curl or httpie

---

## Part 2: Dockerization

### Why Docker for AI?

**Benefits:**
- Consistent environments (dev = prod)
- Dependency isolation
- Easy scaling
- Version control for infrastructure
- Portable across clouds

### Docker Best Practices for AI
1. Use multi-stage builds
2. Minimize image size
3. Cache dependencies
4. Use specific base images
5. Don't include training data in images

### TODO 2.1: Create Docker Configuration

In [None]:
# Create Dockerfile for AI service
dockerfile_content = '''
# TODO: Complete the Dockerfile
FROM python:3.10-slim

# Set working directory
WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements
COPY requirements.txt .

# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Expose port
EXPOSE 8000

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3 \
    CMD curl -f http://localhost:8000/health || exit 1

# Run application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
'''

# TODO: Create docker-compose.yml for local development
docker_compose_content = '''
version: '3.8'

services:
  ai-service:
    build: .
    ports:
      - "8000:8000"
    environment:
      - MODEL_PATH=/models/model.pkl
      - LOG_LEVEL=INFO
    volumes:
      - ./models:/models
      - ./logs:/logs
    restart: unless-stopped
  
  # TODO: Add monitoring services (Prometheus, Grafana)
  
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
'''

# Save files
# with open('Dockerfile', 'w') as f:
#     f.write(dockerfile_content)
# with open('docker-compose.yml', 'w') as f:
#     f.write(docker_compose_content)

print("Docker configuration created!")
print("\nTo build: docker build -t ai-service .")
print("To run: docker-compose up")

---

## Part 3: Logging & Monitoring

### Production Logging Strategy

**Log Levels:**
- **DEBUG**: Detailed diagnostic info
- **INFO**: General informational messages
- **WARNING**: Warning messages
- **ERROR**: Error messages
- **CRITICAL**: Critical issues

### What to Log
1. **Request/Response**: Inputs, outputs, timing
2. **Errors**: Exceptions with context
3. **Performance**: Latency, throughput
4. **Model**: Predictions, confidence scores
5. **System**: Resource usage

### TODO 3.1: Implement Production Logging

In [None]:
import logging
import json
from datetime import datetime
from typing import Dict, Any
import sys

class ProductionLogger:
    """Production-grade logging system."""
    
    def __init__(self, service_name: str, log_level: str = "INFO"):
        self.service_name = service_name
        self.logger = self._setup_logger(log_level)
    
    def _setup_logger(self, log_level: str) -> logging.Logger:
        """Setup structured logging."""
        # TODO: Implement logger setup
        # Use JSON formatter for structured logs
        pass
    
    def log_request(self, request_id: str, endpoint: str, params: Dict):
        """Log incoming request."""
        # TODO: Implement request logging
        pass
    
    def log_prediction(self, request_id: str, prediction: Any, confidence: float, latency_ms: float):
        """Log prediction details."""
        # TODO: Implement prediction logging
        pass
    
    def log_error(self, request_id: str, error: Exception, context: Dict):
        """Log error with context."""
        # TODO: Implement error logging
        pass
    
    def log_performance(self, metrics: Dict):
        """Log performance metrics."""
        # TODO: Implement performance logging
        pass

class MetricsCollector:
    """Collects and exposes metrics for monitoring."""
    
    def __init__(self):
        self.metrics: Dict[str, List[float]] = {
            "request_count": [],
            "latency_ms": [],
            "error_count": [],
            "prediction_confidence": []
        }
    
    def record_request(self, latency_ms: float, success: bool):
        """Record request metrics."""
        # TODO: Implement request recording
        pass
    
    def record_prediction(self, confidence: float):
        """Record prediction metrics."""
        # TODO: Implement prediction recording
        pass
    
    def get_metrics_summary(self) -> Dict:
        """Get summary of metrics."""
        # TODO: Implement metrics summary
        pass
    
    def export_prometheus_format(self) -> str:
        """Export metrics in Prometheus format."""
        # TODO: Implement Prometheus export
        pass

# Test logging
# logger = ProductionLogger("ai-service")
# logger.log_request("req-123", "/v1/predict", {"input": "test"})
# logger.log_prediction("req-123", "result", 0.95, 45.2)

---

## Part 4: Model Drift Detection

### What is Model Drift?

**Types of Drift:**
1. **Data Drift**: Input distribution changes
2. **Concept Drift**: Relationship between input/output changes
3. **Upstream Drift**: Data pipeline changes

### Detection Methods
- Statistical tests (KS test, Chi-square)
- Distribution comparison
- Performance monitoring
- A/B testing

### TODO 4.1: Implement Drift Detection

In [None]:
import numpy as np
from scipy import stats
from typing import List, Dict, Tuple
from collections import deque

class DriftDetector:
    """Detects data and concept drift in production."""
    
    def __init__(self, reference_data: np.ndarray, window_size: int = 1000):
        self.reference_data = reference_data
        self.window_size = window_size
        self.production_window = deque(maxlen=window_size)
        self.drift_history: List[Dict] = []
    
    def add_production_sample(self, sample: np.ndarray):
        """Add new production sample."""
        # TODO: Implement sample addition
        pass
    
    def detect_data_drift(self, significance_level: float = 0.05) -> Tuple[bool, float]:
        """Detect data drift using statistical test."""
        # TODO: Implement drift detection
        # Use Kolmogorov-Smirnov test or similar
        pass
    
    def compute_drift_score(self) -> float:
        """Compute drift score (0-1)."""
        # TODO: Implement drift score computation
        pass
    
    def get_drift_report(self) -> Dict:
        """Generate drift report."""
        # TODO: Implement drift reporting
        pass

class PerformanceMonitor:
    """Monitors model performance over time."""
    
    def __init__(self, baseline_metrics: Dict):
        self.baseline_metrics = baseline_metrics
        self.performance_history: List[Dict] = []
        self.alert_thresholds = {
            "accuracy_drop": 0.05,
            "latency_increase": 2.0
        }
    
    def record_performance(self, metrics: Dict):
        """Record performance metrics."""
        # TODO: Implement performance recording
        pass
    
    def detect_degradation(self) -> List[str]:
        """Detect performance degradation."""
        # TODO: Implement degradation detection
        pass
    
    def should_trigger_alert(self) -> bool:
        """Check if alert should be triggered."""
        # TODO: Implement alert logic
        pass

# Test drift detection
# reference = np.random.normal(0, 1, 1000)
# detector = DriftDetector(reference)
# for i in range(100):
#     # Simulate drift
#     sample = np.random.normal(0.5, 1, 1)
#     detector.add_production_sample(sample)
# drift_detected, p_value = detector.detect_data_drift()
# print(f"Drift detected: {drift_detected}, p-value: {p_value}")

---

## Part 5: Cloud Deployment

### Deployment Options

**Cloud Providers:**
- AWS (SageMaker, EC2, Lambda)
- Google Cloud (Vertex AI, Cloud Run)
- Azure (ML, Container Instances)
- Hugging Face Spaces

### Deployment Strategies
1. **Blue-Green**: Two identical environments
2. **Canary**: Gradual rollout
3. **Rolling**: Update incrementally
4. **A/B Testing**: Compare versions

### TODO 5.1: Deploy to Cloud

In [None]:
# Deployment configuration examples

# AWS deployment with Terraform/CloudFormation
aws_deployment_config = '''
# TODO: Create deployment configuration
# Example: Deploy to AWS ECS/Fargate

resource "aws_ecs_service" "ai_service" {
  name            = "ai-service"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.ai_service.arn
  desired_count   = 2
  
  load_balancer {
    target_group_arn = aws_lb_target_group.ai_service.arn
    container_name   = "ai-service"
    container_port   = 8000
  }
}
'''

# Kubernetes deployment
k8s_deployment_config = '''
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ai-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ai-service
  template:
    metadata:
      labels:
        app: ai-service
    spec:
      containers:
      - name: ai-service
        image: your-registry/ai-service:latest
        ports:
        - containerPort: 8000
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
        env:
        - name: MODEL_PATH
          value: "/models/model.pkl"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
'''

class DeploymentManager:
    """Manages deployment to cloud platforms."""
    
    def __init__(self, platform: str):
        self.platform = platform
        self.deployment_history: List[Dict] = []
    
    def deploy(self, image_tag: str, config: Dict) -> str:
        """Deploy service to cloud."""
        # TODO: Implement deployment logic
        pass
    
    def rollback(self, deployment_id: str):
        """Rollback to previous deployment."""
        # TODO: Implement rollback
        pass
    
    def health_check(self, endpoint: str) -> bool:
        """Check deployment health."""
        # TODO: Implement health check
        pass

print("Deployment configurations created!")
print("Review and customize for your cloud provider.")

---

## Part 6: Building the Complete Production Service

### System Architecture

```
Load Balancer
     |
     v
API Gateway (FastAPI)
     |
     +-- Model Service
     +-- Logging
     +-- Metrics Collection
     +-- Drift Detection
     +-- Cache (Redis)
```

### Production Checklist
- [ ] API with proper error handling
- [ ] Docker containerization
- [ ] Structured logging
- [ ] Metrics collection
- [ ] Drift detection
- [ ] Health checks
- [ ] Auto-scaling
- [ ] Monitoring dashboard

### TODO 6.1: Integrate All Components

In [None]:
from fastapi import FastAPI, BackgroundTasks
import asyncio

class ProductionAIService:
    """Complete production AI service."""
    
    def __init__(self, config: Dict):
        self.config = config
        self.app = FastAPI(title="Production AI Service")
        self.model_service = AIModelService(config['model_path'])
        self.logger = ProductionLogger("ai-service")
        self.metrics = MetricsCollector()
        self.drift_detector = None  # Initialize with reference data
        self._setup_routes()
    
    def _setup_routes(self):
        """Setup all API routes."""
        # TODO: Setup all endpoints
        pass
    
    async def startup(self):
        """Initialize service on startup."""
        # TODO: Implement startup sequence
        # 1. Load model
        # 2. Initialize monitoring
        # 3. Load reference data for drift detection
        # 4. Start background tasks
        pass
    
    async def background_monitoring(self):
        """Background task for monitoring."""
        # TODO: Implement background monitoring
        # Periodically check drift, performance, etc.
        pass
    
    async def process_prediction_request(self, request: PredictionRequest) -> PredictionResponse:
        """Process prediction with full monitoring."""
        # TODO: Implement end-to-end request processing
        # 1. Log request
        # 2. Make prediction
        # 3. Record metrics
        # 4. Check drift
        # 5. Log response
        # 6. Return result
        pass
    
    def get_service_status(self) -> Dict:
        """Get comprehensive service status."""
        # TODO: Implement status reporting
        pass

# Initialize production service
# config = {
#     'model_path': 'models/model.pkl',
#     'log_level': 'INFO',
#     'drift_detection_enabled': True
# }
# service = ProductionAIService(config)
# Run with: uvicorn service:service.app --host 0.0.0.0 --port 8000

---

## Summary and Next Steps

### What You've Learned
- Building production APIs with FastAPI
- Dockerizing AI applications
- Production logging and monitoring
- Drift detection and performance monitoring
- Cloud deployment strategies
- Building complete production AI services

### Next Week Preview
Week 12 will cover the **Capstone System**, where you'll:
- Design end-to-end AI agent platforms
- Handle failure scenarios
- Implement reliability patterns
- Build a production-ready AI agent platform

### Further Practice
1. Deploy your service to a cloud provider
2. Set up monitoring dashboards (Grafana)
3. Implement A/B testing
4. Add authentication and rate limiting
5. Create CI/CD pipelines

---

**Great job on completing Week 11!** 🎉