# Notebook 13: Production Deployment

## Best Practices for Agentic AI using MCP

---

**Series:** MCP Server Best Practices - Agentic AI Workflows  
**Notebook:** 13 of 13  
**Level:** Advanced  
**Duration:** ~75 minutes

---

## Learning Objectives

By the end of this notebook, you will:

1. Containerize MCP servers for production
2. Deploy MCP infrastructure on Kubernetes
3. Set up CI/CD pipelines for MCP servers
4. Implement production monitoring and alerting
5. Design for high availability and disaster recovery
6. Apply security best practices for production

## 1. Production Deployment Overview

### The Production Journey

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ                    MCP PRODUCTION DEPLOYMENT JOURNEY                     ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ                                                                          ‚îÇ
‚îÇ   LOCAL DEV          STAGING              PRODUCTION                    ‚îÇ
‚îÇ   ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê          ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê              ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê                    ‚îÇ
‚îÇ                                                                          ‚îÇ
‚îÇ   ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê       ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê          ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê       ‚îÇ
‚îÇ   ‚îÇ Claude  ‚îÇ       ‚îÇ  Test   ‚îÇ          ‚îÇ    LOAD BALANCER    ‚îÇ       ‚îÇ
‚îÇ   ‚îÇ Desktop ‚îÇ       ‚îÇ Clients ‚îÇ          ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò       ‚îÇ
‚îÇ   ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îò       ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îò                     ‚îÇ                  ‚îÇ
‚îÇ        ‚îÇ                 ‚îÇ                          ‚îÇ                  ‚îÇ
‚îÇ        ‚ñº                 ‚ñº                          ‚ñº                  ‚îÇ
‚îÇ   ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê       ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê          ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê       ‚îÇ
‚îÇ   ‚îÇ   MCP   ‚îÇ       ‚îÇ   MCP   ‚îÇ          ‚îÇ    MCP GATEWAY      ‚îÇ       ‚îÇ
‚îÇ   ‚îÇ Server  ‚îÇ       ‚îÇ Server  ‚îÇ          ‚îÇ    (HA Cluster)     ‚îÇ       ‚îÇ
‚îÇ   ‚îÇ (local) ‚îÇ       ‚îÇ(Docker) ‚îÇ          ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò       ‚îÇ
‚îÇ   ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò       ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò                     ‚îÇ                  ‚îÇ
‚îÇ                                          ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îº‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê       ‚îÇ
‚îÇ   ‚Ä¢ Single process  ‚Ä¢ Containerized      ‚îÇ          ‚îÇ          ‚îÇ       ‚îÇ
‚îÇ   ‚Ä¢ No auth         ‚Ä¢ Basic auth         ‚ñº          ‚ñº          ‚ñº       ‚îÇ
‚îÇ   ‚Ä¢ Console logs    ‚Ä¢ Structured logs  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îê    ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îê     ‚îÇ
‚îÇ                                        ‚îÇMCP ‚îÇ    ‚îÇMCP ‚îÇ    ‚îÇMCP ‚îÇ     ‚îÇ
‚îÇ                                        ‚îÇ 1  ‚îÇ    ‚îÇ 2  ‚îÇ    ‚îÇ 3  ‚îÇ     ‚îÇ
‚îÇ                                        ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îò    ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îò     ‚îÇ
‚îÇ                                                                         ‚îÇ
‚îÇ                                        ‚Ä¢ Kubernetes orchestration       ‚îÇ
‚îÇ                                        ‚Ä¢ Full observability             ‚îÇ
‚îÇ                                        ‚Ä¢ Auto-scaling                   ‚îÇ
‚îÇ                                        ‚Ä¢ Disaster recovery              ‚îÇ
‚îÇ                                                                          ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

In [None]:
# Setup for this notebook
import json
from typing import Dict, List, Any, Optional
from dataclasses import dataclass, field
from datetime import datetime

def pprint(obj: dict, title: str = None):
    if title:
        print(f"\n{'='*60}")
        print(f" {title}")
        print(f"{'='*60}")
    print(json.dumps(obj, indent=2, default=str))

## 2. Containerizing MCP Servers

### Dockerfile Best Practices

In [None]:
# Production Dockerfile for MCP Server
dockerfile_content = '''
# ============================================================
# Production Dockerfile for MCP Server
# ============================================================

# Build stage
FROM python:3.11-slim as builder

WORKDIR /app

# Install build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \\
    build-essential \\
    && rm -rf /var/lib/apt/lists/*

# Create virtual environment
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# ============================================================
# Production stage
FROM python:3.11-slim as production

# Security: Run as non-root user
RUN groupadd -r mcp && useradd -r -g mcp mcp

WORKDIR /app

# Copy virtual environment from builder
COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

# Copy application code
COPY --chown=mcp:mcp . .

# Security: Make files read-only where possible
RUN chmod -R 755 /app && \\
    chmod -R 555 /app/*.py

# Environment variables
ENV PYTHONUNBUFFERED=1 \\
    PYTHONDONTWRITEBYTECODE=1 \\
    MCP_LOG_LEVEL=INFO \\
    MCP_LOG_FORMAT=json

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \\
    CMD python -c "import urllib.request; urllib.request.urlopen(\'http://localhost:8000/health\')" || exit 1

# Switch to non-root user
USER mcp

# Expose port for HTTP transport
EXPOSE 8000

# Run the MCP server
CMD ["python", "-m", "mcp_server", "--host", "0.0.0.0", "--port", "8000"]
'''

print("Production Dockerfile:")
print("=" * 60)
print(dockerfile_content)

In [None]:
# Docker Compose for local development
docker_compose_content = '''
version: '3.8'

services:
  # MCP Gateway
  mcp-gateway:
    build:
      context: ./gateway
      dockerfile: Dockerfile
    ports:
      - "8000:8000"
    environment:
      - MCP_LOG_LEVEL=DEBUG
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
      - AUTH_ISSUER=http://keycloak:8080/realms/mcp
    depends_on:
      - github-mcp
      - database-mcp
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  # GitHub MCP Server
  github-mcp:
    build:
      context: ./servers/github
      dockerfile: Dockerfile
    environment:
      - GITHUB_TOKEN=${GITHUB_TOKEN}
      - MCP_LOG_LEVEL=INFO
    secrets:
      - github_token

  # Database MCP Server
  database-mcp:
    build:
      context: ./servers/database
      dockerfile: Dockerfile
    environment:
      - DATABASE_URL=postgresql://user:pass@postgres:5432/app
    depends_on:
      - postgres

  # PostgreSQL Database
  postgres:
    image: postgres:15-alpine
    environment:
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=pass
      - POSTGRES_DB=app
    volumes:
      - postgres_data:/var/lib/postgresql/data

  # OpenTelemetry Collector
  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    command: ["--config=/etc/otel-collector-config.yaml"]
    volumes:
      - ./observability/otel-collector-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "4317:4317"  # OTLP gRPC
      - "4318:4318"  # OTLP HTTP

  # Prometheus
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./observability/prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"

  # Grafana
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - grafana_data:/var/lib/grafana

volumes:
  postgres_data:
  grafana_data:

secrets:
  github_token:
    file: ./secrets/github_token.txt
'''

print("Docker Compose for Development:")
print("=" * 60)
print(docker_compose_content)

## 3. Kubernetes Deployment

### MCP Server Kubernetes Resources

In [None]:
# Kubernetes Deployment for MCP Server
k8s_deployment = {
    "apiVersion": "apps/v1",
    "kind": "Deployment",
    "metadata": {
        "name": "github-mcp-server",
        "namespace": "mcp-system",
        "labels": {
            "app": "github-mcp",
            "version": "v1.0.0"
        }
    },
    "spec": {
        "replicas": 3,
        "selector": {
            "matchLabels": {
                "app": "github-mcp"
            }
        },
        "strategy": {
            "type": "RollingUpdate",
            "rollingUpdate": {
                "maxSurge": 1,
                "maxUnavailable": 0
            }
        },
        "template": {
            "metadata": {
                "labels": {
                    "app": "github-mcp"
                },
                "annotations": {
                    "prometheus.io/scrape": "true",
                    "prometheus.io/port": "9090"
                }
            },
            "spec": {
                "serviceAccountName": "mcp-server",
                "securityContext": {
                    "runAsNonRoot": True,
                    "runAsUser": 1000,
                    "fsGroup": 1000
                },
                "containers": [
                    {
                        "name": "mcp-server",
                        "image": "registry.company.com/mcp/github-server:v1.0.0",
                        "imagePullPolicy": "Always",
                        "ports": [
                            {"containerPort": 8000, "name": "http"},
                            {"containerPort": 9090, "name": "metrics"}
                        ],
                        "env": [
                            {
                                "name": "MCP_LOG_LEVEL",
                                "value": "INFO"
                            },
                            {
                                "name": "GITHUB_TOKEN",
                                "valueFrom": {
                                    "secretKeyRef": {
                                        "name": "github-credentials",
                                        "key": "token"
                                    }
                                }
                            },
                            {
                                "name": "OTEL_EXPORTER_OTLP_ENDPOINT",
                                "value": "http://otel-collector.observability:4317"
                            }
                        ],
                        "resources": {
                            "requests": {
                                "cpu": "250m",
                                "memory": "256Mi"
                            },
                            "limits": {
                                "cpu": "1000m",
                                "memory": "1Gi"
                            }
                        },
                        "livenessProbe": {
                            "httpGet": {
                                "path": "/health/live",
                                "port": 8000
                            },
                            "initialDelaySeconds": 10,
                            "periodSeconds": 15,
                            "timeoutSeconds": 5,
                            "failureThreshold": 3
                        },
                        "readinessProbe": {
                            "httpGet": {
                                "path": "/health/ready",
                                "port": 8000
                            },
                            "initialDelaySeconds": 5,
                            "periodSeconds": 10,
                            "timeoutSeconds": 5,
                            "failureThreshold": 3
                        },
                        "securityContext": {
                            "readOnlyRootFilesystem": True,
                            "allowPrivilegeEscalation": False,
                            "capabilities": {
                                "drop": ["ALL"]
                            }
                        },
                        "volumeMounts": [
                            {
                                "name": "tmp",
                                "mountPath": "/tmp"
                            },
                            {
                                "name": "config",
                                "mountPath": "/etc/mcp",
                                "readOnly": True
                            }
                        ]
                    }
                ],
                "volumes": [
                    {
                        "name": "tmp",
                        "emptyDir": {}
                    },
                    {
                        "name": "config",
                        "configMap": {
                            "name": "github-mcp-config"
                        }
                    }
                ],
                "affinity": {
                    "podAntiAffinity": {
                        "preferredDuringSchedulingIgnoredDuringExecution": [
                            {
                                "weight": 100,
                                "podAffinityTerm": {
                                    "labelSelector": {
                                        "matchLabels": {
                                            "app": "github-mcp"
                                        }
                                    },
                                    "topologyKey": "kubernetes.io/hostname"
                                }
                            }
                        ]
                    }
                }
            }
        }
    }
}

print("Kubernetes Deployment:")
print("=" * 60)
print(json.dumps(k8s_deployment, indent=2))

In [None]:
# Kubernetes Service and HPA
k8s_service = {
    "apiVersion": "v1",
    "kind": "Service",
    "metadata": {
        "name": "github-mcp-server",
        "namespace": "mcp-system"
    },
    "spec": {
        "selector": {
            "app": "github-mcp"
        },
        "ports": [
            {
                "name": "http",
                "port": 80,
                "targetPort": 8000
            },
            {
                "name": "metrics",
                "port": 9090,
                "targetPort": 9090
            }
        ],
        "type": "ClusterIP"
    }
}

k8s_hpa = {
    "apiVersion": "autoscaling/v2",
    "kind": "HorizontalPodAutoscaler",
    "metadata": {
        "name": "github-mcp-server",
        "namespace": "mcp-system"
    },
    "spec": {
        "scaleTargetRef": {
            "apiVersion": "apps/v1",
            "kind": "Deployment",
            "name": "github-mcp-server"
        },
        "minReplicas": 3,
        "maxReplicas": 10,
        "metrics": [
            {
                "type": "Resource",
                "resource": {
                    "name": "cpu",
                    "target": {
                        "type": "Utilization",
                        "averageUtilization": 70
                    }
                }
            },
            {
                "type": "Pods",
                "pods": {
                    "metric": {
                        "name": "mcp_requests_per_second"
                    },
                    "target": {
                        "type": "AverageValue",
                        "averageValue": "100"
                    }
                }
            }
        ],
        "behavior": {
            "scaleDown": {
                "stabilizationWindowSeconds": 300,
                "policies": [
                    {
                        "type": "Percent",
                        "value": 10,
                        "periodSeconds": 60
                    }
                ]
            },
            "scaleUp": {
                "stabilizationWindowSeconds": 0,
                "policies": [
                    {
                        "type": "Percent",
                        "value": 100,
                        "periodSeconds": 15
                    },
                    {
                        "type": "Pods",
                        "value": 4,
                        "periodSeconds": 15
                    }
                ],
                "selectPolicy": "Max"
            }
        }
    }
}

print("Kubernetes Service:")
print("=" * 60)
print(json.dumps(k8s_service, indent=2))
print("\n" + "=" * 60)
print("Horizontal Pod Autoscaler:")
print("=" * 60)
print(json.dumps(k8s_hpa, indent=2))

## 4. CI/CD Pipeline

### GitHub Actions for MCP Servers

In [None]:
# GitHub Actions CI/CD Pipeline
github_actions_workflow = '''
name: MCP Server CI/CD

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  # ============================================
  # Test Job
  # ============================================
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'
          cache: 'pip'
      
      - name: Install dependencies
        run: |
          pip install -r requirements.txt
          pip install -r requirements-dev.txt
      
      - name: Run linting
        run: |
          ruff check .
          mypy .
      
      - name: Run unit tests
        run: pytest tests/unit -v --cov=mcp_server --cov-report=xml
      
      - name: Run integration tests
        run: pytest tests/integration -v
      
      - name: Test MCP Inspector compatibility
        run: |
          npm install -g @modelcontextprotocol/inspector
          timeout 30 npx @modelcontextprotocol/inspector python -m mcp_server --test || true
      
      - name: Upload coverage
        uses: codecov/codecov-action@v3
        with:
          file: ./coverage.xml

  # ============================================
  # Security Scan Job
  # ============================================
  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Run Trivy vulnerability scanner
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: 'fs'
          scan-ref: '.'
          severity: 'CRITICAL,HIGH'
          exit-code: '1'
      
      - name: Run Bandit security linter
        run: |
          pip install bandit
          bandit -r mcp_server/ -ll

  # ============================================
  # Build Job
  # ============================================
  build:
    needs: [test, security]
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      
      - name: Log in to Container Registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      
      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=ref,event=branch
            type=ref,event=pr
            type=semver,pattern={{version}}
            type=sha,prefix=
      
      - name: Build and push Docker image
        uses: docker/build-push-action@v5
        with:
          context: .
          push: ${{ github.event_name != 'pull_request' }}
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  # ============================================
  # Deploy to Staging
  # ============================================
  deploy-staging:
    needs: build
    if: github.ref == 'refs/heads/develop'
    runs-on: ubuntu-latest
    environment: staging
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up kubectl
        uses: azure/setup-kubectl@v3
      
      - name: Configure kubectl
        run: |
          echo "${{ secrets.KUBE_CONFIG_STAGING }}" | base64 -d > kubeconfig
          export KUBECONFIG=kubeconfig
      
      - name: Deploy to staging
        run: |
          kubectl set image deployment/mcp-server \\
            mcp-server=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} \\
            -n mcp-staging
          kubectl rollout status deployment/mcp-server -n mcp-staging

  # ============================================
  # Deploy to Production
  # ============================================
  deploy-production:
    needs: build
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    environment: production
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up kubectl
        uses: azure/setup-kubectl@v3
      
      - name: Configure kubectl
        run: |
          echo "${{ secrets.KUBE_CONFIG_PRODUCTION }}" | base64 -d > kubeconfig
          export KUBECONFIG=kubeconfig
      
      - name: Deploy to production
        run: |
          kubectl set image deployment/mcp-server \\
            mcp-server=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} \\
            -n mcp-production
          kubectl rollout status deployment/mcp-server -n mcp-production --timeout=5m
      
      - name: Verify deployment
        run: |
          # Health check
          kubectl exec -n mcp-production deploy/mcp-server -- \\
            curl -sf http://localhost:8000/health || exit 1
'''

print("GitHub Actions CI/CD Pipeline:")
print("=" * 60)
print(github_actions_workflow)

## 5. Production Monitoring

### Alerting Rules

In [None]:
# Prometheus Alerting Rules for MCP
prometheus_alerts = {
    "groups": [
        {
            "name": "mcp-server-alerts",
            "rules": [
                {
                    "alert": "MCPServerDown",
                    "expr": 'up{job="mcp-server"} == 0',
                    "for": "1m",
                    "labels": {
                        "severity": "critical"
                    },
                    "annotations": {
                        "summary": "MCP Server is down",
                        "description": "MCP Server {{ $labels.instance }} has been down for more than 1 minute."
                    }
                },
                {
                    "alert": "MCPHighErrorRate",
                    "expr": 'rate(mcp_requests_failed_total[5m]) / rate(mcp_requests_total[5m]) > 0.05',
                    "for": "5m",
                    "labels": {
                        "severity": "warning"
                    },
                    "annotations": {
                        "summary": "High MCP error rate",
                        "description": "Error rate is {{ $value | humanizePercentage }} for {{ $labels.instance }}"
                    }
                },
                {
                    "alert": "MCPHighLatency",
                    "expr": 'histogram_quantile(0.95, rate(mcp_tool_duration_seconds_bucket[5m])) > 2',
                    "for": "5m",
                    "labels": {
                        "severity": "warning"
                    },
                    "annotations": {
                        "summary": "High MCP latency",
                        "description": "95th percentile latency is {{ $value }}s for {{ $labels.instance }}"
                    }
                },
                {
                    "alert": "MCPRateLimitExceeded",
                    "expr": 'rate(mcp_rate_limit_exceeded_total[1m]) > 0',
                    "for": "1m",
                    "labels": {
                        "severity": "info"
                    },
                    "annotations": {
                        "summary": "MCP rate limit exceeded",
                        "description": "Rate limit exceeded for {{ $labels.tenant_id }}"
                    }
                },
                {
                    "alert": "MCPHighMemoryUsage",
                    "expr": 'container_memory_usage_bytes{container="mcp-server"} / container_spec_memory_limit_bytes > 0.85',
                    "for": "5m",
                    "labels": {
                        "severity": "warning"
                    },
                    "annotations": {
                        "summary": "High memory usage",
                        "description": "Memory usage is {{ $value | humanizePercentage }} for {{ $labels.pod }}"
                    }
                },
                {
                    "alert": "MCPPodRestarts",
                    "expr": 'increase(kube_pod_container_status_restarts_total{container="mcp-server"}[1h]) > 3',
                    "for": "0m",
                    "labels": {
                        "severity": "warning"
                    },
                    "annotations": {
                        "summary": "MCP Pod restarting",
                        "description": "Pod {{ $labels.pod }} has restarted {{ $value }} times in the last hour"
                    }
                }
            ]
        }
    ]
}

print("Prometheus Alerting Rules:")
print("=" * 60)
print(json.dumps(prometheus_alerts, indent=2))

## 6. Production Checklist

### Pre-Production Checklist

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ                    MCP PRODUCTION CHECKLIST                              ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ                                                                          ‚îÇ
‚îÇ   SECURITY                                                              ‚îÇ
‚îÇ   ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê                                                              ‚îÇ
‚îÇ   ‚òê Authentication configured (OAuth 2.0 / API Keys)                    ‚îÇ
‚îÇ   ‚òê TLS/HTTPS enabled                                                   ‚îÇ
‚îÇ   ‚òê Secrets stored in vault (not env vars)                              ‚îÇ
‚îÇ   ‚òê Container runs as non-root                                          ‚îÇ
‚îÇ   ‚òê Network policies configured                                         ‚îÇ
‚îÇ   ‚òê Security scanning in CI/CD                                          ‚îÇ
‚îÇ   ‚òê Input validation on all tools                                       ‚îÇ
‚îÇ                                                                          ‚îÇ
‚îÇ   RELIABILITY                                                           ‚îÇ
‚îÇ   ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê                                                           ‚îÇ
‚îÇ   ‚òê Health checks configured (liveness + readiness)                     ‚îÇ
‚îÇ   ‚òê Multiple replicas (min 3 for production)                            ‚îÇ
‚îÇ   ‚òê Pod anti-affinity rules                                             ‚îÇ
‚îÇ   ‚òê Resource limits set                                                 ‚îÇ
‚îÇ   ‚òê Horizontal Pod Autoscaler configured                                ‚îÇ
‚îÇ   ‚òê PodDisruptionBudget defined                                         ‚îÇ
‚îÇ   ‚òê Graceful shutdown implemented                                       ‚îÇ
‚îÇ                                                                          ‚îÇ
‚îÇ   OBSERVABILITY                                                         ‚îÇ
‚îÇ   ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê                                                         ‚îÇ
‚îÇ   ‚òê Structured logging (JSON to stderr)                                 ‚îÇ
‚îÇ   ‚òê Metrics endpoint exposed                                            ‚îÇ
‚îÇ   ‚òê Distributed tracing enabled                                         ‚îÇ
‚îÇ   ‚òê Alerting rules configured                                           ‚îÇ
‚îÇ   ‚òê Dashboards created                                                  ‚îÇ
‚îÇ   ‚òê Log aggregation configured                                          ‚îÇ
‚îÇ                                                                          ‚îÇ
‚îÇ   OPERATIONS                                                            ‚îÇ
‚îÇ   ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê                                                            ‚îÇ
‚îÇ   ‚òê CI/CD pipeline tested                                               ‚îÇ
‚îÇ   ‚òê Rollback procedure documented                                       ‚îÇ
‚îÇ   ‚òê Backup strategy for stateful components                             ‚îÇ
‚îÇ   ‚òê Runbooks created for common issues                                  ‚îÇ
‚îÇ   ‚òê On-call rotation established                                        ‚îÇ
‚îÇ   ‚òê Disaster recovery plan tested                                       ‚îÇ
‚îÇ                                                                          ‚îÇ
‚îÇ   COMPLIANCE                                                            ‚îÇ
‚îÇ   ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê                                                            ‚îÇ
‚îÇ   ‚òê Audit logging enabled                                               ‚îÇ
‚îÇ   ‚òê Data retention policies configured                                  ‚îÇ
‚îÇ   ‚òê Access controls documented                                          ‚îÇ
‚îÇ   ‚òê Privacy requirements met                                            ‚îÇ
‚îÇ                                                                          ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

In [None]:
# Production Checklist Validator

@dataclass
class ChecklistItem:
    """A checklist item."""
    category: str
    item: str
    description: str
    critical: bool = True
    completed: bool = False

class ProductionChecklist:
    """Production readiness checklist."""
    
    def __init__(self):
        self.items: List[ChecklistItem] = []
        self._setup_default_items()
    
    def _setup_default_items(self):
        """Setup default checklist items."""
        security_items = [
            ("Authentication configured", "OAuth 2.0 or API key auth enabled", True),
            ("TLS/HTTPS enabled", "All traffic encrypted in transit", True),
            ("Non-root container", "Container runs as non-root user", True),
            ("Input validation", "All tool inputs validated", True),
            ("Security scanning", "Trivy/Bandit in CI/CD", False),
        ]
        
        reliability_items = [
            ("Health checks", "Liveness and readiness probes configured", True),
            ("Multiple replicas", "Minimum 3 replicas for production", True),
            ("Resource limits", "CPU and memory limits set", True),
            ("HPA configured", "Auto-scaling enabled", False),
            ("Graceful shutdown", "SIGTERM handling implemented", True),
        ]
        
        observability_items = [
            ("Structured logging", "JSON logs to stderr", True),
            ("Metrics endpoint", "Prometheus metrics exposed", True),
            ("Alerting rules", "Critical alerts configured", True),
            ("Tracing", "OpenTelemetry enabled", False),
        ]
        
        for item, desc, critical in security_items:
            self.items.append(ChecklistItem("Security", item, desc, critical))
        for item, desc, critical in reliability_items:
            self.items.append(ChecklistItem("Reliability", item, desc, critical))
        for item, desc, critical in observability_items:
            self.items.append(ChecklistItem("Observability", item, desc, critical))
    
    def mark_complete(self, item_name: str):
        """Mark an item as complete."""
        for item in self.items:
            if item.item == item_name:
                item.completed = True
                return
    
    def is_ready(self) -> tuple[bool, List[str]]:
        """Check if ready for production."""
        missing_critical = [
            item.item for item in self.items
            if item.critical and not item.completed
        ]
        return len(missing_critical) == 0, missing_critical
    
    def print_status(self):
        """Print checklist status."""
        current_category = None
        
        for item in self.items:
            if item.category != current_category:
                current_category = item.category
                print(f"\n{current_category.upper()}")
                print("-" * 40)
            
            status = "‚úÖ" if item.completed else ("‚ùå" if item.critical else "‚ö†Ô∏è")
            critical = "(critical)" if item.critical else ""
            print(f"{status} {item.item} {critical}")


# Demo
checklist = ProductionChecklist()

# Mark some items complete
checklist.mark_complete("Authentication configured")
checklist.mark_complete("TLS/HTTPS enabled")
checklist.mark_complete("Health checks")
checklist.mark_complete("Structured logging")

print("Production Readiness Check:")
print("=" * 60)
checklist.print_status()

ready, missing = checklist.is_ready()
print(f"\n{'='*60}")
print(f"Ready for production: {'‚úÖ YES' if ready else '‚ùå NO'}")
if not ready:
    print(f"Missing critical items: {missing}")

## 7. Key Takeaways

### Production Deployment Summary

| Component | Best Practice |
|-----------|---------------|
| **Container** | Multi-stage build, non-root user, read-only filesystem |
| **Kubernetes** | 3+ replicas, HPA, pod anti-affinity, resource limits |
| **CI/CD** | Automated testing, security scanning, staged deployments |
| **Monitoring** | Prometheus metrics, alerting rules, distributed tracing |
| **Security** | OAuth 2.0, TLS, secrets management, input validation |

### Architecture Principles

1. **Design for Failure** - Assume components will fail; build resilience
2. **Observe Everything** - You can't fix what you can't see
3. **Automate Everything** - Manual processes are error-prone
4. **Security by Default** - Start secure, not "secure it later"
5. **Scale Horizontally** - Prefer more instances over bigger instances

### Series Complete! üéâ

Congratulations on completing the **MCP Server Best Practices** series!

You've learned:
- MCP foundations and architecture
- Primitives: tools, resources, prompts
- Transport mechanisms (STDIO, HTTP, SSE)
- Building and testing MCP servers
- Agentic workflow patterns
- Gateway and enterprise patterns
- Observability and debugging
- Framework integration
- Production deployment

## 8. Exercises

### Exercise 1: Create Helm Chart

Create a Helm chart for deploying an MCP server:

In [None]:
# Exercise 1: Helm Chart values.yaml

helm_values = {
    "# Your Helm values here": "",
    "replicaCount": 3,
    "image": {
        "repository": "your-registry/mcp-server",
        "tag": "latest"
    },
    "# Add more values...": ""
}

# Complete the Helm chart structure
# - templates/deployment.yaml
# - templates/service.yaml
# - templates/hpa.yaml
# - templates/configmap.yaml
# - templates/secret.yaml

print("Create your Helm chart values.yaml:")
print(json.dumps(helm_values, indent=2))

### Exercise 2: Design Disaster Recovery

Design a disaster recovery plan for your MCP infrastructure:

In [None]:
# Exercise 2: Disaster Recovery Plan

dr_plan = {
    "rto": "4 hours",  # Recovery Time Objective
    "rpo": "1 hour",   # Recovery Point Objective
    "scenarios": {
        "single_pod_failure": {
            "detection": "Kubernetes auto-detects via health checks",
            "recovery": "Automatic pod restart",
            "estimated_time": "< 1 minute"
        },
        "availability_zone_failure": {
            "detection": "?",
            "recovery": "?",
            "estimated_time": "?"
        },
        "region_failure": {
            "detection": "?",
            "recovery": "?",
            "estimated_time": "?"
        },
        "data_corruption": {
            "detection": "?",
            "recovery": "?",
            "estimated_time": "?"
        }
    },
    "testing_schedule": "Quarterly"
}

print("Complete your Disaster Recovery Plan:")
print(json.dumps(dr_plan, indent=2))

### Exercise 3: Create Runbook

Create a runbook for handling a common production incident:

In [None]:
# Exercise 3: Incident Runbook

runbook = {
    "title": "MCP Server High Error Rate",
    "severity": "P2",
    "symptoms": [
        "Alert: MCPHighErrorRate fired",
        "Users reporting tool failures",
        "Error rate > 5%"
    ],
    "diagnosis_steps": [
        "1. Check Grafana dashboard for error patterns",
        "2. Review recent deployments",
        "3. Check backend service health",
        "4. Review logs for error messages",
        "# Add more steps..."
    ],
    "remediation_steps": [
        "# Add your remediation steps"
    ],
    "escalation": {
        "after_15_min": "Page on-call engineer",
        "after_30_min": "Page team lead",
        "after_1_hour": "Escalate to management"
    },
    "post_incident": [
        "Create incident report",
        "Schedule post-mortem",
        "Update runbook with learnings"
    ]
}

print("Complete your Incident Runbook:")
print(json.dumps(runbook, indent=2))

---

## References

- [Kubernetes Best Practices](https://kubernetes.io/docs/concepts/configuration/overview/)
- [Docker Security Best Practices](https://docs.docker.com/develop/security-best-practices/)
- [Prometheus Alerting](https://prometheus.io/docs/alerting/latest/alertmanager/)
- [SRE Workbook](https://sre.google/workbook/table-of-contents/)
- [MCP Specification](https://modelcontextprotocol.io/docs/)

---

## üéì Series Complete!

Thank you for completing the **MCP Server Best Practices** workshop series!

**All 13 Notebooks:**
1. MCP Foundations & Architecture
2. MCP Primitives Deep Dive
3. Transport Mechanisms: STDIO
4. Transport Mechanisms: HTTP & SSE
5. Building Your First MCP Server
6. Workshop: Filesystem MCP Server
7. MCP Server Best Practices
8. Agentic Workflow Patterns
9. MCP Gateway Fundamentals
10. Enterprise MCP Patterns
11. Observability & Debugging
12. Framework Integration
13. Production Deployment ‚úì