# 🐳 Docker & Containerization - Greek Derby RAG Chatbot

## Learning Objectives
By the end of this lesson, you will understand:
- Docker fundamentals and containerization concepts
- Multi-stage builds for optimized images
- Docker Compose for orchestration
- Container networking and service communication
- Health checks and monitoring
- Security best practices
- Production deployment strategies
- Performance optimization techniques

---

## Q1: What is Docker and why do we use it for our Greek Derby chatbot?

**Answer:**

Docker is a containerization platform that packages applications and their dependencies into lightweight, portable containers. Our Greek Derby RAG chatbot uses Docker to ensure consistent deployment across different environments and simplify the development-to-production pipeline.

### What is Containerization?

**Traditional Deployment Problems:**
- **"It works on my machine"** - Different environments behave differently
- **Dependency Hell** - Conflicting software versions
- **Complex Setup** - Manual installation of multiple services
- **Environment Drift** - Production differs from development
- **Scaling Issues** - Difficult to replicate and scale services

**Containerization Benefits:**
- **Consistency** - Same environment everywhere
- **Isolation** - Applications don't interfere with each other
- **Portability** - Run anywhere Docker is installed
- **Scalability** - Easy to replicate and scale
- **Efficiency** - Shared OS kernel, smaller resource footprint

### Our Docker Architecture:

```
┌─────────────────────────────────────────────────────────────┐
│                    Docker Host                              │
│  ┌─────────────────┐    ┌─────────────────────────────────┐ │
│  │   Frontend      │    │         Backend                 │ │
│  │   Container     │    │        Container                │ │
│  │                 │    │                                 │ │
│  │  ┌─────────────┐│    │  ┌─────────────────────────────┐│ │
│  │  │   Nginx     ││    │  │      FastAPI                ││ │
│  │  │   (Port 80) ││    │  │     (Port 8000)             ││ │
│  │  └─────────────┘│    │  └─────────────────────────────┘│ │
│  │                 │    │                                 │ │
│  │  ┌─────────────┐│    │  ┌─────────────────────────────┐│ │
│  │  │   React     ││    │  │      LangChain              ││ │
│  │  │   Build     ││    │  │      + LangGraph            ││ │
│  │  └─────────────┘│    │  └─────────────────────────────┘│ │
│  └─────────────────┘    │                                 │ │
│                         │  ┌─────────────────────────────┐│ │
│                         │  │      Pinecone               ││ │
│                         │  │     (External API)          ││ │
│                         │  └─────────────────────────────┘│ │
│                         └─────────────────────────────────┘ │
│                                                             │
│  ┌─────────────────────────────────────────────────────────┐│
│  │              Docker Network                             ││
│  │         (greek-derby-network)                           ││
│  └─────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────┘
```

### Why Docker for Our Project?

#### 1. **Multi-Service Architecture**
```yaml
# docker-compose.yml
services:
  backend:    # FastAPI + Python
  frontend:   # React + Nginx
```

**Benefits:**
- **Service Isolation**: Each service runs in its own container
- **Independent Scaling**: Scale frontend and backend separately
- **Technology Flexibility**: Different base images for different needs
- **Easy Development**: Start entire stack with one command

#### 2. **Environment Consistency**
```bash
# Development
docker-compose up

# Production
docker-compose -f docker-compose.prod.yml up
```

**Same environment everywhere:**
- **Developer Machines**: Identical to production
- **CI/CD Pipelines**: Consistent build and test environments
- **Staging/Production**: No surprises during deployment

#### 3. **Dependency Management**
```dockerfile
# Backend - Python dependencies
FROM python:3.11-slim
COPY requirements.txt .
RUN pip install -r requirements.txt

# Frontend - Node.js dependencies
FROM node:18-alpine as build
COPY package*.json .
RUN npm ci
```

**Benefits:**
- **Isolated Dependencies**: No conflicts between projects
- **Version Pinning**: Exact versions specified in requirements
- **Clean Environments**: Fresh container for each build
- **Reproducible Builds**: Same result every time

#### 4. **Easy Deployment**
```bash
# Single command deployment
docker-compose up -d

# Scale services
docker-compose up --scale backend=3

# Update services
docker-compose pull && docker-compose up -d
```

### Docker vs. Alternatives:

| Feature | Docker | Virtual Machines | Bare Metal |
|---------|--------|------------------|------------|
| **Resource Usage** | Low | High | Medium |
| **Startup Time** | Seconds | Minutes | Hours |
| **Isolation** | Process-level | OS-level | None |
| **Portability** | Excellent | Good | Poor |
| **Management** | Easy | Complex | Very Complex |

### Our Container Strategy:

#### 1. **Backend Container (FastAPI)**
- **Base Image**: `python:3.11-slim`
- **Purpose**: Run Python FastAPI application
- **Dependencies**: LangChain, Pinecone, OpenAI
- **Port**: 8000
- **Health Check**: `/health` endpoint

#### 2. **Frontend Container (React + Nginx)**
- **Base Image**: `nginx:alpine` (production)
- **Build Stage**: `node:18-alpine` (development)
- **Purpose**: Serve React application
- **Port**: 80 (mapped to 3000)
- **Health Check**: Root endpoint

#### 3. **External Services**
- **Pinecone**: Vector database (external API)
- **OpenAI**: LLM API (external API)
- **No Database Container**: Using external services

### Benefits for Our RAG Chatbot:

1. **Rapid Development**: Start entire stack instantly
2. **Consistent Testing**: Same environment for all tests
3. **Easy Scaling**: Scale backend for high load
4. **Simple Deployment**: One command to deploy
5. **Resource Efficiency**: Shared OS, optimized images
6. **Version Control**: Container images are versioned
7. **Rollback Capability**: Easy to revert to previous versions
8. **Monitoring**: Built-in health checks and logging


## Q2: How do we create optimized Dockerfiles using multi-stage builds?

**Answer:**

Multi-stage builds are a powerful Docker feature that allows us to create optimized production images by using multiple build stages. Our Greek Derby chatbot uses multi-stage builds to create efficient, secure, and minimal production containers.

### What are Multi-Stage Builds?

**Single-Stage Build Problems:**
```dockerfile
# ❌ Single-stage - includes build tools in production
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# Problem: Node.js, npm, source code all in final image
CMD ["npm", "start"]
```

**Multi-Stage Build Solution:**
```dockerfile
# ✅ Multi-stage - clean production image
FROM node:18-alpine as build
# Build stage - includes build tools
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

FROM nginx:alpine
# Production stage - only runtime dependencies
COPY --from=build /app/dist /usr/share/nginx/html
```

### Our Backend Dockerfile (Single-Stage):

```dockerfile
# backend/Dockerfile
FROM python:3.11-slim

# Set working directory
WORKDIR /app

# Set environment variables
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
ENV USER_AGENT=greek-derby-api/1.0

# Install system dependencies
RUN apt-get update && apt-get install -y \
    gcc \
    g++ \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements first for better caching
COPY requirements.txt .

# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy the application code
COPY . .

# Create a non-root user
RUN useradd --create-home --shell /bin/bash app && chown -R app:app /app
USER app

# Expose port
EXPOSE 8000

# Health check
HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:8000/health || exit 1

# Run the application
CMD ["python", "api/greek_derby_api.py"]
```

**Why Single-Stage for Backend:**
- **Python Runtime**: Need Python interpreter in production
- **Dependencies**: All packages needed at runtime
- **Source Code**: Application code required
- **Build Tools**: Only system dependencies, no build tools

### Our Frontend Dockerfile (Multi-Stage):

```dockerfile
# front-end/react-chatbot/Dockerfile
# Build stage
FROM node:18-alpine as build

# Set working directory
WORKDIR /app

# Copy package files
COPY package*.json ./

# Install dependencies (including dev dependencies for build)
RUN npm ci

# Copy source code
COPY . .

# Build the app
RUN npm run build

# Production stage
FROM nginx:alpine

# Install curl for health checks
RUN apk add --no-cache curl

# Copy built app from build stage
COPY --from=build /app/dist /usr/share/nginx/html

# Copy custom nginx config
COPY nginx.conf /etc/nginx/conf.d/default.conf

# Expose port
EXPOSE 80

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
    CMD curl -f http://localhost/ || exit 1

# Start nginx
CMD ["nginx", "-g", "daemon off;"]
```

### Multi-Stage Build Benefits:

#### 1. **Smaller Production Images**
```bash
# Build stage image (not used in production)
node:18-alpine + source code + build tools = ~500MB

# Production stage image
nginx:alpine + built files = ~25MB

# Total reduction: ~95% smaller!
```

#### 2. **Security Improvements**
```dockerfile
# Build stage - can have build tools, dev dependencies
FROM node:18-alpine as build
RUN npm ci  # Includes dev dependencies

# Production stage - minimal attack surface
FROM nginx:alpine
# No Node.js, no npm, no source code
# Only production files and nginx
```

#### 3. **Better Caching**
```dockerfile
# Dependencies change less frequently than source code
COPY package*.json ./
RUN npm ci  # This layer is cached if package.json doesn't change

COPY . .
RUN npm run build  # This layer rebuilds when source changes
```

### Advanced Multi-Stage Patterns:

#### 1. **Dependency Optimization**
```dockerfile
# front-end/react-chatbot/Dockerfile (Enhanced)
FROM node:18-alpine as dependencies

WORKDIR /app
COPY package*.json ./

# Install all dependencies
RUN npm ci

# Production dependencies only
FROM node:18-alpine as production-deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

# Build stage
FROM dependencies as build
COPY . .
RUN npm run build

# Production stage
FROM nginx:alpine
COPY --from=production-deps /app/node_modules ./node_modules
COPY --from=build /app/dist /usr/share/nginx/html
```

#### 2. **Multi-Architecture Builds**
```dockerfile
# Build for multiple architectures
FROM --platform=$BUILDPLATFORM node:18-alpine as build
# ... build steps ...

FROM --platform=$TARGETPLATFORM nginx:alpine
COPY --from=build /app/dist /usr/share/nginx/html
```

#### 3. **Development vs Production**
```dockerfile
# Development stage
FROM node:18-alpine as development
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
EXPOSE 3000
CMD ["npm", "run", "dev"]

# Production build
FROM node:18-alpine as build
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Production serve
FROM nginx:alpine as production
COPY --from=build /app/dist /usr/share/nginx/html
```

### Dockerfile Best Practices:

#### 1. **Layer Optimization**
```dockerfile
# ❌ Bad - creates many layers
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y wget
RUN rm -rf /var/lib/apt/lists/*

# ✅ Good - single layer
RUN apt-get update && apt-get install -y \
    curl \
    wget \
    && rm -rf /var/lib/apt/lists/*
```

#### 2. **Copy Optimization**
```dockerfile
# ❌ Bad - copies everything, invalidates cache
COPY . .
RUN npm run build

# ✅ Good - copy dependencies first
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
```

#### 3. **Security Best Practices**
```dockerfile
# ✅ Use specific versions
FROM python:3.11-slim

# ✅ Don't run as root
RUN useradd --create-home --shell /bin/bash app
USER app

# ✅ Use .dockerignore
# .dockerignore
node_modules
.git
*.md
.env
```

#### 4. **Health Checks**
```dockerfile
# ✅ Proper health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
    CMD curl -f http://localhost:8000/health || exit 1
```

### Image Size Comparison:

| Approach | Backend Image | Frontend Image | Total |
|----------|---------------|----------------|-------|
| **Single-Stage** | ~800MB | ~500MB | ~1.3GB |
| **Multi-Stage** | ~200MB | ~25MB | ~225MB |
| **Savings** | 75% | 95% | 83% |

### Build Commands:

```bash
# Build specific stage
docker build --target build -t greek-derby-frontend:build .

# Build production image
docker build -t greek-derby-frontend:latest .

# Build with build args
docker build --build-arg NODE_ENV=production -t greek-derby-frontend:prod .

# Multi-platform build
docker buildx build --platform linux/amd64,linux/arm64 -t greek-derby-frontend:latest .
```

### Benefits for Our RAG Chatbot:

1. **Faster Deployments**: Smaller images = faster pulls
2. **Lower Costs**: Less storage and bandwidth usage
3. **Better Security**: Minimal attack surface
4. **Improved Performance**: Less memory usage
5. **Easier Scaling**: Smaller images = faster container startup
6. **Better Caching**: Optimized layer caching
7. **Cleaner Production**: No build artifacts in production


## Q3: How do we orchestrate services with Docker Compose?

**Answer:**

Docker Compose is a tool for defining and running multi-container Docker applications. Our Greek Derby RAG chatbot uses Docker Compose to orchestrate the frontend, backend, and networking components, making it easy to manage the entire application stack.

### What is Docker Compose?

**Docker Compose Benefits:**
- **Multi-Container Management**: Define and run multiple containers
- **Service Dependencies**: Specify which services depend on others
- **Environment Configuration**: Manage environment variables
- **Networking**: Automatic service discovery and communication
- **Volume Management**: Persistent data and file sharing
- **Scaling**: Easy horizontal scaling of services

### Our Docker Compose Configuration:

```yaml
# docker-compose.yml
version: '3.8'

services:
  backend:
    build:
      context: ./backend
      dockerfile: Dockerfile
    container_name: greek-derby-backend
    ports:
      - "8000:8000"
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - PINECONE_API_KEY=${PINECONE_API_KEY}
      - PINECONE_GREEK_DERBY_INDEX_NAME=${PINECONE_GREEK_DERBY_INDEX_NAME}
      - USER_AGENT=${USER_AGENT:-greek-derby-api/1.0}
    volumes:
      - ./backend:/app
      - /app/__pycache__
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

  frontend:
    build:
      context: ./front-end/react-chatbot
      dockerfile: Dockerfile
    container_name: greek-derby-frontend
    ports:
      - "3000:80"
    depends_on:
      - backend
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost/"]
      interval: 30s
      timeout: 10s
      retries: 3

networks:
  default:
    name: greek-derby-network
```

### Service Configuration Breakdown:

#### 1. **Backend Service Configuration**

```yaml
backend:
  build:
    context: ./backend
    dockerfile: Dockerfile
  container_name: greek-derby-backend
  ports:
    - "8000:8000"  # Host:Container port mapping
  environment:
    - OPENAI_API_KEY=${OPENAI_API_KEY}
    - PINECONE_API_KEY=${PINECONE_API_KEY}
    - PINECONE_GREEK_DERBY_INDEX_NAME=${PINECONE_GREEK_DERBY_INDEX_NAME}
    - USER_AGENT=${USER_AGENT:-greek-derby-api/1.0}
  volumes:
    - ./backend:/app  # Development volume mount
    - /app/__pycache__  # Anonymous volume for Python cache
  restart: unless-stopped
  healthcheck:
    test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
    interval: 30s
    timeout: 10s
    retries: 3
    start_period: 40s
```

**Key Features:**
- **Build Context**: Builds from `./backend` directory
- **Port Mapping**: Exposes port 8000 to host
- **Environment Variables**: Uses `.env` file for secrets
- **Volume Mounts**: Live code reloading for development
- **Health Check**: Monitors service health
- **Restart Policy**: Automatically restarts on failure

#### 2. **Frontend Service Configuration**

```yaml
frontend:
  build:
    context: ./front-end/react-chatbot
    dockerfile: Dockerfile
  container_name: greek-derby-frontend
  ports:
    - "3000:80"  # Host port 3000 maps to container port 80
  depends_on:
    - backend  # Wait for backend to start
  restart: unless-stopped
  healthcheck:
    test: ["CMD", "curl", "-f", "http://localhost/"]
    interval: 30s
    timeout: 10s
    retries: 3
```

**Key Features:**
- **Service Dependencies**: Waits for backend to be ready
- **Port Mapping**: Frontend accessible on port 3000
- **Health Check**: Monitors nginx service
- **Multi-Stage Build**: Uses optimized production image

### Environment Variable Management:

#### 1. **Environment File (.env)**
```bash
# .env
OPENAI_API_KEY=sk-your-openai-key-here
PINECONE_API_KEY=your-pinecone-key-here
PINECONE_GREEK_DERBY_INDEX_NAME=greek-derby-index
USER_AGENT=greek-derby-api/1.0
```

#### 2. **Environment Variable Usage**
```yaml
environment:
  - OPENAI_API_KEY=${OPENAI_API_KEY}
  - PINECONE_API_KEY=${PINECONE_API_KEY}
  - PINECONE_GREEK_DERBY_INDEX_NAME=${PINECONE_GREEK_DERBY_INDEX_NAME}
  - USER_AGENT=${USER_AGENT:-greek-derby-api/1.0}  # Default value
```

**Benefits:**
- **Security**: Secrets not in version control
- **Flexibility**: Different values per environment
- **Default Values**: Fallback values for optional variables
- **Type Safety**: Environment variables are strings

### Volume Management:

#### 1. **Development Volumes**
```yaml
volumes:
  - ./backend:/app  # Live code reloading
  - /app/__pycache__  # Anonymous volume for Python cache
```

**Benefits:**
- **Live Reloading**: Code changes reflect immediately
- **Debugging**: Easy to inspect container files
- **Development**: No need to rebuild on code changes

#### 2. **Production Volumes**
```yaml
# docker-compose.prod.yml
volumes:
  - backend_data:/app/data
  - nginx_logs:/var/log/nginx

volumes:
  backend_data:
  nginx_logs:
```

**Benefits:**
- **Data Persistence**: Data survives container restarts
- **Log Management**: Centralized logging
- **Backup**: Easy to backup named volumes

### Health Checks:

#### 1. **Backend Health Check**
```yaml
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
  interval: 30s
  timeout: 10s
  retries: 3
  start_period: 40s
```

**Health Check Stages:**
- **Start Period**: 40s grace period for startup
- **Interval**: Check every 30 seconds
- **Timeout**: 10s timeout per check
- **Retries**: 3 failures before marking unhealthy

#### 2. **Frontend Health Check**
```yaml
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost/"]
  interval: 30s
  timeout: 10s
  retries: 3
```

**Health Check Benefits:**
- **Service Discovery**: Know when services are ready
- **Load Balancer**: Route traffic only to healthy containers
- **Monitoring**: Track service health over time
- **Auto-Recovery**: Restart unhealthy containers

### Networking:

#### 1. **Default Network**
```yaml
networks:
  default:
    name: greek-derby-network
```

**Network Features:**
- **Service Discovery**: Services can reach each other by name
- **Isolation**: Network isolated from other Docker networks
- **DNS Resolution**: Automatic DNS for service names
- **Internal Communication**: Services communicate internally

#### 2. **Service Communication**
```typescript
// Frontend API calls
const API_BASE = 'http://backend:8000';  // Uses service name

// Backend can reach frontend
const FRONTEND_URL = 'http://frontend:80';
```

**Communication Flow:**
```
User Request → Frontend (Port 3000) → Backend (Port 8000) → External APIs
```

### Docker Compose Commands:

#### 1. **Basic Commands**
```bash
# Start all services
docker-compose up

# Start in background
docker-compose up -d

# Stop all services
docker-compose down

# View logs
docker-compose logs

# View logs for specific service
docker-compose logs backend
```

#### 2. **Development Commands**
```bash
# Rebuild and start
docker-compose up --build

# Start specific service
docker-compose up backend

# Scale services
docker-compose up --scale backend=3

# Execute command in running container
docker-compose exec backend python manage.py migrate
```

#### 3. **Production Commands**
```bash
# Use production compose file
docker-compose -f docker-compose.prod.yml up -d

# Pull latest images
docker-compose pull

# Update services
docker-compose up -d --no-deps frontend
```

### Advanced Compose Features:

#### 1. **Service Dependencies**
```yaml
services:
  frontend:
    depends_on:
      backend:
        condition: service_healthy  # Wait for health check to pass
      database:
        condition: service_started  # Wait for service to start
```

#### 2. **Resource Limits**
```yaml
services:
  backend:
    deploy:
      resources:
        limits:
          cpus: '0.5'
          memory: 512M
        reservations:
          cpus: '0.25'
          memory: 256M
```

#### 3. **Multiple Environments**
```yaml
# docker-compose.dev.yml
services:
  backend:
    volumes:
      - ./backend:/app  # Live reloading
    environment:
      - DEBUG=true

# docker-compose.prod.yml
services:
  backend:
    volumes:
      - backend_data:/app/data  # Persistent data
    environment:
      - DEBUG=false
```

### Benefits for Our RAG Chatbot:

1. **Easy Development**: One command to start entire stack
2. **Service Isolation**: Each service runs independently
3. **Automatic Networking**: Services can communicate by name
4. **Health Monitoring**: Built-in health checks
5. **Environment Management**: Easy configuration per environment
6. **Scaling**: Simple horizontal scaling
7. **Dependency Management**: Services start in correct order
8. **Volume Management**: Persistent data and live reloading


## Q4: How do we implement security best practices in Docker containers?

**Answer:**

Security is crucial when running applications in containers. Our Greek Derby RAG chatbot implements several Docker security best practices to protect against common vulnerabilities and ensure safe operation in production environments.

### Why Container Security Matters?

**Common Container Security Risks:**
- **Privilege Escalation**: Containers running as root
- **Image Vulnerabilities**: Outdated base images with known CVEs
- **Secret Exposure**: API keys and credentials in images
- **Network Exposure**: Unnecessary open ports
- **Resource Abuse**: Containers consuming excessive resources
- **Data Leakage**: Sensitive data in container layers

### Our Security Implementation:

#### 1. **Non-Root User Execution**

```dockerfile
# backend/Dockerfile
# Create a non-root user
RUN useradd --create-home --shell /bin/bash app && chown -R app:app /app
USER app

# Run the application as non-root
CMD ["python", "api/greek_derby_api.py"]
```

**Security Benefits:**
- **Principle of Least Privilege**: Minimal permissions required
- **Attack Surface Reduction**: Limited system access
- **Process Isolation**: Cannot modify system files
- **Compliance**: Meets security standards

#### 2. **Minimal Base Images**

```dockerfile
# Backend - Python slim image
FROM python:3.11-slim

# Frontend - Alpine Linux
FROM nginx:alpine
```

**Why Minimal Images:**
- **Smaller Attack Surface**: Fewer packages = fewer vulnerabilities
- **Faster Scanning**: Less code to analyze for security issues
- **Reduced Resources**: Lower memory and storage usage
- **Faster Deployments**: Smaller images transfer faster

#### 3. **Environment Variable Security**

```yaml
# docker-compose.yml
environment:
  - OPENAI_API_KEY=${OPENAI_API_KEY}
  - PINECONE_API_KEY=${PINECONE_API_KEY}
  - PINECONE_GREEK_DERBY_INDEX_NAME=${PINECONE_GREEK_DERBY_INDEX_NAME}
  - USER_AGENT=${USER_AGENT:-greek-derby-api/1.0}
```

**Security Practices:**
- **No Hardcoded Secrets**: All secrets from environment variables
- **Default Values**: Safe defaults for non-sensitive config
- **Environment Files**: Secrets in `.env` (not in version control)
- **Runtime Injection**: Secrets provided at container startup

#### 4. **Health Check Security**

```dockerfile
# backend/Dockerfile
HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:8000/health || exit 1
```

**Security Benefits:**
- **Service Monitoring**: Detect compromised services
- **Automatic Recovery**: Restart unhealthy containers
- **Attack Detection**: Identify unusual behavior patterns
- **Resource Management**: Prevent resource exhaustion

### Advanced Security Practices:

#### 1. **Image Scanning and Vulnerability Management**

```bash
# Scan images for vulnerabilities
docker scan greek-derby-backend:latest
docker scan greek-derby-frontend:latest

# Use specific base image versions
FROM python:3.11.6-slim  # Specific version, not 'latest'
FROM nginx:1.25.3-alpine  # Specific version
```

**Vulnerability Management:**
- **Regular Scanning**: Automated vulnerability detection
- **Version Pinning**: Specific base image versions
- **Security Updates**: Regular base image updates
- **CVE Monitoring**: Track known vulnerabilities

#### 2. **Network Security**

```yaml
# docker-compose.yml
networks:
  default:
    name: greek-derby-network
    driver: bridge
    ipam:
      config:
        - subnet: 172.20.0.0/16
```

**Network Security Features:**
- **Isolated Networks**: Containers in private network
- **No External Access**: Services not exposed to internet
- **Internal Communication**: Services communicate internally
- **Port Mapping**: Only necessary ports exposed

#### 3. **Resource Limits**

```yaml
# docker-compose.prod.yml
services:
  backend:
    deploy:
      resources:
        limits:
          cpus: '1.0'
          memory: 1G
        reservations:
          cpus: '0.5'
          memory: 512M
    ulimits:
      nofile:
        soft: 1024
        hard: 2048
```

**Resource Security:**
- **CPU Limits**: Prevent CPU exhaustion attacks
- **Memory Limits**: Prevent memory-based DoS
- **File Descriptors**: Limit open file handles
- **Process Limits**: Control process creation

#### 4. **Secrets Management**

```yaml
# docker-compose.prod.yml
services:
  backend:
    secrets:
      - openai_api_key
      - pinecone_api_key
    environment:
      - OPENAI_API_KEY_FILE=/run/secrets/openai_api_key
      - PINECONE_API_KEY_FILE=/run/secrets/pinecone_api_key

secrets:
  openai_api_key:
    external: true
  pinecone_api_key:
    external: true
```

**Secrets Management Benefits:**
- **Encrypted Storage**: Secrets encrypted at rest
- **Runtime Access**: Secrets mounted as files
- **Audit Trail**: Track secret access
- **Rotation**: Easy secret rotation

### Container Security Scanning:

#### 1. **Static Analysis**

```bash
# Using Trivy for vulnerability scanning
trivy image greek-derby-backend:latest
trivy image greek-derby-frontend:latest

# Scan with specific severity levels
trivy image --severity HIGH,CRITICAL greek-derby-backend:latest
```

#### 2. **Runtime Security**

```bash
# Using Falco for runtime security monitoring
falco -c falco.yaml

# Monitor container behavior
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
  falcosecurity/falco:latest
```

#### 3. **Image Hardening**

```dockerfile
# Use distroless images for maximum security
FROM gcr.io/distroless/python3-debian11

# Copy only necessary files
COPY --from=build /app/dist /app
COPY --from=build /app/requirements.txt /app

# No shell, no package manager, minimal attack surface
CMD ["python", "api/greek_derby_api.py"]
```

### Security Best Practices Summary:

#### 1. **Image Security**
- ✅ Use minimal base images
- ✅ Pin specific versions
- ✅ Regular security updates
- ✅ Scan for vulnerabilities
- ❌ Avoid `latest` tags in production
- ❌ Don't include secrets in images

#### 2. **Runtime Security**
- ✅ Run as non-root user
- ✅ Use read-only filesystems where possible
- ✅ Limit container capabilities
- ✅ Implement health checks
- ❌ Don't run containers as root
- ❌ Don't mount sensitive host directories

#### 3. **Network Security**
- ✅ Use private networks
- ✅ Expose minimal ports
- ✅ Implement proper firewall rules
- ✅ Use HTTPS in production
- ❌ Don't expose unnecessary ports
- ❌ Don't use default network configurations

#### 4. **Secrets Management**
- ✅ Use environment variables
- ✅ Implement secrets rotation
- ✅ Encrypt secrets at rest
- ✅ Use dedicated secrets management
- ❌ Don't hardcode secrets
- ❌ Don't commit secrets to version control

### Production Security Checklist:

#### 1. **Pre-Deployment**
- [ ] Scan images for vulnerabilities
- [ ] Verify non-root user execution
- [ ] Check resource limits
- [ ] Validate network configuration
- [ ] Test secrets management

#### 2. **Runtime Monitoring**
- [ ] Monitor container health
- [ ] Track resource usage
- [ ] Log security events
- [ ] Monitor network traffic
- [ ] Alert on anomalies

#### 3. **Regular Maintenance**
- [ ] Update base images monthly
- [ ] Rotate secrets quarterly
- [ ] Review access permissions
- [ ] Audit container configurations
- [ ] Test disaster recovery

### Security Tools Integration:

#### 1. **CI/CD Security Pipeline**
```yaml
# .github/workflows/security.yml
name: Security Scan
on: [push, pull_request]
jobs:
  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Build images
        run: docker-compose build
      - name: Scan for vulnerabilities
        run: trivy image greek-derby-backend:latest
      - name: Check secrets
        run: detect-secrets scan
```

#### 2. **Runtime Security Monitoring**
```yaml
# docker-compose.monitoring.yml
services:
  falco:
    image: falcosecurity/falco:latest
    privileged: true
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /dev:/host/dev
      - /proc:/host/proc:ro
      - /boot:/host/boot:ro
      - /lib/modules:/host/lib/modules:ro
      - /usr:/host/usr:ro
```

### Benefits for Our RAG Chatbot:

1. **API Key Protection**: Secure handling of OpenAI and Pinecone keys
2. **Data Privacy**: User conversations protected in transit and at rest
3. **Service Isolation**: Backend and frontend isolated from each other
4. **Attack Prevention**: Multiple layers of security controls
5. **Compliance**: Meets security standards for production deployment
6. **Monitoring**: Continuous security monitoring and alerting
7. **Recovery**: Quick recovery from security incidents
8. **Audit Trail**: Complete security event logging


## Q5: How do we deploy and scale our containerized application in production?

**Answer:**

Production deployment and scaling are critical aspects of running containerized applications. Our Greek Derby RAG chatbot uses modern deployment strategies to ensure high availability, performance, and scalability in production environments.

### Production Deployment Strategies:

#### 1. **Single Server Deployment**

```bash
# Simple production deployment
docker-compose -f docker-compose.prod.yml up -d

# Production compose file
version: '3.8'
services:
  backend:
    image: greek-derby-backend:latest
    restart: unless-stopped
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - PINECONE_API_KEY=${PINECONE_API_KEY}
    ports:
      - "8000:8000"
    volumes:
      - backend_data:/app/data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  frontend:
    image: greek-derby-frontend:latest
    restart: unless-stopped
    ports:
      - "80:80"
    depends_on:
      - backend
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost/"]
      interval: 30s
      timeout: 10s
      retries: 3

volumes:
  backend_data:
```

**Benefits:**
- **Simple Setup**: Easy to deploy and manage
- **Cost Effective**: Single server costs
- **Quick Start**: Fast deployment for small applications
- **Full Control**: Complete control over the environment

#### 2. **Load Balancer with Multiple Backend Instances**

```yaml
# docker-compose.scale.yml
version: '3.8'
services:
  nginx-lb:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx-lb.conf:/etc/nginx/nginx.conf
    depends_on:
      - backend
    restart: unless-stopped

  backend:
    image: greek-derby-backend:latest
    restart: unless-stopped
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - PINECONE_API_KEY=${PINECONE_API_KEY}
    volumes:
      - backend_data:/app/data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  frontend:
    image: greek-derby-frontend:latest
    restart: unless-stopped
    depends_on:
      - backend
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost/"]
      interval: 30s
      timeout: 10s
      retries: 3

volumes:
  backend_data:
```

**Nginx Load Balancer Configuration:**
```nginx
# nginx-lb.conf
upstream backend {
    server backend:8000;
    # Add more backend instances
    # server backend2:8000;
    # server backend3:8000;
}

upstream frontend {
    server frontend:80;
}

server {
    listen 80;
    
    location /api/ {
        proxy_pass http://backend/;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
    
    location / {
        proxy_pass http://frontend/;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}
```

### Scaling Strategies:

#### 1. **Horizontal Scaling**

```bash
# Scale backend service to 3 instances
docker-compose up --scale backend=3 -d

# Scale with load balancer
docker-compose -f docker-compose.scale.yml up --scale backend=3 -d
```

**Scaling Benefits:**
- **High Availability**: Multiple instances prevent single points of failure
- **Load Distribution**: Traffic spread across multiple containers
- **Performance**: Better response times under high load
- **Fault Tolerance**: Service continues if one instance fails

#### 2. **Vertical Scaling**

```yaml
# docker-compose.prod.yml
services:
  backend:
    deploy:
      resources:
        limits:
          cpus: '2.0'
          memory: 2G
        reservations:
          cpus: '1.0'
          memory: 1G
    ulimits:
      nofile:
        soft: 2048
        hard: 4096
```

**Vertical Scaling Benefits:**
- **Better Performance**: More resources per container
- **Simpler Management**: Single container to manage
- **Cost Efficiency**: Better resource utilization
- **Easier Debugging**: Single point of monitoring

### Production Environment Setup:

#### 1. **Environment-Specific Configuration**

```yaml
# docker-compose.prod.yml
version: '3.8'
services:
  backend:
    image: greek-derby-backend:${VERSION:-latest}
    restart: unless-stopped
    environment:
      - ENVIRONMENT=production
      - LOG_LEVEL=INFO
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - PINECONE_API_KEY=${PINECONE_API_KEY}
    ports:
      - "8000:8000"
    volumes:
      - backend_data:/app/data
      - backend_logs:/app/logs
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

  frontend:
    image: greek-derby-frontend:${VERSION:-latest}
    restart: unless-stopped
    ports:
      - "80:80"
    depends_on:
      backend:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost/"]
      interval: 30s
      timeout: 10s
      retries: 3

volumes:
  backend_data:
  backend_logs:

networks:
  default:
    name: greek-derby-prod
```

#### 2. **Monitoring and Logging**

```yaml
# docker-compose.monitoring.yml
version: '3.8'
services:
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}

  loki:
    image: grafana/loki:latest
    ports:
      - "3100:3100"
    volumes:
      - loki_data:/loki
    command: -config.file=/etc/loki/local-config.yaml

volumes:
  prometheus_data:
  grafana_data:
  loki_data:
```

### Cloud Deployment Options:

#### 1. **AWS ECS Deployment**

```yaml
# ecs-task-definition.json
{
  "family": "greek-derby-chatbot",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "1024",
  "memory": "2048",
  "executionRoleArn": "arn:aws:iam::account:role/ecsTaskExecutionRole",
  "containerDefinitions": [
    {
      "name": "backend",
      "image": "your-account.dkr.ecr.region.amazonaws.com/greek-derby-backend:latest",
      "portMappings": [
        {
          "containerPort": 8000,
          "protocol": "tcp"
        }
      ],
      "environment": [
        {
          "name": "ENVIRONMENT",
          "value": "production"
        }
      ],
      "secrets": [
        {
          "name": "OPENAI_API_KEY",
          "valueFrom": "arn:aws:secretsmanager:region:account:secret:openai-key"
        }
      ],
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -f http://localhost:8000/health || exit 1"],
        "interval": 30,
        "timeout": 5,
        "retries": 3,
        "startPeriod": 60
      },
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/greek-derby-chatbot",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "backend"
        }
      }
    }
  ]
}
```

#### 2. **Kubernetes Deployment**

```yaml
# k8s-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: greek-derby-backend
spec:
  replicas: 3
  selector:
    matchLabels:
      app: greek-derby-backend
  template:
    metadata:
      labels:
        app: greek-derby-backend
    spec:
      containers:
      - name: backend
        image: greek-derby-backend:latest
        ports:
        - containerPort: 8000
        env:
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: api-secrets
              key: openai-key
        - name: PINECONE_API_KEY
          valueFrom:
            secretKeyRef:
              name: api-secrets
              key: pinecone-key
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 60
          periodSeconds: 30
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"
---
apiVersion: v1
kind: Service
metadata:
  name: greek-derby-backend-service
spec:
  selector:
    app: greek-derby-backend
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8000
  type: LoadBalancer
```

### CI/CD Pipeline for Production:

#### 1. **GitHub Actions Workflow**

```yaml
# .github/workflows/deploy.yml
name: Deploy to Production
on:
  push:
    branches: [main]

jobs:
  build-and-deploy:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    
    - name: Build Docker images
      run: |
        docker-compose -f docker-compose.prod.yml build
        
    - name: Run security scan
      run: |
        trivy image greek-derby-backend:latest
        trivy image greek-derby-frontend:latest
        
    - name: Push to registry
      run: |
        echo ${{ secrets.DOCKER_PASSWORD }} | docker login -u ${{ secrets.DOCKER_USERNAME }} --password-stdin
        docker push greek-derby-backend:latest
        docker push greek-derby-frontend:latest
        
    - name: Deploy to production
      run: |
        ssh ${{ secrets.PROD_SERVER }} 'cd /opt/greek-derby && docker-compose pull && docker-compose up -d'
```

#### 2. **Blue-Green Deployment**

```bash
# Blue-Green deployment script
#!/bin/bash

# Deploy new version (green)
docker-compose -f docker-compose.green.yml up -d

# Wait for health check
sleep 60

# Test green deployment
curl -f http://green.example.com/health || exit 1

# Switch traffic to green
docker-compose -f docker-compose.prod.yml down
docker-compose -f docker-compose.green.yml -f docker-compose.prod.yml up -d

# Clean up old version (blue)
docker-compose -f docker-compose.blue.yml down
```

### Performance Optimization:

#### 1. **Resource Optimization**

```yaml
# docker-compose.optimized.yml
services:
  backend:
    image: greek-derby-backend:latest
    deploy:
      resources:
        limits:
          cpus: '1.0'
          memory: 1G
        reservations:
          cpus: '0.5'
          memory: 512M
    environment:
      - WORKERS=4
      - MAX_REQUESTS=1000
      - TIMEOUT=30
    ulimits:
      nofile:
        soft: 1024
        hard: 2048
```

#### 2. **Caching Strategy**

```yaml
# docker-compose.cache.yml
services:
  redis:
    image: redis:alpine
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data
    command: redis-server --appendonly yes

  backend:
    image: greek-derby-backend:latest
    environment:
      - REDIS_URL=redis://redis:6379
    depends_on:
      - redis

volumes:
  redis_data:
```

### Benefits for Our RAG Chatbot:

1. **High Availability**: Multiple instances ensure service continuity
2. **Scalability**: Easy horizontal and vertical scaling
3. **Performance**: Load balancing and resource optimization
4. **Monitoring**: Comprehensive logging and metrics
5. **Security**: Production-grade security configurations
6. **Reliability**: Health checks and automatic recovery
7. **Cost Efficiency**: Optimized resource usage
8. **Easy Updates**: Rolling deployments with zero downtime


## Q6: How do we monitor and troubleshoot containerized applications?

**Answer:**

Monitoring and troubleshooting are essential for maintaining healthy containerized applications. Our Greek Derby RAG chatbot implements comprehensive monitoring strategies to ensure optimal performance, detect issues early, and provide quick resolution when problems occur.

### Why Container Monitoring Matters?

**Container-Specific Challenges:**
- **Ephemeral Nature**: Containers can start/stop frequently
- **Resource Isolation**: Limited visibility into container internals
- **Distributed Logs**: Logs scattered across multiple containers
- **Network Complexity**: Inter-container communication monitoring
- **Scaling Events**: Dynamic container creation/destruction
- **Resource Constraints**: Memory and CPU limits affecting performance

### Our Monitoring Strategy:

#### 1. **Health Check Monitoring**

```yaml
# docker-compose.yml
services:
  backend:
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    restart: unless-stopped
```

**Health Check Benefits:**
- **Service Status**: Know when services are healthy/unhealthy
- **Automatic Recovery**: Docker restarts unhealthy containers
- **Load Balancer Integration**: Route traffic only to healthy instances
- **Alerting**: Trigger alerts when health checks fail

#### 2. **Container Resource Monitoring**

```bash
# Monitor container resources
docker stats greek-derby-backend greek-derby-frontend

# Detailed container information
docker inspect greek-derby-backend

# Container logs
docker logs -f greek-derby-backend

# Container processes
docker top greek-derby-backend
```

**Resource Metrics to Track:**
- **CPU Usage**: Percentage of CPU utilization
- **Memory Usage**: RAM consumption and limits
- **Network I/O**: Bytes sent/received
- **Disk I/O**: Read/write operations
- **File Descriptors**: Open file handles

#### 3. **Application-Level Monitoring**

```python
# backend/api/greek_derby_api.py
import time
import logging
from prometheus_client import Counter, Histogram, Gauge, start_http_server

# Metrics
REQUEST_COUNT = Counter('http_requests_total', 'Total HTTP requests', ['method', 'endpoint'])
REQUEST_DURATION = Histogram('http_request_duration_seconds', 'HTTP request duration')
ACTIVE_CONNECTIONS = Gauge('active_connections', 'Number of active connections')
API_CALLS = Counter('api_calls_total', 'Total API calls', ['service'])

@app.middleware("http")
async def monitor_requests(request: Request, call_next):
    start_time = time.time()
    
    response = await call_next(request)
    
    # Record metrics
    REQUEST_COUNT.labels(method=request.method, endpoint=request.url.path).inc()
    REQUEST_DURATION.observe(time.time() - start_time)
    
    return response

@app.get("/metrics")
async def metrics():
    return Response(prometheus_client.generate_latest(), media_type="text/plain")

# Start metrics server
start_http_server(8001)
```

### Comprehensive Monitoring Stack:

#### 1. **Prometheus + Grafana Setup**

```yaml
# docker-compose.monitoring.yml
version: '3.8'
services:
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--web.enable-lifecycle'

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/dashboards:/var/lib/grafana/dashboards
      - ./grafana/provisioning:/etc/grafana/provisioning
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
      - GF_USERS_ALLOW_SIGN_UP=false

  node-exporter:
    image: prom/node-exporter:latest
    ports:
      - "9100:9100"
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.rootfs=/rootfs'
      - '--path.sysfs=/host/sys'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    ports:
      - "8080:8080"
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
    privileged: true
    devices:
      - /dev/kmsg

volumes:
  prometheus_data:
  grafana_data:
```

#### 2. **Prometheus Configuration**

```yaml
# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "alert_rules.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - alertmanager:9093

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'greek-derby-backend'
    static_configs:
      - targets: ['backend:8001']
    metrics_path: '/metrics'
    scrape_interval: 10s

  - job_name: 'greek-derby-frontend'
    static_configs:
      - targets: ['frontend:80']
    metrics_path: '/metrics'

  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']

  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']
```

#### 3. **Grafana Dashboards**

```json
{
  "dashboard": {
    "title": "Greek Derby Chatbot Monitoring",
    "panels": [
      {
        "title": "Request Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(http_requests_total[5m])",
            "legendFormat": "{{method}} {{endpoint}}"
          }
        ]
      },
      {
        "title": "Response Time",
        "type": "graph",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))",
            "legendFormat": "95th percentile"
          }
        ]
      },
      {
        "title": "Container CPU Usage",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(container_cpu_usage_seconds_total{name=~\"greek-derby.*\"}[5m]) * 100",
            "legendFormat": "{{name}}"
          }
        ]
      },
      {
        "title": "Container Memory Usage",
        "type": "graph",
        "targets": [
          {
            "expr": "container_memory_usage_bytes{name=~\"greek-derby.*\"} / 1024 / 1024",
            "legendFormat": "{{name}}"
          }
        ]
      }
    ]
  }
}
```

### Log Management:

#### 1. **Centralized Logging with ELK Stack**

```yaml
# docker-compose.logging.yml
version: '3.8'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.8.0
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false
    ports:
      - "9200:9200"
    volumes:
      - elasticsearch_data:/usr/share/elasticsearch/data

  logstash:
    image: docker.elastic.co/logstash/logstash:8.8.0
    ports:
      - "5044:5044"
    volumes:
      - ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
    depends_on:
      - elasticsearch

  kibana:
    image: docker.elastic.co/kibana/kibana:8.8.0
    ports:
      - "5601:5601"
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
    depends_on:
      - elasticsearch

  filebeat:
    image: docker.elastic.co/beats/filebeat:8.8.0
    user: root
    volumes:
      - ./filebeat.yml:/usr/share/filebeat/filebeat.yml
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
    depends_on:
      - logstash

volumes:
  elasticsearch_data:
```

#### 2. **Structured Logging**

```python
# backend/api/greek_derby_api.py
import logging
import json
from datetime import datetime

# Configure structured logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

logger = logging.getLogger(__name__)

@app.post("/chat")
async def chat_endpoint(request: ChatRequest):
    start_time = datetime.now()
    
    try:
        # Log request
        logger.info(json.dumps({
            "event": "chat_request",
            "user_id": request.user_id,
            "question": request.question[:100],  # Truncate for privacy
            "timestamp": start_time.isoformat()
        }))
        
        # Process request
        response = await process_chat_request(request)
        
        # Log response
        duration = (datetime.now() - start_time).total_seconds()
        logger.info(json.dumps({
            "event": "chat_response",
            "user_id": request.user_id,
            "duration_seconds": duration,
            "response_length": len(response.answer),
            "timestamp": datetime.now().isoformat()
        }))
        
        return response
        
    except Exception as e:
        # Log error
        logger.error(json.dumps({
            "event": "chat_error",
            "user_id": request.user_id,
            "error": str(e),
            "timestamp": datetime.now().isoformat()
        }))
        raise
```

### Troubleshooting Strategies:

#### 1. **Container Debugging**

```bash
# Access running container
docker exec -it greek-derby-backend bash

# Check container logs
docker logs --tail 100 -f greek-derby-backend

# Inspect container configuration
docker inspect greek-derby-backend

# Check container processes
docker top greek-derby-backend

# Monitor resource usage
docker stats greek-derby-backend
```

#### 2. **Network Troubleshooting**

```bash
# Check container networks
docker network ls
docker network inspect greek-derby-network

# Test connectivity between containers
docker exec greek-derby-frontend ping backend
docker exec greek-derby-backend curl http://frontend:80

# Check port mappings
docker port greek-derby-backend
netstat -tulpn | grep :8000
```

#### 3. **Performance Troubleshooting**

```bash
# Check container resource limits
docker inspect greek-derby-backend | grep -A 10 "Resources"

# Monitor real-time performance
docker stats --no-stream greek-derby-backend

# Check disk usage
docker system df
docker system prune  # Clean up unused resources

# Analyze container layers
docker history greek-derby-backend
```

### Alerting and Notifications:

#### 1. **Alert Rules**

```yaml
# alert_rules.yml
groups:
  - name: greek-derby-alerts
    rules:
      - alert: HighErrorRate
        expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.1
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "High error rate detected"
          description: "Error rate is {{ $value }} errors per second"

      - alert: HighResponseTime
        expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 2
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High response time detected"
          description: "95th percentile response time is {{ $value }} seconds"

      - alert: ContainerDown
        expr: up{job="greek-derby-backend"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Backend container is down"
          description: "Backend container has been down for more than 1 minute"

      - alert: HighMemoryUsage
        expr: container_memory_usage_bytes{name="greek-derby-backend"} / container_spec_memory_limit_bytes{name="greek-derby-backend"} > 0.8
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High memory usage"
          description: "Memory usage is {{ $value | humanizePercentage }}"
```

#### 2. **Alertmanager Configuration**

```yaml
# alertmanager.yml
global:
  smtp_smarthost: 'localhost:587'
  smtp_from: 'alerts@greek-derby.com'

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'web.hook'

receivers:
  - name: 'web.hook'
    webhook_configs:
      - url: 'http://localhost:5001/'
        send_resolved: true

  - name: 'email'
    email_configs:
      - to: 'admin@greek-derby.com'
        subject: 'Greek Derby Alert: {{ .GroupLabels.alertname }}'
        body: |
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          {{ end }}
```

### Monitoring Best Practices:

#### 1. **Key Metrics to Monitor**
- **Application Metrics**: Request rate, response time, error rate
- **Infrastructure Metrics**: CPU, memory, disk, network usage
- **Business Metrics**: User sessions, API calls, conversion rates
- **Security Metrics**: Failed logins, suspicious activity, rate limits

#### 2. **Logging Best Practices**
- **Structured Logging**: Use JSON format for machine readability
- **Log Levels**: Appropriate use of DEBUG, INFO, WARN, ERROR
- **Sensitive Data**: Never log passwords, API keys, or PII
- **Log Rotation**: Implement log rotation to prevent disk full

#### 3. **Alerting Best Practices**
- **Meaningful Alerts**: Only alert on actionable issues
- **Alert Fatigue**: Avoid too many false positives
- **Escalation**: Different severity levels for different issues
- **Documentation**: Document alert conditions and resolution steps

### Benefits for Our RAG Chatbot:

1. **Proactive Monitoring**: Detect issues before they affect users
2. **Performance Optimization**: Identify bottlenecks and optimize accordingly
3. **Quick Troubleshooting**: Fast resolution of production issues
4. **Capacity Planning**: Understand resource usage patterns
5. **Security Monitoring**: Detect and respond to security threats
6. **Business Insights**: Track user behavior and system usage
7. **Compliance**: Meet monitoring and logging requirements
8. **Reliability**: Ensure high availability and uptime


---

## 🎯 Summary

This lesson covered the essential Docker & Containerization concepts used in our Greek Derby RAG chatbot:

### Key Takeaways:

1. **Docker Fundamentals**: Containerization benefits, architecture, and use cases
2. **Multi-Stage Builds**: Optimized images with minimal attack surface and size
3. **Docker Compose**: Service orchestration, networking, and environment management
4. **Security Best Practices**: Non-root users, minimal images, and secrets management
5. **Production Deployment**: Scaling strategies, monitoring, and troubleshooting
6. **Monitoring & Troubleshooting**: Comprehensive observability and debugging techniques

### Next Steps:

- **Practice**: Try building your own multi-stage Dockerfiles
- **Explore**: Learn about Kubernetes for advanced orchestration
- **Advanced**: Study container security scanning and compliance
- **Production**: Implement monitoring and alerting in your deployments

### Project Structure:

```
rag-langchain-langgraph/
├── backend/
│   ├── Dockerfile              # Python FastAPI container
│   ├── requirements.txt        # Python dependencies
│   └── api/                    # Application code
├── front-end/react-chatbot/
│   ├── Dockerfile              # Multi-stage React build
│   ├── nginx.conf              # Nginx configuration
│   └── src/                    # React application
├── docker-compose.yml          # Service orchestration
├── docker-compose.prod.yml     # Production configuration
├── docker-compose.monitoring.yml # Monitoring stack
└── .env                        # Environment variables
```

### Container Benefits for Our RAG Chatbot:

1. **Consistent Environment**: Same behavior across development, staging, and production
2. **Easy Scaling**: Horizontal scaling with load balancers
3. **Service Isolation**: Backend and frontend run independently
4. **Security**: Isolated processes with minimal privileges
5. **Portability**: Run anywhere Docker is installed
6. **Resource Efficiency**: Optimized resource usage and costs
7. **Easy Deployment**: One command to deploy entire stack
8. **Monitoring**: Built-in health checks and observability

This architecture provides a solid foundation for building scalable, maintainable, and production-ready containerized applications! 🚀
