# 🚀 DevOps CI/CD Concepts - Greek Derby RAG Chatbot

## Learning Objectives
By the end of this lesson, you will understand:
- What is DevOps and CI/CD
- Continuous Integration (CI) principles and practices
- Continuous Deployment (CD) strategies
- GitHub Actions for automation
- Docker in CI/CD pipelines
- Testing strategies in CI/CD
- Security scanning and compliance
- Monitoring and alerting
- Blue-green and canary deployments
- Infrastructure as Code (IaC) concepts

---

## Q1: What is DevOps and CI/CD, and why are they crucial for our Greek Derby RAG Chatbot?

**Answer:**

DevOps is a cultural and technical movement that bridges the gap between development and operations teams. It emphasizes collaboration, automation, and continuous improvement to deliver software faster and more reliably.

### What is CI/CD?

**Continuous Integration (CI):**
- **Definition**: The practice of frequently integrating code changes into a shared repository
- **Key Practices**: Automated testing, code quality checks, and early bug detection
- **Benefits**: Reduces integration conflicts, improves code quality, enables faster feedback

**Continuous Deployment (CD):**
- **Definition**: The practice of automatically deploying code changes to production
- **Key Practices**: Automated deployment pipelines, environment management, rollback strategies
- **Benefits**: Faster time-to-market, reduced deployment risk, consistent environments

### Why CI/CD is Essential for Our RAG Chatbot:

1. **Complex Architecture**: Our system has multiple components (FastAPI, React, Docker, Vector DB)
2. **Frequent Updates**: Chatbot responses and data need regular updates
3. **Quality Assurance**: AI responses must be tested and validated
4. **Scalability**: System needs to handle varying loads
5. **Reliability**: 24/7 availability for users

### Our CI/CD Pipeline Overview:

```
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Developer     │    │   GitHub        │    │   Production    │
│   Pushes Code   │───▶│   Actions       │───▶│   Environment   │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                              │
                              ▼
                       ┌─────────────────┐
                       │   Staging       │
                       │   Environment   │
                       └─────────────────┘
```

### Key Benefits for Our Project:
- **Automated Testing**: Ensures code quality before deployment
- **Consistent Deployments**: Same process every time
- **Quick Rollbacks**: Easy to revert problematic changes
- **Team Collaboration**: Clear process for all team members
- **Quality Gates**: Prevents broken code from reaching production


## Q2: How do we implement CI/CD using GitHub Actions for our Greek Derby chatbot?

**Answer:**

GitHub Actions is a CI/CD platform that allows us to automate workflows directly in our GitHub repository. For our Greek Derby RAG chatbot, we can create comprehensive workflows that handle testing, building, and deployment.

### GitHub Actions Workflow Structure:

```yaml
# .github/workflows/ci-cd.yml
name: CI/CD Pipeline

on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main ]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.9'
      - name: Install dependencies
        run: |
          pip install -r backend/requirements.txt
      - name: Run tests
        run: |
          pytest backend/tests/
```

### Our Complete CI/CD Pipeline:

#### 1. **Code Quality Checks**
```yaml
- name: Lint Python code
  run: |
    pip install flake8
    flake8 backend/ --count --select=E9,F63,F7,F82 --show-source --statistics

- name: Type checking
  run: |
    pip install mypy
    mypy backend/ --ignore-missing-imports
```

#### 2. **Security Scanning**
```yaml
- name: Security scan
  run: |
    pip install safety
    safety check -r backend/requirements.txt
```

#### 3. **Docker Build and Test**
```yaml
- name: Build Docker images
  run: |
    docker build -t greek-derby-backend ./backend
    docker build -t greek-derby-frontend ./front-end/react-chatbot

- name: Test Docker containers
  run: |
    docker-compose -f docker-compose.test.yml up --abort-on-container-exit
```

#### 4. **Deployment to Staging**
```yaml
deploy-staging:
  needs: test
  runs-on: ubuntu-latest
  if: github.ref == 'refs/heads/develop'
  steps:
    - name: Deploy to staging
      run: |
        # Deploy to staging environment
        echo "Deploying to staging..."
```

#### 5. **Production Deployment**
```yaml
deploy-production:
  needs: [test, deploy-staging]
  runs-on: ubuntu-latest
  if: github.ref == 'refs/heads/main'
  steps:
    - name: Deploy to production
      run: |
        # Deploy to production environment
        echo "Deploying to production..."
```

### Key Features of Our Pipeline:

1. **Multi-Environment Support**: Different workflows for staging and production
2. **Parallel Execution**: Tests run in parallel for faster feedback
3. **Conditional Deployment**: Only deploy when specific conditions are met
4. **Artifact Management**: Store and reuse build artifacts
5. **Notification System**: Alert team members of build status


## Q3: What testing strategies should we implement in our CI/CD pipeline for the RAG chatbot?

**Answer:**

Testing is crucial for our RAG chatbot because AI responses can be unpredictable and user-facing. We need a comprehensive testing strategy that covers all aspects of our system.

### Testing Pyramid for Our RAG Chatbot:

```
        ┌─────────────────┐
        │   E2E Tests     │ ← Few, slow, expensive
        │   (User Flows)  │
        └─────────────────┘
       ┌─────────────────────┐
       │  Integration Tests  │ ← Some, medium speed
       │  (API + Database)   │
       └─────────────────────┘
      ┌─────────────────────────┐
      │     Unit Tests          │ ← Many, fast, cheap
      │  (Functions & Classes)  │
      └─────────────────────────┘
```

### 1. **Unit Tests** (Fast, Isolated)

```python
# backend/tests/test_chat_service.py
import pytest
from backend.services.chat_service import ChatService

def test_chat_service_initialization():
    """Test that chat service initializes correctly"""
    service = ChatService()
    assert service is not None
    assert service.vector_db is not None

def test_process_user_message():
    """Test message processing logic"""
    service = ChatService()
    result = service.process_message("What is Greek Derby?")
    assert isinstance(result, str)
    assert len(result) > 0

def test_handle_empty_message():
    """Test handling of empty messages"""
    service = ChatService()
    result = service.process_message("")
    assert result == "Please provide a valid question."
```

### 2. **Integration Tests** (API + Database)

```python
# backend/tests/test_api_integration.py
import pytest
from fastapi.testclient import TestClient
from backend.api.greek_derby_api import app

client = TestClient(app)

def test_chat_endpoint():
    """Test the chat API endpoint"""
    response = client.post("/chat", json={
        "message": "What is Greek Derby?",
        "session_id": "test-session"
    })
    assert response.status_code == 200
    data = response.json()
    assert "answer" in data
    assert isinstance(data["answer"], str)

def test_health_check():
    """Test health check endpoint"""
    response = client.get("/health")
    assert response.status_code == 200
    assert response.json()["status"] == "healthy"
```

### 3. **End-to-End Tests** (Full User Journey)

```python
# tests/e2e/test_chatbot_flow.py
import pytest
from selenium import webdriver
from selenium.webdriver.common.by import By

def test_complete_chat_flow():
    """Test complete user interaction flow"""
    driver = webdriver.Chrome()
    driver.get("http://localhost:3000")
    
    # Find chat input and send message
    input_field = driver.find_element(By.ID, "chat-input")
    input_field.send_keys("What is Greek Derby?")
    
    # Click send button
    send_button = driver.find_element(By.ID, "send-button")
    send_button.click()
    
    # Wait for response
    response = driver.find_element(By.CLASS_NAME, "message-response")
    assert response.text is not None
    assert len(response.text) > 0
    
    driver.quit()
```

### 4. **AI-Specific Tests**

```python
# backend/tests/test_ai_responses.py
def test_response_quality():
    """Test that AI responses meet quality standards"""
    service = ChatService()
    response = service.process_message("What is Greek Derby?")
    
    # Check response length (not too short, not too long)
    assert 10 < len(response) < 500
    
    # Check for relevant keywords
    relevant_keywords = ["derby", "greek", "horse", "racing"]
    assert any(keyword in response.lower() for keyword in relevant_keywords)

def test_response_consistency():
    """Test that similar questions get consistent responses"""
    service = ChatService()
    question1 = "What is Greek Derby?"
    question2 = "Tell me about Greek Derby"
    
    response1 = service.process_message(question1)
    response2 = service.process_message(question2)
    
    # Responses should be similar but not identical
    assert response1 != response2
    assert len(response1) > 0
    assert len(response2) > 0
```

### 5. **Performance Tests**

```python
# backend/tests/test_performance.py
import time
import concurrent.futures

def test_response_time():
    """Test that responses are returned within acceptable time"""
    service = ChatService()
    start_time = time.time()
    response = service.process_message("What is Greek Derby?")
    end_time = time.time()
    
    response_time = end_time - start_time
    assert response_time < 5.0  # Should respond within 5 seconds

def test_concurrent_requests():
    """Test system under concurrent load"""
    service = ChatService()
    
    def make_request():
        return service.process_message("What is Greek Derby?")
    
    # Test with 10 concurrent requests
    with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
        futures = [executor.submit(make_request) for _ in range(10)]
        results = [future.result() for future in futures]
    
    # All requests should succeed
    assert len(results) == 10
    assert all(len(result) > 0 for result in results)
```

### 6. **Docker Integration Tests**

```yaml
# docker-compose.test.yml
version: '3.8'
services:
  backend-test:
    build: ./backend
    environment:
      - TESTING=true
    command: pytest /app/tests/
    
  frontend-test:
    build: ./front-end/react-chatbot
    command: npm test -- --coverage
```

### Testing Best Practices for Our RAG Chatbot:

1. **Test Data Management**: Use consistent test data for reproducible results
2. **Mock External Services**: Mock vector database and AI services in unit tests
3. **Test Coverage**: Aim for >80% code coverage
4. **Automated Testing**: All tests run automatically in CI/CD pipeline
5. **Performance Monitoring**: Track response times and resource usage
6. **User Acceptance Testing**: Regular testing with real users


## Q4: How do we implement security scanning and monitoring in our CI/CD pipeline?

**Answer:**

Security is critical for our RAG chatbot, especially since it handles user data and AI-generated content. We need comprehensive security measures throughout our CI/CD pipeline.

### Security Scanning in CI/CD Pipeline:

#### 1. **Dependency Vulnerability Scanning**

```yaml
# .github/workflows/security.yml
name: Security Scan

on:
  push:
    branches: [ main, develop ]
  schedule:
    - cron: '0 2 * * 1'  # Weekly security scan

jobs:
  security-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Python Security Scan
        run: |
          pip install safety bandit
          safety check -r backend/requirements.txt
          bandit -r backend/ -f json -o bandit-report.json
      
      - name: Node.js Security Scan
        run: |
          cd front-end/react-chatbot
          npm audit --audit-level=moderate
          npm install -g audit-ci
          audit-ci --moderate
```

#### 2. **Container Security Scanning**

```yaml
- name: Docker Security Scan
  run: |
    # Scan Docker images for vulnerabilities
    docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
      aquasec/trivy image greek-derby-backend:latest
    
    # Scan for secrets in Dockerfiles
    docker run --rm -v $(pwd):/src \
      trufflesecurity/trufflehog filesystem /src
```

#### 3. **Code Quality and Security Checks**

```python
# backend/security/security_checks.py
import re
import hashlib
from typing import List, Dict

class SecurityValidator:
    def __init__(self):
        self.suspicious_patterns = [
            r'eval\(',
            r'exec\(',
            r'__import__',
            r'subprocess',
            r'os\.system',
            r'shell=True'
        ]
    
    def scan_code(self, file_path: str) -> List[Dict]:
        """Scan code for security vulnerabilities"""
        issues = []
        
        with open(file_path, 'r') as f:
            content = f.read()
            
        for pattern in self.suspicious_patterns:
            matches = re.finditer(pattern, content)
            for match in matches:
                issues.append({
                    'file': file_path,
                    'line': content[:match.start()].count('\n') + 1,
                    'pattern': pattern,
                    'severity': 'HIGH'
                })
        
        return issues
    
    def validate_input(self, user_input: str) -> bool:
        """Validate user input for security"""
        # Check for SQL injection patterns
        sql_patterns = [r"';", r'";', r'--', r'/*', r'*/']
        for pattern in sql_patterns:
            if re.search(pattern, user_input, re.IGNORECASE):
                return False
        
        # Check for XSS patterns
        xss_patterns = [r'<script', r'javascript:', r'onload=']
        for pattern in xss_patterns:
            if re.search(pattern, user_input, re.IGNORECASE):
                return False
        
        return True
```

### Monitoring and Alerting:

#### 1. **Application Performance Monitoring (APM)**

```python
# backend/monitoring/apm.py
import time
import logging
from functools import wraps
from prometheus_client import Counter, Histogram, Gauge

# Metrics
REQUEST_COUNT = Counter('http_requests_total', 'Total HTTP requests', ['method', 'endpoint'])
REQUEST_DURATION = Histogram('http_request_duration_seconds', 'HTTP request duration')
ACTIVE_CONNECTIONS = Gauge('active_connections', 'Number of active connections')
ERROR_COUNT = Counter('http_errors_total', 'Total HTTP errors', ['status_code'])

def monitor_performance(func):
    """Decorator to monitor function performance"""
    @wraps(func)
    async def wrapper(*args, **kwargs):
        start_time = time.time()
        try:
            result = await func(*args, **kwargs)
            REQUEST_COUNT.labels(method='POST', endpoint='/chat').inc()
            return result
        except Exception as e:
            ERROR_COUNT.labels(status_code=500).inc()
            logging.error(f"Error in {func.__name__}: {str(e)}")
            raise
        finally:
            duration = time.time() - start_time
            REQUEST_DURATION.observe(duration)
    
    return wrapper

# Usage in our chat endpoint
@monitor_performance
async def process_chat_message(message: str):
    # Chat processing logic
    pass
```

#### 2. **Health Checks and Uptime Monitoring**

```python
# backend/monitoring/health.py
from fastapi import FastAPI, HTTPException
import psutil
import time

class HealthChecker:
    def __init__(self):
        self.start_time = time.time()
    
    def check_database_connection(self) -> bool:
        """Check if database is accessible"""
        try:
            # Check vector database connection
            return True
        except Exception:
            return False
    
    def check_memory_usage(self) -> bool:
        """Check if memory usage is within limits"""
        memory_percent = psutil.virtual_memory().percent
        return memory_percent < 90
    
    def check_disk_space(self) -> bool:
        """Check if disk space is sufficient"""
        disk_usage = psutil.disk_usage('/')
        free_percent = (disk_usage.free / disk_usage.total) * 100
        return free_percent > 10
    
    def get_uptime(self) -> float:
        """Get application uptime in seconds"""
        return time.time() - self.start_time

@app.get("/health/detailed")
async def detailed_health_check():
    """Detailed health check with system metrics"""
    checker = HealthChecker()
    
    health_status = {
        "status": "healthy",
        "timestamp": time.time(),
        "uptime": checker.get_uptime(),
        "checks": {
            "database": checker.check_database_connection(),
            "memory": checker.check_memory_usage(),
            "disk": checker.check_disk_space()
        }
    }
    
    # If any check fails, return unhealthy status
    if not all(health_status["checks"].values()):
        health_status["status"] = "unhealthy"
        raise HTTPException(status_code=503, detail=health_status)
    
    return health_status
```

#### 3. **Logging and Alerting**

```python
# backend/monitoring/logging.py
import logging
import json
from datetime import datetime
from typing import Dict, Any

class StructuredLogger:
    def __init__(self, name: str):
        self.logger = logging.getLogger(name)
        self.logger.setLevel(logging.INFO)
        
        # Create formatter
        formatter = logging.Formatter(
            '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
        )
        
        # Create handler
        handler = logging.StreamHandler()
        handler.setFormatter(formatter)
        self.logger.addHandler(handler)
    
    def log_chat_interaction(self, user_message: str, bot_response: str, 
                           session_id: str, response_time: float):
        """Log chat interactions for analysis"""
        log_data = {
            "event_type": "chat_interaction",
            "timestamp": datetime.utcnow().isoformat(),
            "session_id": session_id,
            "user_message_length": len(user_message),
            "bot_response_length": len(bot_response),
            "response_time": response_time,
            "user_message_hash": hashlib.md5(user_message.encode()).hexdigest()
        }
        
        self.logger.info(json.dumps(log_data))
    
    def log_error(self, error: Exception, context: Dict[str, Any]):
        """Log errors with context"""
        log_data = {
            "event_type": "error",
            "timestamp": datetime.utcnow().isoformat(),
            "error_type": type(error).__name__,
            "error_message": str(error),
            "context": context
        }
        
        self.logger.error(json.dumps(log_data))
```

#### 4. **Alerting Configuration**

```yaml
# monitoring/alerts.yml
alerts:
  - name: "High Error Rate"
    condition: "error_rate > 5%"
    duration: "5m"
    severity: "critical"
    notification: "slack"
    
  - name: "High Response Time"
    condition: "avg_response_time > 2s"
    duration: "3m"
    severity: "warning"
    notification: "email"
    
  - name: "Memory Usage High"
    condition: "memory_usage > 85%"
    duration: "2m"
    severity: "warning"
    notification: "slack"
    
  - name: "Database Connection Failed"
    condition: "database_health == false"
    duration: "1m"
    severity: "critical"
    notification: "pagerduty"
```

### Security Best Practices for Our RAG Chatbot:

1. **Input Validation**: Sanitize all user inputs
2. **Rate Limiting**: Prevent abuse and DoS attacks
3. **Authentication**: Secure API endpoints
4. **Data Encryption**: Encrypt sensitive data at rest and in transit
5. **Regular Updates**: Keep dependencies updated
6. **Access Control**: Implement proper authorization
7. **Audit Logging**: Log all security-relevant events
8. **Penetration Testing**: Regular security assessments


## Q5: What deployment strategies should we use for our Greek Derby RAG chatbot?

**Answer:**

Deployment strategies are crucial for maintaining high availability and minimizing downtime for our RAG chatbot. We need strategies that allow us to deploy safely while ensuring users can always access the service.

### Deployment Strategy Options:

#### 1. **Blue-Green Deployment**

```yaml
# .github/workflows/blue-green-deployment.yml
name: Blue-Green Deployment

on:
  push:
    branches: [ main ]

jobs:
  blue-green-deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Deploy to Blue Environment
        run: |
          # Deploy new version to blue environment
          docker-compose -f docker-compose.blue.yml up -d
          
          # Run health checks
          ./scripts/health-check.sh blue
          
      - name: Switch Traffic to Blue
        run: |
          # Update load balancer to point to blue
          ./scripts/switch-traffic.sh blue
          
      - name: Monitor Blue Environment
        run: |
          # Monitor for 5 minutes
          ./scripts/monitor-deployment.sh blue 300
          
      - name: Cleanup Green Environment
        if: success()
        run: |
          # Remove old green environment
          docker-compose -f docker-compose.green.yml down
```

**Blue-Green Architecture:**
```
┌─────────────────┐    ┌─────────────────┐
│   Load Balancer │    │   Users         │
└─────────┬───────┘    └─────────────────┘
          │
    ┌─────▼─────┐
    │  Router   │
    └─────┬─────┘
          │
    ┌─────▼─────┐
    │  Blue     │ ← New Version (Active)
    │  (v2.0)   │
    └───────────┘
    ┌───────────┐
    │  Green    │ ← Old Version (Standby)
    │  (v1.0)   │
    └───────────┘
```

#### 2. **Canary Deployment**

```yaml
# .github/workflows/canary-deployment.yml
name: Canary Deployment

on:
  push:
    branches: [ main ]

jobs:
  canary-deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Deploy Canary (5% traffic)
        run: |
          # Deploy canary version
          docker-compose -f docker-compose.canary.yml up -d
          
          # Route 5% traffic to canary
          ./scripts/set-canary-traffic.sh 5
          
      - name: Monitor Canary Performance
        run: |
          # Monitor for 10 minutes
          ./scripts/monitor-canary.sh 600
          
      - name: Increase Canary Traffic (25%)
        if: success()
        run: |
          ./scripts/set-canary-traffic.sh 25
          ./scripts/monitor-canary.sh 600
          
      - name: Full Rollout (100%)
        if: success()
        run: |
          ./scripts/set-canary-traffic.sh 100
          ./scripts/cleanup-canary.sh
```

**Canary Deployment Flow:**
```
Traffic Distribution:
┌─────────────────┐
│   Load Balancer │
└─────────┬───────┘
          │
    ┌─────▼─────┐
    │  Router   │
    └─────┬─────┘
          │
    ┌─────▼─────┐
    │ 95% Prod  │ ← Stable Version
    │ 5% Canary │ ← New Version
    └───────────┘
```

#### 3. **Rolling Deployment**

```yaml
# .github/workflows/rolling-deployment.yml
name: Rolling Deployment

on:
  push:
    branches: [ main ]

jobs:
  rolling-deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Rolling Update
        run: |
          # Update one instance at a time
          kubectl rolling-update greek-derby-backend \
            --image=greek-derby-backend:latest \
            --update-period=30s
```

### Infrastructure as Code (IaC):

#### 1. **Terraform Configuration**

```hcl
# infrastructure/main.tf
provider "aws" {
  region = "us-west-2"
}

# VPC Configuration
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true
  
  tags = {
    Name = "greek-derby-vpc"
  }
}

# Application Load Balancer
resource "aws_lb" "main" {
  name               = "greek-derby-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets            = aws_subnet.public[*].id

  enable_deletion_protection = false
}

# ECS Cluster
resource "aws_ecs_cluster" "main" {
  name = "greek-derby-cluster"
}

# ECS Service
resource "aws_ecs_service" "backend" {
  name            = "greek-derby-backend"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.backend.arn
  desired_count   = 2
  launch_type     = "FARGATE"

  network_configuration {
    security_groups  = [aws_security_group.ecs.id]
    subnets          = aws_subnet.private[*].id
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.backend.arn
    container_name   = "backend"
    container_port   = 8000
  }
}
```

#### 2. **Docker Compose for Different Environments**

```yaml
# docker-compose.prod.yml
version: '3.8'
services:
  backend:
    build: ./backend
    environment:
      - ENVIRONMENT=production
      - DATABASE_URL=${DATABASE_URL}
      - REDIS_URL=${REDIS_URL}
    deploy:
      replicas: 3
      resources:
        limits:
          memory: 1G
          cpus: '0.5'
        reservations:
          memory: 512M
          cpus: '0.25'
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  frontend:
    build: ./front-end/react-chatbot
    environment:
      - REACT_APP_API_URL=${API_URL}
    deploy:
      replicas: 2
      resources:
        limits:
          memory: 512M
          cpus: '0.25'

  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
      - ./ssl:/etc/nginx/ssl
    depends_on:
      - backend
      - frontend
```

### Deployment Automation Scripts:

#### 1. **Deployment Script**

```bash
#!/bin/bash
# scripts/deploy.sh

set -e

ENVIRONMENT=${1:-staging}
VERSION=${2:-latest}

echo "🚀 Deploying Greek Derby RAG Chatbot to $ENVIRONMENT"

# Build and push Docker images
echo "📦 Building Docker images..."
docker build -t greek-derby-backend:$VERSION ./backend
docker build -t greek-derby-frontend:$VERSION ./front-end/react-chatbot

# Tag for registry
docker tag greek-derby-backend:$VERSION registry.greekderby.com/backend:$VERSION
docker tag greek-derby-frontend:$VERSION registry.greekderby.com/frontend:$VERSION

# Push to registry
echo "⬆️ Pushing images to registry..."
docker push registry.greekderby.com/backend:$VERSION
docker push registry.greekderby.com/frontend:$VERSION

# Deploy to environment
echo "🚀 Deploying to $ENVIRONMENT..."
kubectl set image deployment/greek-derby-backend \
  backend=registry.greekderby.com/backend:$VERSION \
  -n $ENVIRONMENT

kubectl set image deployment/greek-derby-frontend \
  frontend=registry.greekderby.com/frontend:$VERSION \
  -n $ENVIRONMENT

# Wait for rollout
echo "⏳ Waiting for rollout to complete..."
kubectl rollout status deployment/greek-derby-backend -n $ENVIRONMENT
kubectl rollout status deployment/greek-derby-frontend -n $ENVIRONMENT

# Run health checks
echo "🏥 Running health checks..."
./scripts/health-check.sh $ENVIRONMENT

echo "✅ Deployment completed successfully!"
```

#### 2. **Health Check Script**

```bash
#!/bin/bash
# scripts/health-check.sh

ENVIRONMENT=${1:-staging}
MAX_ATTEMPTS=30
ATTEMPT=0

echo "🏥 Running health checks for $ENVIRONMENT..."

while [ $ATTEMPT -lt $MAX_ATTEMPTS ]; do
    ATTEMPT=$((ATTEMPT + 1))
    echo "Attempt $ATTEMPT/$MAX_ATTEMPTS"
    
    # Check backend health
    BACKEND_STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
        https://api-$ENVIRONMENT.greekderby.com/health)
    
    # Check frontend health
    FRONTEND_STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
        https://$ENVIRONMENT.greekderby.com/)
    
    if [ "$BACKEND_STATUS" = "200" ] && [ "$FRONTEND_STATUS" = "200" ]; then
        echo "✅ All health checks passed!"
        exit 0
    fi
    
    echo "⏳ Waiting for services to be ready..."
    sleep 10
done

echo "❌ Health checks failed after $MAX_ATTEMPTS attempts"
exit 1
```

### Recommended Strategy for Our RAG Chatbot:

1. **Development**: Use rolling deployments for quick iterations
2. **Staging**: Use blue-green for testing new features
3. **Production**: Use canary deployment for safe rollouts
4. **Hotfixes**: Use blue-green for immediate fixes
5. **Database Changes**: Use feature flags and backward compatibility

### Key Benefits:

- **Zero Downtime**: Users always have access to the service
- **Quick Rollback**: Easy to revert problematic deployments
- **Gradual Rollout**: Test with small user groups first
- **Automated Process**: Reduces human error and deployment time
- **Monitoring**: Real-time feedback on deployment success


## Q6: How do we create a complete CI/CD pipeline for our Greek Derby RAG chatbot?

**Answer:**

Let's put everything together to create a comprehensive CI/CD pipeline that covers the entire lifecycle of our RAG chatbot, from code commit to production deployment.

### Complete CI/CD Pipeline Architecture:

```
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Developer     │    │   GitHub        │    │   Production    │
│   Pushes Code   │───▶│   Actions       │───▶│   Environment   │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                              │
                              ▼
                       ┌─────────────────┐
                       │   Staging       │
                       │   Environment   │
                       └─────────────────┘
```

### Master CI/CD Workflow:

```yaml
# .github/workflows/complete-pipeline.yml
name: Complete CI/CD Pipeline

on:
  push:
    branches: [ main, develop, feature/* ]
  pull_request:
    branches: [ main, develop ]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  # Job 1: Code Quality and Security
  quality-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.9'
      
      - name: Set up Node.js
        uses: actions/setup-node@v3
        with:
          node-version: '18'
      
      - name: Install Python dependencies
        run: |
          pip install -r backend/requirements.txt
          pip install flake8 mypy bandit safety pytest
      
      - name: Install Node.js dependencies
        run: |
          cd front-end/react-chatbot
          npm ci
      
      - name: Python Linting
        run: |
          flake8 backend/ --count --select=E9,F63,F7,F82 --show-source --statistics
      
      - name: Python Type Checking
        run: |
          mypy backend/ --ignore-missing-imports
      
      - name: Python Security Scan
        run: |
          bandit -r backend/ -f json -o bandit-report.json
          safety check -r backend/requirements.txt
      
      - name: Node.js Security Scan
        run: |
          cd front-end/react-chatbot
          npm audit --audit-level=moderate
      
      - name: Upload Security Reports
        uses: actions/upload-artifact@v3
        with:
          name: security-reports
          path: |
            bandit-report.json
            safety-report.json

  # Job 2: Build and Test
  build-and-test:
    runs-on: ubuntu-latest
    needs: quality-check
    strategy:
      matrix:
        service: [backend, frontend]
    
    steps:
      - uses: actions/checkout@v3
      
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
      
      - name: Build Docker Image
        uses: docker/build-push-action@v4
        with:
          context: ./${{ matrix.service }}
          push: false
          tags: ${{ matrix.service }}:test
          cache-from: type=gha
          cache-to: type=gha,mode=max
      
      - name: Run Tests
        if: matrix.service == 'backend'
        run: |
          docker run --rm -v $(pwd)/backend:/app backend:test pytest /app/tests/
      
      - name: Run Tests
        if: matrix.service == 'frontend'
        run: |
          docker run --rm -v $(pwd)/front-end/react-chatbot:/app frontend:test npm test -- --coverage

  # Job 3: Integration Tests
  integration-tests:
    runs-on: ubuntu-latest
    needs: build-and-test
    services:
      postgres:
        image: postgres:13
        env:
          POSTGRES_PASSWORD: postgres
          POSTGRES_DB: test_db
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
      
      redis:
        image: redis:6
        options: >-
          --health-cmd "redis-cli ping"
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
    
    steps:
      - uses: actions/checkout@v3
      
      - name: Run Integration Tests
        run: |
          docker-compose -f docker-compose.test.yml up --abort-on-container-exit
        env:
          DATABASE_URL: postgresql://postgres:postgres@localhost:5432/test_db
          REDIS_URL: redis://localhost:6379

  # Job 4: Build and Push Images
  build-and-push:
    runs-on: ubuntu-latest
    needs: [quality-check, build-and-test, integration-tests]
    if: github.event_name == 'push'
    
    steps:
      - uses: actions/checkout@v3
      
      - name: Log in to Container Registry
        uses: docker/login-action@v2
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      
      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v4
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=ref,event=branch
            type=ref,event=pr
            type=sha,prefix={{branch}}-
            type=raw,value=latest,enable={{is_default_branch}}
      
      - name: Build and push Backend
        uses: docker/build-push-action@v4
        with:
          context: ./backend
          push: true
          tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}/backend:${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
      
      - name: Build and push Frontend
        uses: docker/build-push-action@v4
        with:
          context: ./front-end/react-chatbot
          push: true
          tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}/frontend:${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  # Job 5: Deploy to Staging
  deploy-staging:
    runs-on: ubuntu-latest
    needs: build-and-push
    if: github.ref == 'refs/heads/develop'
    environment: staging
    
    steps:
      - uses: actions/checkout@v3
      
      - name: Deploy to Staging
        run: |
          echo "🚀 Deploying to staging environment..."
          # Add your staging deployment commands here
          # kubectl apply -f k8s/staging/
          # or
          # docker-compose -f docker-compose.staging.yml up -d
      
      - name: Run Smoke Tests
        run: |
          echo "🧪 Running smoke tests..."
          # Add smoke test commands here
          # curl -f https://staging.greekderby.com/health

  # Job 6: Deploy to Production
  deploy-production:
    runs-on: ubuntu-latest
    needs: [build-and-push, deploy-staging]
    if: github.ref == 'refs/heads/main'
    environment: production
    
    steps:
      - uses: actions/checkout@v3
      
      - name: Deploy to Production
        run: |
          echo "🚀 Deploying to production environment..."
          # Add your production deployment commands here
          # kubectl apply -f k8s/production/
      
      - name: Run Health Checks
        run: |
          echo "🏥 Running health checks..."
          # Add health check commands here
          # curl -f https://greekderby.com/health
      
      - name: Notify Deployment
        run: |
          echo "📢 Notifying team of successful deployment..."
          # Add notification commands here (Slack, email, etc.)
```

### Environment-Specific Configurations:

#### 1. **Development Environment**
```yaml
# docker-compose.dev.yml
version: '3.8'
services:
  backend:
    build: ./backend
    environment:
      - ENVIRONMENT=development
      - DEBUG=true
      - LOG_LEVEL=debug
    volumes:
      - ./backend:/app
    ports:
      - "8000:8000"
  
  frontend:
    build: ./front-end/react-chatbot
    environment:
      - REACT_APP_API_URL=http://localhost:8000
    volumes:
      - ./front-end/react-chatbot:/app
    ports:
      - "3000:3000"
```

#### 2. **Staging Environment**
```yaml
# docker-compose.staging.yml
version: '3.8'
services:
  backend:
    image: ghcr.io/username/greek-derby-rag-chatbot/backend:develop
    environment:
      - ENVIRONMENT=staging
      - DATABASE_URL=${STAGING_DATABASE_URL}
    deploy:
      replicas: 2
      resources:
        limits:
          memory: 512M
          cpus: '0.5'
  
  frontend:
    image: ghcr.io/username/greek-derby-rag-chatbot/frontend:develop
    environment:
      - REACT_APP_API_URL=https://api-staging.greekderby.com
    deploy:
      replicas: 2
```

#### 3. **Production Environment**
```yaml
# docker-compose.prod.yml
version: '3.8'
services:
  backend:
    image: ghcr.io/username/greek-derby-rag-chatbot/backend:latest
    environment:
      - ENVIRONMENT=production
      - DATABASE_URL=${PROD_DATABASE_URL}
    deploy:
      replicas: 3
      resources:
        limits:
          memory: 1G
          cpus: '1.0'
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3
  
  frontend:
    image: ghcr.io/username/greek-derby-rag-chatbot/frontend:latest
    environment:
      - REACT_APP_API_URL=https://api.greekderby.com
    deploy:
      replicas: 3
      resources:
        limits:
          memory: 512M
          cpus: '0.5'
```

### Monitoring and Alerting Setup:

```yaml
# monitoring/prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'greek-derby-backend'
    static_configs:
      - targets: ['backend:8000']
    metrics_path: '/metrics'
    scrape_interval: 5s
  
  - job_name: 'greek-derby-frontend'
    static_configs:
      - targets: ['frontend:3000']
    metrics_path: '/metrics'
    scrape_interval: 5s
```

### Practical Exercises:

#### Exercise 1: Set up Basic CI Pipeline
1. Create a `.github/workflows/ci.yml` file
2. Add basic linting and testing steps
3. Test with a sample commit

#### Exercise 2: Implement Security Scanning
1. Add Bandit security scanning for Python code
2. Add npm audit for Node.js dependencies
3. Configure security reports as artifacts

#### Exercise 3: Create Docker Multi-stage Build
1. Optimize Dockerfile for production
2. Implement multi-stage builds
3. Reduce image size and build time

#### Exercise 4: Set up Environment-specific Deployments
1. Create staging and production environments
2. Configure environment-specific variables
3. Implement health checks

#### Exercise 5: Implement Monitoring
1. Add Prometheus metrics to your application
2. Set up Grafana dashboards
3. Configure alerting rules

### Key Takeaways:

1. **Automation is Key**: Automate everything possible to reduce human error
2. **Security First**: Integrate security scanning into every pipeline
3. **Test Early and Often**: Run tests at every stage of the pipeline
4. **Monitor Everything**: Set up comprehensive monitoring and alerting
5. **Environment Parity**: Keep all environments as similar as possible
6. **Documentation**: Document your CI/CD processes and procedures
7. **Continuous Improvement**: Regularly review and improve your pipeline

### Next Steps:

1. **Start Simple**: Begin with basic CI and gradually add complexity
2. **Iterate**: Continuously improve your pipeline based on feedback
3. **Monitor**: Keep track of pipeline performance and reliability
4. **Scale**: Adapt your pipeline as your application grows
5. **Learn**: Stay updated with new CI/CD tools and practices

---

**🎉 Congratulations!** You now have a comprehensive understanding of DevOps CI/CD concepts and how to implement them for your Greek Derby RAG chatbot project. This knowledge will help you build reliable, scalable, and maintainable applications.
