# Practical 6: Docker for Data Processing Pipelines

## Goals

This practical session introduces Docker containerization for building scalable and reproducible data processing pipelines. You will learn how to package applications, manage multi-container environments, and deploy data processing workflows.

### Learning Objectives
* Understand Docker architecture and containerization concepts
* Write Dockerfiles to package Python applications
* Use Docker Compose for multi-container orchestration
* Implement data pipelines with shared volumes
* Build producer-consumer patterns with message queues
* Connect applications to databases in containers
* Implement frontend-backend architectures
* Deploy and scale data processing applications

### Prerequisites
* Completion of Practical 5 (Apache Spark)
* Docker Desktop installed ([Installation Guide](https://docs.docker.com/get-docker/))
* Basic understanding of Linux commands
* Python programming fundamentals

### Installation

Verify Docker is installed:
```bash
docker --version
docker-compose --version
```

### Exercises Overview

| Exercise | Topic | Difficulty |
|----------|-------|------------|
| 1 | Docker Fundamentals and Basic Commands | ★ |
| 2 | Writing Dockerfiles for Python Applications | ★ |
| 3 | Docker Compose for Multi-Container Applications | ★★ |
| 4 | Data Pipelines with Shared Volumes | ★★ |
| 5 | Producer-Consumer with Message Queues | ★★ |
| 6 | Application-Database Integration | ★★ |
| 7 | Frontend-Backend Architectures | ★★★ |
| 8 | Scaling and Monitoring Containers | ★★★ |

---

## Exercise 1: Docker Fundamentals and Basic Commands [★]

### Docker Architecture

Docker uses a client-server architecture:

```
┌─────────────────────────────────────────────────────────────┐
│                     Docker Host                              │
│  ┌─────────────┐    ┌─────────────────────────────────────┐ │
│  │   Docker    │    │          Docker Daemon               │ │
│  │   Client    │◄──►│  ┌─────────┐  ┌─────────┐           │ │
│  │   (CLI)     │    │  │Container│  │Container│           │ │
│  └─────────────┘    │  │   1     │  │   2     │           │ │
│                     │  └─────────┘  └─────────┘           │ │
│                     │       │            │                 │ │
│                     │  ┌────┴────────────┴────┐           │ │
│                     │  │      Images          │           │ │
│                     │  └─────────────────────┘           │ │
│                     └─────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```

### Key Concepts

- **Image**: Read-only template with instructions for creating a container
- **Container**: Runnable instance of an image
- **Dockerfile**: Text file with instructions to build an image
- **Registry**: Storage for Docker images (e.g., Docker Hub)

### Basic Docker Commands

Run the following commands in your terminal to familiarize yourself with Docker:

```bash
# Check Docker version
docker --version

# View system-wide information
docker info

# List available images
docker images

# List running containers
docker ps

# List all containers (including stopped)
docker ps -a
```

### Running Your First Container

```bash
# Run a simple hello-world container
docker run hello-world

# Run an interactive Python container
docker run -it python:3.10 python

# Run a container with a specific command
docker run python:3.10 python -c "print('Hello from Docker!')"

# Run a container in the background (detached mode)
docker run -d --name my_python python:3.10 sleep 60

# Stop a running container
docker stop my_python

# Remove a container
docker rm my_python
```

### Container Lifecycle

```
┌─────────┐   docker run   ┌─────────┐   docker stop   ┌─────────┐
│ Created │───────────────►│ Running │────────────────►│ Stopped │
└─────────┘                └─────────┘                 └─────────┘
     │                          │                           │
     │                          │ docker pause              │
     │                          ▼                           │
     │                    ┌─────────┐                       │
     │                    │ Paused  │                       │
     │                    └─────────┘                       │
     │                                                      │
     └──────────────────────────────────────────────────────┘
                        docker rm
```

```bash
# View container logs
docker logs <container_id>

# Execute command in running container
docker exec -it <container_id> bash

# Copy files to/from container
docker cp local_file.txt <container_id>:/path/in/container/
docker cp <container_id>:/path/in/container/file.txt ./local_file.txt

# View container resource usage
docker stats
```

### Questions - Exercise 1

**Q1.1** Run a Python container that prints the system's Python version, OS name, and current date/time. Capture the output.

**Q1.2** Run an Ubuntu container interactively. Inside the container:
- Update the package list
- Install `curl`
- Download a web page
- Exit the container

**Q1.3** Run three containers in detached mode with different names. Use `docker ps` to verify they're running, then stop and remove all of them using a single command each.

---

## Exercise 2: Writing Dockerfiles for Python Applications [★]

### Dockerfile Basics

A Dockerfile is a script containing instructions to build a Docker image.

### Common Dockerfile Instructions

| Instruction | Description |
|-------------|-------------|
| `FROM` | Base image to start from |
| `WORKDIR` | Set working directory |
| `COPY` | Copy files from host to image |
| `RUN` | Execute commands during build |
| `ENV` | Set environment variables |
| `EXPOSE` | Document which ports the container listens on |
| `CMD` | Default command when container starts |
| `ENTRYPOINT` | Configure container to run as executable |

### Example: Simple Python Application

Create a file `app.py`:

```python
# app.py
import sys
import platform
from datetime import datetime

def main():
    print(f"Python version: {sys.version}")
    print(f"Platform: {platform.platform()}")
    print(f"Current time: {datetime.now()}")
    print("Hello from Docker!")

if __name__ == "__main__":
    main()
```

Create a `Dockerfile`:

```dockerfile
# Use official Python image as base
FROM python:3.10-slim

# Set working directory
WORKDIR /app

# Copy application code
COPY app.py .

# Set the default command
CMD ["python", "app.py"]
```

Build and run:

```bash
# Build the image
docker build -t my-python-app .

# Run the container
docker run my-python-app
```

### Example: Python Application with Dependencies

Create `requirements.txt`:

```
pandas==2.0.0
numpy==1.24.0
requests==2.28.0
```

Create `data_processor.py`:

```python
import pandas as pd
import numpy as np

def process_data():
    # Create sample data
    data = {
        'name': ['Alice', 'Bob', 'Charlie', 'Diana'],
        'value': np.random.randint(1, 100, 4)
    }
    df = pd.DataFrame(data)
    
    print("Data Processing Results:")
    print(df)
    print(f"\nSum: {df['value'].sum()}")
    print(f"Mean: {df['value'].mean():.2f}")

if __name__ == "__main__":
    process_data()
```

Optimized `Dockerfile`:

```dockerfile
FROM python:3.10-slim

WORKDIR /app

# Copy requirements first (for better layer caching)
COPY requirements.txt .

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY data_processor.py .

CMD ["python", "data_processor.py"]
```

### Multi-Stage Builds

Multi-stage builds help create smaller production images:

```dockerfile
# Build stage
FROM python:3.10 AS builder

WORKDIR /app

COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

# Production stage
FROM python:3.10-slim

WORKDIR /app

# Copy installed packages from builder
COPY --from=builder /root/.local /root/.local

# Make sure scripts in .local are usable
ENV PATH=/root/.local/bin:$PATH

COPY app.py .

CMD ["python", "app.py"]
```

### Best Practices for Dockerfiles

1. **Use specific base image tags**: `python:3.10-slim` instead of `python:latest`
2. **Order instructions by frequency of change**: Copy requirements before code
3. **Use `.dockerignore`**: Exclude unnecessary files
4. **Minimize layers**: Combine related RUN commands
5. **Don't run as root**: Create a non-root user when possible
6. **Use multi-stage builds**: For smaller production images

Example `.dockerignore`:

```
__pycache__
*.pyc
*.pyo
.git
.gitignore
*.md
.env
venv/
.pytest_cache/
```

### Questions - Exercise 2

**Q2.1** Create a Dockerfile for a PySpark application that:
- Uses `bitnami/spark` as the base image
- Installs additional Python packages (pandas, matplotlib)
- Copies a Spark script that processes CSV data
- Runs the script when the container starts

**Q2.2** Create a Dockerfile that:
- Uses a non-root user for security
- Implements health checks
- Uses environment variables for configuration
- Includes proper labeling (maintainer, version, description)

**Q2.3** Compare the image sizes of:
- A simple Dockerfile using `python:3.10`
- The same application using `python:3.10-slim`
- A multi-stage build version

Document the size differences and explain when each approach is appropriate.

---

## Exercise 3: Docker Compose for Multi-Container Applications [★★]

### Docker Compose Overview

Docker Compose allows you to define and run multi-container applications using a YAML file.

### Basic docker-compose.yml Structure

```yaml
version: "3.8"

services:
  service_name:
    image: image_name:tag
    # OR build from Dockerfile
    build: ./path/to/dockerfile
    ports:
      - "host_port:container_port"
    volumes:
      - ./local/path:/container/path
    environment:
      - VAR_NAME=value
    depends_on:
      - other_service

volumes:
  named_volume:

networks:
  custom_network:
```

### Docker Compose Commands

```bash
# Start all services
docker-compose up

# Start in detached mode
docker-compose up -d

# Build images before starting
docker-compose up --build

# Stop all services
docker-compose down

# Stop and remove volumes
docker-compose down -v

# View logs
docker-compose logs
docker-compose logs -f service_name

# Scale a service
docker-compose up --scale service_name=3

# Execute command in a service
docker-compose exec service_name command
```

### Example: Web Application with Redis

Create `app.py`:

```python
from flask import Flask
import redis

app = Flask(__name__)
cache = redis.Redis(host='redis', port=6379)

@app.route('/')
def hello():
    count = cache.incr('hits')
    return f'Hello! This page has been viewed {count} times.'

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)
```

Create `requirements.txt`:

```
flask
redis
```

Create `Dockerfile`:

```dockerfile
FROM python:3.10-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY app.py .

EXPOSE 5000

CMD ["python", "app.py"]
```

Create `docker-compose.yml`:

```yaml
version: "3.8"

services:
  web:
    build: .
    ports:
      - "5000:5000"
    depends_on:
      - redis
    environment:
      - FLASK_ENV=development

  redis:
    image: redis:alpine
    volumes:
      - redis_data:/data

volumes:
  redis_data:
```

Run with:

```bash
docker-compose up --build
```

### Service Dependencies and Health Checks

```yaml
version: "3.8"

services:
  web:
    build: .
    depends_on:
      db:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:5000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  db:
    image: postgres:15
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 5
```

### Questions - Exercise 3

**Q3.1** Create a Docker Compose configuration for a data processing pipeline with:
- A Python data generator service
- A Redis service for caching
- A data processor service that reads from Redis
- Proper service dependencies

**Q3.2** Modify the previous example to use:
- Custom networks for service isolation
- Environment files (`.env`)
- Volume mounts for data persistence

**Q3.3** Create a Docker Compose file that starts a Jupyter Notebook server with:
- Pre-installed data science libraries (pandas, numpy, matplotlib, sklearn)
- Persistent notebook storage
- Access to a shared data volume

---

## Exercise 4: Data Pipelines with Shared Volumes [★★]

### Shared Volumes for Container Communication

Shared volumes allow containers to exchange data through the file system.

```
┌─────────────────┐       ┌─────────────────┐
│    Uploader     │       │    Processor    │
│    Container    │       │    Container    │
│                 │       │                 │
│   writes to     │       │   reads from    │
│   /shared       │       │   /shared       │
└────────┬────────┘       └────────┬────────┘
         │                         │
         └─────────┬───────────────┘
                   │
            ┌──────┴──────┐
            │   Shared    │
            │   Volume    │
            └─────────────┘
```

### Example: File Processing Pipeline

Navigate to the `SharedVolume` folder in this practical:

```bash
cd SharedVolume
```

Examine the existing structure:

**Uploader Service** (`Uploader/upload.py`):
```python
import time
from shutil import copyfile

def upload_file():
    while True:
        # Simulate uploading a new file every 5 seconds
        print("Uploading new file...")
        copyfile("sample.txt", "/shared/sample_uploaded.txt")
        time.sleep(5)

if __name__ == "__main__":
    upload_file()
```

**Processor Service** (`Processor/process.py`):
```python
import time
import os

def process_files():
    while True:
        if os.path.exists("/shared/sample_uploaded.txt"):
            with open("/shared/sample_uploaded.txt", "r") as f:
                content = f.read()
            print(f"Processing: {content}")
            # Process the file...
            os.remove("/shared/sample_uploaded.txt")
        else:
            print("Waiting for files...")
        time.sleep(2)

if __name__ == "__main__":
    process_files()
```

**docker-compose.yml**:
```yaml
version: "3.8"

services:
  uploader:
    build:
      context: ./uploader
    volumes:
      - ./shared:/shared
    depends_on:
      - processor

  processor:
    build:
      context: ./processor
    volumes:
      - ./shared:/shared
```

Run with:
```bash
docker-compose up --build
```

### Enhanced Data Pipeline Example

Create a more sophisticated data pipeline:

**data_generator.py**:
```python
import json
import time
import random
from datetime import datetime

def generate_data():
    counter = 0
    while True:
        data = {
            "id": counter,
            "timestamp": datetime.now().isoformat(),
            "sensor_id": f"sensor_{random.randint(1, 10)}",
            "temperature": round(random.uniform(20, 35), 2),
            "humidity": round(random.uniform(30, 80), 2)
        }
        
        filename = f"/shared/input/data_{counter}.json"
        with open(filename, 'w') as f:
            json.dump(data, f)
        
        print(f"Generated: {filename}")
        counter += 1
        time.sleep(2)

if __name__ == "__main__":
    import os
    os.makedirs("/shared/input", exist_ok=True)
    generate_data()
```

**data_processor.py**:
```python
import json
import os
import time

def process_files():
    os.makedirs("/shared/output", exist_ok=True)
    
    while True:
        input_dir = "/shared/input"
        if os.path.exists(input_dir):
            files = [f for f in os.listdir(input_dir) if f.endswith('.json')]
            
            for filename in files:
                filepath = os.path.join(input_dir, filename)
                
                with open(filepath, 'r') as f:
                    data = json.load(f)
                
                # Process the data
                data['processed'] = True
                data['temp_fahrenheit'] = round(data['temperature'] * 9/5 + 32, 2)
                
                # Write to output
                output_path = f"/shared/output/processed_{filename}"
                with open(output_path, 'w') as f:
                    json.dump(data, f, indent=2)
                
                # Remove input file
                os.remove(filepath)
                print(f"Processed: {filename}")
        
        time.sleep(1)

if __name__ == "__main__":
    process_files()
```

### Questions - Exercise 4

**Q4.1** Extend the SharedVolume example to:
- Add a third service that aggregates processed files
- Generate statistics (average temperature, humidity by sensor)
- Output a summary report every minute

**Q4.2** Implement error handling in the pipeline:
- Move failed files to an "error" directory
- Log errors with timestamps
- Add a monitoring service that reports pipeline health

**Q4.3** Create a parallel processing pipeline:
- Multiple processor containers (use `--scale`)
- Implement file locking to prevent duplicate processing
- Measure throughput with different numbers of processors

---

## Exercise 5: Producer-Consumer with Message Queues [★★]

### Message Queue Pattern

Message queues decouple producers and consumers, enabling:
- Asynchronous processing
- Load balancing
- Fault tolerance

```
┌──────────┐     ┌─────────────┐     ┌──────────┐
│ Producer │────►│   Message   │────►│ Consumer │
│    1     │     │    Queue    │     │    1     │
└──────────┘     │  (RabbitMQ) │     └──────────┘
┌──────────┐     │             │     ┌──────────┐
│ Producer │────►│             │────►│ Consumer │
│    2     │     └─────────────┘     │    2     │
└──────────┘                         └──────────┘
```

### RabbitMQ Example

Navigate to the `ProducerConsumerRabbitMQ` folder:

```bash
cd ProducerConsumerRabbitMQ
```

**producer/producer.py**:
```python
import pika
import time

def connect():
    for i in range(5):
        try:
            return pika.BlockingConnection(pika.ConnectionParameters('rabbitmq'))
        except:
            print("Retrying connection to RabbitMQ...")
            time.sleep(2)
    raise Exception("Could not connect to RabbitMQ")

connection = connect()
channel = connection.channel()
channel.queue_declare(queue='task_queue', durable=True)

for i in range(100):
    msg = f"Task #{i}"
    channel.basic_publish(
        exchange='',
        routing_key='task_queue',
        body=msg,
        properties=pika.BasicProperties(delivery_mode=2)  # Make message persistent
    )
    print(f"Sent: {msg}")
    time.sleep(1)

connection.close()
```

**consumer/consumer.py**:
```python
import pika
import time

def connect():
    for i in range(5):
        try:
            return pika.BlockingConnection(pika.ConnectionParameters('rabbitmq'))
        except:
            print("Retrying connection to RabbitMQ...")
            time.sleep(2)
    raise Exception("Could not connect to RabbitMQ")

def callback(ch, method, properties, body):
    print(f"Received: {body.decode()}")
    time.sleep(0.5)  # Simulate processing
    print(f"Processed: {body.decode()}")
    ch.basic_ack(delivery_tag=method.delivery_tag)

connection = connect()
channel = connection.channel()
channel.queue_declare(queue='task_queue', durable=True)
channel.basic_qos(prefetch_count=1)  # Fair dispatch
channel.basic_consume(queue='task_queue', on_message_callback=callback)

print('Waiting for messages...')
channel.start_consuming()
```

**docker-compose.yml**:
```yaml
services:
  rabbitmq:
    image: rabbitmq:3-management
    ports:
      - "5672:5672"   # AMQP protocol
      - "15672:15672" # Management UI
    environment:
      RABBITMQ_DEFAULT_USER: guest
      RABBITMQ_DEFAULT_PASS: guest

  producer:
    build: ./producer
    depends_on:
      - rabbitmq

  consumer:
    build: ./consumer
    depends_on:
      - rabbitmq
```

Run with:
```bash
docker-compose up --build

# Scale consumers
docker-compose up --scale consumer=3
```

Access RabbitMQ management UI at: http://localhost:15672 (guest/guest)

### Data Processing with Message Queues

Enhanced producer for data processing:

```python
# data_producer.py
import pika
import json
import random
import time
from datetime import datetime

def connect():
    for i in range(5):
        try:
            return pika.BlockingConnection(pika.ConnectionParameters('rabbitmq'))
        except:
            time.sleep(2)
    raise Exception("Could not connect")

connection = connect()
channel = connection.channel()
channel.queue_declare(queue='data_queue', durable=True)

sensors = ['temperature', 'humidity', 'pressure']

while True:
    data = {
        'sensor_type': random.choice(sensors),
        'value': round(random.uniform(0, 100), 2),
        'timestamp': datetime.now().isoformat()
    }
    
    channel.basic_publish(
        exchange='',
        routing_key='data_queue',
        body=json.dumps(data),
        properties=pika.BasicProperties(delivery_mode=2)
    )
    
    print(f"Sent: {data}")
    time.sleep(0.5)
```

### Questions - Exercise 5

**Q5.1** Extend the RabbitMQ example to:
- Use topic-based routing (different queues for different data types)
- Implement multiple consumer types (one for each sensor type)
- Store processed data in a shared volume

**Q5.2** Implement dead letter handling:
- Configure a dead letter queue for failed messages
- Add a retry mechanism (max 3 retries)
- Create a monitoring consumer that alerts on DLQ messages

**Q5.3** Compare RabbitMQ with Redis Pub/Sub:
- Implement the same producer-consumer pattern with Redis
- Measure message throughput
- Document the trade-offs between the two approaches

---

## Exercise 6: Application-Database Integration [★★]

### Connecting Applications to Databases

Navigate to the `AppDB` folder:

```bash
cd AppDB
```

This example demonstrates a Flask application connected to PostgreSQL.

**app/app.py**:
```python
from flask import Flask
import psycopg2

app = Flask(__name__)

@app.route("/")
def index():
    conn = psycopg2.connect(
        host="bd",  # Service name in Docker
        database="livres",
        user="postgres",
        password="postgres"
    )
    cur = conn.cursor()
    cur.execute("SELECT titre FROM livres")
    livres = cur.fetchall()
    cur.close()
    conn.close()
    return "<br>".join(title for (title,) in livres)

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)
```

**init_bd/init.sql**:
```sql
CREATE TABLE IF NOT EXISTS livres (
    id SERIAL PRIMARY KEY,
    titre VARCHAR(255) NOT NULL,
    auteur VARCHAR(255),
    annee INTEGER
);

INSERT INTO livres (titre, auteur, annee) VALUES
    ('Les Misérables', 'Victor Hugo', 1862),
    ('Le Petit Prince', 'Antoine de Saint-Exupéry', 1943),
    ('L''Étranger', 'Albert Camus', 1942);
```

**docker-compose.yml**:
```yaml
services:
  app:
    build: ./app
    ports:
      - "5000:5000"
    depends_on:
      - bd

  bd:
    image: postgres:15
    environment:
      POSTGRES_DB: livres
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
    volumes:
      - ./init_bd:/docker-entrypoint-initdb.d
      - postgres_data:/var/lib/postgresql/data
    ports:
      - "5432:5432"

volumes:
  postgres_data:
```

Run with:
```bash
docker-compose up --build
```

Access at: http://localhost:5000

### Enhanced Example with SQLAlchemy

```python
# app_enhanced.py
from flask import Flask, jsonify, request
from flask_sqlalchemy import SQLAlchemy
import os

app = Flask(__name__)

# Database configuration from environment
db_host = os.environ.get('DB_HOST', 'bd')
db_name = os.environ.get('DB_NAME', 'livres')
db_user = os.environ.get('DB_USER', 'postgres')
db_pass = os.environ.get('DB_PASS', 'postgres')

app.config['SQLALCHEMY_DATABASE_URI'] = f'postgresql://{db_user}:{db_pass}@{db_host}/{db_name}'
app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False

db = SQLAlchemy(app)

class Book(db.Model):
    __tablename__ = 'livres'
    id = db.Column(db.Integer, primary_key=True)
    titre = db.Column(db.String(255), nullable=False)
    auteur = db.Column(db.String(255))
    annee = db.Column(db.Integer)

@app.route('/books')
def get_books():
    books = Book.query.all()
    return jsonify([{
        'id': b.id,
        'title': b.titre,
        'author': b.auteur,
        'year': b.annee
    } for b in books])

@app.route('/books', methods=['POST'])
def add_book():
    data = request.json
    book = Book(titre=data['title'], auteur=data['author'], annee=data['year'])
    db.session.add(book)
    db.session.commit()
    return jsonify({'id': book.id}), 201

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)
```

### Questions - Exercise 6

**Q6.1** Extend the AppDB example to include:
- CRUD operations (Create, Read, Update, Delete)
- Input validation
- Error handling with appropriate HTTP status codes

**Q6.2** Add data analytics capabilities:
- Endpoint to get books by year range
- Statistics endpoint (count by author, books per decade)
- Full-text search capability

**Q6.3** Implement a data import service:
- Create a separate container that imports CSV data into the database
- Watch a shared volume for new CSV files
- Log import results and errors

---

## Exercise 7: Frontend-Backend Architectures [★★★]

### Microservices Architecture

Navigate to the `WebAppFrontBack` folder:

```bash
cd WebAppFrontBack
```

This example demonstrates a React frontend with a Flask backend.

```
┌─────────────────┐      ┌─────────────────┐
│    Frontend     │      │    Backend      │
│    (React)      │─────►│    (Flask)      │
│   Port: 3000    │      │   Port: 5000    │
└─────────────────┘      └─────────────────┘
```

### Backend API (Flask)

**backend/app.py**:
```python
from flask import Flask, jsonify, request
from flask_cors import CORS

app = Flask(__name__)
CORS(app)  # Enable Cross-Origin requests

# In-memory data store
tasks = [
    {"id": 1, "title": "Learn Docker", "completed": True},
    {"id": 2, "title": "Build a pipeline", "completed": False}
]

@app.route('/api/tasks', methods=['GET'])
def get_tasks():
    return jsonify(tasks)

@app.route('/api/tasks', methods=['POST'])
def add_task():
    data = request.json
    new_task = {
        "id": len(tasks) + 1,
        "title": data['title'],
        "completed": False
    }
    tasks.append(new_task)
    return jsonify(new_task), 201

@app.route('/api/tasks/<int:task_id>', methods=['PUT'])
def update_task(task_id):
    task = next((t for t in tasks if t['id'] == task_id), None)
    if task:
        data = request.json
        task['completed'] = data.get('completed', task['completed'])
        return jsonify(task)
    return jsonify({"error": "Task not found"}), 404

@app.route('/api/tasks/<int:task_id>', methods=['DELETE'])
def delete_task(task_id):
    global tasks
    tasks = [t for t in tasks if t['id'] != task_id]
    return '', 204

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)
```

### Docker Compose for Full Stack

**docker-compose.yml**:
```yaml
version: "3.8"

services:
  frontend:
    build:
      context: ./frontend
    ports:
      - "3000:3000"
    depends_on:
      - backend
    environment:
      - REACT_APP_API_URL=http://localhost:5000

  backend:
    build:
      context: ./backend
    ports:
      - "5000:5000"
    volumes:
      - ./backend:/app
    environment:
      - FLASK_ENV=development
```

### Adding Nginx as Reverse Proxy

For production deployments, use Nginx as a reverse proxy:

**nginx.conf**:
```nginx
upstream frontend {
    server frontend:3000;
}

upstream backend {
    server backend:5000;
}

server {
    listen 80;

    location / {
        proxy_pass http://frontend;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }

    location /api {
        proxy_pass http://backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}
```

**docker-compose.prod.yml**:
```yaml
version: "3.8"

services:
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/conf.d/default.conf
    depends_on:
      - frontend
      - backend

  frontend:
    build:
      context: ./frontend
      dockerfile: Dockerfile.prod

  backend:
    build:
      context: ./backend
```

### Questions - Exercise 7

**Q7.1** Extend the frontend-backend example to include:
- User authentication (login/logout)
- Protected routes
- JWT token handling

**Q7.2** Add a database to the stack:
- Replace in-memory storage with PostgreSQL
- Add database migrations
- Implement data persistence across restarts

**Q7.3** Create a data visualization dashboard:
- Backend API that serves analytics data
- Frontend with charts (using Chart.js or similar)
- Real-time updates using WebSockets

---

## Exercise 8: Scaling and Monitoring Containers [★★★]

### Container Scaling

```bash
# Scale a specific service
docker-compose up --scale worker=5

# View running containers
docker-compose ps

# View resource usage
docker stats
```

### Load Balancing with Nginx

**docker-compose.yml**:
```yaml
version: "3.8"

services:
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
    depends_on:
      - api

  api:
    build: .
    # No ports exposed - accessed through nginx
    deploy:
      replicas: 3
```

**nginx.conf** for load balancing:
```nginx
events {
    worker_connections 1024;
}

http {
    upstream api_servers {
        least_conn;  # Load balancing method
        server api:5000;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://api_servers;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
        }
    }
}
```

### Monitoring with Prometheus and Grafana

**docker-compose.monitoring.yml**:
```yaml
version: "3.8"

services:
  prometheus:
    image: prom/prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus

  grafana:
    image: grafana/grafana
    ports:
      - "3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin

  cadvisor:
    image: gcr.io/cadvisor/cadvisor
    ports:
      - "8080:8080"
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro

volumes:
  prometheus_data:
  grafana_data:
```

**prometheus.yml**:
```yaml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']
```

### Resource Limits

```yaml
version: "3.8"

services:
  api:
    build: .
    deploy:
      resources:
        limits:
          cpus: '0.50'
          memory: 512M
        reservations:
          cpus: '0.25'
          memory: 256M
```

### Questions - Exercise 8

**Q8.1** Create a scalable data processing pipeline:
- Producer service generating data
- Worker services that can be scaled (1-10 instances)
- Load balancer distributing work
- Measure throughput with different numbers of workers

**Q8.2** Set up monitoring for your application:
- Configure Prometheus to collect metrics
- Create Grafana dashboards for:
  - CPU and memory usage
  - Request rates and latencies
  - Error rates

**Q8.3** Implement auto-scaling simulation:
- Monitor CPU usage of worker containers
- Create a script that scales workers based on load
- Test with varying load patterns

---

## Summary

In this practical, you learned:

1. **Docker Fundamentals**: Images, containers, and basic commands
2. **Dockerfiles**: Writing efficient Dockerfiles for Python applications
3. **Docker Compose**: Orchestrating multi-container applications
4. **Shared Volumes**: Building data pipelines with file-based communication
5. **Message Queues**: Producer-consumer patterns with RabbitMQ
6. **Database Integration**: Connecting applications to PostgreSQL
7. **Frontend-Backend**: Building full-stack applications
8. **Scaling and Monitoring**: Load balancing and observability

### Key Takeaways

- Use Docker Compose for development and testing
- Implement proper health checks for service dependencies
- Use volumes for data persistence
- Choose the right communication pattern (files, messages, API)
- Monitor and scale based on metrics

### Next Steps

In Practical 7, you will learn about Kubernetes for:
- Production-grade container orchestration
- Declarative configuration management
- Automatic scaling and self-healing
- Service discovery and load balancing

### Further Reading

- [Docker Documentation](https://docs.docker.com/)
- [Docker Compose Documentation](https://docs.docker.com/compose/)
- [Best practices for writing Dockerfiles](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/)
- [Docker Security Best Practices](https://docs.docker.com/develop/security-best-practices/)