# Dockerization Practice Notebook

**Prerequisites**: Docker installed, Python 3.10+, PyTorch 2.6.0+, CUDA 12.4+

## Learning Objectives

By the end of this notebook, you will:

1. **Write Dockerfiles**: Create container images for AI/ML projects from scratch
2. **GPU Support**: Configure CUDA and NVIDIA runtime for deep learning
3. **Multi-Service Apps**: Orchestrate training, inference, and development environments
4. **Optimization**: Reduce image sizes with multi-stage builds
5. **Best Practices**: Security, caching, and production-ready configurations

## Why Docker for AI/ML?

**The Dependency Hell Problem**:
- Different projects need different Python versions
- PyTorch 1.x vs 2.x incompatibilities
- CUDA version mismatches
- System library conflicts
- "Works on my machine" syndrome

**Docker Solves This**:
- **Reproducibility**: Same environment everywhere (dev, staging, prod)
- **Isolation**: Each project has its own dependencies
- **Portability**: Build once, run anywhere (laptop, cloud, HPC)
- **Version Control**: Dockerfile tracks environment changes with code
- **Scalability**: Easy to deploy thousands of containers

**Real-World Impact**:
- Companies save weeks of environment setup time
- Research reproducibility improves dramatically
- CI/CD pipelines become reliable
- Cloud deployments are simplified

## Docker vs Virtual Machines

| Feature | Docker Container | Virtual Machine |
|---------|-----------------|------------------|
| Size | MBs | GBs |
| Startup | Seconds | Minutes |
| Performance | Near-native | Overhead |
| Isolation | Process-level | Hardware-level |

**For AI/ML**: Docker is faster, lighter, and sufficient for most use cases.

## How This Notebook Works

Most cells use **`%%writefile`** magic command:
- Creates files directly on your disk
- You can then build Docker images from these files
- Run actual containers to test your work

**Note**: Docker commands should be run in your **terminal**, not in this notebook.

üí° **Tip**: Try completing TODO sections before viewing solutions\!

## Exercise 1: Create a Basic Dockerfile

**Purpose**: Learn the fundamental building blocks of Dockerfiles for PyTorch applications.

### Understanding Dockerfiles

A Dockerfile is a **recipe** for building a Docker image:
- Written as a series of instructions (commands)
- Each instruction creates a new layer
- Layers are cached for faster rebuilds
- Final image is a stack of all layers

Think of it like a script that:
1. Starts with a base operating system
2. Installs software
3. Copies your code
4. Configures how to run it

### Key Dockerfile Instructions

**FROM**: Choose base image
```dockerfile
FROM python:3.10-slim  # Official Python image, minimal size
```
- Always the first instruction
- Use official images when possible (security, maintenance)
- `-slim` variant: Smaller size, fewer packages
- `-alpine`: Even smaller, but compatibility issues

**WORKDIR**: Set working directory
```dockerfile
WORKDIR /app  # All subsequent commands run here
```
- Like `cd /app` but also creates directory
- Keeps image organized
- Relative paths in COPY/ADD are relative to WORKDIR

**RUN**: Execute commands during build
```dockerfile
RUN pip install torch  # Runs at build time
```
- Installs packages, downloads files, compiles code
- Each RUN creates a new layer (affects image size)
- Combine multiple commands with && to reduce layers

**COPY**: Copy files from host to image
```dockerfile
COPY app.py .  # Copies app.py to /app/app.py
```
- Copies from build context (usually current directory)
- Use .dockerignore to exclude files
- Copying source code last improves cache efficiency

**CMD**: Default command when container starts
```dockerfile
CMD ["python", "app.py"]  # Runs when container starts
```
- Only one CMD per Dockerfile (last one wins)
- Can be overridden: `docker run myimage python other.py`
- Use JSON array format for better signal handling

### Why Use python:3.10-slim?

| Image | Size | Use Case |
|-------|------|----------|
| python:3.10 | ~900MB | Full development, all tools |
| python:3.10-slim | ~120MB | Production, minimal overhead |
| python:3.10-alpine | ~50MB | Ultra-minimal, may have compatibility issues |

**For ML**: slim is the sweet spot - small but compatible.

### Layer Caching Strategy

Docker caches layers. Order matters for efficiency:

**Bad (slow rebuilds)**:
```dockerfile
COPY . .              # Changes frequently
RUN pip install torch # Runs every code change
```

**Good (fast rebuilds)**:
```dockerfile
RUN pip install torch # Cached unless dependencies change
COPY . .              # Only this layer rebuilds on code changes
```

**Best practice**: Copy requirements.txt first, install, then copy code.

### Your Task

Create a Dockerfile that:
1. Uses `python:3.10-slim` as base (lightweight)
2. Sets `/app` as working directory (organization)
3. Installs `torch>=2.6.0` (deep learning framework)
4. Copies `app.py` (your application)
5. Runs `app.py` by default (when container starts)

**Success criteria**: A buildable, minimal PyTorch container.

In [None]:
# TODO: Complete this Dockerfile
# This cell uses %%writefile magic command to create a file
# Try to complete it before looking at the solution

%%writefile Dockerfile.basic
# TODO: Create a Dockerfile that:
# 1. Uses python:3.10-slim as base image
# 2. Sets /app as working directory
# 3. Installs torch>=2.6.0
# 4. Copies app.py to container
# 5. Runs app.py as default command

### Solution:

In [None]:
%%writefile Dockerfile.basic
# Base image: Using slim variant to reduce image size
FROM python:3.10-slim

# Set working directory inside container
WORKDIR /app

# Install PyTorch (--no-cache-dir reduces image size)
RUN pip install --no-cache-dir torch>=2.6.0

# Copy application code from host to container
COPY app.py .

# Default command when container starts
CMD ["python", "app.py"]

## Exercise 2: Create a Sample Application

**Purpose**: Build a diagnostic application to verify Docker environment setup.

### Why Create a Test Application?

Before deploying production models, verify:
- Python version is correct
- PyTorch installed successfully
- CUDA is accessible (for GPU containers)
- Basic tensor operations work

**In production**: Similar health check scripts validate deployments.

### What This Application Does

**1. Environment Information**:
```python
print(f"Python: {sys.version}")
print(f"PyTorch: {torch.__version__}")
```
- Confirms correct versions are installed
- Useful for debugging version mismatches

**2. CUDA Detection**:
```python
torch.cuda.is_available()
```
- Returns True if GPU is accessible
- False might mean: no GPU, wrong driver, wrong PyTorch build

**3. GPU Information** (if available):
```python
torch.cuda.get_device_name(0)
```
- Shows GPU model (e.g., "NVIDIA GeForce RTX 3090")
- Helps verify correct GPU is being used

**4. Computation Test**:
```python
x = torch.randn(1000, 1000).cuda()
y = torch.randn(1000, 1000).cuda()
z = torch.matmul(x, y)
```
- Matrix multiplication is a common ML operation
- Tests both memory allocation and computation
- Failure here means something is seriously wrong

### CPU vs GPU Code Paths

Notice the conditional logic:

```python
if torch.cuda.is_available():
    x = torch.randn(1000, 1000).cuda()  # GPU
else:
    x = torch.randn(1000, 1000)  # CPU
```

**Why**:
- CPU-only containers (for testing/development)
- GPU containers (for training/inference)
- Same code works in both environments

### Container Health Checks

This pattern extends to production:

**Dockerfile health check**:
```dockerfile
HEALTHCHECK --interval=30s --timeout=3s \
  CMD python health_check.py || exit 1
```

**Kubernetes readiness probe**:
```yaml
readinessProbe:
  exec:
    command: ["python", "health_check.py"]
```

### Expected Output

**CPU container**:
```
Python: 3.10.x
PyTorch: 2.6.0
CUDA Available: False
Running on CPU
Testing CPU computation...
‚úÖ CPU computation successful
```

**GPU container**:
```
Python: 3.10.x
PyTorch: 2.6.0
CUDA Available: True
CUDA Version: 12.4
GPU: NVIDIA GeForce RTX 3090
Testing GPU computation...
‚úÖ GPU computation successful
```

### Troubleshooting

**If CUDA shows False** (but you have a GPU):
1. Check NVIDIA drivers: `nvidia-smi`
2. Verify nvidia-docker installed: `docker run --gpus all nvidia/cuda:12.4.0-base nvidia-smi`
3. Check PyTorch CUDA version matches system CUDA
4. Use `--gpus all` flag when running container

**If computation fails**:
1. Out of memory: Reduce matrix size
2. CUDA error: Incompatible CUDA versions
3. Permission denied: Check Docker GPU access

### Your Task

This cell creates `app.py` with:
- Version information display
- CUDA availability check
- Conditional GPU/CPU code paths
- Matrix multiplication test
- Clear success/failure messages

**No TODO** - this is a complete reference implementation you'll use in later exercises.

In [None]:
%%writefile app.py
"""Sample PyTorch application for Docker testing.

This script validates that:
1. Python version is correct
2. PyTorch is installed properly
3. CUDA is available (if running with GPU)
4. Basic tensor operations work
"""
import torch
import sys

print("=" * 50)
print("PyTorch Docker Container Test")
print("=" * 50)

# Display environment info
print(f"\nPython: {sys.version}")
print(f"PyTorch: {torch.__version__}")
print(f"CUDA Available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"CUDA Version: {torch.version.cuda}")
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    
    # Test GPU computation
    print("\nTesting GPU computation...")
    x = torch.randn(1000, 1000).cuda()
    y = torch.randn(1000, 1000).cuda()
    z = torch.matmul(x, y)
    print("‚úÖ GPU computation successful")
else:
    print("\nRunning on CPU")
    print("Testing CPU computation...")
    x = torch.randn(1000, 1000)
    y = torch.randn(1000, 1000)
    z = torch.matmul(x, y)
    print("‚úÖ CPU computation successful")

print("\n" + "=" * 50)
print("All tests passed!")
print("=" * 50)

## Exercise 3: Build and Run Docker Image

**Purpose**: Learn Docker CLI commands to build images and run containers.

### The Docker Build Process

**What happens when you build**:
1. Docker reads the Dockerfile
2. Executes each instruction in order
3. Creates a layer for each instruction
4. Caches layers for faster rebuilds
5. Tags the final image with a name

**Build time** depends on:
- Base image size (downloading if not cached)
- Number of packages to install
- Network speed (downloading dependencies)
- Layer caching (faster if nothing changed)

First build: 2-10 minutes. Cached rebuilds: seconds.

### Understanding `docker build` Command

```bash
docker build -t pytorch-basic -f Dockerfile.basic .
```

**Breaking it down**:

**`docker build`**: The build command

**`-t pytorch-basic`**: Tag (name) the image
- Format: `name:tag` (e.g., `pytorch-basic:v1.0`)
- No tag specified = `latest` by default
- Use descriptive names: `my-model-trainer`, `api-server`

**`-f Dockerfile.basic`**: Specify which Dockerfile
- Default: looks for file named `Dockerfile`
- Use -f when you have multiple Dockerfiles
- Examples: Dockerfile.dev, Dockerfile.prod, Dockerfile.gpu

**`.`**: Build context (current directory)
- Docker sends all files in this directory to build daemon
- COPY commands copy from this context
- Use .dockerignore to exclude large/unnecessary files
- Can specify different path: `docker build -t myapp /path/to/context`

### Build Output Explained

You'll see:
```
Sending build context to Docker daemon  2.048kB
Step 1/5 : FROM python:3.10-slim
 ---> abc123def456
Step 2/5 : WORKDIR /app
 ---> Running in xyz789...
 ---> def456ghi789
...
Successfully built abc123def456
Successfully tagged pytorch-basic:latest
```

**What each part means**:
- **Sending context**: Uploading files to Docker
- **Step X/Y**: Which Dockerfile instruction is executing
- **Running in...**: Temporary container created for this step
- **Successfully built**: Image ID (hash)
- **Successfully tagged**: Your friendly name

### Understanding `docker run` Command

```bash
docker run --rm pytorch-basic
```

**Breaking it down**:

**`docker run`**: Create and start a container

**`--rm`**: Remove container after it exits
- Without this, stopped containers accumulate
- Good for one-off tasks and testing
- Use `docker ps -a` to see all containers
- Clean up: `docker container prune`

**`pytorch-basic`**: Image to run
- Uses local image if available
- Downloads from Docker Hub if not found
- Can specify version: `pytorch-basic:v1.0`

### Common `docker run` Options

**Interactive mode**:
```bash
docker run -it --rm pytorch-basic bash
```
- `-it`: Interactive terminal
- `bash`: Override CMD, run bash shell instead
- Useful for debugging and exploration

**Port mapping**:
```bash
docker run -p 8000:8000 --rm pytorch-basic
```
- `-p 8000:8000`: Map host port 8000 to container port 8000
- Format: `-p HOST:CONTAINER`
- Needed for web servers, APIs, Jupyter

**Volume mounting**:
```bash
docker run -v $(pwd)/data:/app/data --rm pytorch-basic
```
- `-v HOST:CONTAINER`: Mount directory
- Changes in container persist on host
- Useful for data, models, configs

**Environment variables**:
```bash
docker run -e MODEL_NAME=gpt2 --rm pytorch-basic
```
- `-e KEY=VALUE`: Set environment variable
- Access in code: `os.getenv('MODEL_NAME')`

### Verification Commands

After building, verify your image:

**List images**:
```bash
docker images
docker images | grep pytorch-basic
```
Shows: name, tag, image ID, creation date, size

**Inspect image**:
```bash
docker inspect pytorch-basic
```
Shows: layers, environment variables, CMD, ENTRYPOINT, etc.

**Check image size**:
```bash
docker images pytorch-basic --format '{{.Size}}'
```
Important: Smaller images = faster deployment, lower costs

### Your Task

Run these commands in your **terminal** (not Jupyter):

1. Navigate to directory with Dockerfile and app.py
2. Build the image (will take 2-5 minutes first time)
3. Verify image was created
4. Run a container from the image
5. Observe the output from app.py

**Success**: You see Python/PyTorch versions and successful computation message.

In [None]:
# IMPORTANT: Run these commands in your TERMINAL, not in this notebook
# The ! prefix would execute them in Jupyter, but Docker commands work better in terminal

# Step 1: Build the Docker image
# docker build -t pytorch-basic -f Dockerfile.basic .

# Step 2: Run the container
# docker run --rm pytorch-basic

# Step 3: List your images
# docker images | grep pytorch-basic

print("""\nTo build and run:\n
Terminal Commands:
==================
cd /home/wsl2ubt2204/Starter_tutorial/Chapters/5
docker build -t pytorch-basic -f Dockerfile.basic .
docker run --rm pytorch-basic
""")

## Exercise 4: Create GPU-Enabled Dockerfile

**Purpose**: Configure Docker containers for GPU-accelerated deep learning.

### Why GPU Containers Are Different

GPUs require special setup:
- **CUDA runtime** libraries (not just drivers)
- **cuDNN** for optimized neural network operations
- **NVIDIA Container Toolkit** on host system
- **Matching versions** (CUDA, PyTorch, drivers)

**The challenge**: Getting all versions aligned correctly.

### Prerequisites for GPU Containers

**On the host system (your machine)**:

1. **NVIDIA GPU**: Obviously\!
2. **NVIDIA Driver**: Version 535+ for CUDA 12.x
   - Check: `nvidia-smi`
3. **nvidia-docker2**: Docker GPU support
   - Install: See 02_dockerization.md
4. **Docker configured**: Runtime set to nvidia

**Common issue**: CPU Dockerfile works but GPU version fails due to missing prerequisites.

### NVIDIA CUDA Base Images

NVIDIA provides official images with CUDA pre-installed:

**Image options**:
```dockerfile
nvidia/cuda:12.4.0-base-ubuntu22.04     # Minimal CUDA
nvidia/cuda:12.4.0-runtime-ubuntu22.04  # + libraries
nvidia/cuda:12.4.0-devel-ubuntu22.04    # + dev tools
```

**Which to use**:
- **base**: Just CUDA, smallest
- **runtime**: For inference, includes cuDNN ‚Üê **Use this**
- **devel**: For compiling CUDA code, largest

**Size comparison**:
- base: ~200MB
- runtime: ~1.5GB
- devel: ~3GB

### Installing Python on CUDA Images

CUDA images are based on Ubuntu, not Python:

```dockerfile
# This is Ubuntu, not Python\!
FROM nvidia/cuda:12.4.0-runtime-ubuntu22.04

# Must install Python ourselves
RUN apt-get update && apt-get install -y \
    python3.10 \
    python3-pip
```

**Why not FROM python:3.10 with CUDA**?
- Python images don't include CUDA
- Installing CUDA manually is complex and error-prone
- NVIDIA images are tested and optimized

### PyTorch CUDA Version Matching

**Critical**: PyTorch CUDA version must match container CUDA version\!

**Our setup**:
- Container: CUDA 12.4
- PyTorch: Must use cu124 build

```dockerfile
RUN pip install torch>=2.6.0 \
    --index-url https://download.pytorch.org/whl/cu124
```

**Index URLs**:
- cu124: CUDA 12.4
- cu121: CUDA 12.1
- cu118: CUDA 11.8
- cpu: CPU-only version

**Wrong version = Runtime errors or no GPU detected\!**

### Setting Python as Default

Ubuntu 22.04 has `python3` but not `python`:

```dockerfile
RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.10 1
```

**Why**:
- Makes `python` command work (not just `python3`)
- Avoids scripts failing with "python: command not found"
- Standard practice in containerized environments

### Running GPU Containers

**Must use `--gpus` flag**:

```bash
docker run --gpus all --rm pytorch-gpu
```

**GPU access options**:
```bash
--gpus all              # All GPUs
--gpus '"device=0"'     # GPU 0 only
--gpus '"device=0,1"'   # GPU 0 and 1
--gpus 2                # Any 2 GPUs
```

**Without --gpus**: Container runs but `torch.cuda.is_available()` returns False\!

### Troubleshooting GPU Access

**Problem**: `torch.cuda.is_available()` returns False

**Checklist**:
1. ‚úÖ Host GPU works: `nvidia-smi`
2. ‚úÖ nvidia-docker installed: `docker run --gpus all nvidia/cuda:12.4.0-base nvidia-smi`
3. ‚úÖ Used `--gpus all` flag when running
4. ‚úÖ CUDA versions match (container CUDA = PyTorch CUDA)
5. ‚úÖ PyTorch installed from correct index-url

**Test GPU access in container**:
```bash
docker run --gpus all --rm pytorch-gpu nvidia-smi
```
Should show GPU info, not "command not found"

### GPU Container Best Practices

**1. Pin CUDA version**:
- Use specific tag: `12.4.0-runtime` not `latest`
- Ensures reproducibility

**2. Match everything**:
- Host CUDA/driver version
- Container CUDA version
- PyTorch CUDA build version

**3. Set shared memory size**:
```bash
docker run --gpus all --shm-size=8g pytorch-gpu
```
- PyTorch DataLoader needs shared memory
- Default 64MB often too small
- Symptoms: DataLoader hangs or crashes

**4. Monitor GPU in container**:
```bash
docker exec -it container_name nvidia-smi
```

### Your Task

Create a GPU-enabled Dockerfile that:
1. Uses NVIDIA CUDA 12.4 runtime base
2. Installs Python 3.10 and pip
3. Makes python the default command
4. Installs PyTorch with matching CUDA version
5. Copies and runs app.py

**Success criteria**: `torch.cuda.is_available()` returns True and GPU computation works.

In [None]:
# TODO: Complete this GPU-enabled Dockerfile
%%writefile Dockerfile.gpu
# TODO: Create a GPU-enabled Dockerfile
# 1. Use nvidia/cuda:12.4.0-runtime-ubuntu22.04 as base
# 2. Install Python 3.10 and pip
# 3. Install PyTorch with CUDA 12.4 support
# 4. Copy and run app.py

### Solution:

In [None]:
%%writefile Dockerfile.gpu
# Base: NVIDIA CUDA 12.4 runtime on Ubuntu 22.04
# This includes CUDA libraries needed for GPU operations
FROM nvidia/cuda:12.4.0-runtime-ubuntu22.04

# Install Python 3.10 from Ubuntu repositories
RUN apt-get update && apt-get install -y \
    python3.10 \
    python3-pip \
    && rm -rf /var/lib/apt/lists/*  # Clean up to reduce image size

# Make Python 3.10 the default 'python' command
RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.10 1
RUN update-alternatives --install /usr/bin/pip pip /usr/bin/pip3 1

WORKDIR /app

# Install PyTorch built for CUDA 12.4
# Using specific wheel index ensures CUDA compatibility
RUN pip install --no-cache-dir \
    torch>=2.6.0 \
    --index-url https://download.pytorch.org/whl/cu124

COPY app.py .

CMD ["python", "app.py"]

In [None]:
# Build and run GPU container
# The --gpus all flag exposes all GPUs to the container

print("""\nTo build and run with GPU:\n
Terminal Commands:
==================
docker build -t pytorch-gpu -f Dockerfile.gpu .
docker run --gpus all --rm pytorch-gpu

# To use specific GPU:
docker run --gpus '"device=0"' --rm pytorch-gpu

# To check GPU access without running app:
docker run --gpus all --rm pytorch-gpu nvidia-smi
""")

## Exercise 5: Create Docker Compose Configuration

**Purpose**: Orchestrate multiple services (training, development, monitoring) with a single command.

### The Multi-Container Challenge

Real ML systems need multiple services:
- Training container (runs model training)
- Jupyter container (for experimentation)
- Database (stores metrics, checkpoints)
- Monitoring (tracks system health)
- API server (for inference)

**Without Docker Compose**:
```bash
docker run -d --name db postgres
docker run -d --name jupyter ...
docker run -d --name trainer ...
# Manual networking, volumes, env vars
```
Managing this manually is error-prone and tedious.

**With Docker Compose**:
```bash
docker-compose up  # Starts everything
```
One command, all services configured and networked automatically.

### Docker Compose Basics

**docker-compose.yml**: YAML file defining your application

**Structure**:
```yaml
version: '3.8'  # Compose file version

services:       # Define containers
  service1:
    # configuration
  service2:
    # configuration

volumes:        # Persistent storage
  data:

networks:       # Custom networks
  mynetwork:
```

### Service Configuration Options

**Build context**:
```yaml
build:
  context: .              # Where to find Dockerfile
  dockerfile: Dockerfile.gpu  # Which Dockerfile to use
```

**Container name**:
```yaml
container_name: pytorch_trainer  # Friendly name
```
Without this, Compose generates names like `folder_service_1`

**GPU access**:
```yaml
runtime: nvidia           # Enable NVIDIA runtime
environment:
  - NVIDIA_VISIBLE_DEVICES=all  # Expose all GPUs
```
Equivalent to `--gpus all` in docker run

**Volumes**:
```yaml
volumes:
  - ./data:/app/data      # Host:Container
  - ./models:/app/models  # Shared between services
```
Multiple services can share volumes for data/model exchange

**Shared memory**:
```yaml
shm_size: '8gb'  # For PyTorch DataLoader workers
```
Prevents "Bus error" with multi-worker DataLoader

**Command override**:
```yaml
command: python train.py  # Override Dockerfile CMD
```

### Why This Configuration?

**Trainer Service**:
- Runs training scripts
- Has GPU access
- Shares volumes with Jupyter
- Can save models to shared volume

**Jupyter Service**:
- Interactive development
- Also has GPU access
- Access trained models via shared volume
- Exposed on port 8888 for browser access
- Token disabled for convenience (set in production\!)

### Multi-Line Commands

YAML multi-line with `>`:
```yaml
command: >
  bash -c "pip install jupyter &&
  jupyter lab --ip=0.0.0.0 --port=8888"
```

**Why**:
- Install additional packages at runtime
- Chain multiple commands
- More flexible than baking everything into image

### Service Dependencies

```yaml
depends_on:
  - database
  - redis
```

**What it does**:
- Starts dependencies first
- **Note**: Doesn't wait for service to be ready (just started)

**For ML pipelines**:
```yaml
trainer:
  depends_on:
    - mlflow  # Metrics tracking
```

### Docker Compose Commands

**Start all services (detached)**:
```bash
docker-compose up -d
```
- `-d`: Runs in background
- Without `-d`: Shows logs (Ctrl+C stops all)

**View logs**:
```bash
docker-compose logs -f trainer    # Follow trainer logs
docker-compose logs --tail=50 jupyter  # Last 50 lines
```

**Stop services**:
```bash
docker-compose stop   # Stop but don't remove
docker-compose down   # Stop and remove containers
```

**Rebuild after changes**:
```bash
docker-compose up -d --build  # Rebuild images
```

**Scale services**:
```bash
docker-compose up -d --scale trainer=3  # 3 training containers
```
Useful for distributed training

**Execute commands in running service**:
```bash
docker-compose exec trainer python check_gpu.py
docker-compose exec jupyter bash
```

### Accessing Jupyter Lab

After `docker-compose up -d`:

1. Open browser
2. Go to `http://localhost:8888`
3. Start experimenting\!

**Files in `/app/notebooks`** inside container = `./notebooks` on host

### Production Considerations

**Security** (for production):
```yaml
command: >
  jupyter lab
  --NotebookApp.token='your-secure-token'
  --NotebookApp.password='hashed-password'
```

**Resource limits**:
```yaml
deploy:
  resources:
    limits:
      cpus: '4.0'
      memory: 16G
```

**Restart policy**:
```yaml
restart: unless-stopped  # Auto-restart on failure
```

### Your Task

Create a docker-compose.yml with:
1. **Trainer service**: GPU-enabled, runs training
2. **Jupyter service**: GPU-enabled, port 8888, interactive
3. **Shared volumes**: For data, models, notebooks
4. **Proper networking**: Services can communicate

**Success**: Both services start, Jupyter accessible in browser, shared volumes work.

In [None]:
%%writefile docker-compose.yml
# TODO: Create a docker-compose.yml with:
# 1. Training service with GPU support
# 2. Jupyter Lab service on port 8888
# 3. Shared volumes for data and models

### Solution:

In [None]:
%%writefile docker-compose.yml
version: '3.8'

services:
  # Model training service
  trainer:
    build:
      context: .              # Build from current directory
      dockerfile: Dockerfile.gpu
    container_name: pytorch_trainer
    runtime: nvidia           # Enable NVIDIA GPU support
    environment:
      - NVIDIA_VISIBLE_DEVICES=all  # Expose all GPUs
    volumes:
      - ./data:/app/data      # Mount data directory
      - ./models:/app/models  # Mount models directory
    shm_size: '8gb'          # Shared memory for DataLoader workers
    command: python train.py # Override default CMD

  # Jupyter Lab service for interactive development
  jupyter:
    build:
      context: .
      dockerfile: Dockerfile.gpu
    container_name: jupyter_lab
    runtime: nvidia
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
    ports:
      - "8888:8888"          # Expose Jupyter on host port 8888
    volumes:
      - ./notebooks:/app/notebooks
      - ./data:/app/data
      - ./models:/app/models
    command: >
      bash -c "pip install jupyter &&
      jupyter lab --ip=0.0.0.0 --port=8888 --no-browser --allow-root --NotebookApp.token=''"

In [None]:
# Docker Compose commands
print("""\nDocker Compose Commands:\n
Start all services:
  docker-compose up -d

View logs:
  docker-compose logs -f trainer
  docker-compose logs -f jupyter

Stop all services:
  docker-compose down

Rebuild and start:
  docker-compose up -d --build

Access Jupyter Lab:
  http://localhost:8888
""")

## Exercise 6: Multi-Stage Build for Image Optimization

**Purpose**: Dramatically reduce Docker image sizes by separating build and runtime environments.

### The Image Size Problem

**Typical ML image build**:
```dockerfile
FROM python:3.10
RUN pip install torch transformers numpy pandas scikit-learn
COPY . .
```

**Result**: 5-10GB image containing:
- Build tools (gcc, make, pip)
- Package manager caches
- Source files from installations
- Intermediate build artifacts
- Everything, even if only needed during installation

**Problems**:
- Slow to download (minutes on slow connections)
- Expensive to store (cloud storage costs)
- More attack surface (extra software = more vulnerabilities)
- Slower to start (container needs to extract GBs)

### What Are Multi-Stage Builds?

**Concept**: Use multiple FROM statements in one Dockerfile

**Stage 1 (builder)**: Full environment with build tools
- Install packages
- Compile code
- Download dependencies
- All the heavy lifting

**Stage 2 (runtime)**: Minimal environment
- Copy only installed packages from Stage 1
- No build tools
- No caches
- Just what's needed to run

**Magic**: Stage 1 artifacts are discarded, only Stage 2 becomes the final image\!

### Size Comparison

**Single-stage**:
```dockerfile
FROM python:3.10
RUN pip install torch transformers
```
Size: ~5-7GB

**Multi-stage**:
```dockerfile
FROM python:3.10 as builder
RUN pip install --user torch transformers

FROM python:3.10-slim
COPY --from=builder /root/.local /root/.local
```
Size: ~2-3GB (40-60% smaller\!)

### How Multi-Stage Works

**Stage 1: Builder**
```dockerfile
FROM python:3.10 as builder  # Named 'builder'
WORKDIR /app
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt
```

**Key points**:
- `as builder`: Names this stage
- `--user`: Installs to /root/.local instead of system-wide
- `--no-cache-dir`: Don't save pip cache (saves ~100-500MB)
- Full `python:3.10` image (has compilers, headers)

**Stage 2: Runtime**
```dockerfile
FROM python:3.10-slim  # Fresh, minimal base
WORKDIR /app
COPY --from=builder /root/.local /root/.local  # Magic line\!
ENV PATH=/root/.local/bin:$PATH
COPY app.py .
```

**Key points**:
- New FROM = new base, previous stage discarded
- `--from=builder`: Copy from named stage
- Only copies installed packages, not build tools
- `ENV PATH`: Makes copied executables findable

### The --user Flag Explained

**Without --user**:
```bash
pip install torch  # Installs to /usr/local/lib/python3.10/...
```
Scattered across system directories, hard to copy cleanly

**With --user**:
```bash
pip install --user torch  # Installs to /root/.local/
```
Everything in one directory:
- /root/.local/lib/python3.10/site-packages/
- /root/.local/bin/

Easy to copy in one COPY command\!

### Why PATH Update Is Needed

After copying /root/.local:
```dockerfile
ENV PATH=/root/.local/bin:$PATH
```

**Why**:
- Executables like `torch-config` are in /root/.local/bin
- Without PATH update, they won't be found
- System will error with "command not found"

### Advanced Multi-Stage Patterns

**Multiple builders**:
```dockerfile
FROM node:18 as frontend-builder
RUN npm install && npm run build

FROM python:3.10 as backend-builder
RUN pip install --user -r requirements.txt

FROM python:3.10-slim
COPY --from=frontend-builder /app/dist /app/static
COPY --from=backend-builder /root/.local /root/.local
```
Combine artifacts from multiple build stages\!

**Compile from source**:
```dockerfile
FROM python:3.10 as builder
RUN apt-get update && apt-get install -y build-essential
RUN pip wheel --no-cache-dir --wheel-dir=/wheels torch

FROM python:3.10-slim
COPY --from=builder /wheels /wheels
RUN pip install --no-cache /wheels/*
```

### Best Practices

**1. Use slim/alpine for final stage**:
```dockerfile
FROM python:3.10         # Builder: 900MB
FROM python:3.10-slim    # Runtime: 120MB ‚úÖ
FROM python:3.10-alpine  # Runtime: 50MB (but compatibility issues)
```

**2. Order matters**:
```dockerfile
# Copy dependencies first (changes less often)
COPY --from=builder /root/.local /root/.local
# Copy code last (changes frequently)
COPY app.py .
```
Better layer caching = faster rebuilds

**3. Clean up in builder**:
```dockerfile
RUN pip install --user --no-cache-dir -r requirements.txt && \
    rm -rf /root/.cache
```

**4. Security**:
- Fewer packages in final image = smaller attack surface
- No compilers/build tools in production
- Scan images: `docker scan myimage`

### Real-World Impact

**Company example**:
- Before: 8GB image, 5 min deploy time
- After multi-stage: 2GB image, 1 min deploy time
- Savings: $1000s/month in bandwidth and storage

**Research example**:
- Sharing 10GB images on slow university networks: hours
- 2GB images: minutes
- Reproducibility improved (people actually download and test)

### Verification

**Compare sizes**:
```bash
docker images | grep pytorch
pytorch-single    latest   6.2GB
pytorch-multi     latest   2.1GB  # 65% smaller\!
```

**Verify it works**:
```bash
docker run --rm pytorch-multi python -c "import torch; print(torch.__version__)"
```

### Your Task

Create a multi-stage Dockerfile:
1. **Stage 1 (builder)**: Use full Python 3.10, install packages with --user
2. **Stage 2 (runtime)**: Use Python 3.10-slim, copy only installed packages
3. Update PATH so executables are found
4. Copy application code

**Success**: Image is significantly smaller but functionally identical.

In [None]:
%%writefile Dockerfile.multistage
# TODO: Create a multi-stage Dockerfile
# Stage 1: Build dependencies
# Stage 2: Minimal runtime with only installed packages

### Solution:

In [None]:
%%writefile Dockerfile.multistage
# ============================================
# Stage 1: Builder - Install dependencies
# ============================================
FROM python:3.10 as builder

WORKDIR /app

# Install dependencies in user directory
# Using --user puts packages in /root/.local
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

# ============================================
# Stage 2: Runtime - Minimal final image
# ============================================
FROM python:3.10-slim

WORKDIR /app

# Copy only installed packages from builder stage
# This excludes pip, setuptools, and other build tools
COPY --from=builder /root/.local /root/.local

# Update PATH to find installed packages
ENV PATH=/root/.local/bin:$PATH

# Copy application code
COPY app.py .

CMD ["python", "app.py"]

In [None]:
# Compare image sizes
print("""\nCompare single-stage vs multi-stage image sizes:\n
Build both:
  docker build -t pytorch-single -f Dockerfile.basic .
  docker build -t pytorch-multi -f Dockerfile.multistage .

Check sizes:
  docker images | grep pytorch

You should see pytorch-multi is smaller!
""")

## Exercise 7: Dependency Management with requirements.txt

**Purpose**: Pin dependencies for reproducible builds and faster Docker caching.

### Why requirements.txt?

**Without it**:
```dockerfile
RUN pip install torch transformers numpy pandas
```

**Problems**:
- Installs latest versions (breaks reproducibility)
- Docker layer rebuilds on any Dockerfile change
- No documentation of dependencies
- Hard to update selectively

**With requirements.txt**:
```dockerfile
COPY requirements.txt .
RUN pip install -r requirements.txt
```

**Benefits**:
- Version-controlled with code
- Docker caches layer until requirements.txt changes
- Clear documentation of dependencies
- Easy to diff and review changes

### Version Pinning Strategies

**Exact version** (most strict):
```
torch==2.6.0
```
- Guarantees exact same version
- Safest for production
- Doesn't get bug fixes automatically

**Compatible version** (recommended):
```
torch>=2.6.0
```
- Allows patch updates (2.6.1, 2.6.2)
- Gets bug fixes
- Won't break compatibility (semantic versioning)

**Range** (flexible):
```
torch>=2.6.0,<3.0.0
```
- Allows minor updates
- Avoids major version breaking changes

**No pin** (dangerous):
```
torch  # Gets whatever is latest
```
- Never use in production\!
- "Works on my machine" guaranteed

### Organizing requirements.txt

**Good structure** (commented and grouped):
```
# Deep Learning Framework
torch>=2.6.0
torchvision>=0.19.0

# Transformers and NLP
transformers>=4.40.0
tokenizers>=0.15.0

# Data Science
numpy>=1.24.0
pandas>=2.0.0
```

**Benefits**:
- Easy to understand what each group is for
- Can comment out entire sections for testing
- New team members understand dependencies

### requirements.txt in Dockerfile

**Optimal pattern** for Docker caching:
```dockerfile
# Copy requirements first (changes infrequently)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy code last (changes frequently)
COPY . .
```

**Why this order**:
1. If only code changes: Cached dependencies, fast rebuild
2. If requirements change: Reinstall dependencies, slower
3. Maximizes cache hit rate

**Bad order**:
```dockerfile
COPY . .  # Copies everything including requirements.txt
RUN pip install -r requirements.txt  # Rebuilds on any file change
```
Every code edit = reinstall ALL dependencies\!

### Multiple requirements Files

**Common pattern for different environments**:

**requirements.txt** (production):
```
torch>=2.6.0
transformers>=4.40.0
```

**requirements-dev.txt** (development):
```
-r requirements.txt  # Include production requirements
pytest>=8.0.0
black>=24.0.0
jupyter>=1.0.0
```

**requirements-gpu.txt** (GPU builds):
```
torch>=2.6.0 --index-url https://download.pytorch.org/whl/cu124
```

### Generating requirements.txt

**From current environment**:
```bash
pip freeze > requirements.txt
```
Captures exact versions of everything installed

**Problem**: Includes transitive dependencies
```
torch==2.6.0
nvidia-cublas-cu12==12.4.5.8  # Transitive dependency
nvidia-cuda-cupti-cu12==12.4.127  # Transitive
# ... 50+ more packages
```

**Better**: Manual requirements with top-level only:
```
torch>=2.6.0
transformers>=4.40.0
```
Let pip resolve transitive dependencies

### pip-compile for Lock Files

**Advanced**: Use pip-tools

**requirements.in** (what you want):
```
torch>=2.6.0
transformers
```

**Generate locked requirements.txt**:
```bash
pip-compile requirements.in
```

**Output** (requirements.txt with all transitive deps pinned):
```
torch==2.6.0
transformers==4.40.0
tokenizers==0.15.0
# ... all dependencies with exact versions
```

**Benefits**: Reproducible + documented + updateable

### Common Pitfalls

**1. Platform-specific packages**:
```
pywin32==306  # Only works on Windows\!
```
Solution: Use environment markers
```
pywin32==306; sys_platform == 'win32'
```

**2. Missing index URLs**:
```
torch>=2.6.0  # Might get CPU version
```
Solution: Specify in Dockerfile
```dockerfile
RUN pip install -r requirements.txt \
    --index-url https://download.pytorch.org/whl/cu124
```

**3. Conflicting versions**:
```
packageA>=2.0  # Requires numpy<2.0
packageB>=3.0  # Requires numpy>=2.0
```
pip will error. Solution: Check compatibility, adjust versions

### Your Task

This cell creates a sample requirements.txt with:
- Deep learning frameworks (PyTorch, torchvision)
- NLP libraries (transformers)
- Data science tools (numpy, pandas, scikit-learn)
- Version pins using >= for compatibility
- Comments organizing by category

**Use this pattern in your projects** for reproducible builds\!

In [None]:
%%writefile requirements.docker.txt
# Deep Learning Framework
torch>=2.6.0
torchvision>=0.19.0

# Transformers for NLP
transformers>=4.40.0

# Data Science
numpy>=1.24.0
pandas>=2.0.0
scikit-learn>=1.3.0

## Exercise 8: Optimizing Builds with .dockerignore

**Purpose**: Exclude unnecessary files from Docker build context to speed up builds and reduce image sizes.

### The Build Context Problem

When you run `docker build .`, Docker sends **all files** in current directory to the Docker daemon:

```bash
docker build .
Sending build context to Docker daemon  15.2GB  # Oops\!
```

**What gets sent**:
- All source code ‚úÖ (needed)
- Git history (.git/) ‚ùå (1-2GB, not needed)
- Virtual environments (venv/) ‚ùå (500MB-2GB, not needed)
- Jupyter checkpoints ‚ùå (100MB+, not needed)
- Dataset files (data/) ‚ùå (GBs, mounted as volume instead)
- Model checkpoints ‚ùå (GBs, should be in separate storage)

**Impact**:
- Build takes minutes just uploading context
- Network bandwidth wasted
- Larger images if files are COPYed
- Slower CI/CD pipelines

### What is .dockerignore?

Like .gitignore but for Docker builds:

**Create `.dockerignore` file**:
```
__pycache__
*.pyc
.git
venv/
```

**These files won't be sent to Docker daemon**:
- Faster builds
- Smaller build context
- Can't accidentally COPY sensitive files

### Essential Patterns to Ignore

**Python cache files** (100s of MB):
```
__pycache__
*.pyc
*.pyo
*.pyd
.Python
```
Regenerated automatically, no need to copy

**Virtual environments** (GBs):
```
.env
.venv
venv/
ENV/
env/
```
Container has its own Python environment

**Git files** (GBs):
```
.git
.gitignore
.gitattributes
```
Git history not needed in container

**IDE files** (MBs):
```
.vscode/
.idea/
*.swp
*.swo
.DS_Store
```
Editor-specific, not needed for running code

**Jupyter notebooks** (optional):
```
*.ipynb
.ipynb_checkpoints
```
Usually for development, not deployment

**Testing files**:
```
.pytest_cache
.coverage
htmlcov/
.tox/
```
Tests run in CI, not in container

**Build artifacts**:
```
dist/
build/
*.egg-info/
.eggs/
```
Rebuilt inside container

**Documentation**:
```
*.md
docs/
README.md
```
Not needed for running code (unless serving docs)

### Whitelisting Pattern

**Ignore everything except specific files**:
```
# Ignore everything
*

# Except these
\!app.py
\!requirements.txt
\!src/
```

Useful for monorepos where you only need subset of files

### Special Cases for ML Projects

**Large datasets** (DON'T include in image):
```
data/
datasets/
*.csv
*.parquet
```
**Why**: GBs of data ‚Üí huge images
**Instead**: Mount as volume or download at runtime

**Model checkpoints** (usually don't include):
```
*.pt
*.pth
*.ckpt
*.safetensors
models/
checkpoints/
```
**Why**: Models are GBs, change frequently
**Instead**: Mount volume or download from model registry

**Exception**: Small models (<100MB) for inference containers

**Logs** (never include):
```
*.log
logs/
```
**Why**: Container logs should go to stdout/stderr

### Verification

**Check what's being sent**:
```bash
docker build --no-cache . 2>&1 | grep 'Sending build context'
```

**Before .dockerignore**:
```
Sending build context to Docker daemon  8.5GB
```

**After .dockerignore**:
```
Sending build context to Docker daemon  2.1MB  # 4000x smaller\!
```

**List files in build context**:
```bash
docker run --rm -v $(pwd):/check alpine sh -c 'cd /check && du -sh * | sort -h'
```

### Common Mistakes

**1. .dockerignore in wrong location**:
- Must be in build context root
- If `docker build path/to/project`, put it in `path/to/project/.dockerignore`

**2. Ignoring required files**:
```
*.py  # Oops\! Ignores all Python files
```
Build succeeds but image is broken

**3. Not testing .dockerignore**:
- Always rebuild after adding .dockerignore
- Verify image still works

**4. Overusing wildcards**:
```
*test*  # Ignores testing/ but also pytest.py\!
```
Be specific

### Production Best Practices

**1. Start with comprehensive .dockerignore**:
- Copy from this exercise
- Adapt to your project
- Keep it in version control

**2. Regular audits**:
```bash
docker history myimage  # See what layers contain
dive myimage            # Explore image layers (tool)
```

**3. CI/CD checks**:
```bash
# Fail if build context > 100MB
SIZE=$(docker build . 2>&1 | grep 'Sending' | awk '{print $5}')
if [ $SIZE -gt 100 ]; then
  echo 'Build context too large\!'
  exit 1
fi
```

**4. Security**:
```
.env         # Don't copy secrets
*.key
*.pem
credentials/
```

### Real-World Impact

**Startup example**:
- Before: 10GB context, 15 min builds
- After .dockerignore: 5MB context, 2 min builds
- CI/CD pipeline: 85% faster

**Research lab**:
- Accidentally included 50GB dataset
- Docker build OOM on CI server
- .dockerignore fixed it immediately

### Your Task

This cell creates a comprehensive .dockerignore covering:
- Python cache and compiled files
- Virtual environments
- Git files
- IDE configurations
- Jupyter notebooks and checkpoints
- Testing artifacts
- Build outputs
- OS-specific files
- Documentation

**Copy this to your projects** and customize as needed\!

In [None]:
%%writefile .dockerignore
# Python cache and compiled files
__pycache__
*.pyc
*.pyo
*.pyd
.Python

# Virtual environments
.env
.venv
venv/
ENV/

# Git
.git
.gitignore
.gitattributes

# Jupyter
.ipynb_checkpoints
*.ipynb

# Testing
.pytest_cache
.coverage
htmlcov/

# Build artifacts
dist/
build/
*.egg-info/

# IDE
.vscode/
.idea/

# OS
.DS_Store
Thumbs.db

# Documentation
*.md
docs/

## Summary

### What You Practiced:

1. ‚úÖ **Basic Dockerfiles**: Created CPU-based PyTorch containers
2. ‚úÖ **GPU Support**: Configured CUDA-enabled containers for deep learning
3. ‚úÖ **Docker Compose**: Orchestrated multi-service applications
4. ‚úÖ **Multi-Stage Builds**: Optimized image sizes
5. ‚úÖ **Best Practices**: Used .dockerignore and requirements.txt

### Key Takeaways:

- **Dockerfiles** define how to build images
- **GPU support** requires NVIDIA runtime and CUDA base images
- **Docker Compose** simplifies multi-container orchestration
- **Multi-stage builds** reduce production image sizes
- **.dockerignore** improves build performance and security

### Next Steps:

1. **Review**: Study [02_dockerization.md](./02_dockerization.md) for detailed concepts
2. **Practice**: Build and run these containers in your environment
3. **Experiment**: Try different base images and configurations
4. **Deploy**: Containerize your own AI projects

### Useful Commands Reference:

```bash
# Build image
docker build -t <name> -f <dockerfile> .

# Run container
docker run --rm <name>
docker run --gpus all --rm <name>  # With GPU

# Docker Compose
docker-compose up -d
docker-compose logs -f <service>
docker-compose down

# Management
docker images
docker ps
docker system prune -a  # Clean up
```