# Docker Testing Notebook - API Testing in Docker Containers

This notebook tests the Resume NER API running in Docker containers, including image building, container lifecycle management, API functionality, Docker Compose, volume mounts, and environment variables.

**Note:** For comprehensive error handling and edge case testing, see `tests/integration/api/test_api_local_server.py`.

## Prerequisites

Before running this notebook, ensure:

1. **Docker and Docker Compose installed and running**
   ```bash
   docker --version
   docker-compose --version
   ```

2. **Access to trained models** (ONNX model and checkpoint directory)
   - Models should be in `outputs/conversion/` and `outputs/final_training/`
   - See [`docs/docker_build.md`](../docs/docker_build.md) for detailed setup instructions

3. **Docker Python library installed**
   ```bash
   pip install docker
   ```

## Quick Start

**Option 1: Build and run manually (Terminal)**
```bash
# Build Docker image
docker build -t resume-ner-api:latest .

# Find models
ONNX_MODEL=$(find outputs/conversion -name "model.onnx" -type f | head -1)
SPEC_HASH=$(echo "$ONNX_MODEL" | sed -n 's|.*\(spec-[a-f0-9]\{8\}_exec-[a-f0-9]\{8\}\).*|\1|p')
CHECKPOINT_DIR=$(find outputs/final_training -path "*${SPEC_HASH}*/checkpoint" -type d | head -1)

# Run container
docker run -d \
  --name resume-ner-api \
  -p 8000:8000 \
  -v $(pwd)/outputs:/app/outputs \
  resume-ner-api:latest \
  conda run -n resume-ner-training python -m src.deployment.api.cli.run_api \
    --onnx-model "$ONNX_MODEL" \
    --checkpoint "$CHECKPOINT_DIR" \
    --host 0.0.0.0 \
    --port 8000
```

**Option 2: Use this notebook**
Run the cells below to build, start, and test the Docker container programmatically.

## Troubleshooting

- **Container won't start**: Check logs with `docker logs resume-ner-api`
- **Models not found**: Verify volume mount `-v $(pwd)/outputs:/app/outputs`
- **Port already in use**: Change port mapping `-p 8001:8000` or stop existing container
- **Permission issues**: Check file permissions on mounted volumes

See [`docs/docker_build.md`](../docs/docker_build.md) for more troubleshooting tips.



## 1. Setup and Configuration


In [None]:
# Install Docker Python library if not already installed
%pip install docker


In [None]:
import sys
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple
import time
import requests
import docker
from docker.errors import DockerException, ImageNotFound

# Setup Python paths (required for infrastructure and src imports)
# Must be done before importing from src
current_dir = Path.cwd()
if current_dir.name == "notebooks":
    project_root = current_dir.parent
else:
    project_root = current_dir

src_path = project_root / "src"
if str(src_path) not in sys.path:
    sys.path.insert(0, str(src_path))
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

# Import test fixtures
from tests.test_data.fixtures import (
    get_text_fixture,
    get_file_fixture,
    get_batch_text_fixture,
    get_batch_file_fixture,
    TEXT_FIXTURES,
    FILE_FIXTURES
)

# Import API utilities
from src.deployment.api.tools.model_finder import (
    find_model_pair,
)
from src.deployment.api.tools.notebook_helpers import (
    display_entities,
    make_request,
)
from src.deployment.api.tools.notebook_config import (
    NotebookConfig,
    get_default_config,
)

# Initialize Docker client
try:
    docker_client = docker.from_env()
    print("✓ Docker client initialized")
except DockerException as e:
    print(f"✗ Failed to connect to Docker: {e}")
    print("Make sure Docker is running and accessible")
    docker_client = None


In [None]:
# Docker Configuration
DOCKER_IMAGE_NAME = "resume-ner-api"
DOCKER_IMAGE_TAG = "latest"
DOCKER_CONTAINER_NAME = "resume-ner-api"
DOCKER_HOST_PORT = 8000
DOCKER_CONTAINER_PORT = 8000

# API Configuration
config: NotebookConfig = get_default_config()
config["api_base_url"] = f"http://localhost:{DOCKER_HOST_PORT}"
API_BASE_URL = config["api_base_url"]
API_TIMEOUT = config["api_timeout"]

# Volume mounts (relative to project root)
OUTPUTS_DIR = project_root / "outputs"

print(f"Docker Image: {DOCKER_IMAGE_NAME}:{DOCKER_IMAGE_TAG}")
print(f"Container Name: {DOCKER_CONTAINER_NAME}")
print(f"Port Mapping: {DOCKER_HOST_PORT}:{DOCKER_CONTAINER_PORT}")
print(f"API Base URL: {API_BASE_URL}")


In [None]:
# Helper functions for Docker operations
def check_port_available(host_port: int, verbose: bool = True) -> Tuple[bool, Optional[str]]:
    """
    Check if a port is available by looking for containers using it.
    
    Args:
        host_port: Port to check
        verbose: If True, print status messages
    
    Returns:
        Tuple of (is_available, conflicting_container_name)
    """
    if docker_client is None:
        return False, None
    
    try:
        # Get all containers (running and stopped)
        all_containers = docker_client.containers.list(all=True)
        
        for container in all_containers:
            # Check port mappings
            ports = container.attrs.get('NetworkSettings', {}).get('Ports', {})
            for container_port, host_bindings in ports.items():
                if host_bindings:
                    for binding in host_bindings:
                        if binding.get('HostPort') == str(host_port):
                            if verbose:
                                print(f"⚠ Port {host_port} is already in use by container: {container.name}")
                            return False, container.name
        
        return True, None
    except Exception as e:
        if verbose:
            print(f"Error checking port availability: {e}")
        return False, None


def start_docker_container(
    image_name: str,
    image_tag: str = "latest",
    container_name: str = "resume-ner-api",
    host_port: int = 8000,
    container_port: int = 8000,
    volumes: Optional[Dict[str, Dict[str, str]]] = None,
    environment: Optional[Dict[str, str]] = None,
    command: Optional[List[str]] = None,
    verbose: bool = True,
    force_port: bool = False,
) -> Optional[docker.models.containers.Container]:
    """
    Start Docker container with specified configuration.
    
    Args:
        image_name: Name of the Docker image
        image_tag: Tag of the image (default: "latest")
        container_name: Name for the container
        host_port: Host port to map
        container_port: Container port to map
        volumes: Volume mounts dictionary
        environment: Environment variables dictionary
        command: Command to run in container
        verbose: If True, print status messages
        force_port: If True, stop conflicting containers on the port
    
    Returns:
        Container object if successful, None otherwise
    """
    if docker_client is None:
        print("✗ Docker client not available")
        return None
    
    image_tag_full = f"{image_name}:{image_tag}"
    
    # Check for port conflicts
    port_available, conflicting_container = check_port_available(host_port, verbose=verbose)
    if not port_available and conflicting_container:
        if force_port:
            if verbose:
                print(f"Stopping conflicting container: {conflicting_container}")
            try:
                conflict_container = docker_client.containers.get(conflicting_container)
                if conflict_container.status == "running":
                    conflict_container.stop()
                conflict_container.remove()
                if verbose:
                    print(f"✓ Removed conflicting container: {conflicting_container}")
            except Exception as e:
                if verbose:
                    print(f"✗ Failed to remove conflicting container: {e}")
                return None
        else:
            if verbose:
                print(f"✗ Port {host_port} is already in use by container: {conflicting_container}")
                print(f"  Stop the conflicting container first, or set force_port=True")
            return None
    
    # Stop and remove existing container if it exists
    try:
        existing_container = docker_client.containers.get(container_name)
        if existing_container.status == "running":
            if verbose:
                print(f"Stopping existing container: {container_name}")
            existing_container.stop()
        if verbose:
            print(f"Removing existing container: {container_name}")
        existing_container.remove()
    except docker.errors.NotFound:
        pass  # Container doesn't exist, which is fine
    
    if verbose:
        print(f"Starting container: {container_name}")
        print(f"  Image: {image_tag_full}")
        print(f"  Port: {host_port}:{container_port}")
    
    try:
        container = docker_client.containers.run(
            image_tag_full,
            name=container_name,
            ports={f"{container_port}/tcp": host_port},
            volumes=volumes or {},
            environment=environment or {},
            command=command,
            detach=True,
            remove=False,
        )
        
        if verbose:
            print(f"✓ Container started: {container_name}")
            print(f"  Container ID: {container.id[:12]}")
        
        return container
    except DockerException as e:
        if verbose:
            print(f"✗ Failed to start container: {e}")
        return None


def stop_docker_container(
    container_name: str,
    remove: bool = True,
    verbose: bool = True,
) -> bool:
    """
    Stop and optionally remove Docker container.
    
    Args:
        container_name: Name of the container
        remove: If True, remove container after stopping
        verbose: If True, print status messages
    
    Returns:
        True if successful, False otherwise
    """
    if docker_client is None:
        print("✗ Docker client not available")
        return False
    
    try:
        container = docker_client.containers.get(container_name)
        
        if container.status == "running":
            if verbose:
                print(f"Stopping container: {container_name}")
            container.stop()
        
        if remove:
            if verbose:
                print(f"Removing container: {container_name}")
            container.remove()
        
        if verbose:
            print(f"✓ Container stopped and removed: {container_name}")
        return True
    except docker.errors.NotFound:
        if verbose:
            print(f"Container not found: {container_name}")
        return False
    except DockerException as e:
        if verbose:
            print(f"✗ Failed to stop container: {e}")
        return False


def check_container_health(
    container_name: str,
    health_endpoint: str = "/health",
    base_url: Optional[str] = None,
    verbose: bool = True,
) -> bool:
    """
    Check container health via API endpoint.
    
    Args:
        container_name: Name of the container
        health_endpoint: Health check endpoint path
        base_url: Base URL for API (default: uses API_BASE_URL)
        verbose: If True, print status messages
    
    Returns:
        True if healthy, False otherwise
    """
    if base_url is None:
        base_url = API_BASE_URL
    
    url = f"{base_url}{health_endpoint}"
    
    try:
        response = requests.get(url, timeout=5)
        is_healthy = response.status_code == 200
        
        if verbose:
            if is_healthy:
                print(f"✓ Container is healthy: {container_name}")
            else:
                print(f"✗ Container health check failed: {container_name} (status: {response.status_code})")
        
        return is_healthy
    except requests.exceptions.RequestException as e:
        if verbose:
            print(f"✗ Health check failed: {e}")
        return False


def get_container_logs(
    container_name: str,
    tail: int = 100,
    verbose: bool = True,
) -> Optional[str]:
    """
    Get container logs.
    
    Args:
        container_name: Name of the container
        tail: Number of lines to retrieve
        verbose: If True, print logs
    
    Returns:
        Logs as string if successful, None otherwise
    """
    if docker_client is None:
        print("✗ Docker client not available")
        return None
    
    try:
        container = docker_client.containers.get(container_name)
        logs = container.logs(tail=tail).decode("utf-8")
        
        if verbose:
            print(f"Container logs ({container_name}):")
            print("=" * 60)
            print(logs)
            print("=" * 60)
        
        return logs
    except docker.errors.NotFound:
        if verbose:
            print(f"Container not found: {container_name}")
        return None
    except DockerException as e:
        if verbose:
            print(f"✗ Failed to get logs: {e}")
        return None


# Create a wrapper that uses config values (for backward compatibility with existing notebook cells)
from functools import partial
make_request = partial(make_request, base_url=API_BASE_URL, timeout=API_TIMEOUT)

# display_entities is now imported from src.deployment.api.tools.notebook_helpers
# No need to redefine it here


## 2. Docker Image Verification

Check if the Docker image exists and verify its details.


In [None]:
# Validate Dockerfile exists
dockerfile_path = project_root / "Dockerfile"

if dockerfile_path.exists():
    print(f"✓ Dockerfile found: {dockerfile_path}")
    # Read and display first few lines
    with open(dockerfile_path) as f:
        lines = f.readlines()[:10]
        print("\nFirst 10 lines of Dockerfile:")
        for i, line in enumerate(lines, 1):
            print(f"{i:2}: {line.rstrip()}")
else:
    print(f"✗ Dockerfile not found: {dockerfile_path}")


In [None]:
# Check if Docker image exists
image_tag = f"{DOCKER_IMAGE_NAME}:{DOCKER_IMAGE_TAG}"

if docker_client is None:
    print("✗ Docker client not available")
    docker_image = None
else:
    try:
        docker_image = docker_client.images.get(image_tag)
        print(f"✓ Docker image found: {image_tag}")
        print(f"\nImage details:")
        print(f"  Image ID: {docker_image.id[:12]}")
        print(f"  Tags: {docker_image.tags}")
        print(f"  Created: {docker_image.attrs['Created']}")
        print(f"  Architecture: {docker_image.attrs['Architecture']}")
        print(f"  OS: {docker_image.attrs['Os']}")
        
        # Calculate size
        size_bytes = docker_image.attrs['Size']
        size_mb = size_bytes / (1024**2)
        size_gb = size_bytes / (1024**3)
        print(f"  Size: {size_mb:.2f} MB ({size_gb:.2f} GB)")
    except ImageNotFound:
        print(f"✗ Docker image not found: {image_tag}")
        print(f"\nTo build the image, run:")
        print(f"  docker build -t {image_tag} .")
        print(f"\nOr see docs/docker_build.md for detailed instructions.")
        docker_image = None
    except Exception as e:
        print(f"✗ Error checking for image: {e}")
        docker_image = None


In [None]:
# Inspect image layers and history
if docker_image and docker_client:
    try:
        image_info = docker_client.images.get(f"{DOCKER_IMAGE_NAME}:{DOCKER_IMAGE_TAG}")
        
        print(f"\nImage history (last 5 layers):")
        history = image_info.history()[:5]
        for i, layer in enumerate(history, 1):
            created_by = layer.get('CreatedBy', 'N/A')
            # Truncate long commands
            if len(created_by) > 80:
                created_by = created_by[:77] + "..."
            print(f"  {i}. {created_by}")
    except Exception as e:
        print(f"Error inspecting image: {e}")
else:
    print("Image not available for inspection")


## 3. Container Lifecycle Management

Find models and start/stop containers.


In [None]:
# Find models (reuse pattern from api_testing.ipynb)
onnx_path, checkpoint_path = find_model_pair(OUTPUTS_DIR)

if onnx_path and checkpoint_path:
    print(f"✓ Found model pair:")
    print(f"  ONNX: {onnx_path}")
    print(f"  Checkpoint: {checkpoint_path}")
    
    # Convert to container paths
    onnx_container_path = f"/app/outputs/{onnx_path.relative_to(OUTPUTS_DIR)}"
    checkpoint_container_path = f"/app/outputs/{checkpoint_path.relative_to(OUTPUTS_DIR)}"
    print(f"\nContainer paths:")
    print(f"  ONNX: {onnx_container_path}")
    print(f"  Checkpoint: {checkpoint_container_path}")
else:
    print("✗ Could not find model pair")
    print("  Make sure models are in outputs/conversion/ and outputs/final_training/")


In [None]:
# Start container with model paths
if onnx_path and checkpoint_path:
    # Prepare volume mounts
    volumes = {
        str(OUTPUTS_DIR): {"bind": "/app/outputs", "mode": "ro"}  # Read-only for outputs
    }
    
    # Prepare command to run API server
    command = [
        "conda", "run", "-n", "resume-ner-training",
        "python", "-m", "src.deployment.api.cli.run_api",
        "--onnx-model", f"/app/outputs/{onnx_path.relative_to(OUTPUTS_DIR)}",
        "--checkpoint", f"/app/outputs/{checkpoint_path.relative_to(OUTPUTS_DIR)}",
        "--host", "0.0.0.0",
        "--port", str(DOCKER_CONTAINER_PORT),
    ]
    
    # Prepare environment variables
    environment = {
        "PYTHONPATH": "/app/src:/app",
        "OCR_EXTRACTOR": "easyocr",
        "PDF_EXTRACTOR": "pymupdf",
    }
    
    
    container = start_docker_container(
        image_name=DOCKER_IMAGE_NAME,
        image_tag=DOCKER_IMAGE_TAG,
        container_name=DOCKER_CONTAINER_NAME,
        host_port=DOCKER_HOST_PORT,
        container_port=DOCKER_CONTAINER_PORT,
        volumes=volumes,
        environment=environment,
        command=command,
        verbose=True,
        force_port=True,  # Automatically stop conflicting containers
    )
    
    if container:
        print("\nWaiting for container to start...")
        time.sleep(5)  # Give container time to start
        print("Container started. Waiting for API to be ready...")
        
        # Wait for API to be ready (up to 30 seconds)
        max_wait = 30
        wait_interval = 2
        for i in range(max_wait // wait_interval):
            if check_container_health(DOCKER_CONTAINER_NAME, verbose=False):
                print("✓ API is ready!")
                break
            time.sleep(wait_interval)
        else:
            print("⚠ API may not be ready yet. Check logs if needed.")
else:
    print("✗ Cannot start container: models not found")


In [None]:
# Check container status
if docker_client:
    try:
        container = docker_client.containers.get(DOCKER_CONTAINER_NAME)
        status = container.status
        print(f"Container status: {status}")
        
        if status == "running":
            print("✓ Container is running")
            
            # Check health
            check_container_health(DOCKER_CONTAINER_NAME)
            
            # Show resource usage
            stats = container.stats(stream=False)
            memory_usage = stats['memory_stats'].get('usage', 0) / (1024**2)  # MB
            print(f"Memory usage: {memory_usage:.2f} MB")
        else:
            print(f"⚠ Container status: {status}")
            get_container_logs(DOCKER_CONTAINER_NAME, tail=50)
    except docker.errors.NotFound:
        print(f"Container not found: {DOCKER_CONTAINER_NAME}")


## 4. API Testing Through Docker

Test all API endpoints through the Docker container (reusing patterns from api_testing.ipynb).


### 4.1 Single Text Prediction


In [None]:
# Test with text_1
text_1 = get_text_fixture("text_1")
result = make_request("POST", "/predict", json={"text": text_1})

if result.get("status_code") == 200 and result.get("data"):
    entities = result["data"].get("entities", [])
    print(f"✓ Request successful (latency: {result['latency_ms']:.1f}ms)")
    display_entities(entities, source_text=text_1)
else:
    print(f"✗ Request failed: {result.get('error', result.get('status_code'))}")


In [None]:
# Test with text_2 (contains email, phone, location)
text_2 = get_text_fixture("text_2")
result = make_request("POST", "/predict", json={"text": text_2})

if result.get("status_code") == 200 and result.get("data"):
    entities = result["data"].get("entities", [])
    print(f"✓ Request successful (latency: {result['latency_ms']:.1f}ms)")
    display_entities(entities, source_text=text_2)
else:
    print(f"✗ Request failed: {result.get('error', result.get('status_code'))}")


### 4.2 Single PDF File Prediction


In [None]:
# Test with PDF file
file_path = get_file_fixture("file_1", "pdf")
try:
    with open(file_path, "rb") as f:
        file_content = f.read()
    files = {"file": (file_path.name, file_content, "application/pdf")}
    result = make_request("POST", "/predict/file", files=files)
    
    if result.get("status_code") == 200 and result.get("data"):
        extracted_text = result["data"].get("extracted_text", "")
        entities = result["data"].get("entities", [])
        print(f"✓ Request successful (latency: {result['latency_ms']:.1f}ms)")
        display_entities(entities, source_text=extracted_text)
    else:
        print(f"✗ Request failed: {result.get('error', result.get('status_code'))}")
except Exception as e:
    print(f"Error loading file: {e}")


### 4.3 Single Image File Prediction (OCR)


In [None]:
# Test with PNG image file (requires OCR)
file_path = get_file_fixture("file_1", "png")
try:
    with open(file_path, "rb") as f:
        file_content = f.read()
    files = {"file": (file_path.name, file_content, "image/png")}
    result = make_request("POST", "/predict/file", files=files)
    
    if result.get("status_code") == 200 and result.get("data"):
        extracted_text = result["data"].get("extracted_text", "")
        entities = result["data"].get("entities", [])
        if extracted_text:
            print(f"✓ Request successful (latency: {result['latency_ms']:.1f}ms)")
            display_entities(entities, source_text=extracted_text)
    elif result.get("status_code") == 400:
        error_detail = result.get("data", {}).get("detail", "")
        if "EasyOCR" in error_detail or "pytesseract" in error_detail or "Pillow" in error_detail:
            print(f"⚠️  OCR dependencies not installed in container")
        else:
            print(f"✗ Request failed: {error_detail}")
    else:
        print(f"✗ Request failed: {result.get('error', result.get('status_code'))}")
except Exception as e:
    print(f"Error loading file: {e}")


### 4.4 Batch Text Prediction


In [None]:
# Test batch with multiple texts
texts = get_batch_text_fixture("batch_text_small")
result = make_request("POST", "/predict/batch", json={"texts": texts})

if result.get("status_code") == 200 and result.get("data"):
    predictions = result["data"].get("predictions", [])
    print(f"✓ Batch request successful (latency: {result['latency_ms']:.1f}ms)")
    print(f"Processed {len(predictions)} texts")
    for i, (text, prediction) in enumerate(zip(texts, predictions), 1):
        entities = prediction.get("entities", [])
        print(f"\nText {i}:")
        display_entities(entities, source_text=text)
else:
    print(f"✗ Request failed: {result.get('error', result.get('status_code'))}")


### 4.5 Batch File Prediction


In [None]:
# Test batch with PDF files
file_paths = get_batch_file_fixture("batch_file_small", "pdf")
try:
    files_list = []
    for file_path in file_paths:
        with open(file_path, "rb") as f:
            file_content = f.read()
        files_list.append(("files", (file_path.name, file_content, "application/pdf")))
    
    result = make_request("POST", "/predict/file/batch", files=files_list)
    
    if result.get("status_code") == 200 and result.get("data"):
        predictions = result["data"].get("predictions", [])
        print(f"✓ Batch request successful (latency: {result['latency_ms']:.1f}ms)")
        print(f"Processed {len(predictions)} files")
        for i, (file_path, prediction) in enumerate(zip(file_paths, predictions), 1):
            extracted_text = prediction.get("extracted_text", "")
            entities = prediction.get("entities", [])
            print(f"\nFile {i} ({file_path.name}):")
            if extracted_text:
                display_entities(entities, source_text=extracted_text)
    else:
        print(f"✗ Request failed: {result.get('error', result.get('status_code'))}")
except Exception as e:
    print(f"Error: {e}")


## 5. Docker Compose Testing

Test using Docker Compose for orchestration.


In [None]:
# Validate docker-compose.yml exists
compose_file = project_root / "docker-compose.yml"

if compose_file.exists():
    print(f"✓ docker-compose.yml found: {compose_file}")
    # Read and display content
    with open(compose_file) as f:
        content = f.read()
        print("\nDocker Compose file content:")
        print(content)
else:
    print(f"✗ docker-compose.yml not found: {compose_file}")


In [None]:
# Start services with Docker Compose
# Note: This requires updating docker-compose.yml with actual model paths first

import subprocess

compose_file = project_root / "docker-compose.yml"

if compose_file.exists() and onnx_path and checkpoint_path:
    # Update docker-compose.yml command with actual paths
    # For now, we'll just validate the file
    print("Note: To use docker-compose, update docker-compose.yml with actual model paths")
    print(f"  ONNX: /app/outputs/{onnx_path.relative_to(OUTPUTS_DIR)}")
    print(f"  Checkpoint: /app/outputs/{checkpoint_path.relative_to(OUTPUTS_DIR)}")
    
    # Example: Start with docker-compose (commented out - requires manual path update)
    # result = subprocess.run(
    #     ["docker-compose", "-f", str(compose_file), "up", "-d"],
    #     cwd=project_root,
    #     capture_output=True,
    #     text=True,
    # )
    # print(result.stdout)
    # if result.returncode != 0:
    #     print(f"Error: {result.stderr}")
else:
    print("Cannot start docker-compose: missing compose file or models")


## 6. Volume Mounts and Environment Variables

Test different volume mount configurations and environment variable settings.


In [None]:
# Test volume mounts - verify files are accessible in container
if docker_client:
    try:
        container = docker_client.containers.get(DOCKER_CONTAINER_NAME)
        
        # Check if outputs directory is mounted
        exit_code, output = container.exec_run(
            ["sh", "-c", "ls -la /app/outputs/conversion | head -5"],
            user="root"
        )
        if exit_code == 0:
            print("✓ Outputs directory is accessible:")
            print(output.decode("utf-8"))
        else:
            print("✗ Cannot access outputs directory")
            if output:
                print(f"  Error: {output.decode('utf-8').strip()}")
            
        # Check if models are visible
        if onnx_path:
            container_onnx_path = f"/app/outputs/{onnx_path.relative_to(OUTPUTS_DIR)}"
            # Use sh -c to allow shell operators
            exit_code, output = container.exec_run(
                ["sh", "-c", f"test -f {container_onnx_path} && echo 'ONNX model found' || echo 'ONNX model not found'"],
                user="root"
            )
            result = output.decode('utf-8').strip() if output else ""
            if exit_code == 0:
                print(f"\nModel check: {result}")
            else:
                print(f"\nModel check: Error checking model file")
                if result:
                    print(f"  {result}")
    except Exception as e:
        print(f"Error checking volumes: {e}")


In [None]:
# Test different environment variable configurations
# Note: This requires restarting the container with different env vars

print("Testing environment variables:")
print("\nCurrent configuration:")
print(f"  OCR_EXTRACTOR: easyocr")
print(f"  PDF_EXTRACTOR: pymupdf")
print(f"  PYTHONPATH: /app/src:/app")

# To test different extractors, you would:
# 1. Stop current container
# 2. Start new container with different environment variables
# 3. Test API endpoints
# Example:
# environment = {
#     "PYTHONPATH": "/app/src:/app",
#     "OCR_EXTRACTOR": "pytesseract",  # Different OCR
#     "PDF_EXTRACTOR": "pdfplumber",    # Different PDF extractor
# }


## 7. Cleanup

Stop containers and optionally remove images.


In [None]:
# Stop Docker container
stop_docker_container(DOCKER_CONTAINER_NAME, remove=True, verbose=True)


In [None]:
# Optional: Remove Docker image (uncomment to use)
# if docker_client:
#     try:
#         image = docker_client.images.get(f"{DOCKER_IMAGE_NAME}:{DOCKER_IMAGE_TAG}")
#         docker_client.images.remove(image.id, force=True)
#         print(f"✓ Removed image: {DOCKER_IMAGE_NAME}:{DOCKER_IMAGE_TAG}")
#     except Exception as e:
#         print(f"Error removing image: {e}")

print("Cleanup complete!")
