# 8. REST API Service for Document Processing

This notebook demonstrates how to start and interact with our FastAPI service that exposes the complete OCR pipeline via REST endpoints.

> **📖 For detailed explanations** of API design decisions, technology choices, and production considerations, see the [README.md](./README.md) in this folder.

**What We'll Build**

- FastAPI service with document upload endpoints
- Real-time processing status tracking
- Web interface for testing document uploads
- Integration with complete infrastructure stack

### API Endpoints Overview

Our service provides these REST endpoints:
- `POST /api/v1/upload` - Upload PDF documents for processing
- `GET /api/v1/status/{id}` - Check document processing status
- `GET /api/v1/results/{id}` - Retrieve extraction results
- `GET /api/v1/visualization/{id}` - Get OCR overlay images
- `GET /api/v1/health` - Service health check
- `GET /` - Web interface for testing
- `GET /docs` - Interactive API documentation


## 1. Prerequisites & Setup


In [1]:
import sys
import os
import time
import subprocess
import threading
import webbrowser
import requests
from pathlib import Path

# Add project root to Python path
project_root = Path().cwd().parent.parent
sys.path.insert(0, str(project_root))

print(f"Project root: {project_root}")
print(f"Current working directory: {Path.cwd()}")
print(f"Python path includes: {project_root in [Path(p) for p in sys.path]}")


Project root: /Users/markuskuehnle/Documents/projects/credit-ocr-system
Current working directory: /Users/markuskuehnle/Documents/projects/credit-ocr-system/notebooks/8-api-service
Python path includes: True


## 2. Infrastructure Startup

Start the required infrastructure services (PostgreSQL, Redis, Azurite) using Docker Compose.


In [2]:
def start_infrastructure():
    """Start required infrastructure services."""
    print("Starting infrastructure services...")
    
    # Change to project root for docker-compose
    os.chdir(project_root)
    
    # Start required services
    services = ["postgres", "redis", "azurite"]
    
    try:
        result = subprocess.run(
            ["docker-compose", "up", "-d"] + services,
            capture_output=True,
            text=True,
            check=True
        )
        print("Infrastructure services started successfully")
        print(f"Output: {result.stdout}")
        
        # Wait a moment for services to initialize
        print("Waiting for services to initialize...")
        time.sleep(5)
        
        return True
        
    except subprocess.CalledProcessError as e:
        print(f"Error starting infrastructure: {e}")
        print(f"Error output: {e.stderr}")
        return False

# Start infrastructure
infrastructure_started = start_infrastructure()


Starting infrastructure services...
Infrastructure services started successfully
Output: 
Waiting for services to initialize...


## 3. API Service Configuration

Configure environment variables and check our API service is ready to start.


In [3]:
# Set environment variables for API service
os.environ["API_DEBUG"] = "true"
os.environ["API_PORT"] = "8000"
os.environ["API_HOST"] = "127.0.0.1"
os.environ["ENVIRONMENT"] = "development"
os.environ["DATABASE_HOST"] = "localhost"
os.environ["REDIS_HOST"] = "localhost"
os.environ["ENABLE_BACKGROUND_PROCESSING"] = "true"

# API service configuration
API_HOST = os.environ["API_HOST"]
API_PORT = int(os.environ["API_PORT"])
API_BASE_URL = f"http://{API_HOST}:{API_PORT}"

print(f"API will be available at: {API_BASE_URL}")
print(f"Web interface: {API_BASE_URL}/")
print(f"API documentation: {API_BASE_URL}/docs")
print(f"Health check: {API_BASE_URL}/api/v1/health")


API will be available at: http://127.0.0.1:8000
Web interface: http://127.0.0.1:8000/
API documentation: http://127.0.0.1:8000/docs
Health check: http://127.0.0.1:8000/api/v1/health


## 4. Start API Service

Launch the FastAPI service in a background thread so we can continue using the notebook.


In [4]:
# Global variable to track API process
api_process = None

def start_api_service():
    """Start the FastAPI service in a subprocess."""
    global api_process
    
    if api_process is not None:
        print("API service is already running")
        return
    
    print("Starting FastAPI service...")
    
    # Change to project root
    os.chdir(project_root)
    
    try:
        # Start the API service using our run_api.py script
        api_process = subprocess.Popen(
            ["python3", "run_api.py"],
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            text=True,
            bufsize=1,
            universal_newlines=True
        )
        
        print(f"API service started with PID: {api_process.pid}")
        
        # Wait for API to start up
        print("Waiting for API service to start...")
        time.sleep(3)
        
        # Check if API is responding and healthy
        for attempt in range(15):
            try:
                response = requests.get(f"{API_BASE_URL}/api/v1/health", timeout=5)
                if response.status_code == 200:
                    health_data = response.json()
                    if health_data.get("status") == "healthy":
                        print("API service is ready and healthy!")
                        return True
                    else:
                        print(f"API responding but not healthy yet: {health_data.get('status', 'unknown')}")
            except requests.exceptions.RequestException:
                pass
            
            time.sleep(2)
            print(f"Waiting for API... (attempt {attempt + 1}/15)")
        
        print("API service may not have started properly or is not healthy")
        return False
        
    except Exception as e:
        print(f"Error starting API service: {e}")
        return False

# Start the API service
api_started = start_api_service()


Starting FastAPI service...
API service started with PID: 14131
Waiting for API service to start...
API service is ready and healthy!


## 5. Open Web Interface

Open the web interface in your browser for interactive testing.


In [5]:
def open_web_interface():
    """Open the web interface and API documentation in browser."""
    if not api_started:
        print("Cannot open web interface - API service not started")
        return
    
    print("Opening web interface in browser...")
    
    # Open main web interface
    try:
        webbrowser.open(f"{API_BASE_URL}/")
        print(f"Opened web interface: {API_BASE_URL}/")
    except Exception as e:
        print(f"Could not open web interface: {e}")
    
    # Also provide links for manual opening
    print("\nAvailable URLs:")
    print(f"Web Interface: {API_BASE_URL}/")
    print(f"API Documentation: {API_BASE_URL}/docs")
    print(f"Alternative Docs: {API_BASE_URL}/redoc")
    print(f"Health Check: {API_BASE_URL}/api/v1/health")

# Open web interface
open_web_interface()

Opening web interface in browser...
Opened web interface: http://127.0.0.1:8000/

Available URLs:
Web Interface: http://127.0.0.1:8000/
API Documentation: http://127.0.0.1:8000/docs
Alternative Docs: http://127.0.0.1:8000/redoc
Health Check: http://127.0.0.1:8000/api/v1/health


## 6. Teardown & Cleanup

**Important:** Run this cell when you're finished to properly shut down services and free up system resources.


In [6]:
def cleanup_services():
    """Stop API service and infrastructure."""
    global api_process
    
    print("Cleaning up services...")
    
    # Stop API service
    if api_process is not None:
        print("Stopping API service...")
        try:
            api_process.terminate()
            api_process.wait(timeout=10)
            print(f"API service stopped (PID: {api_process.pid})")
        except subprocess.TimeoutExpired:
            print("Force killing API service...")
            api_process.kill()
            api_process.wait()
        except Exception as e:
            print(f"Error stopping API service: {e}")
        finally:
            api_process = None
    
    # Stop infrastructure services
    print("Stopping infrastructure services...")
    os.chdir(project_root)
    
    try:
        result = subprocess.run(
            ["docker-compose", "down"],
            capture_output=True,
            text=True,
            timeout=30
        )
        print("Infrastructure services stopped")
        if result.stderr:
            print(f"Warnings: {result.stderr}")
    except subprocess.TimeoutExpired:
        print("Timeout stopping infrastructure - containers may still be running")
    except Exception as e:
        print(f"Error stopping infrastructure: {e}")
    
    print("\nCleanup completed!")
    print("All services should now be stopped.")

In [7]:
# Uncomment the line below to run cleanup
# cleanup_services()