# 1. Setup Guide

This notebook contains the executable setup steps for the Credit OCR System infrastructure.

> **📖 For detailed explanations** of the system architecture, technology choices, and learning resources, see the [README.md](./README.md) in this folder.

## Setup Overview

We'll deploy and configure four services:
- **PostgreSQL** (database) → **Redis** (message broker) → **Ollama** (AI models) → **Azurite** (file storage)

**Estimated time:** 20-30 minutes first run, 2-3 minutes subsequent runs.

## 1. Prerequisites

**Required:** Python 3.10+, Docker Desktop, UV package manager  
**Optional, but recommended:** Git

### 1.1 Quick Setup Check

**Minimum:** 8GB RAM, 15GB disk space  
**Recommended:** 16GB RAM, 25GB disk space

> **⚠️ Start Docker Desktop** before proceeding

### 1.2 Initial Project Setup

### Step-by-Step Project Setup

Follow these steps to prepare your development environment:

**1. Project Preparation** (2 minutes)
```bash
# Navigate to your desired directory
cd /path/to/your/projects

# Download/clone the project (if not done already)
# Ensure the project folder contains compose.yml
```

**2. Python Environment Setup** (3-5 minutes)
```bash
# Navigate to project root (should contain compose.yml)
cd credit-ocr-system

# Create isolated Python environment
uv venv

# Install all project dependencies
uv sync
```

**3. Jupyter Notebook Launch** (1 minute)
```bash
# Start Jupyter with project environment
uv run jupyter notebook

# Open this notebook (01_setup.ipynb)
# Verify kernel shows ".venv" or project name
```

### Understanding Virtual Environments

**What is a virtual environment?**
A virtual environment creates an isolated Python installation for this project. This means:
- ✅ **No conflicts**: Project dependencies won't interfere with your system Python
- ✅ **Reproducible**: Same environment on every machine
- ✅ **Clean**: Easy to reset or remove if needed

**How to verify it's working:**
- Jupyter kernel should show `.venv` or your project name
- Running `which python` in terminal should show the `.venv` path

## 2. Environment Configuration

### 2.1 Import Required Libraries

Before we begin the technical setup, let's import all the Python libraries we'll need. This consolidates our dependencies and makes it clear what tools we're using.

### Why We Need These Libraries
- **Standard Library**: Built-in Python modules for system operations, file handling, and process management
- **Third-party Libraries**: External packages for HTTP requests and database/cache connections
- **Optional Imports**: Libraries that enhance functionality but aren't strictly required for basic operation

In [19]:
# Standard Library Imports
import sys
import subprocess
import time
import json
from pathlib import Path

# Third-party Library Imports
import requests

# Optional imports (will check availability later)
try:
    import redis
    REDIS_AVAILABLE: bool = True
except ImportError:
    REDIS_AVAILABLE: bool = False

try:
    import psycopg2
    PSYCOPG2_AVAILABLE: bool = True
except ImportError:
    PSYCOPG2_AVAILABLE: bool = False

print("Libraries imported successfully.")
print(f"   Redis library available: {REDIS_AVAILABLE}")
print(f"   PostgreSQL library available: {PSYCOPG2_AVAILABLE}")

if not REDIS_AVAILABLE or not PSYCOPG2_AVAILABLE:
    print("\nSome libraries missing. Run 'uv sync' to install them.")

Libraries imported successfully.
   Redis library available: True
   PostgreSQL library available: True


### 2.2 System Configuration & Environment Validation

Before we begin the setup, let's define the configuration settings that will be used throughout this notebook. Understanding these settings helps you see how our microservices will communicate.

### Configuration Strategy
- **Development Mode**: We're using simple constants for easy understanding
- **Production Ready**: These values align with our `compose.yml` and will later move to a dedicated config file
- **Port Mapping**: Each service uses a specific port to avoid conflicts
- **Local Focus**: All services run on localhost for secure, local development

In [20]:
# Simple Configuration Settings
# These are the basic settings we'll use throughout this setup

# AI Model Configuration (will be moved to config file later)
GENERATIVE_MODEL_URL = "http://127.0.0.1:11435"
MODEL_NAME = "llama3.1:8b"

# Database Settings (from compose.yml)
DATABASE_HOST = "localhost"
DATABASE_PORT = 5432
DATABASE_NAME = "dms_meta"
DATABASE_USER = "dms"
DATABASE_PASSWORD = "dms"

# Other Service Settings
REDIS_HOST = "localhost"
REDIS_PORT = 6379
AZURITE_HOST = "localhost" 
AZURITE_PORT = 10000

print("Configuration loaded:")
print(f"   Generative Model: {MODEL_NAME}")
print(f"   Generative Model URL: {GENERATIVE_MODEL_URL}")
print(f"   Database: {DATABASE_NAME} on {DATABASE_HOST}:{DATABASE_PORT}")
print(f"   Message Broker (Redis): {REDIS_HOST}:{REDIS_PORT}")
print(f"   File Storage: {AZURITE_HOST}:{AZURITE_PORT}")
print()
print("All services configured for local development")
print("All data stays on your machine - no external dependencies")

Configuration loaded:
   Generative Model: llama3.1:8b
   Generative Model URL: http://127.0.0.1:11435
   Database: dms_meta on localhost:5432
   Message Broker (Redis): localhost:6379
   File Storage: localhost:10000

All services configured for local development
All data stays on your machine - no external dependencies


In [21]:
# Check Python Environment
python_version = sys.version_info
print(f"Python Version: {python_version.major}.{python_version.minor}.{python_version.micro}")
print(f"Python Path: {sys.executable}")

if python_version >= (3, 10):
    print("Python version is compatible")
else:
    print("Error: Python 3.10 or higher is required")
    print("Please install a newer Python version and try again")
    exit()

Python Version: 3.10.16
Python Path: /Users/markuskuehnle/Documents/projects/credit-ocr-system/.venv/bin/python
Python version is compatible


In [22]:
# Find Project Root Directory
current_directory = Path.cwd()

# Check if we are in the notebook subdirectory
if current_directory.name == "1-setup" and current_directory.parent.name == "notebooks":
    project_root_directory = current_directory.parent.parent
# Check if we are already in the project root
elif (current_directory / "compose.yml").exists():
    project_root_directory = current_directory
else:
    project_root_directory = None

if project_root_directory and (project_root_directory / "compose.yml").exists():
    print(f"Found project root: {project_root_directory}")
    print("Found compose.yml file")
else:
    print("Error: Cannot find compose.yml file")
    print("Make sure you are running this notebook from the correct directory")
    exit()

Found project root: /Users/markuskuehnle/Documents/projects/credit-ocr-system
Found compose.yml file


In [23]:
# Check Docker Installation
try:
    docker_version_result = subprocess.run(["docker", "--version"], capture_output=True, text=True, timeout=5)
    if docker_version_result.returncode == 0:
        version_output = docker_version_result.stdout.strip()
        print(f"Docker is installed: {version_output}")
    else:
        print("Docker is installed but not working properly")
        print("Try restarting Docker Desktop")
        exit()
except FileNotFoundError:
    print("Docker is not installed")
    print("Please install Docker Desktop from https://docs.docker.com/get-docker/")
    exit()
except Exception as error:
    print(f"✗ Error checking Docker: {error}")
    exit()

Docker is installed: Docker version 28.0.1, build 068a01e


In [24]:
# Check Docker Compose
try:
    compose_version_result = subprocess.run(["docker", "compose", "version"], capture_output=True, text=True, timeout=5)
    if compose_version_result.returncode == 0:
        compose_version = compose_version_result.stdout.split()[3]
        print(f"Docker Compose is available: {compose_version}")
    else:
        print("Docker Compose is not working properly")
        print("Make sure Docker Desktop is running")
        exit()
except FileNotFoundError:
    print("Docker Compose command not found")
    print("Docker Compose should come with Docker Desktop")
    exit()
except Exception as error:
    print(f"✗ Error checking Docker Compose: {error}")
    exit()

Docker Compose is available: v2.33.1-desktop.1


In [25]:
# Check UV Package Manager
try:
    uv_version_result = subprocess.run(["uv", "--version"], capture_output=True, text=True, timeout=5)
    if uv_version_result.returncode == 0:
        uv_version = uv_version_result.stdout.strip()
        print(f"UV package manager is installed: {uv_version}")
        print("\nAll required software is ready!")
        print("\nNext step: Start the services with Docker Compose")
    else:
        print("UV package manager is not working properly")
        exit()
except FileNotFoundError:
    print("UV package manager is not installed")
    print("Please install UV from https://docs.astral.sh/uv/getting-started/installation/")
    exit()
except Exception as error:
    print(f"Error checking UV: {error}")
    exit()

UV package manager is installed: uv 0.7.19 (38ee6ec80 2025-07-02)

All required software is ready!

Next step: Start the services with Docker Compose


## 3. Service Deployment & Health Monitoring

### 3.1 Service Orchestration with Docker Compose

Now we'll start all the services that make up our Credit OCR system. Docker Compose will coordinate four different services to work together seamlessly.

### Understanding Docker Compose
Docker Compose reads our `compose.yml` file and:
- **Downloads images** (only on first run - may take 5-10 minutes)
- **Creates networks** for services to communicate securely
- **Starts containers** in the correct order with health checks
- **Manages dependencies** ensuring services start when their requirements are met

### The Four Services We're Starting
| Service | Purpose | First-time Download | Startup Time |
|---------|---------|-------------------|--------------|
| **PostgreSQL** | Document metadata storage | ~200MB | ~15 seconds |
| **Redis** | Message broker for background jobs | ~50MB | ~5 seconds |
| **Ollama** | Local AI model server | ~500MB | ~30 seconds |
| **Azurite** | Local file storage emulator | ~100MB | ~10 seconds |

### Alternative Manual Method
If you prefer terminal commands, you can start services manually:

```bash
cd /path/to/your/project/root
docker compose up -d
```

**First-time setup note:** Downloads may take several minutes depending on internet speed. This is normal and only happens once.

In [26]:
# Check if Docker is Running
try:
    docker_info_result = subprocess.run(["docker", "info"], capture_output=True, timeout=5)
    if docker_info_result.returncode == 0:
        print("Docker is running and accessible")
    else:
        print("Docker is not running")
        print("Please start Docker Desktop and try again")
        exit()
except Exception as error:
    print("Cannot access Docker")
    print("Make sure Docker Desktop is running")
    exit()

Docker is running and accessible


We run `docker compose up -d` with the next cell to start services.

In [27]:
def start_docker_services() -> None:
    """Start all required docker services in detached mode"""
    docker_up_result: subprocess.CompletedProcess = subprocess.run(
        ["docker", "compose", "up", "-d"],
        cwd=str(project_root_directory),
        capture_output=True,
        text=True
    )
    if docker_up_result.returncode == 0:
        print("All services started successfully")
    else:
        print("Failed to start services")
        print(docker_up_result.stderr)


start_docker_services()

All services started successfully


In [28]:
# Check Current Running Services
try:
    containers_result = subprocess.run(
        ["docker", "compose", "ps", "--format", "json"],
        cwd=str(project_root_directory),
        capture_output=True,
        text=True,
        timeout=10
    )
    
    if containers_result.returncode == 0 and containers_result.stdout.strip():
        print("Current service status:")
        for line in containers_result.stdout.strip().split('\n'):
            if line.strip():
                try:
                    container_info = json.loads(line)
                    service_name = container_info.get('Service', 'unknown')
                    service_state = container_info.get('State', 'unknown')
                    
                    if service_state == 'running':
                        print(f"  ✓ {service_name}: {service_state}")
                    else:
                        print(f"  ✗ {service_name}: {service_state}")
                except json.JSONDecodeError:
                    continue
    else:
        print("No services are currently running")
        print("Run 'docker compose up -d' to start services")
        
except Exception as error:
    print(f"Could not check service status: {error}")

Current service status:
  ✓ azurite: running
  ✓ ollama: running
  ✓ postgres: running
  ✓ redis: running


### 3.2 Service Health Monitoring

Our Credit OCR system uses 4 services that work together. Let's check if they are all running properly:

- **postgres**: Database for storing document metadata
- **redis**: Cache for background job processing
- **ollama**: AI model server for text analysis  
- **azurite**: File storage for uploaded documents

In [29]:
# Check Each Service Health Status
expected_services_for_credit_ocr = {
    'postgres': 'PostgreSQL database for document storage',
    'redis': 'Redis cache for background job processing',
    'ollama': 'Local AI model server for text analysis',
    'azurite': 'Azure Blob Storage emulator for file storage'
}

try:
    service_status_result = subprocess.run(
        ["docker", "compose", "ps", "--format", "json"],
        cwd=str(project_root_directory),
        capture_output=True,
        text=True,
        timeout=15
    )
    
    if service_status_result.returncode == 0:
        print("CREDIT OCR SYSTEM - SERVICE STATUS")
        print("=" * 70)
        print(f"{'Service':<15} {'State':<12} {'Health':<12} {'Description'}")
        print("-" * 70)
    else:
        print("Could not get service status")
        print("Make sure you ran 'docker compose up -d'")
        exit()
        
except Exception as error:
    print(f"Error checking services: {error}")
    exit()

# Parse and display service status
running_services = set()
service_status_info = {}

if service_status_result.stdout.strip():
    for line in service_status_result.stdout.strip().split('\n'):
        if line.strip():
            try:
                container_data = json.loads(line)
                service_name = container_data.get('Service', 'unknown')
                service_state = container_data.get('State', 'unknown')
                health_status = container_data.get('Health', 'no check')
                
                running_services.add(service_name)
                service_status_info[service_name] = {'state': service_state, 'health': health_status}
                
                if service_state == 'running' and health_status in ['healthy', 'no check']:
                    status_indicator = "✓"
                    description = "Ready for use"
                elif service_state == 'running' and health_status == 'unhealthy':
                    status_indicator = "⚠"
                    description = "Still starting up..."
                elif service_state == 'exited':
                    status_indicator = "✗"
                    description = "Failed to start"
                else:
                    status_indicator = "?"
                    description = f"Status: {service_state}"
                
                service_description = expected_services_for_credit_ocr.get(service_name, "Unknown service")
                print(f"{status_indicator} {service_name:<13} {service_state:<11} {health_status:<11} {service_description}")
                
            except json.JSONDecodeError:
                continue

CREDIT OCR SYSTEM - SERVICE STATUS
Service         State        Health       Description
----------------------------------------------------------------------
? azurite       running     starting    Azure Blob Storage emulator for file storage
? ollama        running     starting    Local AI model server for text analysis
? postgres      running     starting    PostgreSQL database for document storage
? redis         running     starting    Redis cache for background job processing


In [30]:
# Check for Missing or Failed Services
missing_services = set(expected_services_for_credit_ocr.keys()) - running_services

if missing_services:
    print("\nMISSING SERVICES:")
    print("-" * 50)
    for missing_service in missing_services:
        service_description = expected_services_for_credit_ocr[missing_service]
        print(f"✗ {missing_service:<15} Not running - {service_description}")

failed_services = [service for service, info in service_status_info.items() 
                  if info['state'] in ['exited', 'restarting'] or info['health'] == 'unhealthy']

if failed_services:
    print(f"\nTROUBLESHOOTING FAILED SERVICES:")
    print("-" * 50)
    for failed_service in failed_services:
        print(f"Check logs: docker compose logs {failed_service}")

if missing_services or failed_services:
    print(f"\nACTION NEEDED:")
    print("1. Start/restart services: docker compose up -d")
    print("2. Wait 1-2 minutes for services to start up")
    print("3. Re-run this cell to check status again")
else:
    print(f"\n✓ All Credit OCR services are running successfully!")
    print("Your system is ready for document processing.")


✓ All Credit OCR services are running successfully!
Your system is ready for document processing.


## 4. AI Model Installation & Configuration

Now that Ollama is running, let's install the AI model we'll use for credit document analysis. We'll use **Llama3.1:8b**, a powerful model that's excellent for financial document processing while still being efficient enough to run locally.

**About Llama3.1:8b:**
- Size: ~4.7GB download
- Memory usage: ~8GB RAM when loaded
- Performance: Excellent for text analysis and information extraction
- Speed: Fast enough for real-time document processing

> The next cell will download and install the model if needed. This process may take several minutes, especially on a slow connection.

In [31]:
def model_is_installed(model_name: str) -> bool:
    """Check if the model is already installed in Ollama"""
    try:
        response = requests.get(f"{GENERATIVE_MODEL_URL}/api/tags", timeout=5)
        if response.status_code != 200:
            return False
        models = response.json().get("models", [])
        return any(model.get("name") == model_name for model in models)
    except Exception:
        return False

def install_model(model_name: str) -> bool:
    """Install the model using Ollama's API"""
    try:
        response = requests.post(
            f"{GENERATIVE_MODEL_URL}/api/pull",
            json={"name": model_name},
            timeout=600
        )
        return response.status_code == 200
    except Exception as error:
        print(f"Error installing model: {error}")
        return False


print("LLAMA3.1:8B MODEL CHECK")
print("=" * 50)
print(f"Model: {MODEL_NAME}")
print(f"Ollama URL: {GENERATIVE_MODEL_URL}")

if model_is_installed(MODEL_NAME):
    print(f"Model '{MODEL_NAME}' is already installed.")
else:
    print(f"Model '{MODEL_NAME}' not found. Installing...")
    if not install_model(MODEL_NAME):
        print("Model installation failed.")
        raise SystemExit(1)
    print("Model installed successfully.")

print("=" * 50)

LLAMA3.1:8B MODEL CHECK
Model: llama3.1:8b
Ollama URL: http://127.0.0.1:11435
Model 'llama3.1:8b' is already installed.


## 5. Service Testing & Validation

Now let's test each service individually to make sure they are working correctly. This will help you understand what each service does in our Credit OCR system.

### A) PostgreSQL Database

PostgreSQL stores all the structured data for our system:
- Document metadata (filename, upload date, processing status)
- Extracted text and data from credit documents
- User information and processing history

Let's test the database connection:

In [32]:
# Test PostgreSQL Database Connection
def test_postgres_connection():
    """Test if PostgreSQL is accessible and responding"""
    import socket
    
    try:
        # Test if PostgreSQL port is accessible
        test_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        test_socket.settimeout(3)
        connection_result = test_socket.connect_ex(('localhost', 5432))
        test_socket.close()
        
        if connection_result == 0:
            print("✓ PostgreSQL is accepting connections on port 5432")
            
            # Try to connect with actual credentials
            try:                
                database_connection = psycopg2.connect(
                    host="localhost",
                    port=5432,
                    database="dms_meta", 
                    user="dms",
                    password="dms"
                )
                
                print("✓ Successfully connected to database 'dms_meta'")
                
                # Test simple query
                cursor = database_connection.cursor()
                cursor.execute("SELECT version();")
                postgres_version = cursor.fetchone()[0]
                print(f"✓ PostgreSQL version: {postgres_version.split()[0]} {postgres_version.split()[1]}")
                
                cursor.close()
                database_connection.close()
                return True
                
            except ImportError:
                print("⚠ psycopg2 not installed - basic connection test only")
                return True
            except Exception as db_error:
                print(f"✗ Database connection failed: {db_error}")
                return False
                
        else:
            print("✗ PostgreSQL port 5432 is not accessible")
            print("Make sure the postgres service is running")
            return False
            
    except Exception as connection_error:
        print(f"✗ Could not test PostgreSQL connection: {connection_error}")
        return False


postgres_is_working = test_postgres_connection()

if postgres_is_working:
    print("\nDatabase connection details:")
    print("• Host: localhost")
    print("• Port: 5432") 
    print("• Database: dms_meta")
    print("• Username: dms")
    print("• Password: dms")

✓ PostgreSQL is accepting connections on port 5432
✓ Successfully connected to database 'dms_meta'
✓ PostgreSQL version: PostgreSQL 15.14

Database connection details:
• Host: localhost
• Port: 5432
• Database: dms_meta
• Username: dms
• Password: dms


### B) Redis Message Broker

Redis serves as the message broker for Celery background job processing in our Credit OCR system:
- Manages document processing task queues
- Handles job distribution across background workers
- Tracks processing status and temporary results
- Enables asynchronous document processing without blocking the UI

Let's test Redis connectivity:

In [33]:
# Test Redis Connection
def test_redis_connection():
    """Test if Redis is accessible and responding"""
    import socket
    
    try:
        # Test if Redis port is accessible
        test_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        test_socket.settimeout(3)
        connection_result = test_socket.connect_ex(('localhost', 6379))
        test_socket.close()
        
        if connection_result == 0:
            print("Redis is accepting connections on port 6379")
            
            # Try to connect with redis client if available
            try:                
                redis_client = redis.Redis(host='localhost', port=6379, decode_responses=True)
                
                # Test ping
                redis_response = redis_client.ping()
                if redis_response:
                    print("✓ Successfully connected to Redis")
                    
                    # Test basic operations
                    redis_client.set('test_key', 'test_value')
                    retrieved_value = redis_client.get('test_key')
                    
                    if retrieved_value == 'test_value':
                        print("✓ Redis read/write operations working")
                        redis_client.delete('test_key')  # cleanup
                        return True
                    else:
                        print("✗ Redis read/write test failed")
                        return False
                else:
                    print("✗ Redis ping failed")
                    return False
                    
            except ImportError:
                print("⚠ redis-py not installed - basic connection test only")
                return True
            except Exception as redis_error:
                print(f"✗ Redis connection failed: {redis_error}")
                return False
                
        else:
            print("Redis port 6379 is not accessible")
            print("Make sure the redis service is running")
            return False
            
    except Exception as connection_error:
        print(f"Could not test Redis connection: {connection_error}")
        return False


redis_is_working = test_redis_connection()

if redis_is_working:
    print("\nRedis connection details:")
    print("• Host: localhost")
    print("• Port: 6379")
    print("• Used for: Background job queue and caching")

Redis is accepting connections on port 6379
✓ Successfully connected to Redis
✓ Redis read/write operations working

Redis connection details:
• Host: localhost
• Port: 6379
• Used for: Background job queue and caching


### C) Ollama LLM

Ollama runs LLM models locally for text analysis in our Credit OCR system:
- Processes extracted text from documents  
- Analyzes credit-related information
- Runs completely on your local machine (no external API calls)
- Privacy-focused - your data never leaves your computer

**Note:** The Ollama service is mapped to port 11435 instead of the default 11434 to avoid conflicts.

Let's test Ollama:

In [34]:
# Test Ollama LLM Model
# Note: Using port 11435 as defined in compose.yml
def test_ollama_service():
    """Test if Ollama is running and accessible"""
    
    # Test 1: Check if Ollama API is responding
    try:
        version_response = requests.get(f"{GENERATIVE_MODEL_URL}/api/version", timeout=10)
        if version_response.status_code == 200:
            version_data = version_response.json()
            version = version_data.get('version', 'unknown')
            print("✓ Ollama API is responding")
            print(f"✓ Ollama version: {version}")
        else:
            print(f"⚠ Ollama API returned status: {version_response.status_code}")
            return False
    except requests.exceptions.ConnectionError:
        print("✗ Ollama is not responding on port 11435")
        print("Make sure the ollama service is running")
        return False
    except requests.exceptions.Timeout:
        print("✗ Ollama request timed out")
        print("Service might still be starting up")
        return False
    except Exception as error:
        print(f"✗ Error testing Ollama: {error}")
        return False
    
    # Test 2: Check available models
    try:
        models_response = requests.get(f"{GENERATIVE_MODEL_URL}/api/tags", timeout=10)
        if models_response.status_code == 200:
            models_data = models_response.json()
            available_models = models_data.get('models', [])
            
            print(f"✓ Found {len(available_models)} installed models")
            
            if available_models:
                print("\nInstalled AI models:")
                for model in available_models:
                    model_name = model.get('name', 'unknown')
                    model_size = model.get('size', 0)
                    size_in_gb = round(model_size / (1024**3), 1)
                    print(f"  • {model_name} ({size_in_gb} GB)")
            else:
                print("\nNo models installed yet")
                print("You can install models later for text analysis")
                print("Example: docker exec ollama ollama pull llama3.2:1b")
            
            return True
        else:
            print(f"⚠ Could not get model list: {models_response.status_code}")
            return False
            
    except Exception as models_error:
        print(f"⚠ Could not check models: {models_error}")
        return True  # Still consider success if API is working


ollama_is_working = test_ollama_service()

if ollama_is_working:
    print("\nOllama connection details:")
    print("• Host: localhost")
    print("• Port: 11435") 
    print("• Used for: Local AI text analysis")
    print("• Privacy: All processing happens locally")

✓ Ollama API is responding
✓ Ollama version: 0.5.13
✓ Found 1 installed models

Installed AI models:
  • llama3.1:8b (4.6 GB)

Ollama connection details:
• Host: localhost
• Port: 11435
• Used for: Local AI text analysis
• Privacy: All processing happens locally


### D) Azurite Blob Storage

Azurite is a local Azure Blob Storage emulator for our Credit OCR system:
- Stores uploaded document files (PDFs, images)
- Provides the same API as Azure Blob Storage
- Runs locally for development and testing
- No external dependencies or cloud accounts needed

Let's test Azurite:

In [35]:
# Test Azurite Blob Storage
azurite_base_url = f"http://{AZURITE_HOST}:{AZURITE_PORT}"

def test_azurite_service():
    """Test if Azurite blob storage is running and accessible"""
    
    try:
        # Test if Azurite is responding to the service endpoint
        service_response = requests.get(f"{azurite_base_url}/devstoreaccount1", timeout=10)
        
        if service_response.status_code == 400:
            # Status 400 is expected when accessing the root - it means Azurite is running
            print("✓ Azurite blob storage is responding")
            print("✓ Service is accessible on port 10000")
            
            # Try to test blob service info
            blob_service_response = requests.get(f"{azurite_base_url}/devstoreaccount1?comp=properties&restype=service", timeout=5)
            if blob_service_response.status_code in [200, 400, 403]:
                print("✓ Azurite blob service API is working")
                return True
            else:
                print(f"⚠ Azurite blob service returned: {blob_service_response.status_code}")
                return True  # Still consider success since main service is responding
                
        else:
            print(f"⚠ Azurite returned unexpected status: {service_response.status_code}")
            return False
            
    except requests.exceptions.ConnectionError:
        print("✗ Azurite is not responding on port 10000")
        print("Make sure the azurite service is running")
        return False
    except requests.exceptions.Timeout:
        print("✗ Azurite request timed out")
        print("Service might still be starting up")
        return False
    except Exception as error:
        print(f"✗ Error testing Azurite: {error}")
        return False

azurite_is_working = test_azurite_service()

if azurite_is_working:
    print("\nAzurite connection details:")
    print("• Host: localhost")
    print("• Port: 10000")
    print("• Account: devstoreaccount1 (default development account)")
    print("• Used for: Local file storage for uploaded documents")
    print("• Compatible with Azure Blob Storage APIs")

✗ Azurite is not responding on port 10000
Make sure the azurite service is running


## Summary of Service Tests

Let's summarize the status of all services in our Credit OCR system:

In [36]:
# Create Summary Report
print("=" * 60)
print("CREDIT OCR SYSTEM - SETUP SUMMARY")
print("=" * 60)

all_services_working = True

# Check if variables exist from previous tests
services_status = {}

if 'postgres_is_working' in locals():
    services_status['PostgreSQL Database'] = postgres_is_working
    all_services_working = all_services_working and postgres_is_working
else:
    services_status['PostgreSQL Database'] = False
    all_services_working = False

if 'redis_is_working' in locals():
    services_status['Redis Cache'] = redis_is_working
    all_services_working = all_services_working and redis_is_working
else:
    services_status['Redis Cache'] = False
    all_services_working = False

if 'ollama_is_working' in locals():
    services_status['Ollama AI Models'] = ollama_is_working
    all_services_working = all_services_working and ollama_is_working
else:
    services_status['Ollama AI Models'] = False
    all_services_working = False

if 'azurite_is_working' in locals():
    services_status['Azurite Blob Storage'] = azurite_is_working
    all_services_working = all_services_working and azurite_is_working
else:
    services_status['Azurite Blob Storage'] = False
    all_services_working = False

# Display status for each service
for service_name, is_working in services_status.items():
    status_symbol = "✓" if is_working else "✗"
    status_text = "Working" if is_working else "Not working"
    print(f"{status_symbol} {service_name:<25} {status_text}")

print("-" * 60)

if all_services_working:
    print("SUCCESS! All services are working correctly")
    print("\nYour Credit OCR system is ready for:")
    print("• Document upload and storage")
    print("• Text extraction and processing") 
    print("• AI-powered content analysis")
    print("• Background job processing")
else:
    print("⚠ Some services need attention")
    print("\nNext steps:")
    print("1. Make sure all services are running: docker compose ps")
    print("2. Start missing services: docker compose up -d")
    print("3. Wait 1-2 minutes and re-run the test cells")
    print("4. Check logs if issues persist: docker compose logs [service-name]")

CREDIT OCR SYSTEM - SETUP SUMMARY
✓ PostgreSQL Database       Working
✓ Redis Cache               Working
✓ Ollama AI Models          Working
✗ Azurite Blob Storage      Not working
------------------------------------------------------------
⚠ Some services need attention

Next steps:
1. Make sure all services are running: docker compose ps
2. Start missing services: docker compose up -d
3. Wait 1-2 minutes and re-run the test cells
4. Check logs if issues persist: docker compose logs [service-name]


> This notebook has shown that our compose.yaml file is working correctly.

---

## 6. Setup Complete

✅ **All services are running and tested**

Your Credit OCR system infrastructure is ready. 

> **📖 For troubleshooting, service access details, and development guidance**, see the [README.md](./README.md)

**Next:** Continue to the next notebook for document processing implementation.



---

### Service Access Information

Once all services are running, you can access them at these locations:

### Database Access
- **Host**: localhost
- **Port**: 5432
- **Database**: dms_meta
- **Username**: dms
- **Password**: dms
- **Recommended Tool**: DBeaver (free database client)

### File Storage Access  
- **Azurite Storage Explorer**: Available through Azure Storage Explorer

### AI Model Server
- **Ollama API**: http://localhost:11435
- **Example**: Install models with `docker exec ollama ollama pull llama3.1:8b`

### Redis Cache
- **Host**: localhost  
- **Port**: 6379
- **Tool**: Redis CLI or RedisInsight

---

### Quick Commands Reference

Here are the most important commands you'll use while working with your Credit OCR system:

### Docker Compose Commands
```bash
# Start all services in background
docker compose up -d

# Check service status
docker compose ps

# View service logs
docker compose logs [service-name]

# Stop all services
docker compose down

# Restart specific service
docker compose restart [service-name]
```

### Database Commands
```bash
# Connect to PostgreSQL
docker exec -it postgres psql -U dms -d dms_meta

# Backup database
docker exec postgres pg_dump -U dms dms_meta > backup.sql
```

### Ollama AI Commands
```bash
# List installed models
docker exec ollama ollama list

# Install a new model
docker exec ollama ollama pull llama3.2:1b

# Remove a model
docker exec ollama ollama rm model-name
```

---

## Summary

You have successfully completed the setup of your Credit OCR system infrastructure! 

**What's Running:**
- PostgreSQL database for structured data storage
- Redis message broker for background job processing  
- Ollama LLM server for text analysis capabilities
- Azurite blob storage for document file storage

**Important Notes:**
- Keep services running while developing your application
- Services will automatically restart when you restart Docker Desktop
- All data is stored locally on your machine
- No external cloud services or API keys required

**You're Ready For:**
- Document upload and storage
- OCR text extraction implementation  
- AI-powered document analysis
- Building REST APIs for your application

Continue to the next notebook to start building your Credit OCR application!