# PaperAlchemy - Week 1: Infrastructure Setup

Build a production-grade RAG system for academic paper discovery using Docker, PostgreSQL, OpenSearch, FastAPI, Airflow, Ollama, and Langfuse.

## Technology Stack
| Component | Purpose | Port |
|-----------|---------|------|
| **FastAPI** | REST API | 8000 |
| **PostgreSQL** | Paper metadata storage | 5433 |
| **Redis** | Caching layer | 6380 |
| **OpenSearch** | Hybrid search engine | 9201/5602 |
| **Apache Airflow** | Workflow automation | 8080 |
| **Ollama** | Local LLM inference | 11434 |
| **Langfuse** | RAG monitoring & tracing | 3001 |
| **ClickHouse** | Analytics database | - |

## Learning Materials

**Core Technologies:**
- **Docker**: [Tutorial Video](https://www.youtube.com/watch?v=pg19Z8LL06w) | [Docker Compose](https://www.youtube.com/watch?v=SXwC9fSwct8)
- **FastAPI**: [YouTube Series](https://www.youtube.com/playlist?list=PLK8U0kF0E_D6l19LhOGWhVZ3sQ6ujJKq_) | [Documentation](https://fastapi.tiangolo.com/tutorial/)
- **PostgreSQL**: [Beginners Guide](https://www.youtube.com/watch?v=SpfIwlAYaKk) | [FastAPI + PostgreSQL](https://www.youtube.com/watch?v=398DuQbQJq0)
- **OpenSearch**: [Getting Started](https://docs.opensearch.org/latest/getting-started/)
- **Apache Airflow**: [Tutorial Video](https://www.youtube.com/watch?v=Y_vQyMljDsE)
- **Langfuse**: [Documentation](https://langfuse.com/docs)

**Development Tools:**
- **VS Code Setup**: [Video Guide](https://www.youtube.com/watch?v=mpk4Q5feWaw)
- **Git Basics**: [Tutorial](https://www.youtube.com/watch?v=zTjRZNkhiEU)
- **UV Package Manager**: [Setup Video](https://www.youtube.com/watch?v=AMdG7IjgSPM)

## Prerequisites

**Required Software:**
- Python 3.12+ ([Download](https://www.python.org/downloads/))
- UV Package Manager ([Install Guide](https://docs.astral.sh/uv/getting-started/installation/))
- Docker Desktop ([Download](https://docs.docker.com/get-docker/))
- Git ([Download](https://git-scm.com/downloads))

**System Requirements:**
- 8GB+ RAM (16GB recommended)
- 20GB+ free disk space

## Setup Instructions

**Before running cells:**
1. Extract/clone project to your system
2. Open terminal in project root (contains `compose.yml`)
3. Run: `uv sync`
4. Start all services: `docker compose up -d`
5. **Start Jupyter with UV**: `uv run jupyter notebook`

**Important:** Always start Jupyter with `uv run jupyter notebook` to use the project's virtual environment!

---
## 1. Environment Check

In [1]:
# Environment Check
import sys
from pathlib import Path

python_version = sys.version_info
print(f"Python Version: {python_version.major}.{python_version.minor}.{python_version.micro}")
print(f"Environment: {sys.executable}")

# Check if running in project's virtual environment
if '.venv' in sys.executable:
    print("✓ Running in project virtual environment")
else:
    print("⚠ WARNING: Not running in project .venv!")
    print("  Restart Jupyter with: uv run jupyter notebook")

if python_version >= (3, 12):
    print("✓ Python version compatible")
else:
    print("✗ Need Python 3.12+")

Python Version: 3.12.7
Environment: /Users/nishantgaurav/Project/PaperAlchemy/.venv/bin/python
✓ Running in project virtual environment
✓ Python version compatible


In [2]:
# Find Project Root
current_dir = Path.cwd()

if current_dir.name == "week1" and current_dir.parent.name == "notebooks":
    project_root = current_dir.parent.parent
elif (current_dir / "compose.yml").exists():
    project_root = current_dir
else:
    project_root = None

if project_root and (project_root / "compose.yml").exists():
    print(f"✓ Project root: {project_root}")
else:
    print("✗ Missing compose.yml - check directory")

✓ Project root: /Users/nishantgaurav/Project/PaperAlchemy


In [3]:
# Check Docker
import subprocess

try:
    result = subprocess.run(["docker", "--version"], capture_output=True, text=True, timeout=5)
    if result.returncode == 0:
        print(f"✓ Docker: {result.stdout.strip()}")
    else:
        print("✗ Docker: Not working")
except:
    print("✗ Docker: Not found")

✓ Docker: Docker version 28.4.0, build d8eb465


In [4]:
# Check Docker Compose
try:
    result = subprocess.run(["docker", "compose", "version"], capture_output=True, text=True, timeout=5)
    if result.returncode == 0:
        print(f"✓ Docker Compose: {result.stdout.strip()}")
    else:
        print("✗ Docker Compose: Not working")
except:
    print("✗ Docker Compose: Not found")

✓ Docker Compose: Docker Compose version v2.39.4-desktop.1


In [5]:
# Check UV Package Manager
try:
    result = subprocess.run(["uv", "--version"], capture_output=True, text=True, timeout=5)
    if result.returncode == 0:
        print(f"✓ UV: {result.stdout.strip()}")
        print("\n✓ All required software ready!")
    else:
        print("✗ UV: Not working")
except:
    print("✗ UV: Not found")

✓ UV: uv 0.8.23 (Homebrew 2025-10-04)

✓ All required software ready!


---
## 2. Docker Services Status

**Command to run (in terminal):**
```bash
cd [project-root]
docker compose up -d
```

This will start all 12 services including the FastAPI application.

In [6]:
# Check Docker Running
try:
    result = subprocess.run(["docker", "info"], capture_output=True, timeout=5)
    if result.returncode == 0:
        print("✓ Docker is running")
    else:
        print("✗ Docker not running - start Docker Desktop")
except:
    print("✗ Docker daemon not accessible")

✓ Docker is running


In [7]:
# Check Docker Containers
import json

EXPECTED_DOCKER_SERVICES = [
    'paperalchemy-api',
    'paperalchemy-postgres',
    'paperalchemy-redis',
    'paperalchemy-opensearch',
    'paperalchemy-dashboards',
    'paperalchemy-ollama',
    'paperalchemy-airflow',
    'paperalchemy-clickhouse',
    'paperalchemy-langfuse',
    'paperalchemy-langfuse-postgres',
    'paperalchemy-langfuse-redis',
    'paperalchemy-langfuse-minio'
]

try:
    result = subprocess.run(
        ["docker", "compose", "ps", "--format", "json"],
        cwd=str(project_root),
        capture_output=True,
        text=True,
        timeout=15
    )
    
    print("=" * 70)
    print("DOCKER SERVICE STATUS")
    print("=" * 70)
    print(f"{'Container':<35} {'State':<12} {'Health'}")
    print("-" * 70)
    
    running_services = set()
    service_states = {}
    
    if result.returncode == 0 and result.stdout.strip():
        for line in result.stdout.strip().split('\n'):
            if line.strip():
                try:
                    container = json.loads(line)
                    name = container.get('Name', 'unknown')
                    state = container.get('State', 'unknown')
                    health = container.get('Health', '-')
                    running_services.add(name)
                    service_states[name] = {'state': state, 'health': health}
                    
                    if state == 'running' and health in ['healthy', '-']:
                        icon = "✓"
                    elif state == 'running':
                        icon = "⚠"
                    else:
                        icon = "✗"
                    
                    print(f"{icon} {name:<33} {state:<12} {health}")
                except:
                    pass
    
    # Check missing Docker services
    missing = set(EXPECTED_DOCKER_SERVICES) - running_services
    if missing:
        print("\nMISSING DOCKER SERVICES:")
        for s in sorted(missing):
            print(f"  ✗ {s}")
        print("\n  Run: docker compose up -d")
    else:
        print("\n✓ All Docker services running!")
        
except Exception as e:
    print(f"Error: {e}")

DOCKER SERVICE STATUS
Container                           State        Health
----------------------------------------------------------------------
✓ paperalchemy-airflow              running      healthy
✓ paperalchemy-api                  running      healthy
✓ paperalchemy-clickhouse           running      healthy
✓ paperalchemy-dashboards           running      healthy
⚠ paperalchemy-langfuse             running      unhealthy
✓ paperalchemy-langfuse-minio       running      healthy
✓ paperalchemy-langfuse-postgres    running      healthy
✓ paperalchemy-langfuse-redis       running      healthy
✓ paperalchemy-ollama               running      healthy
✓ paperalchemy-opensearch           running      healthy
✓ paperalchemy-postgres             running      healthy
✓ paperalchemy-redis                running      healthy

✓ All Docker services running!


---
## 3. FastAPI - REST API Service

The FastAPI application runs in Docker as part of `docker compose up -d`.

**Interactive Exploration:**
- **API Documentation**: http://localhost:8000/docs (Interactive Swagger UI)
- **Alternative Docs**: http://localhost:8000/redoc (ReDoc interface)
- **Source Code**: Located in `src/` directory

In [8]:
# Test FastAPI Health
import httpx

try:
    response = httpx.get("http://localhost:8000/health", timeout=5)
    if response.status_code == 200:
        data = response.json()
        print("✓ FastAPI is responding")
        print(f"  Status: {data.get('status', 'unknown')}")
        print(f"  Debug: {data.get('debug', 'unknown')}")
        print(f"\n  API Docs: http://localhost:8000/docs")
    else:
        print(f"⚠ API returned status: {response.status_code}")
except httpx.ConnectError:
    print("✗ API not responding")
    print("\n  Check if container is running:")
    print("  docker compose ps api")
    print("\n  Start it with:")
    print("  docker compose up -d api")
except Exception as e:
    print(f"✗ API test error: {e}")

✓ FastAPI is responding
  Status: healthy
  Debug: True

  API Docs: http://localhost:8000/docs


---
## 4. PostgreSQL - Database Storage

**Interactive Exploration:**

PostgreSQL stores all structured data for our application:
- **Connection**: localhost:5433
- **Database**: paperalchemy
- **Username/Password**: paperalchemy / paperalchemy_secret
- **GUI Tool Recommendation**: DBeaver (free database client)

Let's test the database connection:

In [9]:
# Test PostgreSQL Connection
try:
    import psycopg2
    
    conn = psycopg2.connect(
        host="localhost",
        port=5433,
        database="paperalchemy",
        user="paperalchemy",
        password="paperalchemy_secret"
    )
    
    print("✓ PostgreSQL connected")
    cursor = conn.cursor()
    cursor.execute("SELECT version();")
    version = cursor.fetchone()[0]
    print(f"  Version: {version.split(',')[0]}")
    print(f"  Port: 5433")
    
    print("\n  Connection Details:")
    print("  • Host: localhost")
    print("  • Port: 5433")
    print("  • Database: paperalchemy")
    print("  • Username: paperalchemy")
    print("  • Password: paperalchemy_secret")
    
    cursor.close()
    conn.close()
    
except ImportError:
    print("✗ psycopg2 not installed")
    print("\n  You're not using the project's virtual environment!")
    print("  Restart Jupyter with: uv run jupyter notebook")
except Exception as e:
    print(f"✗ PostgreSQL: {e}")

✓ PostgreSQL connected
  Version: PostgreSQL 16.10 on aarch64-unknown-linux-musl
  Port: 5433

  Connection Details:
  • Host: localhost
  • Port: 5433
  • Database: paperalchemy
  • Username: paperalchemy
  • Password: paperalchemy_secret


In [10]:
# Check Database Tables
try:
    import psycopg2
    
    conn = psycopg2.connect(
        host="localhost", port=5433,
        database="paperalchemy",
        user="paperalchemy", password="paperalchemy_secret"
    )
    cursor = conn.cursor()
    
    cursor.execute("""
        SELECT table_name 
        FROM information_schema.tables 
        WHERE table_schema = 'public'
        ORDER BY table_name;
    """)
    
    tables = cursor.fetchall()
    print(f"Found {len(tables)} tables in public schema")
    
    for (table_name,) in tables[:10]:  # Show first 10
        print(f"  • {table_name}")
    
    if len(tables) > 10:
        print(f"  ... and {len(tables) - 10} more")
    
    if not tables:
        print("  No application tables yet (expected in Week 1)")
    
    cursor.close()
    conn.close()
    
except ImportError:
    print("✗ psycopg2 not installed - restart Jupyter with: uv run jupyter notebook")
except Exception as e:
    print(f"✗ Could not check tables: {e}")

Found 48 tables in public schema
  • ab_permission
  • ab_permission_view
  • ab_permission_view_role
  • ab_register_user
  • ab_role
  • ab_user
  • ab_user_role
  • ab_view_menu
  • alembic_version
  • callback_request
  ... and 38 more


---
## 5. Redis - Cache Layer

**Interactive Exploration:**

Redis provides fast caching for our application:
- **Connection**: localhost:6380
- **Purpose**: Session caching, query results caching

Let's test Redis:

In [11]:
# Test Redis Connection
try:
    import redis
    
    r = redis.Redis(host="localhost", port=6380)
    r.ping()
    print("✓ Redis connected")
    print(f"  Port: 6380")
    
    # Test basic operations
    r.set("test_key", "PaperAlchemy")
    value = r.get("test_key")
    print(f"  Test write/read: {value.decode()}")
    r.delete("test_key")
    
    info = r.info()
    print(f"  Redis version: {info.get('redis_version')}")
    print(f"  Connected clients: {info.get('connected_clients')}")
    
except ImportError:
    print("✗ redis not installed - restart Jupyter with: uv run jupyter notebook")
except Exception as e:
    print(f"✗ Redis: {e}")

✓ Redis connected
  Port: 6380
  Test write/read: PaperAlchemy
  Redis version: 7.4.6
  Connected clients: 1


---
## 6. OpenSearch - Hybrid Search Engine

**Interactive Exploration:**

OpenSearch provides full-text search and analytics capabilities:
- **API Endpoint**: http://localhost:9201
- **Dashboards UI**: http://localhost:5602 (Web interface)

**Important for Students:**
- Use http://localhost:5602 for web interface
- Use Dev Tools in Dashboards for API queries

Let's test OpenSearch:

In [12]:
# Test OpenSearch Connection
import requests

try:
    response = requests.get("http://localhost:9201", timeout=15)
    if response.status_code == 200:
        info = response.json()
        print("✓ OpenSearch connected")
        print(f"  Cluster: {info.get('cluster_name')}")
        print(f"  Version: {info.get('version', {}).get('number')}")
        print(f"  Port: 9201")
except requests.exceptions.Timeout:
    print("⚠ OpenSearch timeout - service may be slow")
    print("  Try running this cell again")
except Exception as e:
    print(f"✗ OpenSearch: {e}")

✓ OpenSearch connected
  Cluster: docker-cluster
  Version: 2.19.0
  Port: 9201


In [13]:
# Check OpenSearch Cluster Health
import requests

try:
    response = requests.get("http://localhost:9201/_cluster/health", timeout=15)
    if response.status_code == 200:
        health = response.json()
        status = health.get('status')
        icon = "✓" if status == "green" else "⚠" if status == "yellow" else "✗"
        print(f"{icon} Cluster health: {status}")
        print(f"  Nodes: {health.get('number_of_nodes')}")
        print(f"  Active shards: {health.get('active_shards')}")
except requests.exceptions.Timeout:
    print("⚠ Health check timeout - try again")
except Exception as e:
    print(f"✗ Health check failed: {e}")

⚠ Cluster health: yellow
  Nodes: 1
  Active shards: 5


In [14]:
# Test OpenSearch Dashboards
import requests

dashboards_url = "http://localhost:5602"

try:
    response = requests.get(f"{dashboards_url}/api/status", timeout=15, allow_redirects=True)
    if response.status_code == 200:
        print("✓ OpenSearch Dashboards is accessible!")
        print("✓ Web interface is ready for exploration")
        
        print("\n  Web Interface Access:")
        print("  " + "=" * 40)
        print(f"  Main Dashboard: {dashboards_url}")
        print(f"  Dev Tools: {dashboards_url}/app/dev_tools")
        print("  " + "=" * 40)
        
        print("\n  Student Learning Activities:")
        print("  1. Explore the Dashboard:")
        print("     • Visit http://localhost:5602")
        print("     • Navigate through the interface")
        print("     • Check out the 'Discover' tab")
        
        print("\n  2. Use Dev Tools for API Queries:")
        print("     • Go to Dev Tools")
        print("     • Try: GET /_cluster/health")
        print("     • Try: GET /_cat/indices?v")
    else:
        print(f"⚠ Dashboards returned status: {response.status_code}")
        
except requests.exceptions.ConnectionError:
    print("✗ OpenSearch Dashboards not accessible yet")
    print("  Wait 2-3 minutes for full startup")
except Exception as e:
    print(f"✗ Error: {e}")

✓ OpenSearch Dashboards is accessible!
✓ Web interface is ready for exploration

  Web Interface Access:
  Main Dashboard: http://localhost:5602
  Dev Tools: http://localhost:5602/app/dev_tools

  Student Learning Activities:
  1. Explore the Dashboard:
     • Visit http://localhost:5602
     • Navigate through the interface
     • Check out the 'Discover' tab

  2. Use Dev Tools for API Queries:
     • Go to Dev Tools
     • Try: GET /_cluster/health
     • Try: GET /_cat/indices?v


---
## 7. Ollama - Local LLM Inference Engine

**Interactive Exploration:**

Ollama runs large language models locally on your machine:
- **API Endpoint**: http://localhost:11434
- **Command Line**: Available inside the container
- **Privacy**: All AI processing happens locally (no external APIs)

Let's test Ollama and see what models are available:

In [15]:
# Test Ollama Service Status
ollama_url = "http://localhost:11434/api/tags"

try:
    response = httpx.get(ollama_url, timeout=5)
    if response.status_code == 200:
        models_data = response.json()
        models = models_data.get('models', [])
        
        print("✓ Ollama is running!")
        print(f"  Available models: {len(models)}")
        
        if models:
            print("\n  Installed Models:")
            for model in models:
                name = model.get('name', 'unknown')
                size = model.get('size', 0)
                size_gb = round(size / (1024**3), 1)
                print(f"    • {name} ({size_gb} GB)")
        else:
            print("\n  No models installed yet")
            print("  To install a model:")
            print("  docker exec paperalchemy-ollama ollama pull llama3.2:1b")
            
    else:
        print(f"⚠ Ollama returned status: {response.status_code}")
        
except httpx.ConnectError:
    print("✗ Ollama is not responding yet")
except Exception as e:
    print(f"✗ Unexpected error: {e}")

✓ Ollama is running!
  Available models: 1

  Installed Models:
    • llama3.2:1b (1.2 GB)


In [16]:
# Check Ollama Version
try:
    response = httpx.get("http://localhost:11434/api/version", timeout=5)
    if response.status_code == 200:
        version_data = response.json()
        version = version_data.get('version', 'unknown')
        
        print("✓ Ollama API is healthy!")
        print(f"  Version: {version}")
        
        print("\n  What is Ollama?")
        print("  • Runs AI models completely on your local machine")
        print("  • No data sent to external services (privacy-first)")
        print("  • No API fees or rate limits")
        print("  • Supports models like Llama, Mistral, Phi, etc.")
        
except Exception as e:
    print(f"✗ Could not check Ollama version: {e}")

✓ Ollama API is healthy!
  Version: 0.11.2

  What is Ollama?
  • Runs AI models completely on your local machine
  • No data sent to external services (privacy-first)
  • No API fees or rate limits
  • Supports models like Llama, Mistral, Phi, etc.


In [17]:
# HANDS-ON: Pull Llama 3.2:1b Model (if not installed)
import time

# Check if model exists
try:
    response = httpx.get("http://localhost:11434/api/tags", timeout=5)
    models = response.json().get('models', [])
    model_names = [m.get('name') for m in models]
    
    if 'llama3.2:1b' in model_names:
        print("✓ llama3.2:1b already installed!")
    else:
        print("DOWNLOADING LLAMA 3.2:1B MODEL")
        print("=" * 50)
        print("This is a small 1.3GB model - perfect for testing!")
        print("Download will take 2-5 minutes...")
        
        result = subprocess.run(
            ["docker", "exec", "paperalchemy-ollama", "ollama", "pull", "llama3.2:1b"],
            capture_output=True,
            text=True,
            timeout=600
        )
        
        if result.returncode == 0:
            print("\n✓ Llama 3.2:1b model downloaded successfully!")
        else:
            print(f"⚠ Download issue: {result.stderr}")
            
except subprocess.TimeoutExpired:
    print("Download timed out - continues in background")
except Exception as e:
    print(f"Error: {e}")

✓ llama3.2:1b already installed!


In [18]:
# Test Llama 3.2:1b Generation
import time

def test_ollama_model(model_name, prompt, max_wait_time=60):
    """Test an Ollama model with a prompt."""
    print(f"Testing {model_name} with prompt: '{prompt}'")
    print("-" * 60)
    
    url = "http://localhost:11434/api/generate"
    data = {
        "model": model_name,
        "prompt": prompt,
        "stream": False
    }
    
    try:
        print("Generating response (this may take 10-30 seconds)...")
        start_time = time.time()
        
        response = httpx.post(url, json=data, timeout=max_wait_time)
        
        if response.status_code == 200:
            result = response.json()
            response_text = result.get('response', '').strip()
            
            elapsed_time = time.time() - start_time
            print(f"Response generated in {elapsed_time:.1f} seconds")
            print("\nRESPONSE:")
            print("=" * 40)
            print(response_text)
            print("=" * 40)
            
            if 'total_duration' in result:
                duration_ms = result['total_duration'] / 1000000
                print(f"\nGeneration time: {duration_ms:.0f}ms")
                
            return True
        else:
            print(f"API error: {response.status_code}")
            return False
            
    except httpx.ConnectError:
        print("Could not connect to Ollama API")
        return False
    except httpx.TimeoutException:
        print("Request timed out - model might be loading")
        return False
    except Exception as e:
        print(f"Unexpected error: {e}")
        return False

# Test with a simple prompt
test_prompt = "What is machine learning in one sentence?"
success = test_ollama_model("llama3.2:1b", test_prompt)

if success:
    print("\n✓ SUCCESS! Your local AI model is working!")
else:
    print("\nTroubleshooting:")
    print("1. Make sure model downloaded: docker exec paperalchemy-ollama ollama list")
    print("2. Check Ollama logs: docker compose logs ollama")

Testing llama3.2:1b with prompt: 'What is machine learning in one sentence?'
------------------------------------------------------------
Generating response (this may take 10-30 seconds)...
Response generated in 18.8 seconds

RESPONSE:
Machine learning is a subfield of artificial intelligence that enables computers to learn from data, make predictions or decisions without being explicitly programmed, allowing for adaptive and self-improving capabilities.

Generation time: 18767ms

✓ SUCCESS! Your local AI model is working!


---
## 8. Apache Airflow - Workflow Automation

**Interactive Exploration:**

Apache Airflow manages data pipelines and automated workflows:
- **Web Dashboard**: http://localhost:8080
- **DAGs Location**: `airflow/dags/` directory

Let's test Airflow:

In [19]:
# Test Airflow Health
try:
    response = httpx.get("http://localhost:8080/health", timeout=15)
    if response.status_code == 200:
        health = response.json()
        print("✓ Airflow is healthy")
        print(f"  Scheduler: {health.get('scheduler', {}).get('status')}")
        print(f"\n  Airflow Dashboard:")
        print(f"  URL: http://localhost:8080")
        print(f"  (Check container logs for admin password)")
    else:
        print(f"⚠ Airflow returned: {response.status_code}")
        
except httpx.ConnectError:
    print("⚠ Airflow not responding yet")
    print("  Airflow takes 3-5 minutes to start")
except Exception as e:
    print(f"⚠ Airflow: {e}")

✓ Airflow is healthy
  Scheduler: healthy

  Airflow Dashboard:
  URL: http://localhost:8080
  (Check container logs for admin password)


---
## 9. Langfuse - RAG Monitoring & Tracing

**Interactive Exploration:**

Langfuse provides observability for LLM applications:
- **Dashboard**: http://localhost:3001
- **Login**: admin@paperalchemy.com / admin123
- **Purpose**: Trace RAG queries, monitor performance, debug issues

Let's test Langfuse:

In [20]:
# Test Langfuse Health
try:
    response = httpx.get("http://localhost:3001/api/public/health", timeout=15)
    if response.status_code == 200:
        health = response.json()
        print("✓ Langfuse is healthy")
        print(f"  Status: {health.get('status')}")
        print(f"  Version: {health.get('version')}")
        print(f"\n  Langfuse Dashboard:")
        print(f"  URL: http://localhost:3001")
        print(f"  Login: admin@paperalchemy.com")
        print(f"  Password: admin123")
    else:
        print(f"⚠ Langfuse returned: {response.status_code}")
        
except httpx.ConnectError:
    print("⚠ Langfuse not responding yet")
    print("  Langfuse takes 2-3 minutes to start")
except Exception as e:
    print(f"⚠ Langfuse: {e}")

✓ Langfuse is healthy
  Status: OK
  Version: 3.148.0

  Langfuse Dashboard:
  URL: http://localhost:3001
  Login: admin@paperalchemy.com
  Password: admin123


In [21]:
# Test Langfuse PostgreSQL
try:
    import psycopg2
    
    conn = psycopg2.connect(
        host="localhost", port=5434,
        user="langfuse", password="langfuse",
        database="langfuse"
    )
    cursor = conn.cursor()
    cursor.execute("SELECT 1;")
    print("✓ Langfuse-Postgres connected")
    print(f"  Port: 5434")
    cursor.close()
    conn.close()
except ImportError:
    print("✗ psycopg2 not installed - restart Jupyter with: uv run jupyter notebook")
except Exception as e:
    print(f"✗ Langfuse-Postgres: {e}")

✓ Langfuse-Postgres connected
  Port: 5434


---
## 10. ClickHouse - Analytics Database

ClickHouse powers Langfuse analytics. Let's verify it's running:

In [22]:
# Test ClickHouse
try:
    result = subprocess.run(
        ["docker", "exec", "paperalchemy-clickhouse", "clickhouse-client", "--query", "SELECT 1"],
        capture_output=True, text=True, timeout=10
    )
    if result.returncode == 0:
        print("✓ ClickHouse connected")
    else:
        print(f"✗ ClickHouse: {result.stderr}")
except Exception as e:
    print(f"✗ ClickHouse: {e}")

✓ ClickHouse connected


---
## 11. Summary & Service URLs

In [23]:
print("=" * 60)
print("  PAPERALCHEMY - SERVICE URLS")
print("=" * 60)
print()
print("Main Services:")
print("  API Docs:      http://localhost:8000/docs")
print("  Airflow:       http://localhost:8080")
print("  Langfuse:      http://localhost:3001")
print("                 (admin@paperalchemy.com / admin123)")
print()
print("Search:")
print("  OpenSearch:    http://localhost:9201")
print("  Dashboard:     http://localhost:5602")
print()
print("LLM:")
print("  Ollama:        http://localhost:11434")
print()
print("Databases:")
print("  PostgreSQL:    localhost:5433 (paperalchemy/paperalchemy_secret)")
print("  Redis:         localhost:6380")
print("  Langfuse-PG:   localhost:5434 (langfuse/langfuse)")
print()
print("=" * 60)
print("  WEEK 1 COMPLETE!")
print("=" * 60)

  PAPERALCHEMY - SERVICE URLS

Main Services:
  API Docs:      http://localhost:8000/docs
  Airflow:       http://localhost:8080
  Langfuse:      http://localhost:3001
                 (admin@paperalchemy.com / admin123)

Search:
  OpenSearch:    http://localhost:9201
  Dashboard:     http://localhost:5602

LLM:
  Ollama:        http://localhost:11434

Databases:
  PostgreSQL:    localhost:5433 (paperalchemy/paperalchemy_secret)
  Redis:         localhost:6380
  Langfuse-PG:   localhost:5434 (langfuse/langfuse)

  WEEK 1 COMPLETE!


---
## Troubleshooting

**Common Issues:**
- **ModuleNotFoundError** → Restart Jupyter with: `uv run jupyter notebook`
- **Connection refused** → Service still starting (wait 2-3 minutes)
- **Port in use** → Stop conflicting application or change ports
- **Container restarting** → Check logs: `docker compose logs [service-name]`
- **Out of memory** → Increase Docker Desktop memory allocation
- **API not responding** → Check: `docker compose ps api` and restart if needed

**Reset everything:** `docker compose down && docker compose up -d`

---
## Project Commands

**Makefile shortcuts:**
```bash
make start    # Start all services
make status   # Check service status
make logs     # View logs
make health   # Check service health
make stop     # Stop all services
make help     # View all commands
```

**Docker commands:**
```bash
docker compose up -d        # Start all services
docker compose ps           # Check running containers
docker compose logs api     # View API logs
docker compose restart api  # Restart API
```

**Next:** Continue to Week 2 for data ingestion from arXiv!