Skip to content

jhult/ol-python-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OLCS Python API

Multi-tenant library catalog service backed by Open Library. FastAPI + SQLAlchemy 2.0 async + SQLite + Huey.

Architecture

┌─────────────┐     ┌──────────────┐     ┌──────────────┐
│   FastAPI    │────▶│  SQLAlchemy  │────▶│   SQLite DB  │
│  (uvicorn)   │     │   async      │     │              │
└──────┬───────┘     └──────────────┘     └──────────────┘
       │
       │ HTTP
       ▼
┌──────────────┐
│ Open Library  │
│    API        │
└──────────────┘

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│    Huey      │────▶│  SQLAlchemy  │────▶│   SQLite DB  │
│  (consumer)  │     │   sync        │     │  (shared)    │
└──────────────┘     └──────────────┘     └──────────────┘

Key Design Decisions

  • Multi-tenancy: Every table has a tenant_id FK. All endpoints are scoped under /tenants/{slug}/.... Tenant slugs are validated as lowercase alphanumeric with hyphens.
  • PII-safe storage: Patron emails are HMAC-SHA256 hashed (deterministic for dedup), names are SHA-256 hashed (one-way), and emails are masked for display (j***@example.com). No plaintext PII is ever stored or returned.
  • Versioned books: Re-ingesting the same work creates a new BookVersion record instead of overwriting. A diff endpoint shows what changed between versions.
  • Rate limiting: Fixed-window per-tenant rate limits — 60 req/min for API endpoints, 10 req/hour for ingestion. Returns 429 with Retry-After header.
  • Background ingestion: Huey task queue handles Open Library API calls asynchronously. Jobs track status (pending → running → completed/failed).
  • Scheduled re-sync: A Huey periodic task re-ingests all previously ingested queries every 6 hours (configurable via RE_SYNC_INTERVAL_HOURS).

Setup

# Clone and install
git clone https://github.com/jhult/olcs-python-api.git
cd olcs-python-api
uv sync

# Create the database and run migrations
uv run alembic upgrade head

# Create a tenant
uv run python scripts/create_tenant.py --name "My Library" --slug my-library

Run

# Start the API server
uv run uvicorn olcs.main:app --reload

# Start the Huey consumer (for background ingestion)
uv run huey_consumer olcs.huey_config.huey

The API server runs on http://localhost:8000 by default. Interactive docs at http://localhost:8000/docs.

OpenAPI Schema

FastAPI auto-generates an OpenAPI schema. With the server running:

curl http://localhost:8000/openapi.json

Or without starting the server:

from olcs.main import app
import json
print(json.dumps(app.openapi(), indent=2))

API Reference

All tenant-scoped endpoints use the prefix /tenants/{slug}/.... Replace {slug} with your tenant's slug.

Health

# Health check (includes DB connectivity check)
curl http://localhost:8000/health

# Service info
curl http://localhost:8000/

Tenants

# Create a tenant
curl -X POST http://localhost:8000/tenants/ \
  -H 'Content-Type: application/json' \
  -d '{"name": "Springfield Library", "slug": "springfield"}'

# List tenants
curl http://localhost:8000/tenants/

# Get a tenant
curl http://localhost:8000/tenants/springfield

Books

# List books (paginated)
curl http://localhost:8000/tenants/springfield/books/

# Search books (min 2 characters)
curl http://localhost:8000/tenants/springfield/books/search?q=tolkien

# Get book detail with version history
curl http://localhost:8000/tenants/springfield/books/{book_id}

# Get version diff
curl http://localhost:8000/tenants/springfield/books/{book_id}/versions/{v1_id}/diff/{v2_id}

Ingestion

# Start ingestion by author
curl -X POST http://localhost:8000/tenants/springfield/ingestion/ \
  -H 'Content-Type: application/json' \
  -d '{"query": "author:Tolkien"}'

# Start ingestion by subject
curl -X POST http://localhost:8000/tenants/springfield/ingestion/ \
  -H 'Content-Type: application/json' \
  -d '{"query": "subject:fantasy"}'

# Start ingestion by OL keys
curl -X POST http://localhost:8000/tenants/springfield/ingestion/ \
  -H 'Content-Type: application/json' \
  -d '{"open_library_keys": ["/works/OL18295W"]}'

# List ingestion jobs
curl http://localhost:8000/tenants/springfield/ingestion/jobs

# Get a specific job
curl http://localhost:8000/tenants/springfield/ingestion/jobs/{job_id}

# List ingestion logs
curl http://localhost:8000/tenants/springfield/ingestion/logs

# Check re-sync status
curl http://localhost:8000/tenants/springfield/ingestion/resync

Submissions (Reading Lists)

# Submit a reading list with an OL key
curl -X POST http://localhost:8000/tenants/springfield/submissions/ \
  -H 'Content-Type: application/json' \
  -d '{
    "email": "reader@example.com",
    "name": "Jane Doe",
    "books": [{"open_library_key": "OL18295W"}]
  }'

# Submit with an ISBN
curl -X POST http://localhost:8000/tenants/springfield/submissions/ \
  -H 'Content-Type: application/json' \
  -d '{
    "email": "reader@example.com",
    "name": "Jane Doe",
    "books": [{"isbn": "9780618640157"}]
  }'

# Submit with mixed identifiers (1-50 books)
curl -X POST http://localhost:8000/tenants/springfield/submissions/ \
  -H 'Content-Type: application/json' \
  -d '{
    "email": "reader@example.com",
    "name": "Jane Doe",
    "books": [
      {"open_library_key": "OL18295W"},
      {"isbn": "9780618640157"}
    ]
  }'

# List submissions (no plaintext PII in responses)
curl http://localhost:8000/tenants/springfield/submissions/

Rate Limits

# Check rate limit usage (API and ingestion)
curl http://localhost:8000/tenants/springfield/rate-limits/

Response shows both API (60/min) and ingestion (10/hour) limits with window start/end times.

When rate-limited, responses include a Retry-After header:

HTTP/1.1 429 Too Many Requests
Retry-After: 45
{"detail": "Rate limit exceeded"}

Tenant Isolation

Each tenant sees only their own data. A book created under tenant A is invisible to tenant B:

# Create two tenants
curl -X POST http://localhost:8000/tenants/ \
  -H 'Content-Type: application/json' \
  -d '{"name": "Library A", "slug": "lib-a"}'

curl -X POST http://localhost:8000/tenants/ \
  -H 'Content-Type: application/json' \
  -d '{"name": "Library B", "slug": "lib-b"}'

# Library B sees zero books even after Library A ingests
curl http://localhost:8000/tenants/lib-b/books/
# {"items":[],"total":0}

# Accessing Library A's book under Library B returns 404
curl http://localhost:8000/tenants/lib-b/books/{lib-a-book-id}
# {"detail":"Book not found"}

Configuration

Environment variables (or .env file):

Variable Default Description
DATABASE_URL sqlite+aiosqlite:///olcs.db Database connection string
HMAC_SECRET_KEY dev-secret-key-change-in-production Key for email HMAC hashing
API_RATE_LIMIT_REQUESTS 60 API requests per window
API_RATE_LIMIT_WINDOW_SECONDS 60 API rate limit window (seconds)
INGESTION_RATE_LIMIT_REQUESTS 10 Ingestion requests per window
INGESTION_RATE_LIMIT_WINDOW_SECONDS 3600 Ingestion rate limit window (seconds)
RE_SYNC_INTERVAL_HOURS 6 How often to re-sync ingested works
TESTING false Bypass rate limits (for tests)

Testing

# Run all tests (unit + live OL API tests)
uv run pytest tests/ -v

# Run only unit tests (no network calls)
uv run pytest tests/ -v --ignore=tests/test_open_library.py

# Lint
uv run ruff check src/ tests/
uv run ruff format src/ tests/

What I Would Do Differently With More Time

  1. SQLite to PostgreSQL: SQLite doesn't handle concurrent writes well. With multiple uvicorn workers or the huey consumer writing simultaneously, we'd hit database is locked errors under load. The trade-off: SQLite is zero-config for dev/prototyping, but real concurrent traffic needs PostgreSQL's row-level locking and connection pooling.

  2. Sliding window rate limiting: Fixed windows have a known burst vulnerability — a tenant can send 60 requests at second 59 of one window and 60 more at second 1 of the next, effectively getting 120 requests in 2 seconds. A sliding window or token bucket would be fairer. The trade-off: fixed windows are simple and the current table schema works, but a determined client can double their rate at window boundaries.

  3. Async task queue instead of huey+sqlite: The huey consumer runs asyncio.run() with its own SQLite connection, and the scheduled resync does the same. This means no connection pooling, no shared async session, and SQLite locks between the web server and the worker. A proper async task system (e.g., arq with Redis, or Dramatiq with PostgreSQL) would share the same database connection pool and avoid SQLite file lock contention.

  4. Async ISBN resolution in submissions: When a patron submits a reading list with ISBNs, the endpoint blocks on get_edition_by_isbn() calls for each unknown ISBN. With 50 books (our max), that's up to 50 sequential HTTP requests at 0.33s each — ~16 seconds worst case. This should use asyncio.gather() for parallel resolution, or move resolution to a background task entirely.

  5. Batch ingestion: Currently each ingestion job handles a single query. A batch endpoint would let tenants kick off multiple ingestion jobs in one request, reducing round trips.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages