Skip to content

giautm/vision-ocr

Repository files navigation

VisionOCR

A production-grade, LiteParse-compatible OCR microservice written in Swift 6 for macOS. Uses Apple's Vision framework for on-device text recognition with no external API dependencies.

Features

  • 100% LiteParse API compatiblePOST /ocr, GET /health, GET /metrics
  • Vision framework — native macOS OCR, Apple Silicon optimized
  • Actor-based concurrency — bounded worker pool with backpressure (HTTP 429)
  • Correct coordinates — Vision bottom-left origin converted to spec-required top-left pixel coords
  • Structured logging — OSLog with request IDs and duration tracking
  • Graceful shutdown — SIGTERM/SIGINT handled; prints status to stderr before exit
  • Helpful errors — port-in-use and permission errors shown clearly

Requirements

  • macOS 15+ (Sequoia)
  • Swift 6.0+
  • Apple Silicon or Intel Mac

Quick Start

# Build (release)
make release

# Run
./.build/release/vision-ocr
# Server starts on http://0.0.0.0:8000

# Test
curl -X POST http://localhost:8000/ocr \
  -F "file=@image.png" \
  -F "language=en"

Build & Install

make build          # Debug build → .build/debug/vision-ocr
make release        # Release build → .build/release/vision-ocr
make test           # Run test suite
make install        # Install to /usr/local/bin
make install-local  # Install to ~/bin
make run            # Build debug and run

API

POST /ocr

Request: multipart/form-data

Field Type Required Description
file binary Yes Image file (PNG, JPG, TIFF, WebP, BMP, GIF)
language string No ISO 639-1 code, default en

Response 200 OK:

{
  "results": [
    {
      "text": "Hello World",
      "bbox": [718.6, 749.0, 2396.5, 885.4],
      "confidence": 1.0
    }
  ]
}

bbox is [x1, y1, x2, y2] in pixels, top-left origin, x2 > x1, y2 > y1.

Error responses:

Status Trigger
400 Missing file field, invalid language code, unsupported image format
429 Worker queue full (retry with backoff)
500 OCR processing or internal failure
504 OCR timed out (> 60 s)

Error body: {"error": "description"}

GET /health

curl http://localhost:8000/health
{
  "status": "healthy",
  "timestamp": 802300420.0,
  "poolStats": {
    "workerCount": 4,
    "activeWorkers": 0,
    "queueDepth": 0,
    "maxQueueSize": 100,
    "totalProcessed": 42
  }
}

GET /metrics

curl http://localhost:8000/metrics
{
  "timestamp": 802300420.0,
  "metrics": {
    "totalRequests": 42,
    "successRequests": 41,
    "errorRequests": 1,
    "successRate": 0.976,
    "errorRate": 0.024,
    "averageLatency": 0.24,
    "p50Latency": 0.21,
    "p95Latency": 0.44,
    "p99Latency": 0.58,
    "throughput": 42
  },
  "pool": { ... }
}

Configuration

# CLI flags (all optional)
vision-ocr --port 8000          # default: 8000
vision-ocr --workers 4          # default: 4 (per CPU performance core recommended)
vision-ocr --max-queue-size 100 # default: 100

Project Structure

Sources/
├── CLI/           Entry point, argument parsing, signal handling
├── OCRServer/     HTTP server actor, route handlers, multipart parsing
├── HTTP/          RequestValidator, ResponseHandler, ErrorHandler
├── OCR/           VisionOCREngine, OCRPipeline, LanguageMapper
├── Image/         ImageProcessor, ImageValidator, format detection
├── Coordinates/   BoundingBox, VisionCoordinateConverter (Y-flip)
├── Results/       TextOrderingEngine, ConfidenceNormalizer
├── Concurrency/   WorkerPool, AsyncQueue, ResultStore, OCRWorker
├── Metrics/       MetricsCollector (P50/P95/P99)
├── Health/        HealthStatus, MetricsResponse
└── Utilities/     ServerError, ServerConfiguration, logging

Dependencies

Package Source Purpose
hummingbird 2.0+ hummingbird-project HTTP server
multipart-kit 4.0+ vapor Multipart form-data parser
swift-async-algorithms 1.0+ Apple Async utilities
swift-metrics 2.4+ Apple Metrics interface

Deployment

See DEPLOYMENT.md for launchd setup, log management, and production tuning.

Architecture

See ARCHITECTURE.md for design decisions, concurrency model, and component details.

LiteParse Compatibility

See API_COMPATIBILITY.md for the full spec compliance matrix.


Status: Production-ready Platform: macOS 15+ (Apple Silicon & Intel) Updated: June 5, 2026

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors