Skip to content

pallav-m/layout-model-isolation

Repository files navigation

Layout Extraction API

A document layout extraction pipeline served behind a FastAPI REST API. Uses YOLO-based layout detection, table structure recognition (SLANet+ for wireless tables, UNet for wired tables), orientation correction, and reading-order analysis.

Features

  • Layout Detection — YOLO-based model detecting text blocks, titles, tables, figures, formulas, and captions
  • Table Structure Recognition — Extracts cell-level structure from detected tables using SLANet+ (wireless) and UNet (wired) models
  • Table Classification — Classifies tables as wired (bordered) or wireless (borderless) via PaddlePaddle classifier
  • Orientation Correction — EAST text detection + Hough transforms for automatic skew correction
  • Reading Order — Spatial column detection algorithm assigns reading-order indices to layout elements
  • Batch Processing — Tables are classified and recognized in batches; wireless tables via SLANet+ batch inference, wired tables via concurrent ThreadPoolExecutor
  • CUDA Support — All ONNX models and YOLO detection support GPU acceleration when available
  • JWT Authentication — Bearer token auth on extraction endpoints; /health remains open
  • Configurable Pipeline — Every pipeline step (overlap correction, reading order, table extraction, confidence filtering, coordinate normalization) is individually toggleable via request config

Project Structure

├── extraction_pipeline/
│   ├── config.py                    # Device detection, ONNX providers, JWT config
│   ├── utilities.py                 # Serialization utilities
│   ├── layout_extraction/
│   │   ├── combined_layout_engine.py  # YOLO layout detection + overlap correction
│   │   ├── layout_extractor.py        # Main pipeline orchestrator
│   │   └── doclayout_yolo_*.pt        # YOLO model weights
│   ├── orientation_correction/
│   │   ├── orienter.py                # EAST + Hough orientation correction
│   │   └── frozen_east_text_detection.pb
│   └── table_extraction/
│       ├── common.py                  # Model paths
│       └── table/
│           ├── table_recognizer.py    # Table structure recognition wrapper
│           ├── cls/                   # Table classification (wired/wireless)
│           ├── rec/                   # Table recognition models
│           │   ├── slanet_plus/       # SLANet+ (wireless tables)
│           │   └── unet_table/        # UNet (wired tables)
│           └── table_rec_models/      # Pre-trained ONNX weights
├── fastapi_app/
│   ├── main.py                      # FastAPI app, routes, Pydantic schemas
│   ├── auth.py                      # JWT token creation & verification
│   └── image_utils.py               # Base64/URL image resolution
├── tests/                           # 218 tests (unit, integration, E2E)
├── Dockerfile                       # CUDA-enabled production image
├── Pipfile / Pipfile.lock           # Python dependencies
├── start.sh                         # Startup script (loads .env + uvicorn)
├── pytest.ini                       # Pytest configuration
└── .env.example                     # Environment variable template

Prerequisites

  • Python 3.10+
  • pipenv (for dependency management)
  • Pre-trained model weights (included in the repository):
    • doclayout_yolo_docstructbench_imgsz1024.pt — YOLO layout detection
    • frozen_east_text_detection.pb — EAST text detection
    • PP-LCNet_x1_0_table_cls.onnx — Table classification
    • slanet-plus.onnx — SLANet+ wireless table recognition
    • unet.onnx — UNet wired table recognition

Local Setup

1. Clone and install dependencies

git clone <repository-url>
cd layout_isolation

# Install pipenv if you don't have it
pip install pipenv

# Install all dependencies (including dev for testing)
pipenv install --dev

2. Configure environment

# Copy the example env file
cp .env.example .env

# Edit .env as needed (especially JWT_SECRET_KEY for production)

Key environment variables:

Variable Default Description
LAYOUT_DEVICE auto Compute device: auto, cuda, mps, cpu
JWT_SECRET_KEY (must set) Secret key for JWT signing (min 32 chars recommended)
JWT_ALGORITHM HS256 JWT signing algorithm
JWT_EXPIRATION_MINUTES 60 Token expiry time in minutes
API_ADMIN_USERNAME admin Username for token generation
API_ADMIN_PASSWORD admin Password for token generation

3. Start the server

# Option A: Using the startup script (loads .env automatically)
chmod +x start.sh
./start.sh

# Option B: Using pipenv directly
pipenv run uvicorn fastapi_app.main:app --host 0.0.0.0 --port 8000

# Option C: With auto-reload for development
pipenv run uvicorn fastapi_app.main:app --host 0.0.0.0 --port 8000 --reload

The server starts on http://localhost:8000. Model loading takes a few seconds on first startup.

4. Run tests

# Run all 218 tests
pipenv run pytest

# Run with verbose output
pipenv run pytest -v

# Run only unit tests (fast, no model loading)
pipenv run pytest tests/test_config.py tests/test_auth.py tests/test_image_utils.py tests/test_utilities.py

# Run only component tests (mock models)
pipenv run pytest tests/test_layout_engine.py tests/test_layout_extractor.py

# Run API integration tests
pipenv run pytest tests/test_api.py

# Run E2E pipeline tests with real images
pipenv run pytest tests/test_e2e_pipeline.py

Docker Deployment

The Dockerfile uses NVIDIA CUDA 12.4 + cuDNN 9 as the base image with GPU-enabled PyTorch and ONNX Runtime. A .dockerignore prevents secrets (.env) and unnecessary files from being copied into the image.

Build the image

docker build -t layout-extraction-api .

Run the container

Secrets (JWT_SECRET_KEY, API_ADMIN_USERNAME, API_ADMIN_PASSWORD) are not baked into the image. You must pass them at runtime via -e flags or --env-file.

# With NVIDIA GPU support
docker run -d \
  --gpus all \
  -p 8000:8000 \
  -e JWT_SECRET_KEY="your-strong-secret-key-here-min-32-chars" \
  -e API_ADMIN_USERNAME="admin" \
  -e API_ADMIN_PASSWORD="your-secure-password" \
  --name layout-api \
  layout-extraction-api

# CPU-only (override LAYOUT_DEVICE)
docker run -d \
  -p 8000:8000 \
  -e LAYOUT_DEVICE=cpu \
  -e JWT_SECRET_KEY="your-strong-secret-key-here-min-32-chars" \
  -e API_ADMIN_USERNAME="admin" \
  -e API_ADMIN_PASSWORD="your-secure-password" \
  --name layout-api \
  layout-extraction-api

# Using an env file
docker run -d \
  --gpus all \
  -p 8000:8000 \
  --env-file .env \
  --name layout-api \
  layout-extraction-api

Docker environment defaults

The Dockerfile sets only non-secret defaults (override with -e):

LAYOUT_DEVICE=auto        # auto-detects cuda > mps > cpu
JWT_ALGORITHM=HS256
JWT_EXPIRATION_MINUTES=60

Required at runtime (no defaults in image):

JWT_SECRET_KEY=            # must set, min 32 chars recommended
API_ADMIN_USERNAME=        # must set
API_ADMIN_PASSWORD=        # must set

Health check

The container includes a built-in health check (30s interval, 60s start period):

docker inspect --format='{{.State.Health.Status}}' layout-api

API Usage

Interactive docs

Once running, visit:

  • Swagger UI: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc

Endpoints

Method Path Auth Description
GET /health No Health check
POST /token No Generate JWT token
POST /extract-layout/ Bearer JWT Extract layout from images

1. Get a token

curl -X POST http://localhost:8000/token \
  -H "Content-Type: application/json" \
  -d '{"username": "admin", "password": "admin"}'

Response:

{
  "access_token": "eyJhbGciOiJIUzI1NiIs...",
  "token_type": "bearer"
}

2. Extract layout

# With base64 image
curl -X POST http://localhost:8000/extract-layout/ \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your-token>" \
  -d '{
    "images": [
      {"base64": "<base64-encoded-image-data>"}
    ],
    "config": {
      "conf_threshold": 0.3,
      "add_reading_order": true,
      "extract_table_structure": true
    }
  }'

# With image URL
curl -X POST http://localhost:8000/extract-layout/ \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your-token>" \
  -d '{
    "images": [
      {"url": "https://example.com/document-page.png"}
    ]
  }'

3. Configuration options

All config fields are optional (defaults shown):

{
  "images": [{"base64": "..."}, {"url": "..."}],
  "config": {
    "imgsz": 1024,
    "conf_threshold": 0.2,
    "correct_orientation": false,
    "correct_overlaps": true,
    "iou_text": 0.3,
    "iou_other": 0.5,
    "add_reading_order": true,
    "extract_table_structure": true,
    "classify_tables": true,
    "normalize_coordinates": false,
    "filter_low_confidence": false,
    "min_confidence": 0.2
  }
}

Response structure

{
  "result": {
    "layout": {
      "pages": [
        {
          "page_number": 1,
          "width": 2480,
          "height": 3508,
          "metadata": [
            {
              "label": "plain text",
              "bounding_box": [100.5, 200.3, 800.1, 350.7],
              "confidence": 0.95,
              "reading_index": 1
            },
            {
              "label": "table",
              "bounding_box": [100.0, 400.0, 900.0, 700.0],
              "confidence": 0.92,
              "reading_index": 2,
              "table_layout": [
                {
                  "bbox": [100, 400, 500, 450],
                  "row_span": 1,
                  "col_span": 1,
                  "row_start": 0,
                  "row_end": 1,
                  "col_start": 0,
                  "col_end": 1
                }
              ],
              "is_wired_table": true,
              "num_rows": 5,
              "num_cols": 4
            }
          ]
        }
      ]
    },
    "visualization": null,
    "metadata": {
      "num_pages": 1,
      "config": { "..." : "..." },
      "orientation_angles": null
    }
  },
  "num_pages": 1
}

Detected element labels

Label Category
plain text, title, table_caption, figure caption, table_footnote, abandon Text
figure, picture, isolate formula, formula caption Image
table Table

Testing

The test suite contains 218 tests organized across 8 test files:

File Tests Description
test_config.py 19 Device detection, ONNX providers, config constants
test_auth.py 14 JWT token creation, security properties
test_image_utils.py 14 Base64 decoding, image resolution, HTTP client
test_utilities.py 28 Serialization of primitives, numpy, dataclasses, enums
test_layout_engine.py 46 Image normalization, IOU, overlap correction, detection
test_layout_extractor.py 32 Config, reading order, coordinate normalization, filtering
test_api.py 35 Health, token, auth, validation, real image processing
test_e2e_pipeline.py 30 Full pipeline with real images, edge cases, API flow

About

isolated layout module for lekha

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages