Layout Extraction API

A document layout extraction pipeline served behind a FastAPI REST API. Uses YOLO-based layout detection, table structure recognition (SLANet+ for wireless tables, UNet for wired tables), orientation correction, and reading-order analysis.

Features

Layout Detection — YOLO-based model detecting text blocks, titles, tables, figures, formulas, and captions
Table Structure Recognition — Extracts cell-level structure from detected tables using SLANet+ (wireless) and UNet (wired) models
Table Classification — Classifies tables as wired (bordered) or wireless (borderless) via PaddlePaddle classifier
Orientation Correction — EAST text detection + Hough transforms for automatic skew correction
Reading Order — Spatial column detection algorithm assigns reading-order indices to layout elements
Batch Processing — Tables are classified and recognized in batches; wireless tables via SLANet+ batch inference, wired tables via concurrent ThreadPoolExecutor
CUDA Support — All ONNX models and YOLO detection support GPU acceleration when available
JWT Authentication — Bearer token auth on extraction endpoints; /health remains open
Configurable Pipeline — Every pipeline step (overlap correction, reading order, table extraction, confidence filtering, coordinate normalization) is individually toggleable via request config

Project Structure

├── extraction_pipeline/
│   ├── config.py                    # Device detection, ONNX providers, JWT config
│   ├── utilities.py                 # Serialization utilities
│   ├── layout_extraction/
│   │   ├── combined_layout_engine.py  # YOLO layout detection + overlap correction
│   │   ├── layout_extractor.py        # Main pipeline orchestrator
│   │   └── doclayout_yolo_*.pt        # YOLO model weights
│   ├── orientation_correction/
│   │   ├── orienter.py                # EAST + Hough orientation correction
│   │   └── frozen_east_text_detection.pb
│   └── table_extraction/
│       ├── common.py                  # Model paths
│       └── table/
│           ├── table_recognizer.py    # Table structure recognition wrapper
│           ├── cls/                   # Table classification (wired/wireless)
│           ├── rec/                   # Table recognition models
│           │   ├── slanet_plus/       # SLANet+ (wireless tables)
│           │   └── unet_table/        # UNet (wired tables)
│           └── table_rec_models/      # Pre-trained ONNX weights
├── fastapi_app/
│   ├── main.py                      # FastAPI app, routes, Pydantic schemas
│   ├── auth.py                      # JWT token creation & verification
│   └── image_utils.py               # Base64/URL image resolution
├── tests/                           # 218 tests (unit, integration, E2E)
├── Dockerfile                       # CUDA-enabled production image
├── Pipfile / Pipfile.lock           # Python dependencies
├── start.sh                         # Startup script (loads .env + uvicorn)
├── pytest.ini                       # Pytest configuration
└── .env.example                     # Environment variable template

Prerequisites

Python 3.10+
pipenv (for dependency management)
Pre-trained model weights (included in the repository):
- doclayout_yolo_docstructbench_imgsz1024.pt — YOLO layout detection
- frozen_east_text_detection.pb — EAST text detection
- PP-LCNet_x1_0_table_cls.onnx — Table classification
- slanet-plus.onnx — SLANet+ wireless table recognition
- unet.onnx — UNet wired table recognition

Local Setup

1. Clone and install dependencies

git clone <repository-url>
cd layout_isolation

# Install pipenv if you don't have it
pip install pipenv

# Install all dependencies (including dev for testing)
pipenv install --dev

2. Configure environment

# Copy the example env file
cp .env.example .env

# Edit .env as needed (especially JWT_SECRET_KEY for production)

Key environment variables:

Variable	Default	Description
`LAYOUT_DEVICE`	`auto`	Compute device: `auto`, `cuda`, `mps`, `cpu`
`JWT_SECRET_KEY`	(must set)	Secret key for JWT signing (min 32 chars recommended)
`JWT_ALGORITHM`	`HS256`	JWT signing algorithm
`JWT_EXPIRATION_MINUTES`	`60`	Token expiry time in minutes
`API_ADMIN_USERNAME`	`admin`	Username for token generation
`API_ADMIN_PASSWORD`	`admin`	Password for token generation

3. Start the server

# Option A: Using the startup script (loads .env automatically)
chmod +x start.sh
./start.sh

# Option B: Using pipenv directly
pipenv run uvicorn fastapi_app.main:app --host 0.0.0.0 --port 8000

# Option C: With auto-reload for development
pipenv run uvicorn fastapi_app.main:app --host 0.0.0.0 --port 8000 --reload

The server starts on http://localhost:8000. Model loading takes a few seconds on first startup.

4. Run tests

# Run all 218 tests
pipenv run pytest

# Run with verbose output
pipenv run pytest -v

# Run only unit tests (fast, no model loading)
pipenv run pytest tests/test_config.py tests/test_auth.py tests/test_image_utils.py tests/test_utilities.py

# Run only component tests (mock models)
pipenv run pytest tests/test_layout_engine.py tests/test_layout_extractor.py

# Run API integration tests
pipenv run pytest tests/test_api.py

# Run E2E pipeline tests with real images
pipenv run pytest tests/test_e2e_pipeline.py

Docker Deployment

The Dockerfile uses NVIDIA CUDA 12.4 + cuDNN 9 as the base image with GPU-enabled PyTorch and ONNX Runtime. A .dockerignore prevents secrets (.env) and unnecessary files from being copied into the image.

Build the image

docker build -t layout-extraction-api .

Run the container

Secrets (JWT_SECRET_KEY, API_ADMIN_USERNAME, API_ADMIN_PASSWORD) are not baked into the image. You must pass them at runtime via -e flags or --env-file.

# With NVIDIA GPU support
docker run -d \
  --gpus all \
  -p 8000:8000 \
  -e JWT_SECRET_KEY="your-strong-secret-key-here-min-32-chars" \
  -e API_ADMIN_USERNAME="admin" \
  -e API_ADMIN_PASSWORD="your-secure-password" \
  --name layout-api \
  layout-extraction-api

# CPU-only (override LAYOUT_DEVICE)
docker run -d \
  -p 8000:8000 \
  -e LAYOUT_DEVICE=cpu \
  -e JWT_SECRET_KEY="your-strong-secret-key-here-min-32-chars" \
  -e API_ADMIN_USERNAME="admin" \
  -e API_ADMIN_PASSWORD="your-secure-password" \
  --name layout-api \
  layout-extraction-api

# Using an env file
docker run -d \
  --gpus all \
  -p 8000:8000 \
  --env-file .env \
  --name layout-api \
  layout-extraction-api

Docker environment defaults

The Dockerfile sets only non-secret defaults (override with -e):

LAYOUT_DEVICE=auto        # auto-detects cuda > mps > cpu
JWT_ALGORITHM=HS256
JWT_EXPIRATION_MINUTES=60

Required at runtime (no defaults in image):

JWT_SECRET_KEY=            # must set, min 32 chars recommended
API_ADMIN_USERNAME=        # must set
API_ADMIN_PASSWORD=        # must set

Health check

The container includes a built-in health check (30s interval, 60s start period):

docker inspect --format='{{.State.Health.Status}}' layout-api

API Usage

Interactive docs

Once running, visit:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Endpoints

Method	Path	Auth	Description
`GET`	`/health`	No	Health check
`POST`	`/token`	No	Generate JWT token
`POST`	`/extract-layout/`	Bearer JWT	Extract layout from images

1. Get a token

curl -X POST http://localhost:8000/token \
  -H "Content-Type: application/json" \
  -d '{"username": "admin", "password": "admin"}'

Response:

{
  "access_token": "eyJhbGciOiJIUzI1NiIs...",
  "token_type": "bearer"
}

2. Extract layout

# With base64 image
curl -X POST http://localhost:8000/extract-layout/ \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your-token>" \
  -d '{
    "images": [
      {"base64": "<base64-encoded-image-data>"}
    ],
    "config": {
      "conf_threshold": 0.3,
      "add_reading_order": true,
      "extract_table_structure": true
    }
  }'

# With image URL
curl -X POST http://localhost:8000/extract-layout/ \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your-token>" \
  -d '{
    "images": [
      {"url": "https://example.com/document-page.png"}
    ]
  }'

3. Configuration options

All config fields are optional (defaults shown):

{
  "images": [{"base64": "..."}, {"url": "..."}],
  "config": {
    "imgsz": 1024,
    "conf_threshold": 0.2,
    "correct_orientation": false,
    "correct_overlaps": true,
    "iou_text": 0.3,
    "iou_other": 0.5,
    "add_reading_order": true,
    "extract_table_structure": true,
    "classify_tables": true,
    "normalize_coordinates": false,
    "filter_low_confidence": false,
    "min_confidence": 0.2
  }
}

Response structure

{
  "result": {
    "layout": {
      "pages": [
        {
          "page_number": 1,
          "width": 2480,
          "height": 3508,
          "metadata": [
            {
              "label": "plain text",
              "bounding_box": [100.5, 200.3, 800.1, 350.7],
              "confidence": 0.95,
              "reading_index": 1
            },
            {
              "label": "table",
              "bounding_box": [100.0, 400.0, 900.0, 700.0],
              "confidence": 0.92,
              "reading_index": 2,
              "table_layout": [
                {
                  "bbox": [100, 400, 500, 450],
                  "row_span": 1,
                  "col_span": 1,
                  "row_start": 0,
                  "row_end": 1,
                  "col_start": 0,
                  "col_end": 1
                }
              ],
              "is_wired_table": true,
              "num_rows": 5,
              "num_cols": 4
            }
          ]
        }
      ]
    },
    "visualization": null,
    "metadata": {
      "num_pages": 1,
      "config": { "..." : "..." },
      "orientation_angles": null
    }
  },
  "num_pages": 1
}

Detected element labels

Label	Category
`plain text`, `title`, `table_caption`, `figure caption`, `table_footnote`, `abandon`	Text
`figure`, `picture`, `isolate formula`, `formula caption`	Image
`table`	Table

Testing

The test suite contains 218 tests organized across 8 test files:

File	Tests	Description
`test_config.py`	19	Device detection, ONNX providers, config constants
`test_auth.py`	14	JWT token creation, security properties
`test_image_utils.py`	14	Base64 decoding, image resolution, HTTP client
`test_utilities.py`	28	Serialization of primitives, numpy, dataclasses, enums
`test_layout_engine.py`	46	Image normalization, IOU, overlap correction, detection
`test_layout_extractor.py`	32	Config, reading order, coordinate normalization, filtering
`test_api.py`	35	Health, token, auth, validation, real image processing
`test_e2e_pipeline.py`	30	Full pipeline with real images, edge cases, API flow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Layout Extraction API

Features

Project Structure

Prerequisites

Local Setup

1. Clone and install dependencies

2. Configure environment

3. Start the server

4. Run tests

Docker Deployment

Build the image

Run the container

Docker environment defaults

Health check

API Usage

Interactive docs

Endpoints

1. Get a token

2. Extract layout

3. Configuration options

Response structure

Detected element labels

Testing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
extraction_pipeline		extraction_pipeline
fastapi_app		fastapi_app
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
__init__.py		__init__.py
pytest.ini		pytest.ini
start.sh		start.sh

Folders and files

Latest commit

History

Repository files navigation

Layout Extraction API

Features

Project Structure

Prerequisites

Local Setup

1. Clone and install dependencies

2. Configure environment

3. Start the server

4. Run tests

Docker Deployment

Build the image

Run the container

Docker environment defaults

Health check

API Usage

Interactive docs

Endpoints

1. Get a token

2. Extract layout

3. Configuration options

Response structure

Detected element labels

Testing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages