A document layout extraction pipeline served behind a FastAPI REST API. Uses YOLO-based layout detection, table structure recognition (SLANet+ for wireless tables, UNet for wired tables), orientation correction, and reading-order analysis.
- Layout Detection — YOLO-based model detecting text blocks, titles, tables, figures, formulas, and captions
- Table Structure Recognition — Extracts cell-level structure from detected tables using SLANet+ (wireless) and UNet (wired) models
- Table Classification — Classifies tables as wired (bordered) or wireless (borderless) via PaddlePaddle classifier
- Orientation Correction — EAST text detection + Hough transforms for automatic skew correction
- Reading Order — Spatial column detection algorithm assigns reading-order indices to layout elements
- Batch Processing — Tables are classified and recognized in batches; wireless tables via SLANet+ batch inference, wired tables via concurrent ThreadPoolExecutor
- CUDA Support — All ONNX models and YOLO detection support GPU acceleration when available
- JWT Authentication — Bearer token auth on extraction endpoints;
/healthremains open - Configurable Pipeline — Every pipeline step (overlap correction, reading order, table extraction, confidence filtering, coordinate normalization) is individually toggleable via request config
├── extraction_pipeline/
│ ├── config.py # Device detection, ONNX providers, JWT config
│ ├── utilities.py # Serialization utilities
│ ├── layout_extraction/
│ │ ├── combined_layout_engine.py # YOLO layout detection + overlap correction
│ │ ├── layout_extractor.py # Main pipeline orchestrator
│ │ └── doclayout_yolo_*.pt # YOLO model weights
│ ├── orientation_correction/
│ │ ├── orienter.py # EAST + Hough orientation correction
│ │ └── frozen_east_text_detection.pb
│ └── table_extraction/
│ ├── common.py # Model paths
│ └── table/
│ ├── table_recognizer.py # Table structure recognition wrapper
│ ├── cls/ # Table classification (wired/wireless)
│ ├── rec/ # Table recognition models
│ │ ├── slanet_plus/ # SLANet+ (wireless tables)
│ │ └── unet_table/ # UNet (wired tables)
│ └── table_rec_models/ # Pre-trained ONNX weights
├── fastapi_app/
│ ├── main.py # FastAPI app, routes, Pydantic schemas
│ ├── auth.py # JWT token creation & verification
│ └── image_utils.py # Base64/URL image resolution
├── tests/ # 218 tests (unit, integration, E2E)
├── Dockerfile # CUDA-enabled production image
├── Pipfile / Pipfile.lock # Python dependencies
├── start.sh # Startup script (loads .env + uvicorn)
├── pytest.ini # Pytest configuration
└── .env.example # Environment variable template
- Python 3.10+
- pipenv (for dependency management)
- Pre-trained model weights (included in the repository):
doclayout_yolo_docstructbench_imgsz1024.pt— YOLO layout detectionfrozen_east_text_detection.pb— EAST text detectionPP-LCNet_x1_0_table_cls.onnx— Table classificationslanet-plus.onnx— SLANet+ wireless table recognitionunet.onnx— UNet wired table recognition
git clone <repository-url>
cd layout_isolation
# Install pipenv if you don't have it
pip install pipenv
# Install all dependencies (including dev for testing)
pipenv install --dev# Copy the example env file
cp .env.example .env
# Edit .env as needed (especially JWT_SECRET_KEY for production)Key environment variables:
| Variable | Default | Description |
|---|---|---|
LAYOUT_DEVICE |
auto |
Compute device: auto, cuda, mps, cpu |
JWT_SECRET_KEY |
(must set) | Secret key for JWT signing (min 32 chars recommended) |
JWT_ALGORITHM |
HS256 |
JWT signing algorithm |
JWT_EXPIRATION_MINUTES |
60 |
Token expiry time in minutes |
API_ADMIN_USERNAME |
admin |
Username for token generation |
API_ADMIN_PASSWORD |
admin |
Password for token generation |
# Option A: Using the startup script (loads .env automatically)
chmod +x start.sh
./start.sh
# Option B: Using pipenv directly
pipenv run uvicorn fastapi_app.main:app --host 0.0.0.0 --port 8000
# Option C: With auto-reload for development
pipenv run uvicorn fastapi_app.main:app --host 0.0.0.0 --port 8000 --reloadThe server starts on http://localhost:8000. Model loading takes a few seconds on first startup.
# Run all 218 tests
pipenv run pytest
# Run with verbose output
pipenv run pytest -v
# Run only unit tests (fast, no model loading)
pipenv run pytest tests/test_config.py tests/test_auth.py tests/test_image_utils.py tests/test_utilities.py
# Run only component tests (mock models)
pipenv run pytest tests/test_layout_engine.py tests/test_layout_extractor.py
# Run API integration tests
pipenv run pytest tests/test_api.py
# Run E2E pipeline tests with real images
pipenv run pytest tests/test_e2e_pipeline.pyThe Dockerfile uses NVIDIA CUDA 12.4 + cuDNN 9 as the base image with GPU-enabled PyTorch and ONNX Runtime. A .dockerignore prevents secrets (.env) and unnecessary files from being copied into the image.
docker build -t layout-extraction-api .Secrets (JWT_SECRET_KEY, API_ADMIN_USERNAME, API_ADMIN_PASSWORD) are not baked into the image. You must pass them at runtime via -e flags or --env-file.
# With NVIDIA GPU support
docker run -d \
--gpus all \
-p 8000:8000 \
-e JWT_SECRET_KEY="your-strong-secret-key-here-min-32-chars" \
-e API_ADMIN_USERNAME="admin" \
-e API_ADMIN_PASSWORD="your-secure-password" \
--name layout-api \
layout-extraction-api
# CPU-only (override LAYOUT_DEVICE)
docker run -d \
-p 8000:8000 \
-e LAYOUT_DEVICE=cpu \
-e JWT_SECRET_KEY="your-strong-secret-key-here-min-32-chars" \
-e API_ADMIN_USERNAME="admin" \
-e API_ADMIN_PASSWORD="your-secure-password" \
--name layout-api \
layout-extraction-api
# Using an env file
docker run -d \
--gpus all \
-p 8000:8000 \
--env-file .env \
--name layout-api \
layout-extraction-apiThe Dockerfile sets only non-secret defaults (override with -e):
LAYOUT_DEVICE=auto # auto-detects cuda > mps > cpu
JWT_ALGORITHM=HS256
JWT_EXPIRATION_MINUTES=60
Required at runtime (no defaults in image):
JWT_SECRET_KEY= # must set, min 32 chars recommended
API_ADMIN_USERNAME= # must set
API_ADMIN_PASSWORD= # must set
The container includes a built-in health check (30s interval, 60s start period):
docker inspect --format='{{.State.Health.Status}}' layout-apiOnce running, visit:
- Swagger UI:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc
| Method | Path | Auth | Description |
|---|---|---|---|
GET |
/health |
No | Health check |
POST |
/token |
No | Generate JWT token |
POST |
/extract-layout/ |
Bearer JWT | Extract layout from images |
curl -X POST http://localhost:8000/token \
-H "Content-Type: application/json" \
-d '{"username": "admin", "password": "admin"}'Response:
{
"access_token": "eyJhbGciOiJIUzI1NiIs...",
"token_type": "bearer"
}# With base64 image
curl -X POST http://localhost:8000/extract-layout/ \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your-token>" \
-d '{
"images": [
{"base64": "<base64-encoded-image-data>"}
],
"config": {
"conf_threshold": 0.3,
"add_reading_order": true,
"extract_table_structure": true
}
}'
# With image URL
curl -X POST http://localhost:8000/extract-layout/ \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your-token>" \
-d '{
"images": [
{"url": "https://example.com/document-page.png"}
]
}'All config fields are optional (defaults shown):
{
"images": [{"base64": "..."}, {"url": "..."}],
"config": {
"imgsz": 1024,
"conf_threshold": 0.2,
"correct_orientation": false,
"correct_overlaps": true,
"iou_text": 0.3,
"iou_other": 0.5,
"add_reading_order": true,
"extract_table_structure": true,
"classify_tables": true,
"normalize_coordinates": false,
"filter_low_confidence": false,
"min_confidence": 0.2
}
}{
"result": {
"layout": {
"pages": [
{
"page_number": 1,
"width": 2480,
"height": 3508,
"metadata": [
{
"label": "plain text",
"bounding_box": [100.5, 200.3, 800.1, 350.7],
"confidence": 0.95,
"reading_index": 1
},
{
"label": "table",
"bounding_box": [100.0, 400.0, 900.0, 700.0],
"confidence": 0.92,
"reading_index": 2,
"table_layout": [
{
"bbox": [100, 400, 500, 450],
"row_span": 1,
"col_span": 1,
"row_start": 0,
"row_end": 1,
"col_start": 0,
"col_end": 1
}
],
"is_wired_table": true,
"num_rows": 5,
"num_cols": 4
}
]
}
]
},
"visualization": null,
"metadata": {
"num_pages": 1,
"config": { "..." : "..." },
"orientation_angles": null
}
},
"num_pages": 1
}| Label | Category |
|---|---|
plain text, title, table_caption, figure caption, table_footnote, abandon |
Text |
figure, picture, isolate formula, formula caption |
Image |
table |
Table |
The test suite contains 218 tests organized across 8 test files:
| File | Tests | Description |
|---|---|---|
test_config.py |
19 | Device detection, ONNX providers, config constants |
test_auth.py |
14 | JWT token creation, security properties |
test_image_utils.py |
14 | Base64 decoding, image resolution, HTTP client |
test_utilities.py |
28 | Serialization of primitives, numpy, dataclasses, enums |
test_layout_engine.py |
46 | Image normalization, IOU, overlap correction, detection |
test_layout_extractor.py |
32 | Config, reading order, coordinate normalization, filtering |
test_api.py |
35 | Health, token, auth, validation, real image processing |
test_e2e_pipeline.py |
30 | Full pipeline with real images, edge cases, API flow |