# IDP Router Quickstart

This notebook demonstrates how to install, configure, and run the `idp_router` package now that it can be packaged as an independent library.

## Installation

Install the library from a source checkout. Optional extras pull in heavier dependencies only when needed.

```bash
pip install .            # base heuristics
pip install .[pymupdf]   # enable PyMuPDF PDF/email analysis
pip install .[huggingface]  # enable OCR + LayoutLM models
```

In [None]:
from idp_router import (
    DocumentRouter,
    HeuristicLayoutAnalyser,
    ModelBackedLayoutAnalyser,
    PyMuPDFLayoutAnalyser,
    RequestsLayoutModelClient,
    HuggingFaceLayoutModelClient,
    RouterConfig,
    RoutingMode,
    OverrideSet,
    PatternOverride,
    StrategyConfig,
    DocumentCategory,
)
import re
from pprint import pprint


In [None]:
sample_body = {
    "documentMetadata": {
        "layout": {
            "pages": [
                {
                    "textDensity": 0.8,
                    "imageDensity": 0.05,
                    "tableDensity": 0.2,
                    "tableCount": 1,
                }
            ]
        }
    }
}

base_config = RouterConfig(
    default_strategy_map={
        DocumentCategory.SHORT_FORM.value: {"name": "azure_form_recognizer"},
        DocumentCategory.LONG_FORM.value: {"name": "textract_async"},
        DocumentCategory.TABLE_HEAVY.value: {"name": "table_extractor"},
        DocumentCategory.FORM_HEAVY.value: {"name": "form_specialist"},
    },
    fallback_strategy={"name": "generic_ocr"},
)


## 1. Default hybrid routing

The heuristic analyser interprets metadata embedded in the request payload to categorise the document.

In [None]:
router = DocumentRouter(base_config, HeuristicLayoutAnalyser())
analysis = router.route(sample_body, "invoices/acme-001.pdf", OverrideSet())
pprint(analysis.to_metadata_record({"object_key": analysis.object_key}))


## 2. Static mode

Force a single downstream strategy regardless of document content.

In [None]:
static_config = RouterConfig(
    mode=RoutingMode.STATIC,
    static_strategy={"name": "force_textract", "model": "textract-v1"},
    default_strategy_map={},
)
static_router = DocumentRouter(static_config, HeuristicLayoutAnalyser())
static_analysis = static_router.route(sample_body, "contracts/nda.pdf", OverrideSet())
print(static_analysis.strategy)


## 3. Pattern overrides

Apply regex-based overrides before automatic routing.

In [None]:
overrides = OverrideSet(
    pattern_overrides=[
        PatternOverride(
            pattern=re.compile(r"bank_statements/.*\\.pdf$"),
            strategy=StrategyConfig(name="bank_statement_parser"),
        )
    ]
)
override_router = DocumentRouter(base_config, HeuristicLayoutAnalyser())
override_analysis = override_router.route(sample_body, "bank_statements/jan.pdf", overrides)
print(override_analysis.strategy, override_analysis.overrides_applied)


## 4. Remote and ML-backed analysers

Switch the layout analyser to a remote HTTP service or to the optional Hugging Face client.

In [None]:
remote_client = RequestsLayoutModelClient(
    endpoint="https://layout-service.internal/route",
    api_key="token-123",
    model_type="layoutlm_v3",
)
model_router = DocumentRouter(base_config, ModelBackedLayoutAnalyser(remote_client))
# model_router.route(sample_body, "invoices/acme-001.pdf", OverrideSet())  # requires reachable endpoint

# Hugging Face example (requires optional dependencies and document bytes)
# pdf_bytes = open("/path/to/document.pdf", "rb").read()
# hf_client = HuggingFaceLayoutModelClient()
# hf_router = DocumentRouter(base_config, ModelBackedLayoutAnalyser(hf_client, fallback=PyMuPDFLayoutAnalyser()))
# hf_router.route(sample_body | {"documentBytes": pdf_bytes}, "forms/form.pdf", OverrideSet())


## 5. Inspecting `DocumentAnalysis`

The `DocumentAnalysis` object can be serialised to metadata for downstream storage or debugging.

In [None]:
analysis_record = analysis.to_metadata_record({"object_key": analysis.object_key})
pprint(analysis_record)
