How PDF to ESX Works

This is the short version of the pipeline.

Plain-Language Flow

estimate PDF -> text/OCR extraction -> structured parsing -> canonical estimate -> ESX/XML export

Step 1: Input

The app accepts one or more insurance estimate PDFs. Some are text-based and easy to read. Others are scan-heavy, image-based, or mixed packets with guide pages and summaries.

Step 2: PDF Reading and OCR

The pipeline first tries to use normal PDF text extraction.

If a page looks text-poor or scan-heavy, the app can apply local OCR. This improves results on messy real-world documents, but OCR quality still depends on scan quality and layout clarity.

Step 3: Structured Parsing

The parser tries to identify:

carrier and claim metadata
insured/property details
dates
estimator information
totals and subtotals
line items
quantities, units, prices, taxes, depreciation, and related fields

The parser is heuristic. Different carriers and estimate layouts can produce different reliability levels.

Step 4: Canonical Model

The extracted information is normalized into a canonical estimate model before export.

That matters because it keeps the app modular:

PDF-specific quirks stay in ingestion/parsing
export logic stays independent from raw PDF layout logic
future parser improvements do not require rewriting the exporter
multi-PDF merge behavior has a single internal contract to work against

Step 5: ESX/XML Export

The export layer turns the canonical estimate into:

a zip-based .esx package
a readable .esx.xml payload
a .canonical.json snapshot for inspection and debugging

The package is deterministic and validated before success is reported.

What This Does Not Claim

It does not claim universal support for every estimate layout.
It does not claim native proprietary XACTDOC.ZIPXML authoring.
It does not claim perfect OCR recovery on poor scans.

Read The Deeper Version

Main repository | Deep docs in docs_repo | Issues

docs_repo/ in the main repository is the authoritative deep technical source of truth.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How PDF to ESX Works

How PDF to ESX Works

Plain-Language Flow

Step 1: Input

Step 2: PDF Reading and OCR

Step 3: Structured Parsing

Step 4: Canonical Model

Step 5: ESX/XML Export

What This Does Not Claim

Read The Deeper Version

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Overview

Contributors

Clone this wiki locally