Skip to content

How PDF to ESX Works

swayerloren edited this page Apr 13, 2026 · 1 revision

How PDF to ESX Works

This is the short version of the pipeline.

Plain-Language Flow

estimate PDF -> text/OCR extraction -> structured parsing -> canonical estimate -> ESX/XML export

Step 1: Input

The app accepts one or more insurance estimate PDFs. Some are text-based and easy to read. Others are scan-heavy, image-based, or mixed packets with guide pages and summaries.

Step 2: PDF Reading and OCR

The pipeline first tries to use normal PDF text extraction.

If a page looks text-poor or scan-heavy, the app can apply local OCR. This improves results on messy real-world documents, but OCR quality still depends on scan quality and layout clarity.

Step 3: Structured Parsing

The parser tries to identify:

  • carrier and claim metadata
  • insured/property details
  • dates
  • estimator information
  • totals and subtotals
  • line items
  • quantities, units, prices, taxes, depreciation, and related fields

The parser is heuristic. Different carriers and estimate layouts can produce different reliability levels.

Step 4: Canonical Model

The extracted information is normalized into a canonical estimate model before export.

That matters because it keeps the app modular:

  • PDF-specific quirks stay in ingestion/parsing
  • export logic stays independent from raw PDF layout logic
  • future parser improvements do not require rewriting the exporter
  • multi-PDF merge behavior has a single internal contract to work against

Step 5: ESX/XML Export

The export layer turns the canonical estimate into:

  • a zip-based .esx package
  • a readable .esx.xml payload
  • a .canonical.json snapshot for inspection and debugging

The package is deterministic and validated before success is reported.

What This Does Not Claim

  • It does not claim universal support for every estimate layout.
  • It does not claim native proprietary XACTDOC.ZIPXML authoring.
  • It does not claim perfect OCR recovery on poor scans.

Read The Deeper Version

Clone this wiki locally