# 💡 Fast, Accurate Parsing of Utility Bills with LandingAI

This notebook demonstrates how to use the `agentic_doc` Python package to extract structured information from utility bills using LandingAI's Agentic Document Extraction (ADE) service. It uses electric bills collected from major utility providers across the United States.

We'll walk through:
- Parsing documents with Agentic Document Extraction.
- Defining a custom schema for use with utility bills using `pydantic` or `JSON`.
- Viewing structured field extractions and metadata.
- Not covered:
    - Connecting to upstream document sources.
    - Inserting parse() and extract() results into structured tables.
    - Optimizing pipeline throughput.

> 📎 Supported formats: `.pdf`, `.png`, `.jpg`, `.jpeg`. (More coming soon)

In [1]:
# ---
# Title: Fast, Accurate Parsing of Utility Bills with LandingAI
# Author: Andrea Kropp
# Description: How to apply a custom extraction schema to pull fields out of photos and PDFs of utility bills.
# Target Audience: Developers, Product Managers
# Content Type: How-To
# Publish Date: 2025-09-22
# ADE Version: v0.3.3
# Change Log:
#    - v1.0: Initial draft
# ---

### ✨ Install LandingAI's Agentic Document Extraction

```bash
!pip install agentic-doc
```

### 🗝️ Obtain and Set an API Key

Obtain your API Key from the Visual Playground at https://va.landing.ai/settings/api-key

Read about options for setting your API at https://docs.landing.ai/ade/agentic-api-key


## 📦 Setup and Imports

In [2]:
# Standard libraries
import os
import json
from datetime import date
from pathlib import Path

# Agentic Document Extraction from LandingAI
from agentic_doc.parse import parse

[2m2025-09-22 13:10:35[0m [info   [0m] [1mSettings loaded: {
  "endpoint_host": "https://api.va.landing.ai",
  "vision_agent_api_key": "OTBiN[REDACTED]",
  "batch_size": 25,
  "max_workers": 4,
  "max_retries": 80,
  "max_retry_wait_time": 30,
  "retry_logging_style": "log_msg",
  "pdf_to_image_dpi": 96,
  "split_size": 10,
  "extraction_split_size": 50
}[0m [[0m[1m[34magentic_doc.config[0m][0m (config.py:172)


In [3]:
import agentic_doc, importlib.metadata as m

print("version =", m.version("agentic-doc"))

version = 0.3.3


## 📁 Define Input and Output Directories

Specify where your documents are located and where results will be saved.


In [4]:
# Define input and output directory paths
base_dir = Path(os.getcwd())
input_folder = base_dir / "input_folder"
results_folder = base_dir / "results_folder"
groundings_folder = base_dir / "groundings_folder"

# Create output folders if they don't exist
results_folder.mkdir(parents=True, exist_ok=True)
groundings_folder.mkdir(parents=True, exist_ok=True)

## 🗂️ Collect Document File Paths

This block filters input files for supported formats.

In [5]:
# Collect all document file paths in input folder with supported extensions
# Convert each Path object to a string to ensure compatibility with parse()

file_paths = [
    str(p)
    for p in input_folder.iterdir()
    if p.suffix.lower() in [".pdf", ".png", ".jpg", ".jpeg"]
]
file_paths

['/Users/andreakropp/Documents/Demos/ADE Demos/Electric_Bills/input_folder/electric_C.jpg',
 '/Users/andreakropp/Documents/Demos/ADE Demos/Electric_Bills/input_folder/electric_B.jpg',
 '/Users/andreakropp/Documents/Demos/ADE Demos/Electric_Bills/input_folder/electric_A.jpg',
 '/Users/andreakropp/Documents/Demos/ADE Demos/Electric_Bills/input_folder/electric2.pdf',
 '/Users/andreakropp/Documents/Demos/ADE Demos/Electric_Bills/input_folder/electric3.pdf',
 '/Users/andreakropp/Documents/Demos/ADE Demos/Electric_Bills/input_folder/electric1.pdf',
 '/Users/andreakropp/Documents/Demos/ADE Demos/Electric_Bills/input_folder/electric4.pdf',
 '/Users/andreakropp/Documents/Demos/ADE Demos/Electric_Bills/input_folder/electric5.pdf',
 '/Users/andreakropp/Documents/Demos/ADE Demos/Electric_Bills/input_folder/electric6.pdf']

### Thumbnails for the Electric Bills in the Demo

Notice that there are 3 photos and 6 PDFs. The extraction process shown here works for both without any modifications.

<img src="images/electric-bills-to-parse-PDF-and-image.png" width="80%" alt="Electric bill image preview">

## 📑 Define the Utility Bill Schema for Field Extraction

Using `JSON`, we define a schema to extract specific fields (e.g., customer name, service address, energy consumption) from the electric bills.

See https://docs.landing.ai/ade/ade-extract-library

In [7]:
## Custom JSON schema. Use the Visual Playground at https://va.landing.ai/demo/doc-extraction to help develop and preview your schema. 

# Define the extraction schema as a dictionary

extraction_schema = {
  "type": "object",
  "title": "Utility Bill Extraction Schema",
  "description": "Schema for extracting key fields from diverse utility bills.",
  "required": [
    "provider_info",
    "account_info",
    "billing_summary",
    "gas_charges",
    "electric_charges"
  ],
  "properties": {
    "provider_info": {
      "type": "object",
      "title": "Provider Information",
      "required": [
        "provider",
        "phone_number",
        "website",
        "usage_bar_chart"
      ],
      "properties": {
        "provider": {
          "type": "string",
          "title": "Utility Name",
          "description": "The name of the utility providing the service and issuing the bill."
        },
        "phone_number": {
          "type": "string",
          "title": "Customer Service Phone Number",
          "description": "The customer service phone number for the utility formatted XXX-XXX-XXXX."
        },
        "website": {
          "type": "string",
          "title": "Website",
          "description": "Official website for account management and payments."
        },
        "usage_bar_chart": {
          "type": "boolean",
          "description": "Does the utility bill include a chart depicting usage trends over time?"
        }
      },
      "description": "Key energy provider details."
    },
    "account_info": {
      "type": "object",
      "title": "Account Information",
      "required": [
        "account_holder",
        "account_number",
        "service_address"
      ],
      "properties": {
        "account_holder": {
          "type": "string",
          "title": "Account Holder Name",
          "description": "Full name of the account holder. This may be a person or organization."
        },
        "account_number": {
          "type": "string",
          "title": "Account Number",
          "description": "Unique identifier for the customer account."
        },
        "service_address": {
          "type": "string",
          "title": "Service Address",
          "description": "The address for the property where utility service is provided. This may or may not be the same as the billing address. Remove new line characters and replace with a space."
        },
        "service_address_primary": {
            "type": "string",
            "description": "The primary street address for the property where utility service is provided including building or house number, predirectionals, street name, postdirectionals, and street suffix. This may or may not be the same as the billing address. Remove new line characters and replace with a space."
        },
        "service_address_city": {
            "type": "string",
            "description": "The city for the address where utility service is provided."
        },
        "service_address_state": {
            "type": "string",
            "description": "The 2 letter state abbreviation for the address where utility service is provided."
        },
        "service_address_zip": {
            "type": "string",
            "description": "The 5-digit Zip Code for the address where utility service is provided."
        }
      },
      "description": "Key account and customer identifiers."
    },
    "billing_summary": {
      "type": "object",
      "title": "Billing Summary",
      "required": [
        "due_date",
        "bill_date",
        "service_start_date",
        "service_end_date",
        "total_amount_due"
      ],
      "properties": {
        "due_date": {
          "type": "string",
          "title": "Due Date",
          "description": "Date by which payment is due, in YYYY-MM-DD format."
        },
        "bill_date": {
          "type": "string",
          "title": "Bill Date",
          "description": "Date the bill was issued, in YYYY-MM-DD format."
        },
        "service_start_date": {
          "type": "string",
          "description": "The starting date for the period of service covered by the utility bill.",
          "format": "MM-DD-YYYY"
        },
        "service_end_date": {
          "type": "string",
          "description": "The ending date for the period of service covered by the utility bill.",
          "format": "MM-DD-YYYY"
        },
        "total_amount_due": {
          "type": "string",
          "title": "Total Amount Due",
          "description": "Total amount due for the current bill, including currency symbol."
        }
      },
      "description": "Summary of charges and due dates."
    },
    "electric_charges": {
      "type": "object",
      "title": "Electric Charges",
      "required": [
        "meter_number",
        "usage_kwh",
        "total_electric_charges"
      ],
      "properties": {
         "meter_number": {
          "type": "string",
          "title": "Electric Meter Number",
          "description": "Identifier for the electric meter. Blank if this bill does not inlcude electric service."
        },
        "usage_kwh": {
          "type": "string",
          "title": "Electric Usage (kWh)",
          "description": "Total electric used in kilowatt-hours for the billing period. Zero if this bill does not inlcude electric service."
        },
        "total_electric_charges": {
          "type": "string",
          "title": "Total Electric Charges",
          "description": "Total electric charges for the billing period, including currency symbol. Zero if this bill does not inlcude electric service."
        }
      },
      "description": "Details of electric usage and charges for the billing period."
    },
      "gas_charges": {
      "type": "object",
      "title": "Gas Charges",
      "required": [
        "meter_number",
        "usage_therms",
        "total_gas_charges"
      ],
      "properties": {
        "meter_number": {
          "type": "string",
          "title": "Gas Meter Number",
          "description": "Identifier for the gas meter. Blank if this bill does not inlcude gas service."
        },
        "usage_therms": {
          "type": "string",
          "title": "Gas Usage (Therms)",
          "description": "Total gas used in therms for the billing period. Zero if this bill does not inlcude gas service."
        },
        "total_gas_charges": {
          "type": "string",
          "title": "Total Gas Charges",
          "description": "Total gas charges for the billing period, including currency symbol. Zero if this bill does not inlcude gas service."
        }
      },
      "description": "Details of gas usage and charges for the billing period."
    }
  }
}


## 🚀 Run Agentic Document Extraction with Field Extraction

Call the `parse()` function from `agentic_doc` and provide the `extraction_schema` to extract structured utility bill data.

See https://docs.landing.ai/ade/ade-extract-library for details

In [None]:
# Parse documents using LandingAI ADE
# Apply the custom extraction schema and save all outputs in the designated folders

parse(documents=file_paths,
      extraction_schema=extraction_schema,
      result_save_dir=str(results_folder),
      grounding_save_dir=str(groundings_folder),
      )

## View the JSON Returned

The JSON object contains the following top-level keys:

- `markdown`: A string containing a Markdown summary of extracted data.

- `chunks`: A list of parsed chunks (individual text/image regions from the document).

- `extraction`: A nested dictionary of the extracted field values based on your custom schema.

- `extraction_metadata`: A nested dictionary containing additional metadata for each extracted field (including `chunk_references` and `confidence`).

- `metadata`: A nested dictionary containing document level information including the filename, number of pages, and time required to process.

Learn more at https://docs.landing.ai/ade/ade-json-response


<img src="images/JSON-top-level-structure.png" width="40%" alt="JSON top level structure screenshot">

## 🔄 Use JSON Files for Downstream Processing

Each utility bill now has a corresponding JSON file in `results_folder`. You can use these outputs flexibly in your downstream processes. 

For example, you can create a summary of extracted fields and their corresponding visal grounding across all the utility bills processed.

In [9]:
import os
import json
import pandas as pd

# Helpers to safely extract from nested fields
def get_nested_val(d, *keys):
    for k in keys:
        if d is None or k not in d:
            return ""
        d = d[k]
    return d

def get_nested_ref(d, *keys):
    for k in keys:
        if d is None or k not in d:
            return []
        d = d[k]
    return d.get("chunk_references", [])

In [10]:
records = []

# Loop over all JSON results
for filename in os.listdir(results_folder):
    if filename.endswith(".json"):
        path = os.path.join(results_folder, filename)
        with open(path, "r") as f:
            doc = json.load(f)
            fields = doc.get("extraction", {})
            meta = doc.get("extraction_metadata", {})

            utility_dict = {
                "document_name": doc["metadata"]["filename"],
                "page_count": doc["metadata"]["page_count"],
                "doc_type": doc["doc_type"],
                "timestamp": doc["metadata"]["processed_at"],

                # Provider Info
                "provider": get_nested_val(fields, "provider_info", "provider"),
                "phone_number": get_nested_val(fields, "provider_info", "phone_number"),
                "website": get_nested_val(fields, "provider_info", "website"),
                "usage_bar_chart": get_nested_val(fields, "provider_info", "usage_bar_chart"),

                # Account Info
                "account_holder": get_nested_val(fields, "account_info", "account_holder"),
                "account_number": get_nested_val(fields, "account_info", "account_number"),
                "service_address": get_nested_val(fields, "account_info", "service_address"),
                "address": get_nested_val(fields, "account_info", "service_address_primary"),
                "city": get_nested_val(fields, "account_info", "service_address_city"),
                "state": get_nested_val(fields, "account_info", "service_address_state"),
                "zip": get_nested_val(fields, "account_info", "service_address_zip"),

                # Billing Summary
                "due_date": get_nested_val(fields, "billing_summary", "due_date"),
                "bill_date": get_nested_val(fields, "billing_summary", "bill_date"),
                "start_date": get_nested_val(fields, "billing_summary", "service_start_date"),
                "end_date": get_nested_val(fields, "billing_summary", "service_end_date"),
                "total_amount_due": get_nested_val(fields, "billing_summary", "total_amount_due"),

                # Electric Charges
                "meter_number_elec": get_nested_val(fields, "electric_charges", "meter_number"),
                "usage_kwh": get_nested_val(fields, "electric_charges", "usage_kwh"),
                "total_electric_charges": get_nested_val(fields, "electric_charges", "total_electric_charges"),

                # Gas Charges
                "meter_number_gas": get_nested_val(fields, "gas_charges", "meter_number"),
                "usage_therms": get_nested_val(fields, "gas_charges", "usage_therms"),
                "total_gas_charges": get_nested_val(fields, "gas_charges", "total_gas_charges"),

                # Refs — Provider Info
                "provider_ref": get_nested_ref(meta, "provider_info", "provider"),
                "phone_number_ref": get_nested_ref(meta, "provider_info", "phone_number"),
                "website_ref": get_nested_ref(meta, "provider_info", "website"),
                "usage_bar_chart_ref": get_nested_ref(meta, "provider_info", "usage_bar_chart"),

                # Refs — Account Info
                "account_holder_ref": get_nested_ref(meta, "account_info", "account_holder"),
                "account_number_ref": get_nested_ref(meta, "account_info", "account_number"),
                "service_address_ref": get_nested_ref(meta, "account_info", "service_address"),
                "address_ref": get_nested_ref(meta, "account_info", "service_address_primary"),
                "city_ref": get_nested_ref(meta, "account_info", "service_address_city"),
                "state_ref": get_nested_ref(meta, "account_info", "service_address_state"),
                "zip_ref": get_nested_ref(meta, "account_info", "service_address_zip"),

                # Refs — Billing Summary
                "due_date_ref": get_nested_ref(meta, "billing_summary", "due_date"),
                "bill_date_ref": get_nested_ref(meta, "billing_summary", "bill_date"),
                "start_date_ref": get_nested_ref(meta, "billing_summary", "service_start_date"),
                "end_date_ref": get_nested_ref(meta, "billing_summary", "service_end_date"),
                "total_amount_due_ref": get_nested_ref(meta, "billing_summary", "total_amount_due"),

                # Refs — Electric Charges
                "meter_number_elec_ref": get_nested_ref(meta, "electric_charges", "meter_number"),
                "usage_kwh_ref": get_nested_ref(meta, "electric_charges", "usage_kwh"),
                "total_electric_charges_ref": get_nested_ref(meta, "electric_charges", "total_electric_charges"),

                # Refs — Gas Charges
                "meter_number_gas_ref": get_nested_ref(meta, "gas_charges", "meter_number"),
                "usage_therms_ref": get_nested_ref(meta, "gas_charges", "usage_therms"),
                "total_gas_charges_ref": get_nested_ref(meta, "gas_charges", "total_gas_charges"),
            }

            records.append(utility_dict)

# Convert to DataFrame
df = pd.DataFrame(records)
df

Unnamed: 0,document_name,page_count,doc_type,timestamp,provider,phone_number,website,usage_bar_chart,account_holder,account_number,...,bill_date_ref,start_date_ref,end_date_ref,total_amount_due_ref,meter_number_elec_ref,usage_kwh_ref,total_electric_charges_ref,meter_number_gas_ref,usage_therms_ref,total_gas_charges_ref
0,electric_A.jpg,1,image,2025-09-22T13:13:12.278431+00:00,Mid-Carolina Electric Cooperative,803-749-6400,www.mcecoop.com,False,CARL P TERRY,7700000024,...,[0e8284fe-8d5d-4894-8527-2cc34b58a011],[0e8284fe-8d5d-4894-8527-2cc34b58a011],[0e8284fe-8d5d-4894-8527-2cc34b58a011],[44494a5d-6d2b-4b81-9321-21e7bd20090f],[6d95459c-e47d-49f6-bdda-4de01aad263f],[6d95459c-e47d-49f6-bdda-4de01aad263f],[0e8284fe-8d5d-4894-8527-2cc34b58a011],[],[],[]
1,electric4_1.pdf,3,pdf,2025-09-22T13:13:06.664243+00:00,Duke Energy,800-700-8744,duke-energy.com,True,KAREN G PEREZ,9100 7883 2561,...,[7019e93c-ed4b-414f-8639-89e7606f3789],[7019e93c-ed4b-414f-8639-89e7606f3789],[7019e93c-ed4b-414f-8639-89e7606f3789],"[51ac1d9a-90f3-4d64-a2fe-be0c15c88ea6, 961edc0...","[e68c4780-6402-4185-a9f7-e8091b5401ee, 27be533...","[e68c4780-6402-4185-a9f7-e8091b5401ee, e90f66a...","[27be533b-d769-497a-aaa0-f39b66e20611, 51ac1d9...",[],[],[]
2,electric6_1.pdf,2,pdf,2025-09-22T13:13:15.842404+00:00,Mississippi Power,601-298-4818,mississippipower.com,True,WILLIAM A VALENCIA,09931-83323,...,[],"[f0bfa4b0-54cc-46a6-95de-00b4575b662d, b5243f6...","[f0bfa4b0-54cc-46a6-95de-00b4575b662d, b5243f6...","[38342c1c-8bbd-49e0-bd32-7b7eb1eb9f3c, 0f523a3...",[b5243f68-44aa-4cd3-a74e-93f2eac44cca],"[b5243f68-44aa-4cd3-a74e-93f2eac44cca, 4d7df64...","[b5243f68-44aa-4cd3-a74e-93f2eac44cca, 0f523a3...",[],[],[]
3,electric_C.jpg,1,image,2025-09-22T13:12:53.081879+00:00,PSEG,1-800-436-7734,pseg.com/myaccount,False,ARISLEIDY BAEZ NUNEZ,7491381707,...,[25f5da53-7b38-4118-9882-83f8b0138d33],[25f5da53-7b38-4118-9882-83f8b0138d33],[25f5da53-7b38-4118-9882-83f8b0138d33],"[ad81be9c-8d02-4e22-9769-2311e7a8a9e8, 2f8b053...",[],[],[],[],[],[]
4,electric_B.jpg,1,image,2025-09-22T13:13:01.214334+00:00,Alabama Power,1-800-245-2244,AlabamaPower.com,True,ERIKA J ZAPATA,96762-33381,...,[],[183c34d2-bdc7-426e-a728-3be7f7c83391],[183c34d2-bdc7-426e-a728-3be7f7c83391],"[306675c0-cd8d-40cd-85cc-970df8dd9152, 2ea1c48...",[],"[2e7a8c7f-9368-43b3-922f-d302007f12b9, 85bad90...","[2ea1c485-c287-4898-b109-943a89466688, 306675c...",[],[],[]
5,electric2_1.pdf,4,pdf,2025-09-22T13:13:03.557917+00:00,PSEG,800-436-7734,pseg.com,True,EDITH AVELLA,7002365118,...,[6d0309f4-a550-42a4-a5da-e372d67761b4],[6d0309f4-a550-42a4-a5da-e372d67761b4],[6d0309f4-a550-42a4-a5da-e372d67761b4],"[9a2728a2-2920-455b-ab63-0a705bda16a3, 926f3b5...",[d4a30369-9f7a-47ef-88bf-f41b5143c661],[d4a30369-9f7a-47ef-88bf-f41b5143c661],"[70b93d03-23b8-4749-9557-0249d7f7f0da, 358477b...",[d76d8533-27c1-410a-b3a2-5da32f7bb5f1],[d76d8533-27c1-410a-b3a2-5da32f7bb5f1],[d76d8533-27c1-410a-b3a2-5da32f7bb5f1]
6,electric5_1.pdf,1,pdf,2025-09-22T13:12:57.703499+00:00,San Diego Gas & Electric,1-877-646-5525,sdge.com,True,DAINETTE R. WOODS,7397 873 592 9,...,[413673b9-b1aa-493a-8907-61b15b426ee5],[2a86e189-c4f6-4655-9381-d3f2601ce829],[2a86e189-c4f6-4655-9381-d3f2601ce829],"[6efd5610-c24d-427d-9be3-0f5618593a0c, 2a86e18...",[],"[2a86e189-c4f6-4655-9381-d3f2601ce829, 619ba1e...",[2a86e189-c4f6-4655-9381-d3f2601ce829],[],"[2a86e189-c4f6-4655-9381-d3f2601ce829, 949866b...",[2a86e189-c4f6-4655-9381-d3f2601ce829]
7,electric3_1.pdf,2,pdf,2025-09-22T13:13:00.181309+00:00,Con Edison,1-800-752-6633,conEd.com/PaymentPlans,True,MITCHELL JOHNSON,44-6011-0985-0021-7,...,[66363377-e231-4d99-9496-c5bf0673d407],[66363377-e231-4d99-9496-c5bf0673d407],[66363377-e231-4d99-9496-c5bf0673d407],[66363377-e231-4d99-9496-c5bf0673d407],[416bd812-f6c9-4e5e-b264-f4e38f7e2599],[416bd812-f6c9-4e5e-b264-f4e38f7e2599],[c8d06f89-1e5c-4ff8-a7e3-b9e3a174cde5],[],[],[]
8,electric1_1.pdf,2,pdf,2025-09-22T13:13:07.892890+00:00,"Mountain View Electric Association, Inc.",1-800-388-9881,www.mvea.coop,True,RON A BAUMERT,61358805,...,"[5a0a06fd-9e0a-481b-a220-a42a63fed00a, ecb590c...",[ecb590cc-b22e-495e-9f1e-0c2ad5cd86ac],[ecb590cc-b22e-495e-9f1e-0c2ad5cd86ac],"[f9263d3b-b963-4320-a2d8-02a22c95c7f3, fd86eec...","[5a0a06fd-9e0a-481b-a220-a42a63fed00a, ecb590c...","[ecb590cc-b22e-495e-9f1e-0c2ad5cd86ac, 71e81ba...",[fd86eec6-5420-4be4-882f-0ce3b361ad07],[],[],[]


In [11]:
# Save the DataFrame to a CSV file inside the results_folder
csv_path = results_folder / "utility_bill_output.csv"
df.to_csv(csv_path, index=False)

## ✅ Wrap-Up

You’ve now used LandingAI’s ADE to:
- Parse and extract data from utility bills, whether the originals are images or PDFs.
- Define custom fields using a `JSON` schema.
- Run Agentic Document Extraction on a batch of utility bills and save the results.
- Save the extracted results and visual grounding chunks as structured data.

To learn more, visit the [LandingAI Documentation](https://docs.landing.ai/ade/ade-overview).