# **ENERGY DEVELOPERS & UTILITIES—PROJECTS PACKAGE INGESTION**

This Notebook leverages the **Processors** configured on **[Retab](https://www.retab.com/)** to process Energy Developers and Utilities Projects' Packages (like Contracts, Permits, Right-Of-Way, Invoices, Calendars, etc.) with high accuracy and reliability, based on an extraction schema defined and optimized on the platform.

These Packages include various documents:
- Energy contracts (Power Purchase Agreement, Gas Supply Contract, etc.)
- Reports (Consumption / generation reports, smart meter data, etc.)
- Invoices (Utility bills, supplier invoices, settlement invoices, etc.)
- Regulatory documents (ROW, Compliance filings, licenses, permits, audit docs, etc.)
- Technical reports (Maintenance logs, inspection reports, engineering reports, etc.)
- Environmental reports (Sustainability, CO₂ emissions, impact studies, etc.)
- Many more

*Retab's platform enables to automatically generale - iterate - deploy your schemas & prompts into production. See the [Documentation here](https://docs.retab.com/overview/introduction)*

Built with 🩷 by Retab.

We recommmend to initiate your **Retab API Key** on **[Retab](https://www.retab.com/)** and save them in a `.env` file.

You should have:
```
RETAB_API_KEY=sk_retab_***


In [1]:
# INITIALIZATION

from dotenv import load_dotenv
from retab import Retab

load_dotenv() # Import the Retab API key from the .env file

client = Retab()

In [2]:
# PARAMETERS

# ---- Classifier -> leverage Retab to classify the documents for Energy Developers and Utilities
CLASSIFIER_PROJECT_ID = "proj_103i40l-cy0cuqvzCke4U"
CLASSIFIER_ITERATION_ID = "eval_iter_GwOePIw6fsSVzhhzstYrU"

# ---- Routing map (Extraction projects) -> re-route the classified documents 
# to the corresponding extractors configured on Retab to process them with the right schema
DEST_PROJECTS = {
    # Energy Contracts → PPA
    "ppa_contract": {
        "project_id":   "proj_5kqZpmrwx1evgcDMPIMfe",
        "iteration_id": "eval_iter_jGOxQK5BM2PMvVtTnYB-M",
    },
    # Technical Contract → O&M
    "om_contract": {
        "project_id":   "proj_U25G1dYqNDfdXWwwWbX4M",
        "iteration_id": "eval_iter_EypfGH9Uf35VpafcVrPt3",
    },
    # We can add more as we wire them on the platform
    # Not wired yet: "REGULATORY_DOCUMENT","METER_REPORT", "TECHNICAL_REPORT", "OTHER"
}

# Map raw classifier labels with the routing keys above
CATEGORY_TO_KEY = {
    "ENERGY_CONTRACT":     "ppa_contract",
    "TECHNICAL_CONTRACT":  "om_contract",
}

In [3]:
# PROCESS WITH RETAB

from pathlib import Path
import re

DIR = "../assets/docs/Energy/" # Sample documents foudn on the Web

EXTS = {".pdf", ".png", ".jpg", ".jpeg", ".webp", ".pptx"} # You can add more if needed

for doc in Path(DIR).glob("*"):
    if doc.suffix.lower() not in EXTS:
        continue

    print(f"\nProcessing: {doc.name}")

    # --- Classify with Retab
    clf_res = client.projects.extract(
        project_id=CLASSIFIER_PROJECT_ID,
        iteration_id=CLASSIFIER_ITERATION_ID,
        document=str(doc),
    )

    # --- Inline label read
    out = getattr(clf_res, "output", {}) or {}
    label = None
    for k in ("doc_type", "DOCUMENT_TYPE", "document_type", "label", "class", "category", "type"):
        v = out.get(k)
        if v:
            label = str(v).strip().upper()
            break
        if not label:
            m = re.search(
                r'"(?:doc[_\s-]?type|DOCUMENT[_\s-]?TYPE|document[_\s-]?type|label|class|category|type)"\s*:\s*"([^"]+)"',
                str(clf_res),
                flags=re.I,
            )
            if m:
                label = m.group(1).strip().upper()

    print(f"Classifier → {label!r}")

    # --- Route
    route_key = CATEGORY_TO_KEY.get(label)
    if not route_key:
        print(f"No route for {label}, skipping.")
        continue

    dest = DEST_PROJECTS[route_key]

    # --- Extract with Retab
    extraction = client.projects.extract(
        project_id=dest["project_id"],
        iteration_id=dest["iteration_id"],
        document=str(doc),
    )

    print(f"Extraction result → {extraction}")


Processing: ppa_sample3 (long).pdf
Classifier → 'ENERGY_CONTRACT'
Extraction result → RetabParsedChatCompletion(id='91OkaLTMDKmavdIP3IrP8Qw', choices=[RetabParsedChoice(finish_reason='stop', index=0, logprobs=None, message=ParsedChatCompletionMessage(content='{"jurisdiction": "Vietnam", "issuing_ministry": "Ministry of Industry and Trade (MOIT)", "agreement_title": "POWER PURCHASE AGREEMENT", "agreement_date": "null", "mini_grid_name": "POWER PROJECT", "buyer": {"name": "VIETNAM ELECTRICITY (EVN)", "registered_address": "18 Tran Nguyen Han Street, Hanoi, Vietnam", "country": "Vietnam"}, "seller": {"name": "BOT Company", "registered_address": "null", "country": "Vietnam"}, "facility": {"facility_name": "Coal-Fired Thermal Power Facility", "technology": "Coal-Fired Thermal Power", "capacity_kw": 0, "location": "Vietnam", "delivery_point_description": "The location where the Net Energy Output from the Facility is transferred from the BOT Company to EVN (or such other entity as is nominat

In [4]:
# PRINT THE EXTRACTION RESULT IN A NICE JSON FORMAT

import json

parsed_data = json.loads(extraction.choices[0].message.content)
formatted_json = json.dumps(parsed_data, indent=2, ensure_ascii=False)

print(formatted_json)

{
  "document_title": "Operation and Maintenance Enforceable Agreement",
  "project_name": "[SITE NAME]",
  "system_name": "[TYPE OF REMEDIATION SYSTEM(S)]",
  "agreement_type": "enforceable_om_agreement",
  "governing_entities": [
    {
      "role": "authority",
      "name": "Office of Legal Counsel (OLC)"
    },
    {
      "name": "State Department of Toxic Substances Control",
      "role": "regulator"
    },
    {
      "name": "Site Mitigation Branch Chief",
      "role": "authority"
    },
    {
      "name": "Site Mitigation Enforcement Workgroup",
      "role": "committee"
    },
    {
      "name": "Regional Site Mitigation Enforcement Workgroup representatives",
      "role": "committee"
    },
    {
      "name": "Regional Operation Project Managers",
      "role": "other"
    },
    {
      "name": "Regional Branch Chiefs",
      "role": "authority"
    },
    {
      "name": "Site Mitigation Program's Planning and Policy Unit",
      "role": "other"
    },
    {
      "