# **STATE FARM INSURANCE—CLAIMS BILLING PACKAGE INGESTION**

This Notebook leverages the **Processors** configured on **[Retab](https://www.retab.com/)** to process the Claims Billing Package of the State Farm Insurance with high accuracy and reliability, based on an extraction schema defined and optimized on the platform.

These Packages include various documents:
- Claim billings (estimates, medical bills, invoices, etc.) with CPT/HCPCS/parts codes
- Loss-run generations
- Subrogation recovery
- Legal demand letters
- Coverage opinions
- Litigation packages
- Many more

*Retab's platform enables to automatically generale - iterate - deploy your schemas & prompts into production. See the [Documentation here](https://docs.retab.com/overview/introduction)*

Built with 🩷 by Retab.

We recommmend to initiate your **Retab API Key** on **[Retab](https://www.retab.com/)** and save them in a `.env` file.

You should have:
```
RETAB_API_KEY=sk_retab_***


In [1]:
# INITIALIZATION

from dotenv import load_dotenv
from retab import Retab

load_dotenv() # Import the Retab API key from the .env file

client = Retab()

In [2]:
# PARAMETERS

# ---- Classifier -> leverage Retab to classify the documents of Claims Billing Package
CLASSIFIER_PROJECT_ID = "proj_Wdr_IVbaGq0mFxiCLHdSY"
CLASSIFIER_ITERATION_ID = "base-configuration"

# ---- Routing map (Extraction projects) -> re-route the classified documents 
# to the corresponding extractors configured on Retab to process them with the right schema
DEST_PROJECTS = {
    "repair_estimate": {
        "project_id":   "proj_gZtHZWuTtAxi_xOqOIXrJ",
        "iteration_id": "eval_iter_o-t8_4X3uJEYeF8-4DJLV",
    },
    "medical_bill": {
        "project_id":   "proj_kJ-sHsVP6_UQQVCGOArmk",         # both CMS1500 and UB04
        "iteration_id": "eval_iter_VBk_n-SOk-9HOaxulyMDb",
    },
    "contractor_invoice": {
        "project_id":   "proj_XxbzN5ymYJn3KMT6vzgRw",
        "iteration_id": "eval_iter_3V1XNKzHkYiUUVW5JzNVD",
    },
    "demand_letter": {
        "project_id":   "proj_ijmWRdVglm6C_ORTIoqpr",
        "iteration_id": "eval_iter_et1oJq0ZJ7xp21PB5UadK",
    },
    "loss_run": {
        "project_id":   "proj_W17iPKczi0pEZhFsQk0um",
        "iteration_id": "eval_iter_Tvu9aBwHMAPeHRqDEZywm",
    },
    # We can add more as we wire them on the platform
    # Not wired yet: "COVERAGE_POSITION", "EOB", "POLICE_REPORT",
    # "PROOF_OF_LOSS", "POLICY_DECLARATIONS", "PHOTO_EVIDENCE", "OTHER"
}

# Map raw classifier labels with the routing keys above
CATEGORY_TO_KEY = {
    "REPAIR_ESTIMATE":       "repair_estimate",
    "MEDICAL_BILL_CMS1500":  "medical_bill",
    "MEDICAL_BILL_UB04":     "medical_bill",
    "CONTRACTOR_INVOICE":    "contractor_invoice",
    "DEMAND_LETTER":         "demand_letter",
    "LOSS_RUN":              "loss_run",
}

In [3]:
# PROCESS WITH RETAB

from pathlib import Path
import re

DIR = "../assets/docs/Insurance-StateFarm/" # Sample documents foudn on the Web

EXTS = {".pdf", ".png", ".jpg", ".jpeg", ".webp"} # You can add more if needed

for doc in Path(DIR).glob("*"):
    if doc.suffix.lower() not in EXTS:
        continue

    print(f"\nProcessing: {doc.name}")

    # --- Classify with Retab
    clf_res = client.projects.extract(
        project_id=CLASSIFIER_PROJECT_ID,
        iteration_id=CLASSIFIER_ITERATION_ID,
        document=str(doc),
    )

    # --- Inline label read
    out = getattr(clf_res, "output", None)
    label = None
    if isinstance(out, dict):
        label = out.get("DOCUMENT_TYPE") or out.get("document_type")
    if not label:
        m = re.search(r'"(?:DOCUMENT_TYPE|document_type)"\s*:\s*"([^"]+)"', str(clf_res))
        if m:
            label = m.group(1)
    label = label.strip().upper() if isinstance(label, str) else None

    print(f"Classifier → {label!r}")

    # --- Route
    route_key = CATEGORY_TO_KEY.get(label)
    if not route_key:
        print(f"No route for {label}, skipping.")
        continue

    dest = DEST_PROJECTS[route_key]

    # --- Extract with Retab
    extraction = client.projects.extract(
        project_id=dest["project_id"],
        iteration_id=dest["iteration_id"],
        document=str(doc),
    )

    print(f"Extraction result → {extraction}")


Processing: medical_bill_sample1.jpg
Classifier → 'MEDICAL_BILL_UB04'
Extraction result → RetabParsedChatCompletion(id='chatcmpl-C3j0Mx3ejRRZAf06kVwFnlMCvWLjx', choices=[RetabParsedChoice(finish_reason='stop', index=0, logprobs=None, message=ParsedChatCompletionMessage(content='{"provider_name": "KISHWAUKEE COMMUNITY HOSP", "provider_address": "ONE KISH HOSPITAL DR, PO BOX 846, DEKALB, IL 601154939", "provider_phone": "8157561521", "bill_type": "other", "bill_number": "V0002014", "statement_period_from": "2010-04-05", "statement_period_through": "2010-04-05", "patient_name": "TEST, FRIDAY", "patient_address": "1600 PENNSYLVANIA AVENUE, DEKALB, IL 60115", "patient_dob": "1953-03-26", "patient_sex": "M", "admission_date": "2010-04-05", "discharge_date": "2010-04-05", "payer_details": [{"payer_name": "AETNA", "group_name": "", "group_number": "", "health_plan_id": ""}], "diagnosis_codes": ["7062"], "service_lines": [{"rev_code": "0300", "description": "LABORATORY GENERAL", "hcpcs_code": 

In [4]:
# PRINT THE EXTRACTION RESULT IN A NICE JSON FORMAT

import json

parsed_data = json.loads(extraction.choices[0].message.content)
formatted_json = json.dumps(parsed_data, indent=2, ensure_ascii=False)

print(formatted_json)

{
  "letter_date": "2009-06-15",
  "account_reference": "WFM 01002506",
  "final_demand": false,
  "sender_signature": "L. Noblitt",
  "sender": {
    "name": "Stores Protective Association",
    "address": "Post Office Box 2219 Simi Valley, California 93062",
    "contact": "Tel (805)630-1015 Fax (805)526-5490",
    "representative": {
      "name": "L. Noblitt",
      "role": "Unit Supervisor"
    }
  },
  "recipient": {
    "name": "Ms. Elle Ko",
    "address": "32440 Lake Temescal Fremont, CA 94555"
  },
  "client": {
    "name": "Whole Foods Market",
    "role": "Client"
  },
  "subject": "Ms. Elle Ko",
  "debt_details": {
    "amount": 500.0,
    "currency": "USD",
    "amount_text": "Five Hundred",
    "as_of_date": "2009-06-15",
    "penalties_included": false,
    "reason": "Incident which occurred at their San Francisco store on 4/06/2008.",
    "invoice_numbers": []
  },
  "days_overdue": 0,
  "payment_deadline_days": 10,
  "vacate_deadline_days": 0,
  "legal_action_descript