# **MANUFACTURERS—AUTO PACKAGE INGESTION**

This Notebook leverages the **Processors** configured on **[Retab](https://www.retab.com/)** to process Auto Manufacturer's Package (like Audi, CNH, Bridgestone, etc.) with high accuracy and reliability, based on an extraction schema defined and optimized on the platform.

These Packages include various documents:
- Retail/Lease (apps, payslips, IDs, etc.)
- Income/KYC
- Dealer incentive
- Warranty claim reconciliation
- Many more

*Retab's platform enables to automatically generale - iterate - deploy your schemas & prompts into production. See the [Documentation here](https://docs.retab.com/overview/introduction)*

Built with 🩷 by Retab.

We recommmend to initiate your **Retab API Key** on **[Retab](https://www.retab.com/)** and save them in a `.env` file.

You should have:
```
RETAB_API_KEY=sk_retab_***


In [1]:
# INITIALIZATION

from dotenv import load_dotenv
from retab import Retab

load_dotenv() # Import the Retab API key from the .env file

client = Retab()

In [2]:
# PARAMETERS

# ---- Classifier -> leverage Retab to classify the documents for Auto Manufacturers
CLASSIFIER_PROJECT_ID = "proj_NzodKxtfWqB5oFuvZrlAP"
CLASSIFIER_ITERATION_ID = "base-configuration"

# ---- Routing map (Extraction projects) -> re-route the classified documents 
# to the corresponding extractors configured on Retab to process them with the right schema
DEST_PROJECTS = {
    "certificate_of_origin": {
        "project_id":   "proj_1j0RaQySBc_nwmD5lI4Ld",
        "iteration_id": "eval_iter_azDPRn5mEjnMdZ_4CThAS",
    },
    "driver_license": {
        "project_id":   "proj_va3ZR5HLHjrfjGwryTY1R",      
        "iteration_id": "eval_iter_omPLcS69TmzSw-pQZKF2c",
    },
    # We can add more as we wire them on the platform
    # Not wired yet: "KYC", "DEALER_INCENTIVE", "WARRANTY_CLAIM_RECONCILIATION", "OTHER"
}

# Map raw classifier labels with the routing keys above
CATEGORY_TO_KEY = {
    "CERTIFICATE_OF_ORIGIN":   "certificate_of_origin",
    "IDENTIFICATION_DOCUMENT": "driver_license",
}

In [3]:
# PROCESS WITH RETAB

from pathlib import Path
import re

DIR = "../assets/docs/Manufacturers/" # Sample documents foudn on the Web

EXTS = {".pdf", ".png", ".jpg", ".jpeg", ".webp"} # You can add more if needed

for doc in Path(DIR).glob("*"):
    if doc.suffix.lower() not in EXTS:
        continue

    print(f"\nProcessing: {doc.name}")

    # --- Classify with Retab
    clf_res = client.projects.extract(
        project_id=CLASSIFIER_PROJECT_ID,
        iteration_id=CLASSIFIER_ITERATION_ID,
        document=str(doc),
    )

    # --- Inline label read
    out = getattr(clf_res, "output", {}) or {}
    label = None
    for k in ("doc_type", "DOCUMENT_TYPE", "document_type", "label", "class", "category", "type"):
        v = out.get(k)
        if v:
            label = str(v).strip().upper()
            break
        if not label:
            m = re.search(
                r'"(?:doc[_\s-]?type|DOCUMENT[_\s-]?TYPE|document[_\s-]?type|label|class|category|type)"\s*:\s*"([^"]+)"',
                str(clf_res),
                flags=re.I,
            )
            if m:
                label = m.group(1).strip().upper()

    print(f"Classifier → {label!r}")

    # --- Route
    route_key = CATEGORY_TO_KEY.get(label)
    if not route_key:
        print(f"No route for {label}, skipping.")
        continue

    dest = DEST_PROJECTS[route_key]

    # --- Extract with Retab
    extraction = client.projects.extract(
        project_id=dest["project_id"],
        iteration_id=dest["iteration_id"],
        document=str(doc),
    )

    print(f"Extraction result → {extraction}")


Processing: risc_sample5.webp
Classifier → 'CONTRACT'
No route for CONTRACT, skipping.

Processing: certificate_of_origin_sample1.png
Classifier → 'CERTIFICATE_OF_ORIGIN'
Extraction result → RetabParsedChatCompletion(id='NPKiaP6TEIW7xN8PhOGYsAE', choices=[RetabParsedChoice(finish_reason='stop', index=0, logprobs=None, message=ParsedChatCompletionMessage(content='{"issue_date": "2003-10-29", "certificate_number": "RBLPD019", "document_title": "CERTIFICATE OF ORIGIN FOR A VEHICLE", "manufacturer": {"name": "CHEVROLET MOTOR DIVISION GENERAL MOTORS CORPORATION", "brand": "CHEVROLET", "address_line": "null", "city": "DETROIT", "state_province": "MI", "country": "null"}, "vehicle": {"vin": "1GCEK19T34E191229", "year": 2004, "make": "CHEVROLET", "model": "null", "body_type": "PICKUP", "series_or_trim": "CK15753", "engine_power_hp": 44, "engine_displacement_cc": 0, "gvwr_lbs": 6400, "shipping_weight_lbs": 4957, "cylinders": 8, "drive_type": "null", "fuel_type": "null", "color": "null"}, "emis

In [4]:
# PRINT THE EXTRACTION RESULT IN A NICE JSON FORMAT

import json

parsed_data = json.loads(extraction.choices[0].message.content)
formatted_json = json.dumps(parsed_data, indent=2, ensure_ascii=False)

print(formatted_json)

{
  "jurisdiction": "DC",
  "license_class": "A",
  "license_number": "A9999999",
  "issue_date": "2010-02-17",
  "expiry_date": "2018-02-21",
  "full_name": "ANGELINA GABRIELA JONES",
  "last_name": "JONES",
  "first_name": "ANGELINA GABRIELA",
  "date_of_birth": "1984-02-21",
  "sex": "female",
  "eye_color": "bro",
  "hair_color": "oth",
  "organ_donor": true,
  "veteran_status": true,
  "restrictions": "3",
  "endorsements": "NONE",
  "height_in_inches": 62,
  "photo_present": true,
  "weight_in_pounds": 120,
  "document_number": "12348757475974",
  "compliance": "real_id",
  "address": {
    "street": "1234 COMMODORE JOSHUA BARNEY DRIVE, NE",
    "street_2": "#1234",
    "city": "WASHINGTON",
    "state": "DC",
    "postal_code": "00000-0000"
  }
}
