# 🧠 Agentic Document Extraction with LandingAI

This notebook demonstrates how to use the `agentic_doc` Python package to extract structured information from documents using LandingAI's Agentic Document Extraction (ADE) service.

We'll walk through:
- Parsing documents with ADE
- Defining a custom schema using `pydantic`
- Viewing structured field extractions
- Saving results to CSV

> 📎 Supported formats: `.pdf`, `.png`, `.jpg`, `.jpeg`

## 📦 Setup & Imports

Import necessary packages and utility functions. Ensure you have installed `agentic_doc`, Pillow, and other dependencies:

```bash
pip install agentic-doc pillow
```

Obtain your API Key from the Visial Playground at https://va.landing.ai/settings/api-key

Read about options for setting your API at https://docs.landing.ai/ade/agentic-api-key

The video that accompnies this notebook uses a .env file in the same directory as the notebook.


In [1]:
# Standard libraries
import os
import json
from pathlib import Path

# Agentic Document Extraction from LandingAI
from agentic_doc.parse import parse

[2m2025-07-24 16:38:37[0m [info   [0m] [1mSettings loaded: {
  "endpoint_host": "https://api.va.landing.ai",
  "vision_agent_api_key": "cmI5a[REDACTED]",
  "batch_size": 4,
  "max_workers": 5,
  "max_retries": 100,
  "max_retry_wait_time": 60,
  "retry_logging_style": "log_msg",
  "pdf_to_image_dpi": 96,
  "split_size": 10
}[0m [[0m[1m[34magentic_doc.config[0m][0m (config.py:84)


## 📁 Define Input and Output Directories

Specify where your documents are located and where results will be saved.


In [2]:
# Define input and output directory paths
base_dir = Path(os.getcwd())
input_folder = base_dir / "input_folder"
results_folder = base_dir / "results_folder"
groundings_folder = base_dir / "groundings_folder"

# Create output folders if they don't exist
input_folder.mkdir(parents=True, exist_ok=True)
results_folder.mkdir(parents=True, exist_ok=True)
groundings_folder.mkdir(parents=True, exist_ok=True)

## 🗂️ Collect Document File Paths

This block filters input files for supported formats.


In [3]:
# Collect all document file paths in input folder with supported extensions
# Convert each Path object to a string to ensure compatibility with parse()
file_paths = [
    str(p)
    for p in input_folder.iterdir()
    if p.suffix.lower() in [".pdf", ".png", ".jpg", ".jpeg"]
]

file_paths

['/Users/andreakropp/Documents/Demos/ADE Demos/Notebooks/input_folder/alexandre_yogurt.jpg',
 '/Users/andreakropp/Documents/Demos/ADE Demos/Notebooks/input_folder/force_of_nature_beef.jpg',
 '/Users/andreakropp/Documents/Demos/ADE Demos/Notebooks/input_folder/goji.jpg',
 '/Users/andreakropp/Documents/Demos/ADE Demos/Notebooks/input_folder/garden_of_life_collagen.jpg',
 '/Users/andreakropp/Documents/Demos/ADE Demos/Notebooks/input_folder/teton_hot_dogs.jpg',
 '/Users/andreakropp/Documents/Demos/ADE Demos/Notebooks/input_folder/chomps.jpg']

## 🚀 Run Agentic Document Extraction

Call the `parse()` function from `agentic_doc` to extract structured data and save results to the output folders.

See https://docs.landing.ai/ade/ade-parse-docs for details

In [4]:
# Parse documents using LandingAI ADE

result = parse(
    documents=file_paths,
    result_save_dir=str(results_folder),
    grounding_save_dir=str(groundings_folder),
    include_marginalia=True,
    include_metadata_in_markdown=True,      
    )

[2m2025-07-24 16:39:24[0m [info   [0m] [1mAPI key is valid.             [0m [[0m[1m[34magentic_doc.utils[0m][0m (utils.py:44)
[2m2025-07-24 16:39:24[0m [info   [0m] [1mParsing 6 documents           [0m [[0m[1m[34magentic_doc.parse[0m][0m (parse.py:322)


Parsing documents:   0%|          | 0/6 [00:00<?, ?it/s]

HTTP Request: POST https://api.va.landing.ai/v1/tools/agentic-document-analysis "HTTP/1.1 200 OK" (_client.py:1025)
[2m2025-07-24 16:39:30[0m [info   [0m] [1mTime taken to successfully parse a document chunk: 5.28 seconds[0m [[0m[1m[34magentic_doc.parse[0m][0m (parse.py:683)
[2m2025-07-24 16:39:30[0m [info   [0m] [1mSaving 10 chunks as images to '/Users/andreakropp/Documents/Demos/ADE Demos/Notebooks/groundings_folder/force_of_nature_beef_20250724_093930'[0m [[0m[1m[34magentic_doc.utils[0m][0m [36mfile_path[0m=[35mPosixPath('/Users/andreakropp/Documents/Demos/ADE Demos/Notebooks/input_folder/force_of_nature_beef.jpg')[0m [36mfile_type[0m=[35mimage[0m (utils.py:84)
[2m2025-07-24 16:39:30[0m [info   [0m] [1mSaved the parsed result to '/Users/andreakropp/Documents/Demos/ADE Demos/Notebooks/results_folder/force_of_nature_beef_20250724_093930.json'[0m [[0m[1m[34magentic_doc.parse[0m][0m (parse.py:407)
HTTP Request: POST https://api.va.landing.ai/v1/tool

Parsing documents:  17%|█▋        | 1/6 [00:08<00:40,  8.03s/it]

HTTP Request: POST https://api.va.landing.ai/v1/tools/agentic-document-analysis "HTTP/1.1 200 OK" (_client.py:1025)
[2m2025-07-24 16:39:33[0m [info   [0m] [1mTime taken to successfully parse a document chunk: 8.25 seconds[0m [[0m[1m[34magentic_doc.parse[0m][0m (parse.py:683)
[2m2025-07-24 16:39:33[0m [info   [0m] [1mSaving 4 chunks as images to '/Users/andreakropp/Documents/Demos/ADE Demos/Notebooks/groundings_folder/goji_20250724_093933'[0m [[0m[1m[34magentic_doc.utils[0m][0m [36mfile_path[0m=[35mPosixPath('/Users/andreakropp/Documents/Demos/ADE Demos/Notebooks/input_folder/goji.jpg')[0m [36mfile_type[0m=[35mimage[0m (utils.py:84)
[2m2025-07-24 16:39:33[0m [info   [0m] [1mSaved the parsed result to '/Users/andreakropp/Documents/Demos/ADE Demos/Notebooks/results_folder/goji_20250724_093933.json'[0m [[0m[1m[34magentic_doc.parse[0m][0m (parse.py:407)


Parsing documents:  50%|█████     | 3/6 [00:08<00:06,  2.16s/it]

HTTP Request: POST https://api.va.landing.ai/v1/tools/agentic-document-analysis "HTTP/1.1 200 OK" (_client.py:1025)
[2m2025-07-24 16:39:37[0m [info   [0m] [1mTime taken to successfully parse a document chunk: 12.24 seconds[0m [[0m[1m[34magentic_doc.parse[0m][0m (parse.py:683)
[2m2025-07-24 16:39:37[0m [info   [0m] [1mSaving 1 chunks as images to '/Users/andreakropp/Documents/Demos/ADE Demos/Notebooks/groundings_folder/garden_of_life_collagen_20250724_093937'[0m [[0m[1m[34magentic_doc.utils[0m][0m [36mfile_path[0m=[35mPosixPath('/Users/andreakropp/Documents/Demos/ADE Demos/Notebooks/input_folder/garden_of_life_collagen.jpg')[0m [36mfile_type[0m=[35mimage[0m (utils.py:84)
[2m2025-07-24 16:39:37[0m [info   [0m] [1mSaved the parsed result to '/Users/andreakropp/Documents/Demos/ADE Demos/Notebooks/results_folder/garden_of_life_collagen_20250724_093937.json'[0m [[0m[1m[34magentic_doc.parse[0m][0m (parse.py:407)


Parsing documents:  67%|██████▋   | 4/6 [00:12<00:05,  2.79s/it]

HTTP Request: POST https://api.va.landing.ai/v1/tools/agentic-document-analysis "HTTP/1.1 200 OK" (_client.py:1025)
[2m2025-07-24 16:39:37[0m [info   [0m] [1mTime taken to successfully parse a document chunk: 4.58 seconds[0m [[0m[1m[34magentic_doc.parse[0m][0m (parse.py:683)
[2m2025-07-24 16:39:37[0m [info   [0m] [1mSaving 2 chunks as images to '/Users/andreakropp/Documents/Demos/ADE Demos/Notebooks/groundings_folder/chomps_20250724_093937'[0m [[0m[1m[34magentic_doc.utils[0m][0m [36mfile_path[0m=[35mPosixPath('/Users/andreakropp/Documents/Demos/ADE Demos/Notebooks/input_folder/chomps.jpg')[0m [36mfile_type[0m=[35mimage[0m (utils.py:84)
[2m2025-07-24 16:39:37[0m [info   [0m] [1mSaved the parsed result to '/Users/andreakropp/Documents/Demos/ADE Demos/Notebooks/results_folder/chomps_20250724_093937.json'[0m [[0m[1m[34magentic_doc.parse[0m][0m (parse.py:407)
HTTP Request: POST https://api.va.landing.ai/v1/tools/agentic-document-analysis "HTTP/1.1 200 O

Parsing documents: 100%|██████████| 6/6 [00:19<00:00,  3.19s/it]


## 📑 Define Custom Schema for Field Extraction

Using `pydantic`, we define a schema to extract specific fields (e.g., product name) from the document.

See https://docs.landing.ai/ade/ade-extract-library

In [6]:
# Import pydantic for schema definition
from pydantic import BaseModel, Field

# Define schema for structured extraction
class Product(BaseModel):
    product_name: str = Field(description="The full name of the product excluding the brand name as it appears on the packaging.")
    brand: str = Field(description="The brand or company name.")
    net_weight_oz: float = Field(description="The net weight of the product in ounces, as labeled (e.g., 'NET WT 8 OZ'). Return empty field if not found.")
    net_weight_g: float = Field(description="The net weight of the product in grams, often shown in parentheses next to ounces (e.g., '330 g'). Return empty field if not found.")
    servings_per_container: int = Field(description="The total number of servings per container, usually listed in the Nutrition Facts panel. Return empty field if not found.")
    serving_size: str = Field(description="The serving size as printed on the package, such as '1 stick (45g)' or '1 scoop (10g)'. Return empty field if not found.")
    product_type: str = Field(description="General category of the product (e.g., 'yogurt', 'hot dogs', 'supplement').")
    flavor: str = Field(description="The flavor of the product if applicable, such as 'creamy vanilla'. Return empty field if not found or not applicable.")
    is_grass_fed: bool = Field(description="True if the label mentions 'Grass-Fed'.")
    is_organic: bool = Field(description="True if the label mentions 'Organic' or includes the 'USDA Organic' seal.")
    is_keto_friendly: bool = Field(description="True if the label mentions 'Keto' or 'Ketogenic' diets or similar.")
    is_paleo_friendly: bool = Field(description="True if the label mentions 'Paleo' or 'Paleolithic' diets or similar.")
    is_kosher: bool = Field(description="True if the label mentions 'Kosher'.")
    is_regenerative: bool = Field(description="True if the label includes terms like 'Regeneratively Sourced' or 'Certified Regenerative'.")
    is_certified_humane: bool = Field(description="True if the label features the 'Certified Humane' logo or wording.")
    is_animal_welfare_certified: bool = Field(description="True if the product is 'Animal Welfare Certified' or meets GAP (Global Animal Partnership) standards.")
    is_pasture_raised: bool = Field(description="True if the label claims the animals were 'Pasture Raised'.")
    is_non_gmo: bool = Field(description="True if the product is labeled 'Non-GMO' or has the 'Non-GMO Project Verified' seal.")
    is_gluten_free: bool = Field(description="True if the product is labeled 'Gluten-Free' or certified gluten-free.")
    is_dairy_free: bool = Field(description="True if the product states 'Dairy-Free' or No Dairy.")
    is_lactose_free: bool = Field(description="True if the product explicitly states 'Lactose-Free' or no lactose.")
    is_whole30_approved: bool = Field(description="True if the product is labeled as 'Whole30 Approved'.")
    has_no_added_sugar: bool = Field(description="True if the packaging says 'No Added Sugar' or 'Zero Sugar' or similar.")
    no_antibiotics: bool = Field(description="True if the label claims 'No Antibiotics' or similar language.")
    no_hormones: bool = Field(description="True if the product claims 'No Hormones' or or 'Not treated with rBST' or similar language.")
    no_animal_byproducts: bool = Field(description="True if it states animals were not fed animal by-products.")
    usda_inspected: bool = Field(description="True if the USDA inspection seal is present on the packaging.")


## 🚀 Run Agentic Document Extraction with Schema

Call the `parse()` function from `agentic_doc` to extract structured data and save results to the output folders.

Pass the `extraction_model` as an input to `parse()`.

To learn more about parsing visit [https://docs.landing.ai/ade/ade-parse-docs](https://docs.landing.ai/ade/ade-parse-docs).

In [7]:
# Run ADE using the custom Product schema for structured field extraction
result_fe = parse(
    documents=file_paths, 
    extraction_model=Product  # This line is new
    )

[2m2025-07-24 16:41:54[0m [info   [0m] [1mAPI key is valid.             [0m [[0m[1m[34magentic_doc.utils[0m][0m (utils.py:44)
[2m2025-07-24 16:41:54[0m [info   [0m] [1mParsing 6 documents           [0m [[0m[1m[34magentic_doc.parse[0m][0m (parse.py:261)


Parsing documents:   0%|          | 0/6 [00:00<?, ?it/s]

HTTP Request: POST https://api.va.landing.ai/v1/tools/agentic-document-analysis "HTTP/1.1 200 OK" (_client.py:1025)
[2m2025-07-24 16:42:03[0m [info   [0m] [1mTime taken to successfully parse a document chunk: 9.47 seconds[0m [[0m[1m[34magentic_doc.parse[0m][0m (parse.py:683)
HTTP Request: POST https://api.va.landing.ai/v1/tools/agentic-document-analysis "HTTP/1.1 200 OK" (_client.py:1025)
[2m2025-07-24 16:42:04[0m [info   [0m] [1mTime taken to successfully parse a document chunk: 10.34 seconds[0m [[0m[1m[34magentic_doc.parse[0m][0m (parse.py:683)
HTTP Request: POST https://api.va.landing.ai/v1/tools/agentic-document-analysis "HTTP/1.1 200 OK" (_client.py:1025)
[2m2025-07-24 16:42:08[0m [info   [0m] [1mTime taken to successfully parse a document chunk: 13.98 seconds[0m [[0m[1m[34magentic_doc.parse[0m][0m (parse.py:683)
HTTP Request: POST https://api.va.landing.ai/v1/tools/agentic-document-analysis "HTTP/1.1 200 OK" (_client.py:1025)
[2m2025-07-24 16:42:12

Parsing documents:  17%|█▋        | 1/6 [00:17<01:29, 17.90s/it]

HTTP Request: POST https://api.va.landing.ai/v1/tools/agentic-document-analysis "HTTP/1.1 200 OK" (_client.py:1025)
[2m2025-07-24 16:42:16[0m [info   [0m] [1mTime taken to successfully parse a document chunk: 11.58 seconds[0m [[0m[1m[34magentic_doc.parse[0m][0m (parse.py:683)
HTTP Request: POST https://api.va.landing.ai/v1/tools/agentic-document-analysis "HTTP/1.1 200 OK" (_client.py:1025)
[2m2025-07-24 16:42:25[0m [info   [0m] [1mTime taken to successfully parse a document chunk: 21.95 seconds[0m [[0m[1m[34magentic_doc.parse[0m][0m (parse.py:683)


Parsing documents: 100%|██████████| 6/6 [00:31<00:00,  5.24s/it]


In [8]:
# View results
result_fe

[ParsedDocument(markdown='Summary : This is a product label for Alexandre Family Farm 100% Grass-Fed A2/A2 Organic Plain Yogurt, highlighting certifications and product features.\n\nphoto:\nScene Overview :\n  • The main subject is the front label of a yogurt container.\n  • The label is cream-colored with green text and graphics.\n  • A realistic illustration of a cow is present on the left side.\n  • The product is described as "100% Grass-Fed A2/A2 Organic Plain Yogurt" with "Extra Cream Top".\n  • The container size is specified as 24 oz (680 g).\n\nTechnical Details :\n  • Multiple certifications are visible: "Certified Humane", "Certified Grass-Fed Organic Dairy", "USDA Organic", "Regenerative Organic Certified".\n  • Additional text includes "Certified Regenerative", "Stir Well", "Non-Homogenized", and "Thank You".\n  • The label includes the brand name "Alexandre Family Farm" in large, stylized font.\n\nSpatial Relationships :\n  • The cow illustration is on the left, occupying

## 🔍 Explore Field Extraction Outputs

Dive into the result to understand the contents and structure.

In [9]:
# Access one document from the results
doc = result_fe[2] # Choose index based on available docs

In [10]:
# Extract various outputs
markdown_output = doc.markdown
chunk_output = doc.chunks
doc_type = doc.doc_type
result_path = str(doc.result_path)

# Print metadata
print("Document Type:", doc_type)
print("Result Path:", result_path)
print("Markdown Summary (first 100 chars):")
print(markdown_output[:100])

# Access and iterate through chunks
print(f"Total Chunks: {len(doc.chunks)}")

for i, chunk in enumerate(doc.chunks):
    print(f"\n--- Chunk {i+1} ---")
    print("Chunk ID:", chunk.chunk_id)
    print("Chunk Type:", chunk.chunk_type.value)  # e.g., 'text', 'figure', etc.
    print("Text (shortened):", chunk.text[:100].replace("\n", " "), "...")

    # Access grounding (box and image path)
    for grounding in chunk.grounding:
        box = grounding.box
        print("  Page:", grounding.page)
        print(f"  Box (l, t, r, b): ({box.l:.3f}, {box.t:.3f}, {box.r:.3f}, {box.b:.3f})")
        print("  Image Path:", str(grounding.image_path))


Document Type: image
Result Path: None
Markdown Summary (first 100 chars):
Summary : This image is a logo-style graphic promoting the concept of "Super Foods with Purpose," em
Total Chunks: 4

--- Chunk 1 ---
Chunk ID: d5604e6b-afd7-45e9-8a10-3220b67bf9c3
Chunk Type: figure
Text (shortened): Summary : This image is a logo-style graphic promoting the concept of "Super Foods with Purpose," em ...
  Page: 0
  Box (l, t, r, b): (0.116, 0.010, 0.255, 0.226)
  Image Path: None

--- Chunk 2 ---
Chunk ID: a484a55b-42cc-4db9-9004-5288b6ecc13e
Chunk Type: figure
Text (shortened): Summary : This is a product label illustration promoting a food item as suitable for topping and sna ...
  Page: 0
  Box (l, t, r, b): (0.567, 0.029, 0.732, 0.187)
  Image Path: None

--- Chunk 3 ---
Chunk ID: ea3adc5b-5fd0-4f80-b734-16c6e8436c0b
Chunk Type: text
Text (shortened): logo: NAVITAS ORGANICS ...
  Page: 0
  Box (l, t, r, b): (0.262, 0.214, 0.573, 0.368)
  Image Path: None

--- Chunk 4 ---
Chunk ID: b0070276-

In [11]:
# print the field extractions
doc.extraction

Product(product_name='Organic Goji Berries', brand='NAVITAS ORGANICS', net_weight_oz=8.0, net_weight_g=227.0, servings_per_container=0, serving_size='', product_type='superfood', flavor='', is_grass_fed=False, is_organic=True, is_keto_friendly=False, is_paleo_friendly=False, is_kosher=False, is_regenerative=False, is_certified_humane=False, is_animal_welfare_certified=False, is_pasture_raised=False, is_non_gmo=True, is_gluten_free=False, is_dairy_free=False, is_lactose_free=False, is_whole30_approved=False, has_no_added_sugar=False, no_antibiotics=False, no_hormones=False, no_animal_byproducts=False, usda_inspected=False)

In [12]:
# print the metadata for the field extractions
doc.extraction_metadata

ProductMetadata(product_name={'chunk_references': ['b0070276-1ffa-41c9-9c19-589be655e6a5']}, brand={'chunk_references': ['ea3adc5b-5fd0-4f80-b734-16c6e8436c0b']}, net_weight_oz={'chunk_references': ['b0070276-1ffa-41c9-9c19-589be655e6a5']}, net_weight_g={'chunk_references': ['b0070276-1ffa-41c9-9c19-589be655e6a5']}, servings_per_container={'chunk_references': []}, serving_size={'chunk_references': []}, product_type={'chunk_references': ['b0070276-1ffa-41c9-9c19-589be655e6a5']}, flavor={'chunk_references': []}, is_grass_fed={'chunk_references': []}, is_organic={'chunk_references': ['b0070276-1ffa-41c9-9c19-589be655e6a5']}, is_keto_friendly={'chunk_references': []}, is_paleo_friendly={'chunk_references': []}, is_kosher={'chunk_references': []}, is_regenerative={'chunk_references': []}, is_certified_humane={'chunk_references': []}, is_animal_welfare_certified={'chunk_references': []}, is_pasture_raised={'chunk_references': []}, is_non_gmo={'chunk_references': ['b0070276-1ffa-41c9-9c19-589

In [13]:
# print the extracted product name 
doc.extraction.product_name

'Organic Goji Berries'

In [14]:
# print the chunk from which the product name is extracted
# note that there can be more than one, so this is returned as a list
doc.extraction_metadata.product_name['chunk_references']

['b0070276-1ffa-41c9-9c19-589be655e6a5']

In [15]:
# print the page number and bounding box location for the chunk
target_id = 'b0070276-1ffa-41c9-9c19-589be655e6a5'  #Update this value based on the response above

# Search through chunks to find the one with the matching ID
for chunk in doc.chunks:
    if chunk.chunk_id == target_id:
        print("Chunk type:", chunk.chunk_type)
        print("Chunk text:", chunk.text)
        for grounding in chunk.grounding:
            box = grounding.box
            print("Page:", grounding.page)
            print(f"Box Coordinates:")
            print(f"  Left (l):   {box.l}")
            print(f"  Top (t):    {box.t}")
            print(f"  Right (r):  {box.r}")
            print(f"  Bottom (b): {box.b}")
        break
else:
    print("Chunk ID not found.")

Chunk type: ChunkType.figure
Chunk text: Summary : This is a product package image for organic goji berries, highlighting key features and certifications.

photo:
Scene Overview :
  • The main subject is a package of organic goji berries.
  • The package is predominantly purple and pink with white and red accents.
  • A white spoon filled with dried goji berries is prominently displayed, along with an illustration of fresh goji berries and green leaves.
  • The text "ORGANIC GOJI BERRIES" is large and central.
  • The phrase "PLANT-BASED SUPERFOOD" appears in a pink box at the top right.

Technical Details :
  • The package lists several product features:
    – "HIGH IN ANTIOXIDANTS (VIT. A)"
    – "MILDLY SWEET & TANGY"
    – "UNSULFURED & GENTLY DRIED"
  • Certifications and labels visible:
    – "USDA ORGANIC" seal (bottom left)
    – "NON GMO" icon (bottom left)
  • Net weight is specified as "NET WT 8 OZ (227g)" at the bottom.
  • The product is described as "PLANT-BASED SUPERFOOD

In [16]:
# print specific fields and the associated metadata
print(f"The product name is: {doc.extraction.product_name}. This is extracted from chunk {doc.extraction_metadata.product_name['chunk_references']}")
print(f"The product claims to be non-GMO: {doc.extraction.is_non_gmo}. This is extracted from chunk {doc.extraction_metadata.is_non_gmo['chunk_references']}")
print(f"The product claims to be gluten free: {doc.extraction.is_gluten_free}. This is extracted from chunk {doc.extraction_metadata.is_gluten_free['chunk_references']}")

The product name is: Organic Goji Berries. This is extracted from chunk ['b0070276-1ffa-41c9-9c19-589be655e6a5']
The product claims to be non-GMO: True. This is extracted from chunk ['b0070276-1ffa-41c9-9c19-589be655e6a5']
The product claims to be gluten free: False. This is extracted from chunk []


## 💾 Convert to Table and Save

Convert the field extractions to a pandas dataframe. Save it to the results folder created earlier.

In [17]:
import pandas as pd

# Assume result_fe is your list of ParsedDocument objects
# Example: result_fe = [ParsedDocument(...), ParsedDocument(...), ...]

# Extract the product data
#Note that only 3 fields have chunk_references included for this example
records = []
for doc in result_fe:
    product = doc.extraction
    meta=doc.extraction_metadata
    product_dict = {
        "product_name": product.product_name,
        "product_name_ref":meta.product_name['chunk_references'],
        "brand": product.brand,
        "net_weight_oz": product.net_weight_oz,
        "net_weight_oz_ref":meta.net_weight_oz['chunk_references'],
        "net_weight_g": product.net_weight_g,
        "servings_per_container": product.servings_per_container,
        "serving_size": product.serving_size,
        "product_type": product.product_type,
        "flavor": product.flavor,
        "is_grass_fed": product.is_grass_fed,
        "is_organic": product.is_organic,
        "is_keto_friendly": product.is_keto_friendly,
        "is_paleo_friendly": product.is_paleo_friendly,
        "is_kosher": product.is_kosher,
        "is_regenerative": product.is_regenerative,
        "is_certified_humane": product.is_certified_humane,
        "is_animal_welfare_certified": product.is_animal_welfare_certified,
        "is_pasture_raised": product.is_pasture_raised,
        "is_non_gmo": product.is_non_gmo,
        "is_gluten_free": product.is_gluten_free,
        "is_dairy_free": product.is_dairy_free,
        "is_lactose_free": product.is_lactose_free,
        "is_whole30_approved": product.is_whole30_approved,
        "has_no_added_sugar": product.has_no_added_sugar,
        "no_antibiotics": product.no_antibiotics,
        "no_hormones": product.no_hormones,
        "no_animal_byproducts": product.no_animal_byproducts,
        "usda_inspected": product.usda_inspected,
        "usda_inspected_ref": meta.usda_inspected['chunk_references'],
    }
    records.append(product_dict)

# Create DataFrame
df = pd.DataFrame(records)
df


Unnamed: 0,product_name,product_name_ref,brand,net_weight_oz,net_weight_oz_ref,net_weight_g,servings_per_container,serving_size,product_type,flavor,...,is_gluten_free,is_dairy_free,is_lactose_free,is_whole30_approved,has_no_added_sugar,no_antibiotics,no_hormones,no_animal_byproducts,usda_inspected,usda_inspected_ref
0,100% Grass-Fed A2/A2 Organic Plain Yogurt,[2c27ec64-98e9-4743-b4e7-b1b8a4d44d03],Alexandre Family Farm,24.0,[2c27ec64-98e9-4743-b4e7-b1b8a4d44d03],680.0,0,,yogurt,plain,...,False,False,False,False,False,False,False,False,False,[]
1,"Ancestral Blend Ground Beef, Beef Liver, Beef ...","[58deba68-7421-4a9c-a6b3-b604b33d2aa0, 2bea927...",Force of Nature,16.0,[4093b83c-ad18-47fa-bb20-69a9ce1aa9c4],0.0,0,,ground beef,,...,False,False,False,False,False,False,False,False,True,[4093b83c-ad18-47fa-bb20-69a9ce1aa9c4]
2,Organic Goji Berries,[b0070276-1ffa-41c9-9c19-589be655e6a5],NAVITAS ORGANICS,8.0,[b0070276-1ffa-41c9-9c19-589be655e6a5],227.0,0,,superfood,,...,False,False,False,False,False,False,False,False,False,[]
3,COLLAGEN CREAMER,[74e35e41-4753-499e-9a89-bc1396e2570e],Garden of Life,11.64,[74e35e41-4753-499e-9a89-bc1396e2570e],330.0,12,,supplement,creamy vanilla,...,False,True,False,False,False,False,False,False,False,[]
4,Uncured Beef Hot Dogs,[79d50a02-f930-4e16-8197-4b1f4a9b41a6],Teton Waters Ranch,8.0,[0ba0ae06-e3e8-45e6-bff9-84b7727d2dad],0.0,5,1 link (45g),hot dogs,,...,True,False,False,True,True,True,True,True,True,[6a9f2c91-1556-448b-8cec-e15137fac703]
5,ORIGINAL BEEF MINI STICKS,[044859ec-e1a6-44d4-9868-57b32668f1d4],CHOMPS,3.0,[044859ec-e1a6-44d4-9868-57b32668f1d4],84.0,0,1 stick (0.5 OZ),beef sticks,mild,...,True,False,False,True,True,False,False,False,True,[044859ec-e1a6-44d4-9868-57b32668f1d4]


In [18]:
# Save the DataFrame to a CSV file inside the results_folder
csv_path = results_folder / "output.csv"
df.to_csv(csv_path, index=False)

## ✅ Wrap-Up

You’ve now used LandingAI’s ADE to:
- Parse and extract data from images or PDFs
- Define custom fields using `pydantic`
- Export structured results to a table

To learn more, visit the [LandingAI Documentation](https://docs.landing.ai/ade/ade-overview).