# Extract structured data from images

Use AI vision to extract JSON data from receipts, forms, documents, and other images.


## Problem

You have images containing structured information (receipts, forms, ID cards) and need to extract specific fields as JSON for downstream processing.

| Image | Fields to extract |
|-------|------------------|
| receipt.jpg | vendor, total, date, items |
| business_card.jpg | name, email, phone, company |
| invoice.pdf | invoice_number, amount, due_date |


## Solution

**What's in this recipe:**
- Extract structured JSON from images using GPT-4o
- Define a schema with Pydantic for validation
- Access individual fields from the extracted data

You define a Pydantic model for the expected output, then use OpenAI's vision API to extract the data. The response is validated against your schema automatically.


### Setup


In [None]:
%pip install -qU pixeltable openai pydantic


In [None]:
import pixeltable as pxt
from pixeltable.functions import openai
from pydantic import BaseModel


### Define the output schema

Create a Pydantic model that describes the data you want to extract:


In [None]:
# Define the structure you want to extract
class ImageAnalysis(BaseModel):
    description: str
    objects: list[str]
    dominant_colors: list[str]
    scene_type: str


### Load images


In [None]:
# Create a fresh directory
pxt.drop_dir('extraction_demo', force=True)
pxt.create_dir('extraction_demo')


In [None]:
t = pxt.create_table('extraction_demo.images', {'image': pxt.Image})


In [None]:
# Insert sample images
t.insert([
    {'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000036.jpg'},
    {'image': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000090.jpg'},
])


### Extract structured data

Use GPT-4o to extract data according to your schema:


In [None]:
# Add extraction column with structured output
t.add_computed_column(
    analysis=openai.chat_completions(
        messages=[
            {
                'role': 'user',
                'content': [
                    {'type': 'text', 'text': 'Analyze this image and extract the requested information.'},
                    {'type': 'image_url', 'image_url': {'url': t.image}}
                ]
            }
        ],
        model='gpt-4o-mini',
        response_format={'type': 'json_schema', 'json_schema': {
            'name': 'image_analysis',
            'schema': ImageAnalysis.model_json_schema()
        }}
    )
)


In [None]:
# Extract the JSON content from the response
t.add_computed_column(data=t.analysis.choices[0].message.content)


In [None]:
# View extracted data
t.select(t.image, t.data).collect()


## Explanation

**Why use Pydantic models:**

- Type validation ensures consistent output
- Clear documentation of expected fields
- Easy to update schema as requirements change

**Accessing nested fields:**

The extracted JSON is stored as a string. You can parse it or create additional computed columns for specific fields.

**Other extraction use cases:**

| Use Case | Schema Fields |
|----------|---------------|
| Receipts | vendor, total, date, items, tax |
| Business cards | name, title, company, email, phone |
| Product photos | brand, model, condition, defects |
| Documents | title, date, author, summary |


## See also

- [Analyze images in batch](./vision-batch-analysis.ipynb)
- [Configure API keys](./workflow-api-keys.ipynb)
