# pandas_ext Basics

This notebook explains the core `pandas_ext` flow in order: input, outputs, and practical benefits.


## 1. Setup

Configure authentication and set a default responses model.


In [1]:
import os

import pandas as pd
from pydantic import BaseModel, Field

from openaivec import pandas_ext

assert os.getenv("OPENAI_API_KEY") or os.getenv("AZURE_OPENAI_BASE_URL"), (
    "Set OPENAI_API_KEY or Azure OpenAI environment variables before running this notebook."
)

pandas_ext.set_responses_model("gpt-5.2")


## 2. Input: pandas DataFrame

The main input is a normal DataFrame or Series.


In [2]:
fruits_df = pd.DataFrame({
    "name": ["apple", "banana", "cherry", "orange"],
    "price_usd": [1.2, 0.8, 2.5, 1.1],
})

fruits_df


Unnamed: 0,name,price_usd
0,apple,1.2
1,banana,0.8
2,cherry,2.5
3,orange,1.1


## 3. Output A: plain-text responses

Use `.ai.responses()` when you want one text output per row.


In [3]:
name_fr = fruits_df["name"].ai.responses(
    "Translate this fruit name to French.",
    batch_size=16,
    show_progress=True,
)

fruits_df.assign(name_fr=name_fr)


Processing batches:   0%|          | 0/4 [00:00<?, ?item/s]

Unnamed: 0,name,price_usd,name_fr
0,apple,1.2,pomme
1,banana,0.8,banane
2,cherry,2.5,cerise
3,orange,1.1,orange


## 4. Output B: structured responses

Use `response_format` when you want typed, structured output.


In [4]:
class FruitFacts(BaseModel):
    family: str = Field(description="Botanical family name")
    color: str = Field(description="Typical fruit color")
    short_note: str = Field(description="Short one-line note")


facts = fruits_df["name"].ai.responses(
    "Return botanical family, typical color, and a short note for this fruit.",
    response_format=FruitFacts,
    batch_size=16,
    show_progress=True,
)

facts.head()


Processing batches:   0%|          | 0/4 [00:00<?, ?item/s]

0    family='Rosaceae' color='Red, green, or yellow...
1    family='Musaceae' color='Yellow (ripe), green ...
2    family='Rosaceae' color='Red to deep purple' s...
3    family='Rutaceae' color='Orange' short_note='A...
Name: name, dtype: object

## 5. Expand structured output into columns


In [5]:
facts_df = facts.rename("facts").ai.extract()
fruits_df.join(facts_df)


Unnamed: 0,name,price_usd,facts_family,facts_color,facts_short_note
0,apple,1.2,Rosaceae,"Red, green, or yellow","A pome fruit from Malus domestica, widely eate..."
1,banana,0.8,Musaceae,"Yellow (ripe), green (unripe)",An elongated berry from Musa species; typicall...
2,cherry,2.5,Rosaceae,Red to deep purple,"A stone fruit (drupe) from Prunus species, kno..."
3,orange,1.1,Rutaceae,Orange,"A citrus hesperidium (Citrus × sinensis), priz..."


## 6. Benefits

**Main input**
- A pandas `Series` or `DataFrame`
- One prompt, optionally with a `response_format` schema

**Main outputs**
- Plain-text outputs (`.ai.responses`)
- Structured typed outputs (`response_format=...`)
- Expandable columns via `.ai.extract()`

**Why this is useful**
- Keeps LLM processing inside familiar pandas workflows
- Supports structured validation with Pydantic models
- Provides batching controls (`batch_size`) for larger datasets
