# Intelligent Fill with the ai.fillna Method

This notebook explains the feature in order: the main input, the output, and the practical benefits.


## 1. Setup

Configure authentication and choose a responses model.


In [1]:
import os

import pandas as pd

from openaivec import pandas_ext

assert os.getenv("OPENAI_API_KEY") or os.getenv("AZURE_OPENAI_BASE_URL"), (
    "Set OPENAI_API_KEY or Azure OpenAI environment variables before running this notebook."
)

pandas_ext.set_responses_model("gpt-4.1-mini")


## 2. Input: DataFrame and target column

The input is a normal DataFrame with missing values in one target column.


In [2]:
products = pd.DataFrame({
    "product": ["Wireless Earbuds", "Yoga Mat", "Desk Lamp", "Water Bottle", "Running Shoes"],
    "category": ["Electronics", "Fitness", "Home", "Fitness", "Footwear"],
    "price_usd": [129.0, 35.0, 48.0, 20.0, 95.0],
    "marketing_copy": [
        "Immersive sound with all-day battery life",
        None,
        "Warm light for focused evening work",
        None,
        None,
    ],
})

products


Unnamed: 0,product,category,price_usd,marketing_copy
0,Wireless Earbuds,Electronics,129.0,Immersive sound with all-day battery life
1,Yoga Mat,Fitness,35.0,
2,Desk Lamp,Home,48.0,Warm light for focused evening work
3,Water Bottle,Fitness,20.0,
4,Running Shoes,Footwear,95.0,


## 3. Run intelligent fill

Call `.ai.fillna()` with the target column. The method fills only missing values in that column.


In [3]:
filled_products = products.ai.fillna(
    target_column_name="marketing_copy",
    batch_size=16,
    show_progress=True,
)

filled_products


Processing batches:   0%|          | 0/3 [00:00<?, ?item/s]

Unnamed: 0,product,category,price_usd,marketing_copy
0,Wireless Earbuds,Electronics,129.0,Immersive sound with all-day battery life
1,Yoga Mat,Fitness,35.0,Non-slip surface for safe and comfortable work...
2,Desk Lamp,Home,48.0,Warm light for focused evening work
3,Water Bottle,Fitness,20.0,Durable and lightweight bottle to stay hydrate...
4,Running Shoes,Footwear,95.0,Lightweight and supportive design for optimal ...


## 4. Output: before and after

The output keeps the same schema and index, with missing target values completed.


In [4]:
summary = products[["product", "marketing_copy"]].rename(columns={"marketing_copy": "before"}).assign(
    after=filled_products["marketing_copy"]
)
summary


Unnamed: 0,product,before,after
0,Wireless Earbuds,Immersive sound with all-day battery life,Immersive sound with all-day battery life
1,Yoga Mat,,Non-slip surface for safe and comfortable work...
2,Desk Lamp,Warm light for focused evening work,Warm light for focused evening work
3,Water Bottle,,Durable and lightweight bottle to stay hydrate...
4,Running Shoes,,Lightweight and supportive design for optimal ...


## 5. Benefits

**Main input**
- A DataFrame
- One target column name that contains missing values

**Main output**
- A DataFrame with the same rows and columns
- Missing values in the target column filled with context-aware values

**Why this is useful**
- Uses row context from other columns instead of simple mean/mode rules
- Keeps the pandas workflow simple (`df.ai.fillna(...)`)
- Supports batching controls (`batch_size`) for larger datasets
