# Build a Multimodal AI Shopping Agent with Voyage AI and Pixeltable

**Best-in-class embeddings and rerankers powering an intelligent product assistant**

| Feature | Model | Description |
|---------|-------|-------------|
| **Semantic Search** | `voyage-3.5` | State-of-the-art text embeddings for product descriptions |
| **Neural Reranking** | `rerank-2.5` | Two-stage retrieval that reorders results by relevance |
| **Multimodal Search** | `voyage-multimodal-3` | Text-to-image search with unified embedding space |
| **AI Agent** | GPT-4o-mini + `pxt.tools()` | LLM with tool calling that orchestrates search |

```
┌─────────────────────────────────────────────────────────────────────────────────┐
│                              DATA FLOW OVERVIEW                                 │
├─────────────────────────────────────────────────────────────────────────────────┤
│                                                                                 │
│  ┌──────────────┐      ┌─────────────────────────────────────────────────────┐  │
│  │ Amazon Data  │─────▶│              products table                         │  │
│  │  (Parquet)   │      │  ┌─────────────────────────────────────────────┐    │  │
│  └──────────────┘      │  │ Embedding Indexes (Voyage AI voyage-3.5)    │    │  │
│                        │  │  • About_Product  (semantic text search)    │    │  │
│                        │  │  • Product_Name   (name matching)           │    │  │
│                        │  │  • Category       (category search)         │    │  │
│                        │  │  • Image          (multimodal: voyage-mm-3) │    │  │
│                        │  └─────────────────────────────────────────────┘    │  │
│                        └───────────────────────────┬─────────────────────────┘  │
│                                                    │                            │
│                        ┌───────────────────────────▼─────────────────────────┐  │
│                        │              searches table                         │  │
│  User Query ──────────▶│  Computed Columns:                                  │  │
│  "outdoor toys"        │   • results    ◀── search_products(@pxt.query)      │  │
│                        │   • candidates ◀── search_products(limit=15)        │  │
│                        │   • reranked   ◀── voyageai.rerank (rerank-2.5)     │  │
│                        └───────────────────────────┬─────────────────────────┘  │
│                                                    │                            │
│                        ┌───────────────────────────▼─────────────────────────┐  │
│                        │               agent table                           │  │
│  "Find colorful       │  Tools (lightweight for LLM context):                │  │
│   toys for kids" ────▶│   • agent_search         (text search)              │  │
│                        │   • agent_image_search   (visual search)            │  │
│                        │   • get_product_summary  (exact lookup)             │  │
│                        │  Pipeline:                                          │  │
│                        │   question → tool_calls → tool_results → answer     │  │
│                        │              (GPT-4o-mini) (auto-exec)  (GPT-4o-mini)│  │
│                        └─────────────────────────────────────────────────────┘  │
│                                                                                 │
└─────────────────────────────────────────────────────────────────────────────────┘
```

Modern e-commerce platforms need more than keyword search—they need AI that understands customer intent. Queries like "comfortable shoes for standing all day" or "gift ideas for a tech enthusiast" require semantic understanding, not string matching.

In this tutorial, we'll build an **AI-powered shopping agent** that combines:

- **[Voyage AI](https://voyageai.com)**: State-of-the-art embedding models (`voyage-3.5`) and rerankers (`rerank-2.5`) purpose-built for search and retrieval
- **[Pixeltable](https://pixeltable.com)**: Declarative AI data infrastructure for storage and orchestration of multimodal data, embeddings, tool calling, and agentic pipelines

**What you'll build:**

1. **Semantic Product Search** — Multi-column embeddings with similarity thresholds
2. **Two-Stage Retrieval** — Fast embedding search + precise neural reranking  
3. **Multimodal Search** — Text-to-image search using unified embedding space
4. **AI Shopping Agent** — LLM that orchestrates search and lookup tools

### Prerequisites

- A Voyage AI API key ([get one free](https://www.voyageai.com/))
- An OpenAI API key (for the agent)
- Basic familiarity with Python


## Setup

First, let's install the required packages and configure our environment.


In [15]:
%pip install -qU pixeltable voyageai openai

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.3[0m[39;49m -> [0m[32;49m26.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [16]:
import os
import getpass

if 'VOYAGE_API_KEY' not in os.environ:
    os.environ['VOYAGE_API_KEY'] = getpass.getpass('Enter your Voyage AI API key: ')

In [17]:
import pixeltable as pxt
from pixeltable.functions import voyageai

# Create a fresh workspace for this demo
pxt.drop_dir('ecommerce_search', force=True)
pxt.create_dir('ecommerce_search')

Created directory 'ecommerce_search'.


<pixeltable.catalog.dir.Dir at 0x3535c82d0>

## Load Amazon Product Data

We'll use a pre-processed subset of the [Amazon Product Dataset 2020](https://huggingface.co/datasets/calmgoose/amazon-product-data-2020), which contains real product listings with rich metadata including:

- Product names and descriptions
- Categories and specifications
- Pricing information
- One image URL per row

The dataset contains ~1,800 rows from 500 products, with each product having 1-7 images.


In [18]:
# Dataset URL - uses GitHub raw content for reproducibility
DATASET_URL = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/amazon_products_with_images.parquet'

In [19]:
# Load parquet from URL and import into Pixeltable
import pandas as pd
df = pd.read_parquet(DATASET_URL)

# Import into Pixeltable with schema overrides
# on_error='ignore' skips rows where image URLs return 404 (some Amazon URLs expire)
products = pxt.create_table(
    'ecommerce_search.products',
    source=df,
    schema_overrides={
        'Uniq_Id': pxt.String,
        'Product_Name': pxt.String,
        'Category': pxt.String,
        'Selling_Price': pxt.String,
        'About_Product': pxt.String,
        'Image': pxt.Image,
    },
    on_error='ignore'
)

Created table 'products'.
Inserted 1779 rows with 10 errors across 2 columns (products.None, products.Image) in 14.14 s (125.78 rows/s)


In [20]:
products.select(products.Uniq_Id, products.Product_Name).where(products.Category.contains('Toys')).distinct().show(3)

Uniq_Id,Product_Name
003fed6c097d330b68fee5ca499eab24,Funko Pop! Animation: Rick and Morty Lawyer Morty Collectible Figure
02d5d3748dc98cf913b93dc8b5c05c8b,"Hasbro Little Pony Rainbow Dash Cuddle Pillow, Large, Blue"
0304022236b299aad7bc87ea32e043ff,"Thames & Kosmos Chem C2000 (V 2.0) Chemistry Set with 250 Experiments and 128 Page Lab Manual, Student Laboratory Quality Instruments & Chemicals"


## Multi-Column Embedding Strategy

Instead of combining all product fields into a single text, we'll create **separate embedding indexes** for each searchable column. This approach offers several advantages:

- **Flexible weighting**: Combine results from different columns with custom weights
- **Column-specific queries**: Search only product names, or only descriptions
- **Better relevance**: Each embedding captures the semantic meaning of its specific field

In [21]:
# Define the embedding function once for reuse
# The .using() syntax fixes the model parameter, creating a specialized embedding function
embed_fn = voyageai.embeddings.using(model='voyage-3.5', input_type='document')

# Add embedding indexes for each searchable text column
products.add_embedding_index('Product_Name', embedding=embed_fn)
products.add_embedding_index('Category', embedding=embed_fn)
products.add_embedding_index('About_Product', embedding=embed_fn)

## Semantic Product Search with Query Functions

With embedding indexes on multiple columns, we can create **query functions** that combine similarity scores with configurable weights. Query functions (`@pxt.query`) are declarative—they can be used as computed columns that execute automatically when data is inserted.


In [22]:
# Semantic search on product descriptions with similarity threshold
# Note: We fetch extra results (limit * 5) to have enough after deduplication
# (Each product may have multiple rows due to multiple images)
@pxt.query
def search_products(query_text: str, limit: int = 5):
    """Search products by semantic similarity on product description.
    Filters results to only include products with similarity > 0.5.
    Returns more candidates to allow for deduplication."""
    sim = products['About_Product'].similarity(string=query_text)
    return (
        products
        .where(sim > 0.5)  # Filter by similarity threshold
        .order_by(sim, asc=False)
        .limit(limit * 5)  # Fetch extra to account for duplicates
        .select(
            products['Uniq_Id'],
            products['Product_Name'],
            products['Category'],
            products['Selling_Price'],
            products['About_Product'],
            score=sim
        )
    )

### Declarative Search with Computed Columns

The real power of `@pxt.query` functions is using them as **computed columns**. Create a searches table where results are computed automatically when queries are inserted:


In [23]:
# Create a searches table with semantic search as a computed column
searches = pxt.create_table(
    'ecommerce_search.searches',
    {'query': pxt.String}
)

# Search results computed automatically on insert
searches.add_computed_column(
    results=search_products(searches.query),
    if_exists='replace'  # Allow re-running to update column definition
)


Created table 'searches'.
Added 0 column values with 0 errors in 0.00 s


No rows affected.

In [24]:
# Insert a query - search results computed automatically!
searches.insert([{'query': 'durable outdoor toys for active kids'}])

Inserted 1 row with 0 errors in 0.55 s (1.81 rows/s)


1 row inserted.

In [25]:
# UDF to format search results for clean display (with deduplication)
@pxt.udf
def format_results(results: list[dict], limit: int = 5) -> list[str]:
    """Deduplicate by product ID and format as 'Product Name ($Price) - score'"""
    if not results:
        return []
    # Deduplicate: keep first occurrence of each product (highest score)
    seen_ids = set()
    unique = []
    for r in results:
        pid = r.get('Uniq_Id')
        if pid not in seen_ids:
            seen_ids.add(pid)
            unique.append(r)
            if len(unique) >= limit:
                break
    return [
        f"{r.get('Product_Name', 'N/A')[:50]}... ({r.get('Selling_Price', 'N/A')}) - {r.get('score', 0):.3f}"
        for r in unique
    ]

# View the search results - deduplicated and formatted
searches.select(
    searches.query,
    top_results=format_results(searches.results)
).collect()

query,top_results
durable outdoor toys for active kids,"[""Melissa & Doug Bella Butterfly Net... (\$6.99) - 0.804"", ""Nerf Sports Pro Grip Football (blue football)... (\$16.97) - 0.786"", ""Swimline Pool Jam Inground Basketball... (\$43.42) - 0.785"", ""Creativity for Kids Clay Keychains... (\$6.99) - 0.782"", ""Pressman Toys Giant Snakes & Ladders Game (4 Playe... (\$14.90) - 0.780""]"


## Boost Relevance with Voyage AI Reranking

While semantic search is powerful, we can further improve result quality using Voyage AI's **rerank-2.5** model. The two-stage retrieval pattern:

1. **First stage**: Use embeddings to quickly retrieve candidates (top 15)
2. **Second stage**: Use the reranker to precisely score and reorder results

Let's add reranking as another computed column to our searches table:

In [26]:
# First, get more candidates for reranking
searches.add_computed_column(
    candidates=search_products(searches.query, limit=15),
    if_exists='replace'  # Allow re-running to update column definition
)

# UDF to extract unique descriptions for reranking (deduplicated by product ID)
@pxt.udf
def extract_descriptions(results: list[dict], limit: int = 15) -> list[str]:
    """Extract unique About_Product descriptions (one per product)."""
    if not results:
        return []
    seen_ids = set()
    descriptions = []
    for r in results:
        pid = r.get('Uniq_Id')
        desc = r.get('About_Product', '')
        if pid not in seen_ids and desc:
            seen_ids.add(pid)
            descriptions.append(desc)
            if len(descriptions) >= limit:
                break
    return descriptions

# Add reranking using Voyage AI's rerank-2.5 model
# Reranks the embedding search results for improved precision
searches.add_computed_column(
    reranked=voyageai.rerank(
        searches.query,
        extract_descriptions(searches.candidates),  # Deduplicated descriptions
        model='rerank-2.5',
        top_k=5
    ),
    if_exists='replace'  # Allow re-running to update column definition
)

Added 1 column value with 0 errors in 0.47 s (2.13 rows/s)
Added 1 column value with 0 errors in 0.19 s (5.25 rows/s)


1 row updated.

In [27]:
# View the searches table structure
# Each query gets: results + candidates + reranked
searches

0
table 'ecommerce_search/searches'

Column Name,Type,Computed With
query,String,
results,Json,search_products(query)
candidates,Json,"search_products(query, limit=15)"
reranked,Json,"rerank(query, extract_descriptions(candidates), model='rerank-2.5', top_k=5)"


In [28]:
# The query we inserted earlier now has reranked results too!
# View raw results to see the full structure
searches.select(
    searches.query,
    searches.results,
    searches.reranked
).collect()

query,results,reranked
durable outdoor toys for active kids,"[{""score"": 0.804, ""Uniq_Id"": ""01bbe4d02c928d24da487a8749ee2553"", ""Category"": ""Patio, Lawn & Garden | Outdoor D\u00e9cor | Backyard Birding & Wildlife | Butterflies"", ""Product_Name"": ""Melissa & Doug Bella Butterfly Net"", ""About_Product"": ""Make sure this fits by entering your model number. | Bug-catching net for kids | Cheerful colors and butterfly decoration | Fade-resistant materia ...... r play. | Durable frame and strong polyester netting | Encourages gross motor skills, hand-eye coordination, and exploration of the natural world."", ""Selling_Price"": ""\$6.99""}, {""score"": 0.804, ""Uniq_Id"": ""01bbe4d02c928d24da487a8749ee2553"", ""Category"": ""Patio, Lawn & Garden | Outdoor D\u00e9cor | Backyard Birding & Wildlife | Butterflies"", ""Product_Name"": ""Melissa & Doug Bella Butterfly Net"", ""About_Product"": ""Make sure this fits by entering your model number. | Bug-catching net for kids | Cheerful colors and butterfly decoration | Fade-resistant materia ...... r play. | Durable frame and strong polyester netting | Encourages gross motor skills, hand-eye coordination, and exploration of the natural world."", ""Selling_Price"": ""\$6.99""}, {""score"": 0.804, ""Uniq_Id"": ""01bbe4d02c928d24da487a8749ee2553"", ""Category"": ""Patio, Lawn & Garden | Outdoor D\u00e9cor | Backyard Birding & Wildlife | Butterflies"", ""Product_Name"": ""Melissa & Doug Bella Butterfly Net"", ""About_Product"": ""Make sure this fits by entering your model number. | Bug-catching net for kids | Cheerful colors and butterfly decoration | Fade-resistant materia ...... r play. | Durable frame and strong polyester netting | Encourages gross motor skills, hand-eye coordination, and exploration of the natural world."", ""Selling_Price"": ""\$6.99""}, {""score"": 0.804, ""Uniq_Id"": ""01bbe4d02c928d24da487a8749ee2553"", ""Category"": ""Patio, Lawn & Garden | Outdoor D\u00e9cor | Backyard Birding & Wildlife | Butterflies"", ""Product_Name"": ""Melissa & Doug Bella Butterfly Net"", ""About_Product"": ""Make sure this fits by entering your model number. | Bug-catching net for kids | Cheerful colors and butterfly decoration | Fade-resistant materia ...... r play. | Durable frame and strong polyester netting | Encourages gross motor skills, hand-eye coordination, and exploration of the natural world."", ""Selling_Price"": ""\$6.99""}, {""score"": 0.786, ""Uniq_Id"": ""6f15e433a39f5c0fcdaa627012bf1b57"", ""Category"": ""Sports & Outdoors | Sports & Fitness | Leisure Sports & Game Room | Outdoor Games & Activities | Balls | Playground Balls"", ""Product_Name"": ""Nerf Sports Pro Grip Football (blue football)"", ""About_Product"": ""Make sure this fits by entering your model number. | Durable foam ball | Grip helps players throw like a pro | Great for indoor and outdoor play | Includes foam ball | Ages 4 and up"", ""Selling_Price"": ""\$16.97""}, {""score"": 0.786, ""Uniq_Id"": ""6f15e433a39f5c0fcdaa627012bf1b57"", ""Category"": ""Sports & Outdoors | Sports & Fitness | Leisure Sports & Game Room | Outdoor Games & Activities | Balls | Playground Balls"", ""Product_Name"": ""Nerf Sports Pro Grip Football (blue football)"", ""About_Product"": ""Make sure this fits by entering your model number. | Durable foam ball | Grip helps players throw like a pro | Great for indoor and outdoor play | Includes foam ball | Ages 4 and up"", ""Selling_Price"": ""\$16.97""}, ..., {""score"": 0.763, ""Uniq_Id"": ""9baf24f25158ca43e10a644dcca23af2"", ""Category"": ""Toys & Games | Learning & Education"", ""Product_Name"": ""Dickie Toys Air Pump Garbage Truck"", ""About_Product"": ""Make sure this fits by entering your model number. | Air pump feature | Hours of endless play | No batteries required"", ""Selling_Price"": ""\$39.99""}, {""score"": 0.763, ""Uniq_Id"": ""9baf24f25158ca43e10a644dcca23af2"", ""Category"": ""Toys & Games | Learning & Education"", ""Product_Name"": ""Dickie Toys Air Pump Garbage Truck"", ""About_Product"": ""Make sure this fits by entering your model number. | Air pump feature | Hours of endless play | No batteries required"", ""Selling_Price"": ""\$39.99""}, {""score"": 0.762, ""Uniq_Id"": ""9baf24f25158ca43e10a644dcca23af2"", ""Category"": ""Toys & Games | Learning & Education"", ""Product_Name"": ""Dickie Toys Air Pump Garbage Truck"", ""About_Product"": ""Make sure this fits by entering your model number. | Air pump feature | Hours of endless play | No batteries required"", ""Selling_Price"": ""\$39.99""}, {""score"": 0.76, ""Uniq_Id"": ""4f292609b885b7c321aededb356f0942"", ""Category"": ""Toys & Games | Games & Accessories | Board Games"", ""Product_Name"": ""Haywire Group Flickin' Chicken"", ""About_Product"": ""Make sure this fits by entering your model number. | Includes four rubber chickens, a target disc and score pad | Great family game | Players encourage to make up their own rules | Encourages physical activity | For 2 4 players"", ""Selling_Price"": ""\$15.51""}, {""score"": 0.76, ""Uniq_Id"": ""4f292609b885b7c321aededb356f0942"", ""Category"": ""Toys & Games | Games & Accessories | Board Games"", ""Product_Name"": ""Haywire Group Flickin' Chicken"", ""About_Product"": ""Make sure this fits by entering your model number. | Includes four rubber chickens, a target disc and score pad | Great family game | Players encourage to make up their own rules | Encourages physical activity | For 2 4 players"", ""Selling_Price"": ""\$15.51""}, {""score"": 0.744, ""Uniq_Id"": ""8636a90d68d24c44934ef2ea51702f18"", ""Category"": ""Toys & Games | Hobbies | Remote & App Controlled Vehicles & Parts | Remote & App Controlled Vehicle Parts"", ""Product_Name"": ""Robinson Racing Products Robinson Racing 1710 Hardened 10T Pinion Gear 32P, Brown/A"", ""About_Product"": ""Precision RC hobby parts | Check your users manual for exact parts listings"", ""Selling_Price"": ""\$3.64""}]","{""results"": [{""index"": 0, ""document"": ""Make sure this fits by entering your model number. | Bug-catching net for kids | Cheerful colors and butterfly decoration | Fade-resistant materia ...... r play. | Durable frame and strong polyester netting | Encourages gross motor skills, hand-eye coordination, and exploration of the natural world."", ""relevance_score"": 0.738}, {""index"": 1, ""document"": ""Make sure this fits by entering your model number. | Durable foam ball | Grip helps players throw like a pro | Great for indoor and outdoor play | Includes foam ball | Ages 4 and up"", ""relevance_score"": 0.656}, {""index"": 2, ""document"": ""Make sure this fits by entering your model number. | Sturdy blow-molded, adjustable height design with water weightable base | Complete with real-feel basketball | Great Item for kids"", ""relevance_score"": 0.562}, {""index"": 5, ""document"": ""Make sure this fits by entering your model number. | Stimulate and inspire imagination of children | Hand painted | Designed in France | Extreme care taken with product quality and safety | Highly detailed and durable"", ""relevance_score"": 0.535}, {""index"": 4, ""document"": ""Make sure this fits by entering your model number. | This classic game just got a GIANT makeover! | Oversized pieces are perfect for little hands | The cloth like board folds for easy storage, and is ideal for outdoor play"", ""relevance_score"": 0.498}], ""total_tokens"": 661}"


In [29]:
# Insert another query to see the full pipeline in action
searches.insert([{'query': 'safe educational toys for toddlers'}])

Inserted 1 row with 0 errors in 0.72 s (1.39 rows/s)


1 row inserted.

### Compare Embedding Search vs. Reranked Results

The reranker often surfaces more relevant results by considering the full query-document relationship:


In [30]:
# UDF to format reranked results showing document snippets and scores
@pxt.udf
def format_reranked(reranked: dict) -> list[str]:
    """Format reranked results as 'doc_snippet... (score)'"""
    if not reranked or 'results' not in reranked:
        return []
    return [
        f"{r['document'][:60]}... ({r['relevance_score']:.3f})"
        for r in reranked['results']
    ]

# Compare: embedding search vs reranked (formatted for readability)
searches.select(
    searches.query,
    embedding_top_5=format_results(searches.results),
    reranked_top_5=format_reranked(searches.reranked),
).where(searches.query == 'safe educational toys for toddlers').collect()


query,embedding_top_5,reranked_top_5
safe educational toys for toddlers,"[""Odyssey Toys Hape Chunky Number Puzzle (10 Pieces)... (\$19.63) - 0.801"", ""Creativity for Kids Clay Keychains... (\$6.99) - 0.800"", ""Silver Unicorn... (\$10.64) - 0.795"", ""Forum Novelties Children's Unisex Headless Costume... (\$23.95 - \$53.82) - 0.785"", ""Hohner Kids MP383 Musical Shapes, 20 Piece, 3 Uniq... (\$65.89) - 0.778""]","[""Make sure this fits by entering your model number. | Include... (0.797)"", ""Make sure this fits by entering your model number. | Classic... (0.773)"", ""Make sure this fits by entering your model number. | K's kid... (0.730)"", ""Make sure this fits by entering your model number. | Bug-cat... (0.645)"", ""Make sure this fits by entering your model number. | Get to ... (0.645)""]"


In [31]:
# UDF to get the top reranked result
@pxt.udf
def top_reranked_result(reranked: dict) -> str:
    """Get the top reranked document with its score."""
    if not reranked or 'results' not in reranked or not reranked['results']:
        return 'N/A'
    top = reranked['results'][0]
    return f"{top['document'][:80]}... (score: {top['relevance_score']:.3f})"

# View all queries with their top reranked result
searches.select(
    searches.query,
    best_match=top_reranked_result(searches.reranked),
).collect()


query,best_match
durable outdoor toys for active kids,Make sure this fits by entering your model number. | Bug-catching net for kids |... (score: 0.738)
safe educational toys for toddlers,"Make sure this fits by entering your model number. | Includes: 35 foam pieces, 1... (score: 0.797)"


## Incremental Updates: Adding New Products

One of Pixeltable's key strengths is handling incremental updates. When new products are added to the catalog, embeddings are computed automatically—no need to reprocess the entire dataset.


In [32]:
# Add new products - embeddings for all three indexes are computed automatically!
new_products = [
    {
        'Uniq_Id': 'new_001',
        'Product_Name': 'Ultimate STEM Building Kit - 500 Pieces',
        'Category': 'Toys & Games | Building Toys | Building Sets',
        'About_Product': 'Educational building set with 500 pieces for ages 6+. Includes gears, motors, and instruction booklet for 50 projects. Develops problem-solving and engineering skills.',
        'Selling_Price': '$49.99',
        'Image': None,  # Use None for no image, not empty string
        'image_idx': 0
    },
    {
        'Uniq_Id': 'new_002', 
        'Product_Name': 'Outdoor Adventure Binoculars for Kids',
        'Category': 'Toys & Games | Sports & Outdoor Play | Exploration Toys',
        'About_Product': 'Kid-friendly binoculars with 8x magnification, rubber grip, and neck strap. Perfect for bird watching, camping, and nature exploration. Shockproof design.',
        'Selling_Price': '$24.99',
        'Image': None,  # Use None for no image, not empty string
        'image_idx': 0
    }
]

products.insert(new_products)


Inserted 2 rows with 0 errors in 1.78 s (1.12 rows/s)


2 rows inserted.

## Agentic Search: LLM-Powered Product Assistant

Now let's combine everything into an **agentic pipeline** where an LLM decides which tools to use:

- **Semantic search** (`search_products`): Find products by description similarity
- **Image search** (`search_product_images`): Find products by visual similarity  
- **Exact lookup** (`get_product_by_id`): Get specific product details by ID

The LLM orchestrates these tools to answer complex questions and returns **both text and images**.

In [33]:
# Create an exact product lookup using retrieval_udf
# This queries by product ID for precise lookups
get_product_by_id = pxt.retrieval_udf(
    products,
    name='get_product_by_id',
    description='Look up a specific product by its unique ID (Uniq_Id)',
    parameters=['Uniq_Id'],
    limit=1
)

In [34]:
# Add image search capability using Voyage AI's multimodal embeddings
from pixeltable.functions import image as pxt_image

# Add Voyage AI multimodal embedding index on the Image column
# voyage-multimodal-3 embeds both text AND images in the same space!
products.add_embedding_index(
    'Image',
    embedding=voyageai.multimodal_embed.using(model='voyage-multimodal-3', input_type='document'),
    if_exists='ignore'
)

In [None]:
# Lightweight agent search functions - return MINIMAL data to avoid context overflow
# (Full search_products is defined above but returns too much data for LLM context)
@pxt.query
def agent_search(query_text: str, limit: int = 5):
    """Lightweight product search for agent - returns only essential fields."""
    sim = products['About_Product'].similarity(string=query_text)
    return (
        products
        .where(sim > 0.5)
        .order_by(sim, asc=False)
        .limit(limit * 5)  # Extra for deduplication
        .select(
            products['Uniq_Id'],
            products['Product_Name'],
            products['Selling_Price'],
            # Short description only - first 100 chars
        )
    )

@pxt.query
def agent_image_search(query_text: str, limit: int = 3):
    """Lightweight image search for agent - returns product info without base64."""
    sim = products.Image.similarity(string=query_text)
    return (
        products
        .where(sim > 0.3)
        .order_by(sim, asc=False)
        .limit(limit)
        .select(
            products.Uniq_Id,
            products.Product_Name,
            products.Selling_Price,
            # No base64 images here - too large for LLM context
        )
    )

In [38]:
# Create a product lookup by ID (returns all columns for matching product)
get_product_summary = pxt.retrieval_udf(
    products,
    name='get_product_summary',
    description='Look up a specific product by its unique ID (Uniq_Id)',
    parameters=['Uniq_Id'],
    limit=1
)

In [39]:
# Bundle lightweight tools for LLM use (minimal data to avoid context overflow)
product_tools = pxt.tools(
    agent_search,           # Lightweight semantic search (name, price only)
    agent_image_search,     # Lightweight image search (no base64)
    get_product_summary     # Lightweight lookup (essential fields only)
)

In [40]:
# Set up OpenAI for the agent (or use Anthropic, etc.)
if 'OPENAI_API_KEY' not in os.environ:
    os.environ['OPENAI_API_KEY'] = getpass.getpass('Enter your OpenAI API key: ')

from pixeltable.functions import openai

In [41]:
# Create the agent table with tool-calling pipeline
agent = pxt.create_table(
    'ecommerce_search.agent',
    {'question': pxt.String}
)

# System prompt for the product assistant (using lightweight tool names)
SYSTEM_PROMPT = """You are a helpful e-commerce product assistant. You have access to three tools:
1. agent_search: Find products by semantic search on descriptions
2. agent_image_search: Find products visually - use when user asks to "show" products
3. get_product_summary: Look up a specific product by ID

Be concise and helpful. Summarize product results clearly."""

# LLM decides which tools to call
agent.add_computed_column(
    llm_response=openai.chat_completions(
        messages=[
            {'role': 'system', 'content': SYSTEM_PROMPT},
            {'role': 'user', 'content': agent.question}
        ],
        model='gpt-4o-mini',
        tools=product_tools
    )
)

# Automatically execute the tool calls
agent.add_computed_column(
    tool_results=openai.invoke_tools(product_tools, agent.llm_response)
)

Created table 'agent'.
Added 0 column values with 0 errors in 0.00 s
Added 0 column values with 0 errors in 0.01 s


No rows affected.

In [42]:
# UDF to format tool results concisely for LLM (avoid context overflow)
@pxt.udf
def assemble_answer_prompt(question: str, tool_results: dict) -> list[dict]:
    """Format tool results concisely for the LLM to generate an answer."""
    results_text = []
    for tool_name, outputs in (tool_results or {}).items():
        if outputs is None:
            continue
        # Deduplicate by product ID
        seen_ids = set()
        for output in outputs:
            if output is None:
                continue
            if isinstance(output, dict):
                pid = output.get('Uniq_Id', '')
                if pid in seen_ids:
                    continue
                seen_ids.add(pid)
                name = output.get('Product_Name', 'Unknown')
                price = output.get('Selling_Price', 'N/A')
                results_text.append(f"• {name} - {price}")
    
    context = "\n".join(results_text[:10]) if results_text else "No results found."
    
    return [
        {'role': 'system', 'content': 'You are a helpful product assistant. Summarize the search results concisely.'},
        {'role': 'user', 'content': f"Question: {question}\n\nProducts Found:\n{context}\n\nProvide a brief, helpful response."}
    ]

# Generate final answer using tool results
agent.add_computed_column(
    answer_prompt=assemble_answer_prompt(agent.question, agent.tool_results)
)

agent.add_computed_column(
    answer=openai.chat_completions(
        messages=agent.answer_prompt,
        model='gpt-4o-mini'
    )['choices'][0]['message']['content']
)

Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s


No rows affected.

In [43]:
# Test the agent with different questions
test_questions = [
    {'question': 'What educational toys do you have for kids who like building?'},
    {'question': 'Find me colorful toys for toddlers'},
    {'question': 'Tell me about product new_001'},
]

agent.insert(test_questions)

Inserted 3 rows with 0 errors in 5.70 s (0.53 rows/s)


3 rows inserted.

In [44]:
# View the agent's answers
agent.select(agent.question, agent.answer).collect()

question,answer
Tell me about product new_001,"It seems that there are no search results available for the product ""new_001."" If you can provide more details or clarify the product name, I can assist you further."
What educational toys do you have for kids who like building?,"It seems that there are currently no specific educational toys listed for kids who enjoy building. However, popular options typically include building blocks, construction sets, LEGO kits, and magnetic tiles. These types of toys encourage creativity, problem-solving, and fine motor skills. You might want to explore local toy stores or online retailers for a variety of building toys."
Find me colorful toys for toddlers,"I couldn't find specific colorful toys for toddlers, but you can try searching on popular online retailers or local stores specializing in children's toys. Look for categories like building blocks, stuffed animals, or activity sets, which often feature vibrant colors and are suitable for young children."


In [45]:
# See the full pipeline: question → tool results → answer
agent.select(
    agent.question,
    agent.tool_results,
    agent.answer
).collect()

question,tool_results,answer
Tell me about product new_001,"{""agent_search"": null, ""agent_image_search"": null, ""get_product_summary"": [[{""Image"": null, ""Uniq_Id"": ""new_001"", ""Category"": ""Toys & Games | Building Toys | Building Sets"", ""image_idx"": 0, ""Product_Name"": ""Ultimate STEM Building Kit - 500 Pieces"", ""About_Product"": ""Educational building set with 500 pieces for ages 6+. Includes gears, motors, and instruction booklet for 50 projects. Develops problem-solving and engineering skills."", ""Selling_Price"": ""\$49.99""}]]}","It seems that there are no search results available for the product ""new_001."" If you can provide more details or clarify the product name, I can assist you further."
What educational toys do you have for kids who like building?,"{""agent_search"": [[{""Uniq_Id"": ""new_001"", ""Product_Name"": ""Ultimate STEM Building Kit - 500 Pieces"", ""Selling_Price"": ""\$49.99""}, {""Uniq_Id"": ""2b47bdb56e45a2bcd883dda0ac5c396e"", ""Product_Name"": ""Creativity for Kids Clay Keychains"", ""Selling_Price"": ""\$6.99""}, {""Uniq_Id"": ""2b47bdb56e45a2bcd883dda0ac5c396e"", ""Product_Name"": ""Creativity for Kids Clay Keychains"", ""Selling_Price"": ""\$6.99""}, {""Uniq_Id"": ""2b47bdb56e45a2bcd883dda0ac5c396e"", ""Product_Name"": ""Creativity for Kids Clay Keychains"", ""Selling_Price"": ""\$6.99""}, {""Uniq_Id"": ""2b47bdb56e45a2bcd883dda0ac5c396e"", ""Product_Name"": ""Creativity for Kids Clay Keychains"", ""Selling_Price"": ""\$6.99""}, {""Uniq_Id"": ""ca60bfaf525369543188f03b98d5d75e"", ""Product_Name"": ""BathBlocks STEM Floating Construction Set"", ""Selling_Price"": ""\$24.99""}, ..., {""Uniq_Id"": ""031e2e0c145b46ff6ed4fbded1e837c5"", ""Product_Name"": ""Odyssey Toys Hape Chunky Number Puzzle (10 Pieces), Multicolor, 5'' x 2''"", ""Selling_Price"": ""\$19.63""}, {""Uniq_Id"": ""3a165de98664f7a9c26cec87cf690473"", ""Product_Name"": ""Constructive Playthings Tree Blocks, Set of 36 Hand-Cut Wood Pieces, Various Shapes and Shades, STEM Approved"", ""Selling_Price"": ""\$64.94""}, {""Uniq_Id"": ""3a165de98664f7a9c26cec87cf690473"", ""Product_Name"": ""Constructive Playthings Tree Blocks, Set of 36 Hand-Cut Wood Pieces, Various Shapes and Shades, STEM Approved"", ""Selling_Price"": ""\$64.94""}, {""Uniq_Id"": ""443dffffe6264a6f2ca54a5e8084d775"", ""Product_Name"": ""Smart Play Ingenio Colors & Shapes Memory Match Game"", ""Selling_Price"": ""\$15.20""}, {""Uniq_Id"": ""443dffffe6264a6f2ca54a5e8084d775"", ""Product_Name"": ""Smart Play Ingenio Colors & Shapes Memory Match Game"", ""Selling_Price"": ""\$15.20""}, {""Uniq_Id"": ""89354e527633105bd209522dcd6f0260"", ""Product_Name"": ""Silver Unicorn"", ""Selling_Price"": ""\$10.64""}]], ""agent_image_search"": null, ""get_product_summary"": null}","It seems that there are currently no specific educational toys listed for kids who enjoy building. However, popular options typically include building blocks, construction sets, LEGO kits, and magnetic tiles. These types of toys encourage creativity, problem-solving, and fine motor skills. You might want to explore local toy stores or online retailers for a variety of building toys."
Find me colorful toys for toddlers,"{""agent_search"": [[{""Uniq_Id"": ""3cb8cc66556c1b240fcdfdc9b89db022"", ""Product_Name"": ""Bright Starts Rattle & Shake Barbell Toy, Ages 3 months +"", ""Selling_Price"": ""\$2.88""}, {""Uniq_Id"": ""3cb8cc66556c1b240fcdfdc9b89db022"", ""Product_Name"": ""Bright Starts Rattle & Shake Barbell Toy, Ages 3 months +"", ""Selling_Price"": ""\$2.88""}, {""Uniq_Id"": ""2b47bdb56e45a2bcd883dda0ac5c396e"", ""Product_Name"": ""Creativity for Kids Clay Keychains"", ""Selling_Price"": ""\$6.99""}, {""Uniq_Id"": ""2b47bdb56e45a2bcd883dda0ac5c396e"", ""Product_Name"": ""Creativity for Kids Clay Keychains"", ""Selling_Price"": ""\$6.99""}, {""Uniq_Id"": ""2b47bdb56e45a2bcd883dda0ac5c396e"", ""Product_Name"": ""Creativity for Kids Clay Keychains"", ""Selling_Price"": ""\$6.99""}, {""Uniq_Id"": ""2b47bdb56e45a2bcd883dda0ac5c396e"", ""Product_Name"": ""Creativity for Kids Clay Keychains"", ""Selling_Price"": ""\$6.99""}, ..., {""Uniq_Id"": ""30ea772fc1d42248ea26c4c5ad0311a5"", ""Product_Name"": ""Melissa & Doug K\u2019s Kids Whose Tail? 8-Page Soft Activity Book, The Original (Great Gift for Girls and Boys - Best for Babies and Toddlers, All Ages)"", ""Selling_Price"": ""\$12.99""}, {""Uniq_Id"": ""30ea772fc1d42248ea26c4c5ad0311a5"", ""Product_Name"": ""Melissa & Doug K\u2019s Kids Whose Tail? 8-Page Soft Activity Book, The Original (Great Gift for Girls and Boys - Best for Babies and Toddlers, All Ages)"", ""Selling_Price"": ""\$12.99""}, {""Uniq_Id"": ""30ea772fc1d42248ea26c4c5ad0311a5"", ""Product_Name"": ""Melissa & Doug K\u2019s Kids Whose Tail? 8-Page Soft Activity Book, The Original (Great Gift for Girls and Boys - Best for Babies and Toddlers, All Ages)"", ""Selling_Price"": ""\$12.99""}, {""Uniq_Id"": ""01bbe4d02c928d24da487a8749ee2553"", ""Product_Name"": ""Melissa & Doug Bella Butterfly Net"", ""Selling_Price"": ""\$6.99""}, {""Uniq_Id"": ""01bbe4d02c928d24da487a8749ee2553"", ""Product_Name"": ""Melissa & Doug Bella Butterfly Net"", ""Selling_Price"": ""\$6.99""}, {""Uniq_Id"": ""01bbe4d02c928d24da487a8749ee2553"", ""Product_Name"": ""Melissa & Doug Bella Butterfly Net"", ""Selling_Price"": ""\$6.99""}]], ""agent_image_search"": null, ""get_product_summary"": null}","I couldn't find specific colorful toys for toddlers, but you can try searching on popular online retailers or local stores specializing in children's toys. Look for categories like building blocks, stuffed animals, or activity sets, which often feature vibrant colors and are suitable for young children."


In [46]:
# Display agent answers in a readable format
for row in agent.select(agent.question, agent.answer).collect():
    print(f"Q: {row['question']}")
    print(f"A: {row['answer']}\n")
    print("-" * 60 + "\n")

Q: Tell me about product new_001
A: It seems that there are no search results available for the product "new_001." If you can provide more details or clarify the product name, I can assist you further.

------------------------------------------------------------

Q: What educational toys do you have for kids who like building?
A: It seems that there are currently no specific educational toys listed for kids who enjoy building. However, popular options typically include building blocks, construction sets, LEGO kits, and magnetic tiles. These types of toys encourage creativity, problem-solving, and fine motor skills. You might want to explore local toy stores or online retailers for a variety of building toys.

------------------------------------------------------------

Q: Find me colorful toys for toddlers
A: I couldn't find specific colorful toys for toddlers, but you can try searching on popular online retailers or local stores specializing in children's toys. Look for categories l

## Summary

In this tutorial, we built a **multimodal AI shopping agent** for e-commerce using **Voyage AI** and **Pixeltable**:

### Voyage AI Features
- **voyage-3.5 Embeddings**: State-of-the-art embedding model for semantic search on product descriptions
- **voyage-multimodal-3**: Embeds both text AND images in the same vector space for cross-modal search
- **rerank-2.5 Reranker**: Two-stage retrieval pattern that combines fast embedding search with precise cross-encoder reranking

### Pixeltable Capabilities
- **Embedding Indexes**: Semantic search with `add_embedding_index()` and similarity thresholds
- **Query Functions (`@pxt.query`)**: Reusable search logic that can be used as computed columns or LLM tools
- **Retrieval UDFs (`pxt.retrieval_udf`)**: Exact lookups by key (product ID, SKU, etc.)
- **Base64 Image Encoding**: Return images directly with `pxt_image.b64_encode()` for display
- **Tool Calling (`pxt.tools`)**: Bundle search functions and lookups as tools for LLM agents
- **Agentic Pipelines**: LLM decides which tools to call; returns **both text answers AND images**

### Key Takeaways
1. **Unified Multimodal Embeddings**: Voyage AI's `voyage-multimodal-3` embeds text and images in the same space - no need for separate models!
2. **Similarity Thresholds**: Filter results by similarity score (e.g., `sim > 0.5`) to ensure relevance
3. **Multimodal Search**: The agent can return both textual answers and product images
4. **Semantic Search > Keyword Search**: Find "comfortable shoes for standing" even if products don't contain those exact words
5. **Two-Stage Retrieval**: Embeddings for fast candidate retrieval, reranker for precision
6. **Agentic Architecture**: Combine text search + image search + exact lookup + LLM reasoning in one pipeline
7. **Declarative Everything**: Insert a row → tools called, images retrieved, answer generated automatically

This architecture adapts easily to other use cases like document search, visual product recommendations, or multimodal customer support.


## Learn More

**Voyage AI Resources**
- [Voyage AI Documentation](https://docs.voyageai.com/)
- [Embedding Models](https://docs.voyageai.com/docs/embeddings) - voyage-3.5 and other models
- [Reranker Guide](https://docs.voyageai.com/docs/reranker) - rerank-2.5 and rerank-2.5-lite
- [Voyage AI + MongoDB](https://www.mongodb.com/blog/post/voyage-ai-joins-mongodb-to-advance-ai-powered-applications) - Voyage AI is now part of MongoDB

**Pixeltable Resources**
- [Documentation](https://docs.pixeltable.com/)
- [Embedding Indexes Guide](https://docs.pixeltable.com/platform/embedding-indexes)
- [Tool Calling with LLMs](https://docs.pixeltable.com/howto/cookbooks/agents/llm-tool-calling) - Agent patterns
- [Data Lookup UDFs](https://docs.pixeltable.com/howto/cookbooks/agents/pattern-data-lookup) - Retrieval UDFs

**Get Started**
- [Sign up for Voyage AI](https://www.voyageai.com/) (free tier available)
- [Install Pixeltable](https://github.com/pixeltable/pixeltable): `pip install pixeltable`