# Supercharging Search and Retrieval for E-Commerce with Voyage AI and Pixeltable

**Best-in-class embedding models and rerankers for unstructured product data**

Modern e-commerce platforms deal with massive amounts of unstructured data: product descriptions, specifications, customer reviews, and images. Traditional keyword search often fails to capture the semantic meaning behind customer queries like "comfortable shoes for standing all day" or "gift ideas for a tech enthusiast."

In this tutorial, we'll demonstrate how to build a powerful semantic search system for product data by combining:

- **[Pixeltable](https://pixeltable.com)**: A multimodal data infrastructure that handles embeddings, indexing, and retrieval as declarative table operations for all data types
- **[Voyage AI](https://voyageai.com)**: State-of-the-art embedding models and rerankers purpose-built for search and retrieval

We'll use real Amazon product data to showcase:

1. üîç **Semantic Product Search**: Find products by meaning, not just keywords
2. üéØ **Reranking for Precision**: Improve search relevance with Voyage AI's reranker
3. üñºÔ∏è **Multimodal Data**: Work with product images alongside text
4. üìä **Incremental Updates**: Add new products without reprocessing the entire catalog

### Prerequisites

- A Voyage AI account with an API key ([get one free](https://www.voyageai.com/))
- Basic familiarity with Python and data operations


## Setup

First, let's install the required packages and configure our environment.


In [7]:
%pip install -qU pixeltable voyageai pandas pyarrow

[0mNote: you may need to restart the kernel to use updated packages.


In [8]:
import os
import getpass

if 'VOYAGE_API_KEY' not in os.environ:
    os.environ['VOYAGE_API_KEY'] = getpass.getpass('Enter your Voyage AI API key: ')


In [9]:
import pixeltable as pxt
from pixeltable.functions import voyageai
import pandas as pd

# Create a fresh workspace for this demo
pxt.drop_dir('ecommerce_search', force=True)
pxt.create_dir('ecommerce_search')


Created directory 'ecommerce_search'.


<pixeltable.catalog.dir.Dir at 0x363457f10>

## Load Amazon Product Data

We'll use a pre-processed subset of the [Amazon Product Dataset 2020](https://huggingface.co/datasets/calmgoose/amazon-product-data-2020), which contains real product listings with rich metadata including:

- Product names and descriptions
- Categories and specifications
- Pricing information
- **One image URL per row** (the original dataset had multiple images pipe-separated; we've split them for easier processing)

The dataset contains ~1,800 rows from 500 products, with each product having 1-7 images.


In [None]:
# Load the pre-processed Amazon product dataset from GitHub
# Note: Update URL to pixeltable/pixeltable after PR is merged
DATASET_URL = 'https://raw.githubusercontent.com/pierrebrunelle/pixeltable/feature/voyageai-ecommerce-search-notebook/docs/resources/amazon_products_with_images.parquet'

df = pd.read_parquet(DATASET_URL)

# Select columns we need and clean null values
# Text columns get empty strings, ensuring embeddings work correctly
columns_to_keep = ['Uniq_Id', 'Product_Name', 'Category', 'Selling_Price', 
                   'About_Product', 'Image', 'image_idx']
df = df[columns_to_keep].copy()

# Fill null values with empty strings for text columns
text_columns = ['Category', 'About_Product']
df[text_columns] = df[text_columns].fillna('')

df.head(3)


Unnamed: 0,Uniq_Id,Product_Name,Category,Upc_Ean_Code,Selling_Price,Model_Number,About_Product,Product_Specification,Technical_Details,Shipping_Weight,Product_Dimensions,Image,Variants,Product_Url,Is_Amazon_Seller,image_idx
0,4c69b61db1fc16e7013b43fc926e502d,"DB Longboards CoreFlex Crossbow 41"" Bamboo Fib...",Sports & Outdoors | Outdoor Recreation | Skate...,,$237.68,,Make sure this fits by entering your model num...,Shipping Weight: 10.7 pounds (View shipping ra...,,10.7 pounds,,https://images-na.ssl-images-amazon.com/images...,https://www.amazon.com/DB-Longboards-CoreFlex-...,https://www.amazon.com/DB-Longboards-CoreFlex-...,Y,0
1,4c69b61db1fc16e7013b43fc926e502d,"DB Longboards CoreFlex Crossbow 41"" Bamboo Fib...",Sports & Outdoors | Outdoor Recreation | Skate...,,$237.68,,Make sure this fits by entering your model num...,Shipping Weight: 10.7 pounds (View shipping ra...,,10.7 pounds,,https://images-na.ssl-images-amazon.com/images...,https://www.amazon.com/DB-Longboards-CoreFlex-...,https://www.amazon.com/DB-Longboards-CoreFlex-...,Y,1
2,4c69b61db1fc16e7013b43fc926e502d,"DB Longboards CoreFlex Crossbow 41"" Bamboo Fib...",Sports & Outdoors | Outdoor Recreation | Skate...,,$237.68,,Make sure this fits by entering your model num...,Shipping Weight: 10.7 pounds (View shipping ra...,,10.7 pounds,,https://images-na.ssl-images-amazon.com/images...,https://www.amazon.com/DB-Longboards-CoreFlex-...,https://www.amazon.com/DB-Longboards-CoreFlex-...,Y,2


In [11]:
# Dataset stats: rows, unique products, and images per product
unique_products = df['Uniq_Id'].nunique()
total_rows = len(df)
f"Total rows: {total_rows}, Unique products: {unique_products}, Avg images per product: {total_rows/unique_products:.1f}"


'Total rows: 1779, Unique products: 500, Avg images per product: 3.6'

### Import into Pixeltable

Now let's import this dataset into Pixeltable. Pixeltable can import pandas DataFrames directly using the `source` parameter.


In [12]:
# Import the dataset into Pixeltable
products = pxt.create_table(
    'ecommerce_search.products',
    source=df
)

products.head(3)


TypeError: float() argument must be a string or a real number, not 'NoneType'

## Multi-Column Embedding Strategy

Instead of combining all product fields into a single text, we'll create **separate embedding indexes** for each searchable column. This approach offers several advantages:

- **Flexible weighting**: Combine results from different columns with custom weights
- **Column-specific queries**: Search only product names, or only descriptions
- **Better relevance**: Each embedding captures the semantic meaning of its specific field


In [None]:
# Define the embedding function once for reuse
# The .using() syntax fixes the model parameter, creating a specialized embedding function
embed_fn = voyageai.embeddings.using(model='voyage-4', input_type='document')

# Add embedding indexes for each searchable text column
products.add_embedding_index('Product_Name', embedding=embed_fn)
products.add_embedding_index('Category', embedding=embed_fn)
products.add_embedding_index('About_Product', embedding=embed_fn)


In [None]:
# View the table structure - note the embedding indexes
products

0
table 'ecommerce_search.products'

Column Name,Type,Computed With
Uniq_Id,String,
Product_Name,String,
Category,String,
Upc_Ean_Code,String,
Selling_Price,String,
Model_Number,String,
About_Product,String,
Product_Specification,String,
Technical_Details,String,
Shipping_Weight,String,

Index Name,Column,Metric,Embedding
idx15,Product_Name,cosine,"embeddings(Product_Name, model='voyage-3.5', input_type='document', truncation=None, output_dimension=None, output_dtype=None)"
idx16,Category,cosine,"embeddings(Category, model='voyage-3.5', input_type='document', truncation=None, output_dimension=None, output_dtype=None)"
idx17,About_Product,cosine,"embeddings(About_Product, model='voyage-3.5', input_type='document', truncation=None, output_dimension=None, output_dtype=None)"


## Semantic Product Search

With embedding indexes on multiple columns, we can now perform semantic searches. Let's create a search function that combines similarity scores from all three columns with configurable weights.


In [None]:
def search_products(query: str, limit: int = 5, 
                     name_weight: float = 0.4, 
                     category_weight: float = 0.2, 
                     description_weight: float = 0.4):
    """
    Search products using weighted similarity across multiple columns.
    
    Args:
        query: Search query
        limit: Number of results to return
        name_weight: Weight for product name similarity
        category_weight: Weight for category similarity  
        description_weight: Weight for description similarity
    """
    # Compute similarity for each column
    name_sim = products['Product_Name'].similarity(string=query)
    category_sim = products['Category'].similarity(string=query)
    description_sim = products['About_Product'].similarity(string=query)
    
    # Combine with weights
    combined_score = (
        name_weight * name_sim + 
        category_weight * category_sim + 
        description_weight * description_sim
    )
    
    return (
        products
        .order_by(combined_score, asc=False)
        .limit(limit)
        .select(
            products['Product_Name'],
            products['Category'],
            products['Selling_Price'],
            name_score=name_sim,
            category_score=category_sim,
            description_score=description_sim,
            combined_score=combined_score
        )
        .collect()
    )


Let's try some realistic e-commerce search scenarios. Notice how the combined score weighs the individual column similarities:


In [None]:
# Search 1: Natural language query
search_products("fun games for kids birthday party")


Product_Name,Category,Selling_Price,name_score,category_score,description_score,combined_score
Rubie's Suicide Squad Joker Teeth Adult Costume,,\$7.73,0.677,,0.684,
"EVAN-MOOR 4545 Skill Sharpeners Math Book, Grade 1, 0.5"" Height, 8.5"" Width, 11"" Length",Toys & Games | Learning & Education | Counting & Math Toys,\$12.64,0.618,0.818,,
Banpresto Love Live! Exq Figure Kotori Minami,,\$19.99,0.589,,0.607,
"amscan Pretty Potion Witch Halloween Costume for Girls, with Included Accessories","Clothing, Shoes & Jewelry | Costumes & Accessories | Kids & Baby | Girls | Costumes",\$51.12,0.697,0.749,,
"Vallejo Umber Wash, 17ml",,\$5.84,0.58,,0.485,


In [None]:
# Search 3: Adjust weights to prioritize product names over descriptions
search_products("educational toys", name_weight=0.6, category_weight=0.2, description_weight=0.2)

Product_Name,Category,Selling_Price,name_score,category_score,description_score,combined_score
Rubie's Suicide Squad Joker Teeth Adult Costume,,\$7.73,0.72,,0.71,
"EVAN-MOOR 4545 Skill Sharpeners Math Book, Grade 1, 0.5"" Height, 8.5"" Width, 11"" Length",Toys & Games | Learning & Education | Counting & Math Toys,\$12.64,0.699,0.887,,
Banpresto Love Live! Exq Figure Kotori Minami,,\$19.99,0.653,,0.653,
"amscan Pretty Potion Witch Halloween Costume for Girls, with Included Accessories","Clothing, Shoes & Jewelry | Costumes & Accessories | Kids & Baby | Girls | Costumes",\$51.12,0.689,0.775,,
"Vallejo Umber Wash, 17ml",,\$5.84,0.642,,0.593,


## Boost Relevance with Voyage AI Reranking

While semantic search is powerful, we can further improve result quality using Voyage AI's reranker. The two-stage retrieval pattern works like this:

1. **First stage**: Use embeddings to quickly retrieve a broad set of candidates (e.g., top 20)
2. **Second stage**: Use the reranker to precisely score and reorder results

This approach combines the speed of embedding search with the precision of cross-encoder reranking.


In [None]:
# Create a query function that retrieves candidates for reranking
# Uses combined similarity across all columns
@pxt.query
def get_candidates(query_text: str, n_candidates: int = 20):
    """Retrieve top candidates using combined embedding similarity."""
    name_sim = products['Product_Name'].similarity(string=query_text)
    category_sim = products['Category'].similarity(string=query_text)
    description_sim = products['About_Product'].similarity(string=query_text)
    combined = 0.4 * name_sim + 0.2 * category_sim + 0.4 * description_sim
    
    return (
        products
        .order_by(combined, asc=False)
        .limit(n_candidates)
        .select(
            products['Product_Name'],
            products['Selling_Price'],
            products['About_Product']
        )
    )


In [None]:
# Create a table to store search queries and their reranked results
searches = pxt.create_table(
    'ecommerce_search.searches',
    {'query': pxt.String}, if_exists='replace'
)

# Add computed column for candidates (retrieves top 15 from embedding search)
searches.add_computed_column(
    candidates=get_candidates(searches.query, n_candidates=15)
)

# Add computed column for reranked results using Voyage AI reranker
# Reranks based on product descriptions for more precise relevance
searches.add_computed_column(
    reranked=voyageai.rerank(
        searches.query,
        searches.candidates['About_Product'],
        model='rerank-2.5',
        top_k=5
    )
)


Created table 'searches'.
Added 0 column values with 0 errors in 0.01 s
Added 0 column values with 0 errors in 0.01 s


No rows affected.

In [None]:
# Test the reranking pipeline with a complex query
test_query = "durable toys for active toddlers"
searches.insert([{'query': test_query}])

Inserted 1 row with 0 errors in 0.33 s (3.03 rows/s)


1 row inserted.

In [None]:
# View the reranked results with relevance scores
searches.collect()


query,candidates,reranked
durable toys for active toddlers,"[{""Product_Name"": ""LAMO 5\"" Vinyl Figure - Legacy Gamers Dr Disrespect, Red"", ""About_Product"": ""Make sure this fits by entering your model number. | 5\u201d stylized Dr DisRespect figure comes with a unique, one-time-use QR card that unlocks the c ...... bridging the digital and the physical. | Collect, display and play with all LAMO vinyl figures. | LAMO is made by gamers, for gamers, with gamers."", ""Selling_Price"": ""\$19.50""}, {""Product_Name"": ""Disney Rapunzel Acrylic Key Ring, Multicolor"", ""About_Product"": ""Make sure this fits by entering your model number. | Rapunzel"", ""Selling_Price"": ""\$3.33""}, {""Product_Name"": ""Funko Pop! Games: Persona 5 - The Joker (Styles May Vary)"", ""About_Product"": ""Make sure this fits by entering your model number. | From persona 5, the Joker (styles may vary), as a stylized POP vinyl from Funko! | Stylized c ...... ny persona 5 fan! | There's a 1/6 chance you'll receive a rare, Chase version, of the Joker! | Collect and display all persona 5 items from Funko!"", ""Selling_Price"": ""\$15.45""}, {""Product_Name"": ""Weiler 14506 Nylox Cup Brush, 6\"", 0.40/80SC Crimped Fill, 5/8\""-11 UNC Nut"", ""About_Product"": null, ""Selling_Price"": ""\$73.58""}, {""Product_Name"": ""Melissa & Doug Annie Doll & Feeding Set Bundle"", ""About_Product"": null, ""Selling_Price"": ""\$24.99""}, {""Product_Name"": ""Pokemon TCG: Sun and Moon Crimson Invasion Elite Trainer Box"", ""About_Product"": ""Make sure this fits by entering your model number. | The Pok\u00e9mon TCG: Sun & Moon\u2014Crimson Invasion Elite Trainer Box includes: | 8 Pok\u00e9mon TCG: Sun ...... er's guide to the Sun & Moon\u2014Crimson Invasion expansion, A code card for the Pokemon Trading Card Game | Official Release Date: November 3rd, 2017"", ""Selling_Price"": ""\$38.49""}, {""Product_Name"": ""Vallejo Umber Wash, 17ml"", ""About_Product"": ""Used to reproducing the weathering of surfaces exposed to harsh climatic conditions | The washes are always needed to blend the edges of the color ...... been formulated so the superficial tension is similar to that of the traditional solvent-based washes | Packaging: Bottles of 17 ml. with flip top"", ""Selling_Price"": ""\$5.84""}, {""Product_Name"": ""Rubie's Suicide Squad Joker Teeth Adult Costume"", ""About_Product"": ""100% Other Fibers | Imported | No Closure closure | Hand Wash | Fun costumes for kids and adults"", ""Selling_Price"": ""\$7.73""}, {""Product_Name"": ""Huffy Kids Bikes 16 & 20 inch with Streamers and BMX Pegs"", ""About_Product"": null, ""Selling_Price"": ""\$74.99 - \$249.99""}, {""Product_Name"": ""Terra by Battat \u2013 4 Dinosaur Toys, Medium \u2013 Dinosaurs for Kids & Collectors, Scientifically Accurate & Designed by A Paleo-Artist; Age 3+ (4 Pc)"", ""About_Product"": ""Make sure this fits by entering your model number. | 4 medium-sized dinosaurs for kids, with lifelike pose, accurate ratio, and exquisitely detail ...... ckaging | Age: suggested for ages 3+ | Collect them all! Discover the entire Terra by Battat family of animal toy figurines and dinosaur playsets!"", ""Selling_Price"": ""\$18.66""}, {""Product_Name"": ""amscan Pretty Potion Witch Halloween Costume for Girls, with Included Accessories"", ""About_Product"": null, ""Selling_Price"": ""\$51.12""}, {""Product_Name"": ""Hoffmaster 120813 Double-Tipped Triangular Crayon, 88 mm Length, Wrapped (500 Packs of 2)"", ""About_Product"": null, ""Selling_Price"": ""\$97.68""}, {""Product_Name"": ""Banpresto Love Live! Exq Figure Kotori Minami"", ""About_Product"": ""Make sure this fits by entering your model number. | Officially licensed product | Base stand included | From the popular series love Live | Kotori | Anime"", ""Selling_Price"": ""\$19.99""}, {""Product_Name"": ""EVAN-MOOR 4545 Skill Sharpeners Math Book, Grade 1, 0.5\"" Height, 8.5\"" Width, 11\"" Length"", ""About_Product"": null, ""Selling_Price"": ""\$12.64""}, {""Product_Name"": ""Wild Republic Mermaid Toy, Slap Bracelet, Gifts for Kids, Purple, Huggers 12\"""", ""About_Product"": ""Make sure this fits by entering your model number. | Let your child be a part of the magical world of Mermaids, with this adorable slap bracelet. ...... When spread out and is made of a high quality material which contributes to the comfy feel. | Mermaids never go out of style and neither will you!"", ""Selling_Price"": ""\$7.99""}]",


## Compare Embedding Search vs. Reranked Results

Let's compare the quality of results before and after reranking to see the improvement:


In [None]:
comparison_query = "safe and educational baby toys"

# Insert the query for reranking
searches.insert([{'query': comparison_query}])

# Embedding search results (before reranking)
search_products(comparison_query, limit=5)


Inserted 1 row with 0 errors in 0.29 s (3.39 rows/s)


Product_Name,Category,Selling_Price,name_score,category_score,description_score,combined_score
Rubie's Suicide Squad Joker Teeth Adult Costume,,\$7.73,0.702,,0.727,
"EVAN-MOOR 4545 Skill Sharpeners Math Book, Grade 1, 0.5"" Height, 8.5"" Width, 11"" Length",Toys & Games | Learning & Education | Counting & Math Toys,\$12.64,0.646,0.854,,
Banpresto Love Live! Exq Figure Kotori Minami,,\$19.99,0.609,,0.637,
"amscan Pretty Potion Witch Halloween Costume for Girls, with Included Accessories","Clothing, Shoes & Jewelry | Costumes & Accessories | Kids & Baby | Girls | Costumes",\$51.12,0.683,0.778,,
"Vallejo Umber Wash, 17ml",,\$5.84,0.634,,0.583,


In [None]:
# Reranked results (after reranking with Voyage AI)
searches.select(
    searches.query,
    searches.reranked['results']
).where(searches.query == comparison_query).collect()


query,reranked_results
safe and educational baby toys,


## Incremental Updates: Adding New Products

One of Pixeltable's key strengths is handling incremental updates. When new products are added to the catalog, embeddings are computed automatically‚Äîno need to reprocess the entire dataset.


In [None]:
# Add new products - embeddings for all three indexes are computed automatically!
new_products = [
    {
        'Uniq Id': 'new_001',
        'Product Name': 'Ultimate STEM Building Kit - 500 Pieces',
        'Category': 'Toys & Games | Building Toys | Building Sets',
        'About Product': 'Educational building set with 500 pieces for ages 6+. Includes gears, motors, and instruction booklet for 50 projects. Develops problem-solving and engineering skills.',
        'Selling Price': '$49.99'
    },
    {
        'Uniq Id': 'new_002', 
        'Product Name': 'Outdoor Adventure Binoculars for Kids',
        'Category': 'Toys & Games | Sports & Outdoor Play | Exploration Toys',
        'About Product': 'Kid-friendly binoculars with 8x magnification, rubber grip, and neck strap. Perfect for bird watching, camping, and nature exploration. Shockproof design.',
        'Selling Price': '$24.99'
    }
]

products.insert(new_products)


In [None]:
# Search should now find the new products
search_products("STEM toys for kids who like to build things")


## Working with Product Images

Since our dataset already has one image URL per row, we can easily convert URLs to actual images using Pixeltable's `Image` type. This enables image display, analysis, and similarity search.


In [None]:
# Add a computed column that converts the Image URL to an actual image
# Pixeltable will automatically download and cache images from URLs
products.add_computed_column(product_image=products.Image.apply(pxt.Image), if_exists='ignore')

In [None]:
# View sample products with their images
products.select(
    products.Product_Name,
    products.Selling_Price,
    products.image_idx,
    products.product_image
).limit(6).collect()


0
view 'ecommerce_search.product_images' (of 'ecommerce_search.products')

Column Name,Type,Computed With
pos,Required[Int],
image_idx,Required[Int],
image_url,Required[String],
Uniq_Id,String,
Product_Name,String,
Category,String,
Upc_Ean_Code,String,
Selling_Price,String,
Model_Number,String,
About_Product,String,


In [None]:
# Count images per product using group_by
products.group_by(products.Uniq_Id).select(
    products.Product_Name,
    image_count=products.image_idx.count()
).order_by(products.Product_Name).limit(10).collect()


TypeError: Image.__init__() takes 1 positional argument but 2 were given

In [None]:
# Search by product name and show images
search_results = products.select(
    products.Product_Name,
    products.product_image
).where(products.Product_Name.contains('Longboard')).limit(4).collect()
search_results


## Export to MongoDB Atlas

Since Voyage AI is part of the MongoDB family, let's complete the integration story by exporting our enriched product data to [MongoDB Atlas](https://www.mongodb.com/atlas). This enables you to:

- Power your production applications with semantically searchable product data
- Use MongoDB Atlas Vector Search alongside Voyage AI embeddings
- Build real-time e-commerce experiences backed by MongoDB's scalable infrastructure

**Prerequisites:**
- A MongoDB Atlas account ([get one free](https://www.mongodb.com/cloud/atlas/register))
- A cluster with a database and collection ready for product data

In [None]:
%pip install -qU pymongo

In [None]:
# Skip interactive sections in CI environments
import os
SKIP_CLOUD_TESTS = os.environ.get('CI') or os.environ.get('GITHUB_ACTIONS')

if not SKIP_CLOUD_TESTS:
    # Enter your MongoDB Atlas credentials interactively
    mongodb_uri = getpass.getpass('MongoDB Atlas connection string (mongodb+srv://...): ')
    mongodb_database = input('Database name: ')
    mongodb_collection = input('Collection name (e.g., products): ')

In [None]:
if not SKIP_CLOUD_TESTS:
    from pymongo import MongoClient
    
    # Connect to MongoDB Atlas
    client = MongoClient(mongodb_uri)
    db = client[mongodb_database]
    collection = db[mongodb_collection]
    
    # Export product data from Pixeltable to MongoDB
    # Select the columns we want to export (excluding binary image data)
    export_data = products.select(
        products.Uniq_Id,
        products.Product_Name,
        products.Category,
        products.Selling_Price,
        products.About_Product,
        products.Image  # URL string, not the binary image
    ).collect().to_pandas()
    
    # Convert DataFrame to list of dictionaries for MongoDB
    documents = export_data.to_dict('records')
    
    # Insert into MongoDB (replace existing documents)
    if documents:
        # Clear existing data and insert fresh
        collection.delete_many({})
        result = collection.insert_many(documents)
        f"Exported {len(result.inserted_ids)} products to MongoDB Atlas"

In [None]:
if not SKIP_CLOUD_TESTS:
    # Verify the export by querying MongoDB
    sample_products = list(collection.find().limit(3))
    
    # Display sample products (excluding MongoDB's _id field for cleaner output)
    for product in sample_products:
        del product['_id']
    
    sample_products

### Next Steps: MongoDB Atlas Vector Search

With your product data now in MongoDB Atlas, you can take advantage of [MongoDB Atlas Vector Search](https://www.mongodb.com/products/platform/atlas-vector-search) to enable semantic search directly in your production database:

1. **Create a Vector Search Index** on your collection
2. **Store Voyage AI embeddings** alongside your product data
3. **Query with semantic similarity** using MongoDB's `$vectorSearch` aggregation stage

This creates a powerful end-to-end pipeline:
- **Pixeltable** for data ingestion, transformation, and embedding generation
- **Voyage AI** for state-of-the-art embeddings and reranking
- **MongoDB Atlas** for scalable production storage and vector search

## Summary

In this tutorial, we demonstrated how to build a production-ready semantic search system for e-commerce by combining:

### Pixeltable Capabilities
- **DataFrame Import**: Load data directly from pandas with automatic type mapping
- **Multi-Column Embedding Indexes**: Separate indexes for product name, category, and description
- **Weighted Search**: Combine similarity scores with custom weights per column
- **Image Handling**: Convert URLs to images with computed columns
- **Query Functions**: Reusable retrieval logic for complex pipelines

### Voyage AI Features
- **voyage-3.5**: Best-in-class embedding model for retrieval tasks
- **rerank-2.5**: High-precision reranker for improved relevance

### MongoDB Integration
- **Export to MongoDB Atlas**: Seamlessly move enriched product data to production
- **Atlas Vector Search Ready**: Enable semantic search in your MongoDB cluster

### Key Benefits
1. **Flexible Multi-Column Search**: Weight different product attributes based on query intent
2. **Two-Stage Retrieval**: Combine fast embedding search with precise reranking
3. **Multimodal Data**: Work with product images alongside text data
4. **Incremental Updates**: Add new products without reprocessing
5. **Production Ready**: Export directly to MongoDB Atlas for scalable deployments

This architecture scales from small catalogs to millions of products and adapts easily to other use cases like document search, support ticket routing, or recommendation systems.


## Learn More

**Pixeltable Resources**
- [Documentation](https://docs.pixeltable.com/)
- [RAG Operations Tutorial](https://docs.pixeltable.com/howto/use-cases/rag-operations)
- [Embedding Indexes Guide](https://docs.pixeltable.com/platform/embedding-indexes)

**Voyage AI Resources**
- [Voyage AI Documentation](https://docs.voyageai.com/)
- [Embedding Models Guide](https://docs.voyageai.com/docs/embeddings)
- [Reranker Guide](https://docs.voyageai.com/docs/reranker)

**MongoDB Resources**
- [MongoDB Atlas](https://www.mongodb.com/atlas) (free tier available)
- [Atlas Vector Search](https://www.mongodb.com/products/platform/atlas-vector-search)
- [Voyage AI + MongoDB Integration](https://www.mongodb.com/blog/post/voyage-ai-joins-mongodb-to-advance-ai-powered-applications)

**Get Started**
- [Sign up for Voyage AI](https://www.voyageai.com/) (free tier available)
- [Sign up for MongoDB Atlas](https://www.mongodb.com/cloud/atlas/register) (free tier available)
- [Install Pixeltable](https://github.com/pixeltable/pixeltable): `pip install pixeltable`