![Cover image](https://d1fiydes8a4qgo.cloudfront.net/blog/2025/january/1/linkedin_card.png)

In [1]:
import ast
import base64
from datetime import datetime
from dotenv import load_dotenv
import json
import os
from typing import Any, List, Optional

import boto3
from botocore.exceptions import NoCredentialsError
from IPython.display import display, Image, HTML

# LlamaIndex
from llama_index.core import Document
from llama_index.core import Settings
from llama_index.core import VectorStoreIndex

from llama_index.core.embeddings import BaseEmbedding
from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.schema import QueryBundle
from llama_index.core.tools import FunctionTool

# LlamaIndex agents
from llama_index.core.agent import FunctionCallingAgentWorker, AgentRunner

# LlamaIndex LLMs
from llama_index.llms.openai import OpenAI as OpenAI_Llama

# LlamaIndex metadata filters
from llama_index.core.vector_stores.types import (
    MetadataFilters,FilterCondition
)

# LlamaIndex retrievers
from llama_index.core.retrievers import VectorIndexAutoRetriever, VectorIndexRetriever

# LlamaIndex vector stores
from llama_index.core.vector_stores import MetadataInfo, VectorStoreInfo
from llama_index.vector_stores.pinecone import PineconeVectorStore
from llama_index.core.vector_stores.types import VectorStoreQuery

# Pinecone
from pinecone import Pinecone, ServerlessSpec

import pandas as pd
from rapidfuzz import fuzz, process
from tqdm import tqdm
from tqdm.autonotebook import tqdm

# Load shoe data
Let's start by reading the SoleMates shoe dataset. This dataset contains detailed product information, such as shoe colors and heel heights, which we'll transform into embeddings and store in a cloud-based Pinecone vector database.

In [2]:
# Load the SoleMates shoe dataset
df_shoes = pd.read_csv('../data/solemates_shoe_directory.csv')

# Convert 'color_details' from string representation of a list to an actual list
df_shoes['color_details'] = df_shoes['color_details'].apply(ast.literal_eval)

# Display the first few rows of the dataset
df_shoes.head()

Unnamed: 0,product_title,gender,product_type,color,usage,color_details,heel_height,heel_type,price_usd,brand,product_id,image
0,Puma men future cat remix sf black casual shoes,men,casual shoes,black,casual,[],,,220,puma,1,1.jpg
1,Buckaroo men flores black formal shoes,men,formal shoes,black,formal,[],,,155,buckaroo,2,2.jpg
2,Gas men europa white shoes,men,casual shoes,white,casual,[],,,105,gas,3,3.jpg
3,Nike men's incinerate msl white blue shoe,men,sports shoes,white,sports,[blue],,,125,nike,4,4.jpg
4,Clarks men hang work leather black formal shoes,men,formal shoes,black,formal,[],,,220,clarks,5,5.jpg


# Visualize shoes

In [3]:
width = 100
images_html = ""
image_data_path = '../data/footwear'
for img_file in df_shoes.head()['image']:
    img_path = os.path.join(image_data_path, img_file)
    # Add each image as an HTML <img> tag
    images_html += f'<img src="{img_path}" style="width:{width}px; margin-right:10px;">'
# Display all images in a row using HTML
display(HTML(f'<div style="display: flex; align-items: center;">{images_html}</div>'))

# Cost of vectorization and pre-embedded dataset
Vectorizing datasets with AWS Bedrock and the Titan multimodal model involves costs based on the number of input tokens and images:

- **Text embeddings:** $0.0008 per 1,000 input tokens

- **Image embeddings:** $0.00006 per image

The provided SoleMates dataset is small, containing just **1306 pairs of shoes**, making it affordable to vectorize. For this dataset, I calculated the total cost of vectorization and summarized the token counts below:

- **Token Count:** 12746 tokens
- **Images:** 1306
- **Total Cost:** $0.0885568

If you prefer not to generate embeddings yourself or don't have access to AWS, you can use a pre-embedded dataset that I've prepared as a CSV file. This file includes all embeddings and token counts, allowing you to follow the guide without incurring additional costs. However, for hands-on experience, I recommend running the embedding process to understand the workflow.

To load the pre-embedded dataset, use the following code:

```python
# Load pre-embedded dataset
df_shoes_with_embeddings = pd.read_csv('../data/solemates_shoe_directory_with_embeddings_token_count.csv')

# Convert string representations to actual lists
df_shoes_with_embeddings['titan_embedding'] = df_shoes_with_embeddings['titan_embedding'].apply(ast.literal_eval)
```

**This step is entirely optional and designed to accommodate various levels of access and resources.**

# Set up AWS Bedrock client

You'll need access to Amazon Bedrock foundation models.

### What is AWS Bedrock?
Amazon Bedrock is a fully managed service offering high-performing foundation models (FMs) for building generative AI applications.

Bedrock is serverless and offers multiple foundational models to choose between.

In [4]:
# Define your AWS profile
# Replace AWS_PROFILE with the name of your AWS CLI profile
# To use your default AWS profile, leave 'aws_profile' as None
aws_profile = os.environ.get('AWS_PROFILE')

# Specify the AWS region where Bedrock is available
aws_region_name = "us-east-1"

try:
    # Set the default session for the specified profile
    if aws_profile:
        boto3.setup_default_session(profile_name=aws_profile)
    else:
        boto3.setup_default_session()  # Use default AWS profile if none is specified

    # Initialize the Bedrock runtime client
    bedrock_runtime = boto3.client(
        service_name="bedrock-runtime",
        region_name=aws_region_name
    )
except NoCredentialsError:
    print("AWS credentials not found. Please configure your AWS profile.")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

# Generate embeddings for product data
To prepare our product data for the vector database, we'll generate embeddings for each product using AWS Titan. These embeddings combine image and text data to represent each product in a format suitable for search and recommendation systems.

Before generating embeddings, we'll initialize two new columns in the dataset:

- **titan_embedding:** To store the embedding vectors
- **token_count:** To store the token count for each product title

Then, we'll define a function to generate embeddings and apply it to the dataset.

# Initialize columns for embeddings

In [5]:
# Initialize columns to store embeddings and token counts
df_shoes['titan_embedding'] = None  # Placeholder for embedding vectors
df_shoes['token_count'] = None  # Placeholder for token counts

# Define function for generating embeddings

In [6]:
# Main function to generate image and text embeddings
def generate_embeddings(df, image_col='image', text_col='product_title', embedding_col='embedding', image_folder=None):

    if image_folder is None:
        raise ValueError("You must specify an image folder path.")

    for index, row in tqdm(df.iterrows(), total=df.shape[0], desc="Generating embeddings"):
        try:
            # Prepare image file as base64
            image_path = os.path.join(image_folder, row[image_col])
            with open(image_path, 'rb') as img_file:
                image_base64 = base64.b64encode(img_file.read()).decode('utf-8')

            # Create input data for the model
            input_data = {"inputImage": image_base64, "inputText": row[text_col]}

            # Invoke AWS Titan model via Bedrock runtime
            response = bedrock_runtime.invoke_model(
                body=json.dumps(input_data),
                modelId="amazon.titan-embed-image-v1",
                accept="application/json",
                contentType="application/json"
            )
            response_body = json.loads(response.get("body").read())

            # Extract embedding and token count from response
            embedding = response_body.get("embedding")
            token_count = response_body.get("inputTextTokenCount")

            # Validate and save the embedding
            if isinstance(embedding, list):
                df.at[index, embedding_col] = embedding  # Save embedding as a list
                df.at[index, 'token_count'] = int(token_count)  # Save token count as an integer
            else:
                raise ValueError("Embedding is not a list as expected.")

        except Exception as e:
            print(f"Error for row {index}: {e}")
            df.at[index, embedding_col] = None  # Handle errors gracefully

    return df

# Generate embeddings

In [7]:
# Generate embeddings for the product data
df_shoes = generate_embeddings(
    df=df_shoes, 
    embedding_col='titan_embedding', 
    image_folder='../data/footwear'
)

Generating embeddings:   0%|          | 0/1306 [00:00<?, ?it/s]

# Save dataset for reuse

In [8]:
# Save the dataset with generated embeddings to a new CSV file
# Get today's date in YYYY_MM_DD format
today = datetime.now().strftime('%Y_%m_%d')

# Save the dataset with generated embeddings to a CSV file
df_shoes.to_csv(f'shoes_with_embeddings_token_{today}.csv', index=False)
print(f"Dataset with embeddings saved as 'shoes_with_embeddings_token_{today}.csv'")

Dataset with embeddings saved as 'shoes_with_embeddings_token_2025_01_29.csv'


# Create a dictionary with product data

Before we create LlamaIndex Document objects, we need to structure the product data into dictionaries. These dictionaries include:

1. **Text:** The product title that will be used for embedding queries
2. **Metadata:** A dictionary containing detailed attributes for each product (e.g., color, gender, usage, price)
3. **Embedding:** The Titan embeddings generated earlier

This dictionary format ensures the data is well-organized for creating Document objects in the next step.

In [9]:
# Convert DataFrame rows into a list of dictionaries for LlamaIndex
product_data = df_shoes.apply(lambda row: {
    'text': row['product_title'],
    'metadata': {
        'color': row['color'],
        'text': row['product_title'],
        'gender': row['gender'],
        'product_type': row['product_type'],
        'usage': row['usage'],
        'price': row['price_usd'],
        'product_id': row['product_id'],
        'brand': row['brand'],
        **({'heel_height': float(row['heel_height'])} if not pd.isna(row['heel_height']) else {}),
        **({'heel_type': row['heel_type']} if not pd.isna(row['heel_type']) else {}),
        **({'color_details': row['color_details']} if row['color_details'] else {})
    },
    'embedding': row['titan_embedding']
}, axis=1).tolist()

# Preview the first product dictionary
#product_data[0]

# Create LlamaIndex Documents
We'll now use the product data dictionaries to create LlamaIndex `Document` objects.

These `Documents` are crucial because:

- They act as containers for our product data and embeddings.
- They enable seamless interaction with Pinecone for upserting embeddings.

Each `Document` includes:

1. The **text** (product_title) for embedding and query purposes
2. **Metadata** with attributes like color, gender, and price
3. The AWS Titan multimodal **embedding** generated earlier
4. An **exclusion list** (excluded_embed_metadata_keys) to prevent unnecessary metadata fields from being embedded, ensuring optimal performance and cost-efficiency

# Create LlamaIndex Documents

In [10]:
# Create LlamaIndex Document objects
documents = []
for doc in product_data:
    documents.append(
        Document(
            text=doc["text"],
            extra_info=doc["metadata"],
            embedding=doc['embedding'],

            # Avoid embedding unnecessary metadata
            excluded_embed_metadata_keys=[
                'color',
                'gender',
                'product_type',
                'usage',
                'text',
                'price',
                'product_id',
                'brand',
                'heel_height',
                'heel_type',
                'color_details'
            ]
        )
    )

# Confirm the first Document object
documents[0].metadata

{'color': 'black',
 'text': 'Puma men future cat remix sf black casual shoes',
 'gender': 'men',
 'product_type': 'casual shoes',
 'usage': 'casual',
 'price': 220,
 'product_id': 1,
 'brand': 'puma'}

# Add local environmental variables

In [62]:
# Specify the path to the .env file
dotenv_path = './.env' 

# Load the .env file
load_dotenv(dotenv_path=dotenv_path, override=True)

# Access the OPENAI_API_KEY
pinecone_api_key = os.getenv('PINECONE_API_KEY')

# Print the key (optional, for testing purposes)
print(f"Your Pinecone API Key is: {pinecone_api_key[:8]}{'*' * (len(pinecone_api_key) - 8)}")

Your Pinecone API Key is: pcsk_4rG*******************************************************************


# Initialize Pinecone
To interact with Pinecone, you'll first need an account and API keys. If you don't already have them, [create a Pinecone account ↗](https://www.pinecone.io/) and retrieve your API key.

Pinecone is a vector database designed to store and query embeddings. We'll use Pinecone to upsert the AWS Titan embeddings we generated earlier, enabling efficient similarity and hybrid search.

In [28]:
# Initialize Pinecone client with API key
pc = Pinecone(api_key=os.getenv('PINECONE_API_KEY'))
index_name = "solemates"  # Replace with your desired index name

# List current indexes
Let's list the existing indexes in your Pinecone account to ensure no duplicates before creating a new index.

In [29]:
# List current indexes
pc.list_indexes()

{'indexes': []}

# Create Index
Next, we'll create a Pinecone index. An index stores the embeddings and metadata for your data.

- **Dimension:** Matches the size of the embeddings we're using (1024 for AWS Titan multimodal embeddings)
- **Metric:** Defines how similarity is calculated (e.g., dot product, cosine similarity)
- **ServerlessSpec:** Specifies the cloud provider and region for your index

If the index already exists, this step will be skipped

In [30]:
if index_name not in pc.list_indexes().names():
    pc.create_index(
        name=index_name, # The index name you picked earlier
        dimension=1024,  # AWS Titan embeddings require 1024 dimensions
        metric="dotproduct",  # Required for hybrid search with Pinecone
        spec=ServerlessSpec(
            cloud="aws",
            region="us-east-1"
        )
    )

## Inspect Pinecone
Navigate to your Pinecone dashboard, and you should see your new index with 0 records (vectors), as we haven't populated it with our vectors yet:

![Check your Pinecone index](https://norahsakal.com/assets/images/1_vector_db_empty-0ace534dc05ce707ac99c961c06c4d50.png)

# Initialize Pinecone Index
After creating the index, we'll initialize it for further operations like upserting embeddings and querying vectors.

In [31]:
pinecone_index = pc.Index(index_name)

# Create Pinecone Vector Store
We'll now set up a **Pinecone Vector Store** using LlamaIndex.

This vector store connects our Pinecone index with the LlamaIndex framework.

Key configuration details:

1. **Namespace:** A logical grouping within the index, allowing future addition of other product types
2. **Hybrid Search:** Enabling both semantic and keyword search by adding sparse vectors

For more information:

- [Pinecone Namespaces Guide ↗](https://docs.pinecone.io/guides/indexes/use-namespaces)
- [Hybrid Search Introduction ↗](https://www.pinecone.io/learn/hybrid-search-intro/)

In [32]:
vector_store = PineconeVectorStore(
    pinecone_index=pinecone_index,
    namespace='footwear',  # Logical namespace for shoe data
    add_sparse_vector=True  # Enables hybrid search
)

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.


# Create an ingestion pipeline
We'll create an **Ingestion Pipeline** to upsert our vectors into the Pinecone index. No transformations are required since we've pre-generated embeddings with AWS Titan.

>**Note:** As of January 29 2025, LlamaIndex doesn't abstract AWS Titan multimodal embeddings, so we're using our own vectors directly.

In [33]:
pipeline = IngestionPipeline(
    transformations=[],  # No transformations since we pre-generated our embeddings
    vector_store=vector_store
)

# Run the ingestion pipeline

In [None]:
# Run the pipeline to upsert embeddings into Pinecone
pipeline.run(documents=documents, show_progress=True)

# Inspect your Pinecone index

Now that we've upserted the vectors, navigate back to Pinecone. You should see **1306** records in your index, corresponding to the embeddings we added:

![Navigate back to your Pinecone, you should see 1306 records](https://norahsakal.com/assets/images/2_vector_db_with_records-af929a625f221f9e8db01080703422f3.png)

# Query the Vector Database Directly (Without Query Engine or Chat Engine)
Before we use a **Query Engine** or **Chat Engine** to interact with the vector database, we'll start with a direct query using a simple retriever.

This approach demonstrates how you can fetch relevant records from the database without involving advanced reasoning, natural language understanding, or conversation tracking. It's a fundamental way to confirm that the embeddings and metadata are stored correctly and the vector database is functioning as expected.

Next, we'll move on to more advanced querying techniques, including using a **Query Engine** and an **AI Agent** to leverage the power of LLMs.

The first step is creating a simple retriever, but first, we need to define a custom embedding function.

As of **January 29, 2024, LlamaIndex does not abstract AWS Titan multimodal embeddings**, so we'll implement a custom class for this purpose.

# Define helper functions

## Define function to request AWS Titan embeddings
We'll define a helper function to request embeddings from AWS Titan's multimodal model. This function will handle both text and image inputs.

In [37]:
def request_embedding(image_base64=None, text_description=None):
    """
    Request embeddings from AWS Titan multimodal model.

    Parameters:
        image_base64 (str, optional): Base64 encoded image string.
        text_description (str, optional): Text description.

    Returns:
        list: Embedding vector.
    """
    input_data = {"inputImage": image_base64, "inputText": text_description}
    body = json.dumps(input_data)

    # Invoke the Titan multimodal model
    response = bedrock_runtime.invoke_model(
        body=body,
        modelId="amazon.titan-embed-image-v1",
        accept="application/json",
        contentType="application/json"
    )

    response_body = json.loads(response.get("body").read())

    if response_body.get("message"):
        raise ValueError(f"Embeddings generation error: {response_body.get('message')}")

    return response_body.get("embedding")

## Define custom embedding class
We'll now define a custom embedding class that uses the AWS Titan multimodal model to fetch embeddings.

This class overrides key methods in LlamaIndex's BaseEmbedding to integrate AWS Titan into the framework.

In [38]:
class MultimodalEmbeddings(BaseEmbedding):
    """
    Custom embedding class for AWS Titan multimodal embeddings.
    """

    def __init__(self, **kwargs: Any) -> None:
        super().__init__(**kwargs)

    @classmethod
    def class_name(cls) -> str:
        return "multimodal"

    async def _aget_query_embedding(self, query: str) -> List[float]:
        return self._get_query_embedding(query)

    async def _aget_text_embedding(self, text: str) -> List[float]:
        return self._get_text_embedding(text)

    def _get_query_embedding(self, query: str) -> List[float]:
        """
        Get embeddings for a query string.
        """
        return request_embedding(text_description=query)

    def _get_text_embedding(self, text: str) -> List[float]:
        """
        Get embeddings for a text string.
        """
        return request_embedding(text_description=text)

    def _get_text_embeddings(self, texts: List[str]) -> List[List[float]]:
        """
        Get embeddings for a batch of text strings.
        """
        return [request_embedding(text_description=text) for text in texts]

## Instantiate the Custom Class
We'll now instantiate the MultimodalEmbeddings class to use it in our retriever.

In [39]:
# Instantiate the custom embedding model
embed_model = MultimodalEmbeddings()

## Create a Vector Store Index
We'll start by creating a **Vector Store Index** with `LlamaIndex`. This index will allow us to query the Pinecone index using the same vector store we initialized earlier.

In [42]:
# Create a Vector Store Index
vector_index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

## Create a simple retriever
We'll create a simple retriever using the custom embedding model and the vector index.

Key configurations:

1. **similarity_top_k:** Number of top results to retrieve
2. **vector_store_query_mode:** Set to **"hybrid"** for combining semantic and keyword search
3. **alpha:** Weighting between semantic (embedding) and keyword search

In [43]:
# Create a simple retriever
retriever = VectorIndexRetriever(
    index=vector_index,
    embed_model=embed_model,
    similarity_top_k=8,  # Retrieve the top 8 results
    vector_store_query_mode="hybrid",  # Enable hybrid search
    alpha=0.5  # Weighting between semantic and keyword search
)

# Query Vector Database Directly Using a Simple Retriever
We'll use a simple retriever to query the vector database and inspect the results. This method interacts with the embeddings and metadata in a straightforward way, without utilizing an LLM-powered **Query Engine** or **Chat Engine**.

**Why this step matters:**

1. Validates that the vector database is populated correctly
2. Shows how to query embeddings directly, bypassing the overhead of LLM-based reasoning
3. Prepares the groundwork for building advanced workflows with Query Engines and Agents

In the next steps, we'll extend this retriever to integrate with an LLM-powered Query Engine for richer responses.

In [44]:
# Query the vector store for "red shoes"
results = retriever.retrieve("red shoes")

# Display results
for item in results:
    score = item.score
    print(f"Score: {score:.4f}")
    print(f"Text: {item.get_content()}")
    print("-" * 50)

Score: 2.2923
Text: Id men red shoes
--------------------------------------------------
Score: 2.2458
Text: Arrow men red shoes
--------------------------------------------------
Score: 2.2390
Text: Catwalk women red shoes
--------------------------------------------------
Score: 2.2381
Text: Vans men red old skool shoes
--------------------------------------------------
Score: 2.2366
Text: Cobblerz women red shoes
--------------------------------------------------
Score: 2.2335
Text: Converse men black & red shoes
--------------------------------------------------
Score: 2.2253
Text: Converse men red casual shoes
--------------------------------------------------
Score: 2.2251
Text: Fila men leonard red shoes
--------------------------------------------------


## Visualize Vector Database Pull
The vector database query returns a list of red shoes based on the embeddings. To verify the results, let's visualize the pulled vectors.

We'll create a function that loops through the retrieved nodes and displays each image along with its metadata in a row for easy inspection.

In [45]:
def display_nodes_with_images_in_row(vector_database_response_nodes, image_folder_path=None, img_width=100):
    html_content = "<div style='display: flex; flex-wrap: wrap; gap: 20px;'>"

    if image_folder_path is None:
        raise ValueError("You must specify an image folder path.")

    for node in vector_database_response_nodes:
        # Retrieve text and product_id from node metadata
        text = node.metadata.get('text')
        product_id = node.metadata.get('product_id')

        # Generate image path based on product_id
        image_path = os.path.join(image_folder_path, f"{product_id}.jpg")

        if os.path.exists(image_path):
            # Add each text and image in a flex container
            html_content += f"""
                <div style="text-align: center;">
                    <p>{text}</p>
                    <img src='{image_path}' width='{img_width}px' style="padding: 5px;"/>
                </div>
            """
        else:
            # Handle missing images gracefully
            html_content += f"""
                <div style="text-align: center;">
                    <p>{text}</p>
                    <p style='color: red;'>Image not found for product_id {product_id}</p>
                </div>
            """

    # Close the main div
    html_content += "</div>"

    # Display the content as HTML
    display(HTML(html_content))

# Visualize Shoes
Let's visualize the shoes retrieved from the vector database to confirm that the results match the query for "red shoes."

In [46]:
display_nodes_with_images_in_row(results, image_folder_path='../data/footwear')

# Examine the shoes
As shown, all the retrieved shoes are red or have red details, confirming that the naive vector index query works well for focused queries like red shoes:

![As shown, all the retrieved shoes are red or have red details](https://norahsakal.com/assets/images/3_vector_query_pull-fdb72ce1fbd5b2651ba3dab057816d97.png)

## Why not keep using the index alone?
But the vector database pull of red shoes looks great, so why even involve an LLM?

Before we add an LLM, let's try to query the vector database again, but this time with something unrelated, like a **"Thank you!"** from the customer and examine the new response:

![Let's try to query our vector database with something unrelated, like "Thank you!"](https://norahsakal.com/assets/images/4_unrelated_reply-4aff8ef88d4d0cfc39285004aab64960.png)


# Test an unrelated query
Let's run the previous code snippet again, but this time, use the query **"Thank you!"**

In [49]:
# Query the vector store with an unrelated query
results = retriever.retrieve("Thank you!") # Try an unrelated query

# Display results
for item in results:
    score = item.score
    print(f"Score: {score:.4f}")
    print(f"Text: {item.get_content()}")
    print("-" * 50)

Score: 1.1846
Text: Timberland women femmes brown boot
--------------------------------------------------
Score: 1.1839
Text: Adidas women color can pink shoes
--------------------------------------------------
Score: 1.1784
Text: Nike men's air max black shoe
--------------------------------------------------
Score: 1.1750
Text: Id men red shoes
--------------------------------------------------
Score: 1.1746
Text: Adidas originals women superstar 2 white casual shoes
--------------------------------------------------
Score: 1.1744
Text: Timberland women femmes brown casual shoes
--------------------------------------------------
Score: 1.1732
Text: Nike women's transform iii in black pink shoe
--------------------------------------------------
Score: 1.1726
Text: Vans men khaki shoes
--------------------------------------------------


# Visualize pulled shoes

In [50]:
display_nodes_with_images_in_row(results, image_folder_path='../data/footwear')

# Limitations of a naive RAG system
As seen, regardless of the query, the vector database matches the closest vectors based on the embeddings.

In this case, querying with **"Thank you!"** still returns shoes that have vectors most similar to the vectorized **"Thank you!"**:

![Regardless of the query, the vector database matches the closest vectors based on the embeddings](https://norahsakal.com/assets/images/5_unrelated_pull-71472284ac5d2f808254e648e7eaa7bd.png)

# Create query engine
To overcome the limitations of naive queries, we'll integrate an LLM into our workflow by creating a Query Engine.

This Query Engine will:

1. Interpret the user's natural language input
2. Retrieve contextually relevant information from the vector database
3. Enable more dynamic and flexible interactions with the data

For this guide, we'll use the `openai-o4` model for the LLM, but LlamaIndex supports various other LLMs.

- [Supported LLMs ↗](https://docs.llamaindex.ai/en/stable/module_guides/models/llms/modules/)

## OpenAI AP keys
For this next step you'll nedd to add your OpenAI API key.

Navigate to your [OpenAI dashboard ](https://platform.openai.com/api-keys) and generate a new API key:

![Navigate to your OpenAI dashboard and generate API keys](https://d1fiydes8a4qgo.cloudfront.net/blog/2025/1_generate_api_keys.png)

# Read OpenAI API key

In [63]:
# Specify the path to the .env file
dotenv_path = './.env' 

# Load the .env file
load_dotenv(dotenv_path=dotenv_path, override=True)

# Access the OPENAI_API_KEY
pinecone_api_key = os.getenv('PINECONE_API_KEY')
openai_api_key = os.getenv('OPENAI_API_KEY') # Newly added OpenAI API key

# Print the key (optional, for testing purposes)
print(f"Your Pinecone API Key is: {pinecone_api_key[:8]}{'*' * (len(pinecone_api_key) - 8)}")
print(f"Your OpenAI API Key is: {openai_api_key[:8]}{'*' * (len(openai_api_key) - 8)}")

Your Pinecone API Key is: pcsk_4rG*******************************************************************
Your OpenAI API Key is: sk-9wvMO*******************************************


# Initialize LLM

In [66]:
# Initialize LLM
llm = OpenAI_Llama(
    temperature=0.0,
    model="gpt-4o",
    api_key=os.getenv('OPENAI_API_KEY')
)
Settings.llm = llm

# Create Query Engine
We'll now create a Query Engine using the vector index and our custom embedding model. This engine will leverage the LLM for intelligent query interpretation and responses.

In [68]:
# Create a query engine from the vector index
query_engine = vector_index.as_query_engine(
    embed_model=embed_model,
    similarity_top_k=8,
    vector_store_query_mode="hybrid",
    alpha=0.5,
)

# Test query engine with today's challenge
Remember our challenge, a customer asking about black shoes with red details:

**Customer:** "I need women's black shoes with red details":

![Our challenge is a customer that asks about women's black shoes with red details](https://norahsakal.com/assets/images/1_user_inquiry-f2a38240854cbb44f5a6c2412c48d7a8.png)

In [69]:
# Query the engine for black shoes with red details
response = query_engine.query("I need women's black shoes with red details")

print("Chatbot response: ", response.response)

Chatbot response:  The Nike women's transform iii in black pink shoe fits your criteria, as it is a women's black shoe with pink details.


# Visualize pulled shoes

In [71]:
display_nodes_with_images_in_row(response.source_nodes, image_folder_path='../data/footwear')

# Query engine limitations
Looking at the results, the chatbot correctly understands the context of **"women's black shoes with red details"** and recommends one pair of black shoes with red details.

However, while the LLM correctly identifies that black shoes are requested, it fails to pull multiple black shoes with red details, due to the vector query.

Looking at the pulled shoes, 8 out of 10 are **men's** shoes:

In [72]:
display_nodes_with_images_in_row(response.source_nodes, image_folder_path='../data/footwear')

# Why did the naive chatbot fail?
The naive chatbot struggled with this query because it is designed to vectorize the **entire user query** and match it against the shoes in the vector database.

While this approach works well for straightforward searches, like the **"red shoes"** we tried earlier, it has limitations for multiple colors like "**black** shoes with **red** details":

![The LLM correctly identifies that black shoes are requested but it fails to pull multiple black shoes with red details](https://norahsakal.com/assets/images/1_query_engine_output-2fc6cb81cf312156bc459ce11240317d.png)

# Verify women's black shoes with red details in dataset

Despite that black shoes with red details exist in the dataset, the simple RAG system failed to retrieve them:

In [73]:
# Verify the dataset for women's black shoes with red details
black_red_shoe_filter = df_shoes[(df_shoes['gender'] == 'women') &
         (df_shoes['color'] == 'black') &
         (df_shoes['color_details'].apply(lambda x: 'red' in x))]
black_red_shoe_filter

Unnamed: 0,product_title,gender,product_type,color,usage,color_details,heel_height,heel_type,price_usd,brand,product_id,image,titan_embedding,token_count
705,Nike women double team lite black shoes,women,casual shoes,black,casual,[red],,,150,nike,706,706.jpg,"[0.01401766, -0.007690088, 0.004016977, -0.004...",11
912,Puma women saba ballet dc3 black casual shoes,women,casual shoes,black,casual,[red],,,170,puma,913,913.jpg,"[0.053020123, 0.0035624718, 0.017885217, -0.00...",13
958,Puma women black crazy slide flats,women,flats,black,casual,[red],,,130,puma,959,959.jpg,"[0.026023556, -0.015191399, -0.00014282111, -0...",9
996,Catwalk women black heels,women,heels,black,casual,[red],2.0,stiletto,70,catwalk,997,997.jpg,"[0.038116843, 0.0013890459, -0.036084555, 0.00...",7
1211,Adidas women court sequence black shoe,women,casual shoes,black,casual,[red],,,65,adidas,1212,1212.jpg,"[0.02072281, -0.00848748, -0.025447942, -0.055...",9
1285,Hm women black sandals,women,flats,black,casual,[red],,,125,hm,1286,1286.jpg,"[-0.0054102736, 0.009834683, -0.015783966, 0.0...",8


# Visualize shoes in database

In [74]:
width = 100
images_html = ""
for img_file in black_red_shoe_filter['image']:
    img_path = os.path.join(image_data_path, img_file)
    # Add each image as an HTML <img> tag
    images_html += f'<img src="{img_path}" style="width:{width}px; margin-right:10px;">'
# Display all images in a row using HTML
display(HTML(f'<div style="display: flex; align-items: center;">{images_html}</div>'))

# Build an AI agent

## Create vector info
We'll start by defining metadata about the vector store we created to allow the AI agent to filter results based on the shoe attributes like:

- gender
- usage
- color
- color details
- heel heights

This metadata will enhance the AI agent's ability to refine queries and pull relevant shoes.

## Define helper function

In [75]:
def generate_options_string(column, is_list_column=False):
    # Extract unique values
    if is_list_column:
        # For list columns, flatten and get unique values
        unique_values = set(item for sublist in column.dropna() for item in sublist)
    else:
        # For non-list columns, get unique values
        unique_values = set(column.dropna())

    # Sort values for consistency
    sorted_values = sorted(unique_values)

    # Handle the string formatting
    if not sorted_values:
        return "No values available"
    elif len(sorted_values) == 1:
        return f"'{sorted_values[0]}'"
    else:
        formatted_values = ", ".join(f"'{value}'" for value in sorted_values[:-1])
        formatted_string = f"Either {formatted_values} or '{sorted_values[-1]}'"
        print(formatted_string)
        return formatted_string

## Create vector store info strings

In [76]:
color_string = generate_options_string(df_shoes['color'])
color_details_string = generate_options_string(df_shoes['color_details'], is_list_column=True)
gender_string = generate_options_string(df_shoes['gender'])
usage_string = generate_options_string(df_shoes['usage'])

Either 'beige', 'black', 'blue', 'bronze', 'brown', 'charcoal', 'copper', 'coral', 'cream', 'gold', 'green', 'grey', 'khaki', 'lavender', 'maroon', 'metallic', 'multi', 'mushroom brown', 'mustard', 'navy blue', 'nude', 'off white', 'olive', 'orange', 'peach', 'pink', 'purple', 'red', 'silver', 'tan', 'taupe', 'teal', 'turquoise blue', 'white' or 'yellow'
Either 'beige', 'black', 'blue', 'bronze', 'brown', 'cream', 'gold', 'green', 'grey', 'maroon', 'metallic', 'multi', 'navy blue', 'off white', 'olive', 'orange', 'pink', 'purple', 'red', 'silver', 'tan', 'teal', 'white' or 'yellow'
Either 'men' or 'women'
Either 'boots', 'casual', 'formal', 'semi formal', 'smart casual' or 'sports'


## Create vector store info

In [77]:
# Create vector store information
vector_store_info = VectorStoreInfo(
    content_info="shoes in the shoe store",
    metadata_info=[
        MetadataInfo(
            name="gender",
            type="str",
            description=f"{gender_string}", # our string for gender
        ),
        MetadataInfo(
            name="usage",
            type="str",
            description=f"{usage_string}", # our string for usage
        ),
        MetadataInfo(
            name="color",
            type="str",
            description=f"{color_string}", # our string for color
        ),
        MetadataInfo(
            name="color_details",
            type="List[str]",
            description=f"A list of colors that are in a specified array, filter operator 'in', value has to be List[str] each color being one of: {color_details_string}",
        ),
        MetadataInfo(
            name="price",
            type="int",
            description="The price of the shoes in USD. Must be greater than or equal to 0.",
        ),
    ],
)

## Create Tools
Tools are essential for enabling the AI Agent to interact with the vector store.
We'll define two tools:

1. **create_metadata_filter:** Generates metadata filters for refining the search query
2. **search_footwear_database:** Searches the vector database using the query and optional filters

# Define agent metadata filter tool

In [78]:
# Define a tool to create metadata filters
def create_metadata_filter(filter_string):
    """
    Creates metadata filter JSON for vector database queries.

    Args:
        filter_string (str): Query string for generating metadata filters.

    Returns:
        str: JSON string of filters.
    """
    class CustomRetriever(VectorIndexAutoRetriever):
        def __init__(self, vector_index, vector_store_info, **kwargs):
            super().__init__(vector_index, vector_store_info, **kwargs)

        def _retrieve(self, query, **kwargs):
            query_bundle = QueryBundle(query_str=query)
            retrieval_spec = self.generate_retrieval_spec(query_bundle)
            return retrieval_spec

    # Separate LLM for generating a filter
    llm_filter = OpenAI_Llama(
        temperature=1, # higher temperature than 0 for creativity
        model="gpt-4o",
        api_key=os.environ["OPENAI_API_KEY"],
        system_prompt="You are a helpful assistant, help the user purchase shoes.",
    )

    custom_retriever = CustomRetriever(
        vector_index=vector_index,
        vector_store_info=vector_store_info,
        llm=llm_filter
    )

    retrieval_spec = custom_retriever._retrieve(filter_string)

    filters_dicts = [{'key': f.key, 'value': f.value, 'operator': f.operator.value} for f in retrieval_spec.filters]
    return json.dumps(filters_dicts)

# Define agent vector database search tool

In [79]:
# Define a tool to search the footwear vector database
def search_footwear_database(query_str, filters_json=None):
    """
    Searches the footwear vector database using a query string and optional filters.

    Args:
        query_str (str): Query string describing the footwear.
        filters_json (Optional[List]): JSON list of metadata filters.

    Returns:
        list: Search results from the vector database.
    """

    # Generate the embedding for the query string
    query_embedding = embed_model._get_query_embedding(query_str)

    # Deserialize from JSON
    if filters_json is None:
        metadata_filters = None
    else:
        metadata_filters = MetadataFilters.from_dicts(filters_json, condition=FilterCondition.AND)


    vector_store_query = VectorStoreQuery(
        query_str=query_str,
        query_embedding=query_embedding,
        alpha=0.5,
        mode='hybrid',
        filters=metadata_filters,
        similarity_top_k=10
    )

    # Execute the query against the vector store
    query_result = vector_store.query(vector_store_query)

    # Create output without embeddings
    nodes_with_scores = []
    for index, node in enumerate(query_result.nodes):
        score: Optional[float] = None
        if query_result.similarities is not None:
            score = query_result.similarities[index]
        nodes_with_scores.append({
            'color': node.metadata['color'],
            'text': node.metadata['text'],
            'gender': node.metadata['gender'],
            'product_type': node.metadata['product_type'],
            'product_id': node.metadata['product_id'],
            'usage': node.metadata['usage'],
            'price': node.metadata['price'],
            'brand': node.metadata['brand'],
            'heel_height': node.metadata.get('heel_height'),  # Add heel_height if present
            'heel_type': node.metadata.get('heel_type'),  # Add heel_type if present
            'similarity_score': score
        })

    return nodes_with_scores

## Define agent tools

In [80]:
create_metadata_filters_tool = FunctionTool.from_defaults(
    name="create_metadata_filter",
    fn=create_metadata_filter
)

query_vector_database_tool = FunctionTool.from_defaults(
    name="search_footwear_database",
    fn=search_footwear_database
)

# Create AI Agent
We'll now define an AI Agent capable of reasoning over the data, generating filters, and performing refined searches to address customer queries more effectively.

## Create the agent worker

In [81]:
# Create the agent worker
agent_worker = FunctionCallingAgentWorker.from_tools(
    [
        create_metadata_filters_tool,
        query_vector_database_tool,
    ],
    llm=llm,
    verbose=True,
    system_prompt="""\
You are an agent designed to answer customers looking for shoes.\
Please always use the tools provided to answer a question. Do not rely on prior knowledge.\
Drive sales and always feel free to ask a user for more information.\

- Always consider if filters are needed based on the user's query.
- Use the tools provided to answer questions; do not rely on prior knowledge.
- Always feel free to ask a user for more information.

**Example 1:**

User Query: "Hi! I'm going to a party and I'm looking for red women's shoes. Thank you!"

Agent Actions:

1. Determine what query string to use for filters e.g. "red woman's shoes"
2. Call:
   filters_dicts = create_metadata_filter_string("red woman's shoes")
3. Call:
   results = search_footwear_database(query_str='shoes', filters_json=filters_dicts)

**Example 2:**

User Query: "Hi! I'm going to a meeting and I'm looking for formal women's shoes. Thank you!"

Agent Actions:

1. Determine what query string to use for filters e.g. "formal woman's shoes"
2. Call:
   filters_dicts = create_metadata_filter_string("formal woman's shoes")
3. Call:
   results = search_footwear_database(query_str='shoes', filters_json=filters_dicts)

**Example 3:**

User Query: "I'm looking for shoes"

Agent Actions:

1. Ask for more information

**Example 4:**

User Query: "I'm looking for stable heels"

Agent Actions:

1. Determine what query string to use for filters e.g. "women's heels"
2. Call:
   filters_dicts = create_metadata_filter_string("women's heels")
3. Call:
   results = search_footwear_database(query_str='wedges', filters_json=filters_dicts)

Remember to follow these instructions carefully.
""",
)

# Create the AI agent runner

In [82]:
agent = AgentRunner(agent_worker)

# Test AI agent
Let's test the AI agent by asking for "women's black shoes with red details":

In [83]:
# Test the agent
agent_response = agent.chat("I need women's black shoes with red details")

Added user message to memory: I need women's black shoes with red details
=== Calling Function ===
Calling function: create_metadata_filter with args: {"filter_string": "women's black shoes with red details"}
=== Function Output ===
[{"key": "gender", "value": "women", "operator": "=="}, {"key": "color", "value": "black", "operator": "=="}, {"key": "color_details", "value": ["red"], "operator": "in"}]
=== Calling Function ===
Calling function: search_footwear_database with args: {"query_str": "shoes", "filters_json": [{"key": "gender", "value": "women", "operator": "=="}, {"key": "color", "value": "black", "operator": "=="}, {"key": "color_details", "value": ["red"], "operator": "in"}]}
=== Function Output ===
[{'color': 'black', 'text': 'Puma women saba ballet dc3 black casual shoes', 'gender': 'women', 'product_type': 'casual shoes', 'product_id': 913, 'usage': 'casual', 'price': 170, 'brand': 'puma', 'heel_height': None, 'heel_type': None, 'similarity_score': 1.69973636}, {'color': 

# Interpreting the Query
Unlike a naive query engine, which vectorizes the full user query, the AI agent approaches the request by changing the query and add a custom filter.

For the query "I need women's black shoes with red details", the agent decides to not vectorize the entire customer query. 

Instead, the agent vectorizes just the query **shoes** and then creates a filter for the condition **women's black shoes with red** details and then recommends 6 different shoes:

![The AI agent looks through the retrieved vector database results and provide a response with recommended shoes](https://norahsakal.com/assets/images/2_agent_response-035170dfb641aec09c729ddb781c845f.png)

# Visualize the AI agent's recommendations

Let's go ahead create a helper function that visualizes the AI agent's recommended shoes:

In [88]:
def visualize_agent_response(agent_response, image_folder_path=None, img_width=150, threshold=98):
    """
    Visualizes products from agent response if they match (fuzzily) names in an unstructured string.

    Args:
    - agent_response: Agent response.
    - image_folder: Path to the folder containing product images.
    - img_width: Width of the product images in the visualization.
    - threshold: Minimum similarity score for fuzzy matching.

    Returns:
    - None: Displays the visualization directly in the notebook.
    """
    if image_folder_path is None:
        raise ValueError("You must specify an image folder path.")

    # Extract product names from raw output and make them lowercase
    products = [product['text'].lower() for product in agent_response.sources[1].raw_output]

    # Prepare HTML content for visualization
    html_content = "<div style='display: flex; flex-wrap: wrap; gap: 20px;'>"

    # Loop through the products and match with unstructured string
    for product in agent_response.sources[1].raw_output:
        product_name = product['text'].lower()

        # Perform fuzzy matching
        match = process.extractOne(product_name, [agent_response.response.lower()], scorer=fuzz.partial_ratio)
        if match and match[1] > threshold:  # If a match is found and meets the threshold
            # Generate image path based on product_id
            image_path = os.path.join(image_folder_path, f"{product['product_id']}.jpg")

            # Append product info and image to HTML content
            html_content += f"""
                <div style="text-align: center;">
                    <p>{product['text']}</p>
                    <img src='{image_path}' width='{img_width}px' style="padding: 5px;"/>
                </div>
            """

    # Close the main div
    html_content += "</div>"

    # Display the content as HTML
    display(HTML(html_content))

## Visulize pulled shoes

In [89]:
# Call the function
visualize_agent_response(agent_response, image_folder_path=image_data_path)

# Compare AI agent recommendations with available shoes
Looking back at when we filtered our shoe directory for black women's shoes with red details, we received these **6 pair of shoes**:

In [90]:
# Verify the dataset for women's black shoes with red details
black_red_shoe_filter = df_shoes[(df_shoes['gender'] == 'women') &
         (df_shoes['color'] == 'black') &
         (df_shoes['color_details'].apply(lambda x: 'red' in x))]

width = 100
images_html = ""
for img_file in black_red_shoe_filter['image']:
    img_path = os.path.join(image_data_path, img_file)
    # Add each image as an HTML <img> tag
    images_html += f'<img src="{img_path}" style="width:{width}px; margin-right:10px;">'
# Display all images in a row using HTML
display(HTML(f'<div style="display: flex; align-items: center;">{images_html}</div>'))

The AI agent reply includes all of these 6 pair of shoes:

In [92]:
print(agent_response)

Here are some women's black shoes with red details that you might like:

1. **Puma Women Saba Ballet DC3 Black Casual Shoes**
   - Price: $170
   - Brand: Puma
   - Type: Casual Shoes

2. **Nike Women Double Team Lite Black Shoes**
   - Price: $150
   - Brand: Nike
   - Type: Casual Shoes

3. **H&M Women Black Sandals**
   - Price: $125
   - Brand: H&M
   - Type: Flats

4. **Catwalk Women Black Heels**
   - Price: $70
   - Brand: Catwalk
   - Type: Heels
   - Heel Height: 2.0 inches
   - Heel Type: Stiletto

5. **Adidas Women Court Sequence Black Shoe**
   - Price: $65
   - Brand: Adidas
   - Type: Casual Shoes

6. **Puma Women Black Crazy Slide Flats**
   - Price: $130
   - Brand: Puma
   - Type: Flats

Let me know if you need more information on any of these options!


# How did the AI agent succeed?
- Understands multiple color request
- Applies filters to return all matches, not just the first found
- Leverages metadata (gender and color_details) to ensure accurate results

## Key takeaways
**Naive chatbot limitation**
- Finds limited or no results due to lack of multiple color filtering

**AI agent advantages**
- Accurately applies multiple color filters
- Shows all products that fit the user's request
- Enhances the shopping experience with full, accurate results

# Conclusion
When users specify multiple color requirements like black shoes with red details, naive chatbots fail to apply proper filters and return only partial matches.

AI agents, on the other hand, use metadata filtering and reasoning to find all suitable options, improving both accuracy and user satisfaction.

# Want to go deeper?

This Notebook is part of a free mini-course where we'll dig deeper into the mechanics behind building this AI agent.

## [Enroll for free ↗](https://norahsakal.gumroad.com/l/mini-course-1)

[![Mini-course](https://d1fiydes8a4qgo.cloudfront.net/mini-courses/mini-course-1/social_media_cover.png)](https://norahsakal.com/courses)