In [2]:
import ast
import base64
from datetime import datetime
import json
import os
from typing import Any, List, Optional

import boto3
from botocore.exceptions import NoCredentialsError
from IPython.display import display, Image, HTML

# LlamaIndex
from llama_index.core import Document
from llama_index.core import Settings
from llama_index.core import VectorStoreIndex

from llama_index.core.embeddings import BaseEmbedding
from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.schema import QueryBundle
from llama_index.core.tools import FunctionTool

# LlamaIndex agents
from llama_index.core.agent import FunctionCallingAgentWorker, AgentRunner

# LlamaIndex LLMs
from llama_index.llms.openai import OpenAI as OpenAI_Llama

# LlamaIndex metadata filters
from llama_index.core.vector_stores.types import (
    MetadataFilters,FilterCondition
)

# LlamaIndex retrievers
from llama_index.core.retrievers import VectorIndexAutoRetriever, VectorIndexRetriever

# LlamaIndex vector stores
from llama_index.core.vector_stores import MetadataInfo, VectorStoreInfo
from llama_index.vector_stores.pinecone import PineconeVectorStore
from llama_index.core.vector_stores.types import VectorStoreQuery

# Pinecone
from pinecone import Pinecone, ServerlessSpec

import pandas as pd
from rapidfuzz import fuzz, process
from tqdm import tqdm
from tqdm.autonotebook import tqdm

# Advent Calendar Day 2: How AI Agents Improve Naive Chatbots by Understanding Context Shifts

This December, we're highlighting the limitations of simple AI chatbots in online retail and demonstrating how **AI agents** enhance customer interactions.

Each day, we'll explore a common challenge faced by naive Retrieval Augmented Generation (RAG) chatbot systems and show how AI agents overcome them. 

Todays topic is about how AI agents improve naive chatbots by understanding context shifts.

![Cover image](images/2_dec/cover.png)

## Introducing SoleMates

***SoleMates*** is a fictional online shoe store that we'll use as a practical example throughout this tutorial.

We'll explore interactions between customers and chatbots at SoleMates, highlighting the differences between basic chatbots and advanced AI agents.

![SoleMates Illustration](images/solemates.png)


## Today's Challenge: Failure to Adapt After Context Shift

### Scenario

A customer initiates a chat with **SoleMates**:

**Customer:** "Hi! I’m looking for women's casual shoes"

![A customer initiates a chat with SoleMates](images/2_dec/1.png)


# Step-by-step walkthrough
Let's build a simple chatbot that can answer this, step-by-step together.

## Load Shoe Data

Let's start by reading the SoleMates shoe dataset. This dataset contains detailed product information, such as shoe colors and heel heights, which we'll transform into embeddings and store in a cloud-based Pinecone vector database. 

In [3]:
# Load the SoleMates shoe dataset
df_shoes = pd.read_csv('data/solemates_shoe_directory.csv')

# Convert 'color_details' from string representation of a list to an actual list
df_shoes['color_details'] = df_shoes['color_details'].apply(ast.literal_eval)

# Ensure 'heels_height' is treated as a nullable integer type
df_shoes['heels_height'] = df_shoes['heels_height'].astype('Int64')

# Display the first few rows of the dataset
df_shoes.head()

Unnamed: 0,product_id,gender,category,sub_category,product_type,color,color_details,usage,product_title,image,price_usd,heels_height,brand
0,1,men,footwear,shoes,sports shoes,black,[neon green],sports,Adidas men eqt nitro fashion black sports shoes,1.jpg,120,,adidas
1,2,men,footwear,shoes,sports shoes,black,[white],sports,Puma men's yugorun black white shoe,2.jpg,50,,puma
2,3,men,footwear,shoes,boots,black,[],casual,Timberland men black casual shoes,3.jpg,60,,timberland
3,4,men,footwear,shoes,casual shoes,black,[],casual,Provogue men black shoes,4.jpg,125,,provogue
4,5,men,footwear,shoes,formal shoes,black,[],formal,Lee cooper men black shoe,5.jpg,155,,lee cooper


## Cost of Vectorization and Pre-Embedded Dataset

Vectorizing datasets with AWS Bedrock and the Titan multimodal model involves costs based on the number of input tokens and images:

- **Text embeddings**: $0.0008 per 1,000 input tokens  

- **Image embeddings**: $0.00006 per image  

The provided SoleMates dataset is small, containing just 88 pairs of shoes, making it affordable to vectorize. For this dataset, I calculated the total cost of vectorization and summarized the token counts below:

- **Token Count**: `858` tokens  
- **Total Cost**: `$0.006`  

If you prefer not to generate embeddings yourself or don't have access to AWS, you can use a pre-embedded dataset that I've prepared as a CSV file. This file includes all embeddings and token counts, allowing you to follow the guide without incurring additional costs. However, for hands-on experience, I recommend running the embedding process to understand the workflow.

To load the pre-embedded dataset, use the following code:
```python
# Load pre-embedded dataset
df_shoes = pd.read_csv('data/solemates_shoe_directory_pre_embedded_shoes.csv')
```
This step is entirely optional and designed to accommodate various levels of access and resources.

### Prepare Amazon Bedrock for Embedding Generation

To vectorize our product data, we'll generate embeddings for each product using AWS Titan. These embeddings combine image and text data to represent each product in a format suitable for search and recommendation systems.

>**Important Note on Cost**:  
>Vectorizing datasets incurs a cost. The SoleMates dataset contains 88 pairs of shoes, resulting in an estimated total cost of `$0.006`.
>
>I've added a token count column to help track these costs, and you can calculate your own total for larger datasets.

If you'd rather not generate embeddings yourself, you can load a pre-embedded version of the dataset I've provided. This is entirely optional but ensures you can still follow along with the guide:
```python
# Load pre-embedded dataset
df_shoes = pd.read_csv('data/solemates_shoe_directory_pre_embedded_shoes.csv')
```

## Getting started with Amazon Bedrock
To use Amazon Bedrock for embedding generation, start by setting up your AWS environment:

1. Create an AWS account if you don't already have one
2. Set up an AWS Identity and Access Management (IAM) role with permissions tailored for Amazon Bedrock
3. Submit a request to access the foundation models (FMs) you'd like to use

Next, we'll initialize the Bedrock runtime client, which allows us to interact with AWS Titan for embedding generation.


## Set up AWS Bedrock client

In [5]:
# Define your AWS profile 
# Replace 'your-profile-name' with the name of your AWS CLI profile
# To use your default AWS profile, leave 'aws_profile' as None
aws_profile = os.environ.get('AWS_PROFILE')

# Specify the AWS region where Bedrock is available
aws_region_name = "us-east-1"

try:
    # Set the default session for the specified profile
    if aws_profile:
        boto3.setup_default_session(profile_name=aws_profile)
    else:
        boto3.setup_default_session()  # Use default AWS profile if none is specified
    
    # Initialize the Bedrock runtime client
    bedrock_runtime = boto3.client(
        service_name="bedrock-runtime",
        region_name=aws_region_name
    )
except NoCredentialsError:
    print("AWS credentials not found. Please configure your AWS profile.")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

## Generate Embeddings for Product Data

To prepare our product data for the vector database, we'll generate embeddings for each product using AWS Titan. These embeddings combine image and text data to represent each product in a format suitable for search and recommendation systems.

Before generating embeddings, we'll initialize two new columns in the dataset:
- **`titan_embedding`**: To store the embedding vectors
- **`token_count`**: To store the token count for each product title

Then, we'll define a function to generate embeddings and apply it to the dataset.

## Initialize Columns for Embeddings

In [6]:
# Initialize columns to store embeddings and token counts
df_shoes['titan_embedding'] = None  # Placeholder for embedding vectors
df_shoes['token_count'] = None  # Placeholder for token counts

In [7]:
# Main function to generate image and text embeddings
def generate_embeddings(df, image_col='image', text_col='product_title', embedding_col='embedding', image_folder='data/footwear'):
    for index, row in tqdm(df.iterrows(), total=df.shape[0], desc="Generating embeddings"):
        try:
            # Prepare image file as base64
            image_path = os.path.join(image_folder, row[image_col])
            with open(image_path, 'rb') as img_file:
                image_base64 = base64.b64encode(img_file.read()).decode('utf-8')
            
            # Create input data for the model
            input_data = {"inputImage": image_base64, "inputText": row[text_col]}

            # Invoke AWS Titan model via Bedrock runtime
            response = bedrock_runtime.invoke_model(
                body=json.dumps(input_data),
                modelId="amazon.titan-embed-image-v1",
                accept="application/json",
                contentType="application/json"
            )
            response_body = json.loads(response.get("body").read())

            # Extract embedding and token count from response
            embedding = response_body.get("embedding")
            token_count = response_body.get("inputTextTokenCount")

            # Validate and save the embedding
            if isinstance(embedding, list):
                df.at[index, embedding_col] = embedding  # Save embedding as a list
                df.at[index, 'token_count'] = int(token_count)  # Save token count as an integer
            else:
                raise ValueError("Embedding is not a list as expected.")
                            
        except Exception as e:
            print(f"Error for row {index}: {e}")
            df.at[index, embedding_col] = None  # Handle errors gracefully
            
    return df

## Generate Embeddings

In [8]:
# Generate embeddings for the product data
df_shoes = generate_embeddings(df=df_shoes, embedding_col='titan_embedding')

Generating embeddings:   0%|          | 0/88 [00:00<?, ?it/s]

## Save Dataset for Reuse

In [None]:
# Save the dataset with generated embeddings to a CSV file
# Get today's date in YYYY_MM_DD format
today = datetime.now().strftime('%Y_%m_%d')

# Save the dataset with generated embeddings to a CSV file
df_shoes.to_csv(f'shoes_with_embeddings_token_{today}.csv', index=False)
print(f"Dataset with embeddings saved as 'shoes_with_embeddings_token_{today}.csv'")

## Create a Dictionary with Product Data

Before we create LlamaIndex `Document` objects, we need to structure the product data into dictionaries. These dictionaries include:

1. **Text**: The product title that will be used for embedding queries.
2. **Metadata**: A dictionary containing detailed attributes for each product (e.g., color, gender, usage, price).
3. **Embedding**: The Titan embeddings generated earlier.

This dictionary format ensures the data is well-organized for creating `Document` objects in the next step.


In [9]:
# Convert DataFrame rows into a list of dictionaries for LlamaIndex
product_data = df_shoes.apply(lambda row: {
    'text': row['product_title'],
    'metadata': {
        'color': row['color'],
        'text': row['product_title'],
        'gender': row['gender'],
        'product_type': row['product_type'],
        'usage': row['usage'],
        'price': row['price_usd'],
        'product_id': row['product_id'],
        'brand': row['brand'],
        **({'heels_height': int(row['heels_height'])} if not pd.isna(row['heels_height']) else {}),
        **({'color_details': row['color_details']} if row['color_details'] else {})
    },
    'embedding': row['titan_embedding']
}, axis=1).tolist()

# Preview the first product dictionary
#product_data[0]


## Create LlamaIndex Documents

We'll now use the product data dictionaries to create LlamaIndex `Document` objects. 

These `Documents` are crucial because:

- They act as containers for our product data and embeddings.
- They enable seamless interaction with Pinecone for upserting embeddings.

Each `Document` includes:
1. The **text** (product title) for embedding and query purposes
2. **Metadata** with attributes like color, gender, and price
3. The **embedding** generated earlier
4. An **exclusion list** (`excluded_embed_metadata_keys`) to prevent unnecessary metadata fields from being embedded, ensuring optimal performance and cost-efficiency


## Create LlamaIndex Documents

In [10]:
# Create LlamaIndex Document objects
documents = []
for doc in product_data:
    documents.append(
        Document(
            text=doc["text"],
            extra_info=doc["metadata"],
            embedding=doc['embedding'],
            
            # Avoid embedding unnecessary metadata
            excluded_embed_metadata_keys=[
                'color',
                'gender',
                'product_type',
                'usage',
                'text',
                'price',
                'product_id',
                'brand',
                'heels_height',
                'color_details'
            ]
        )
    )

# Confirm the first Document object
documents[0].metadata

{'color': 'black',
 'text': 'Adidas men eqt nitro fashion black sports shoes',
 'gender': 'men',
 'product_type': 'sports shoes',
 'usage': 'sports',
 'price': 120,
 'product_id': 1,
 'brand': 'adidas',
 'color_details': ['neon green']}

## Initialize Pinecone

To interact with Pinecone, you'll first need an account and API keys. If you don't already have them, [create a Pinecone account](https://www.pinecone.io/) and retrieve your API key.

Pinecone is a vector database designed to store and query embeddings. We'll use Pinecone to upsert the AWS Titan embeddings we generated earlier, enabling efficient similarity and hybrid search.


In [11]:
# Initialize Pinecone client with API key
pc = Pinecone(api_key=os.environ['PINECONE_API_KEY'])
index_name = "solemates"  # Replace with your desired index name

## List Current Indexes

Let's list the existing indexes in your Pinecone account to ensure no duplicates before creating a new index.

In [33]:
# List current indexes
pc.list_indexes()

{'indexes': []}

## Create Index

Next, we'll create a Pinecone index. An index stores the embeddings and metadata for your data.

- **Dimension**: Matches the size of the embeddings we're using (1024 for AWS Titan multimodal embeddings)
- **Metric**: Defines how similarity is calculated (e.g., dot product, cosine similarity)
- **ServerlessSpec**: Specifies the cloud provider and region for your index

If the index already exists, this step will be skipped

In [12]:
if index_name not in pc.list_indexes().names():
    pc.create_index(
        name=index_name,
        dimension=1024,  # AWS Titan embeddings require 1024 dimensions
        metric="dotproduct",  # Required for hybrid search with Pinecone
        spec=ServerlessSpec(
            cloud="aws",
            region="us-east-1"
        )
    )

## Inspect Pinecone Index

Navigate to your Pinecone dashboard, and you should now see your new index with **0 records (vectors)**, as it hasn't been populated yet:

![Pinecone shows an empty index](images/pinecone/4_pinecone_empty_index.png)


## Initialize Pinecone Index

After creating the index, we'll initialize it for further operations like upserting embeddings and querying vectors.

In [13]:
pinecone_index = pc.Index(index_name)

## Create Pinecone Vector Store

We'll now set up a **Pinecone Vector Store** using LlamaIndex. 

This vector store connects our Pinecone index with the LlamaIndex framework.

Key configuration details:
1. **Namespace**: A logical grouping within the index, allowing future addition of other product types
2. **Hybrid Search**: Enabling both semantic and keyword search by adding sparse vectors

For more information:
- [Pinecone Namespaces Guide](https://docs.pinecone.io/guides/indexes/use-namespaces)
- [Hybrid Search Introduction](https://www.pinecone.io/learn/hybrid-search-intro/)


In [14]:
vector_store = PineconeVectorStore(
    pinecone_index=pinecone_index,
    namespace='footwear',  # Logical namespace for shoe data
    add_sparse_vector=True  # Enables hybrid search
)

## Create an Ingestion Pipeline

We'll create an **Ingestion Pipeline** to upsert our vectors into the Pinecone index. 
No transformations are required since we've pre-generated embeddings with AWS Titan.

>**Note**: As of Dec 4 2024, LlamaIndex doesn't abstract AWS Titan multimodal embeddings, so we're using our own vectors directly.


In [15]:
pipeline = IngestionPipeline(
    transformations=[],  # No transformations since embeddings are pre-generated
    vector_store=vector_store
)

## Run the Ingestion Pipeline

This step upserts the embeddings into Pinecone for storage and querying.

- **Cost Note**: Pinecone charges $2.00 per 1M vectors unless you're on the free plan
- **Time Note**: It may take a minute or two for the vectors to become visible in your Pinecone index


In [None]:
# Run the pipeline to upsert embeddings into Pinecone
pipeline.run(documents=documents, show_progress=True)

## Inspect Pinecone Index

Now that we've upserted the vectors, navigate back to Pinecone. You should see **88 records** in your index, corresponding to the embeddings we added:

![Populated Pinecone index](images/pinecone/5_pinecone_populated_index.png)


## Test Querying the Vector Store

Now that we have upserted all our shoe vectors, let's test querying the vector database.  

We'll start by creating a **Vector Store Index** with LlamaIndex. This index will allow us to query the Pinecone index using the same vector store we initialized earlier.


In [16]:
# Create a Vector Store Index
vector_index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

## Query the Vector Database Directly (Without Query Engine or Chat Engine)

Before we use a **Query Engine** or **Chat Engine** to interact with the vector database, we'll start with a direct query using a simple retriever.  

This approach demonstrates how you can fetch relevant records from the database without involving advanced reasoning, natural language understanding, or conversation tracking. It's a fundamental way to confirm that the embeddings and metadata are stored correctly and the vector database is functioning as expected.

Next, we'll move on to more advanced querying techniques, including using a **Query Engine** and an **Agent** to leverage the power of LLMs.

The first step is creating a simple retriever, but first, we need to define a custom embedding function. 

As of Dec 4, 2024, **LlamaIndex does not abstract AWS Titan multimodal embeddings**, so we'll implement a custom class for this purpose.

## Create a Function to Request AWS Titan Embeddings

We'll define a helper function to request embeddings from AWS Titan's multimodal model. This function will handle both text and image inputs.


In [17]:
def request_embedding(image_base64=None, text_description=None):
    """
    Request embeddings from AWS Titan multimodal model.

    Parameters:
        image_base64 (str, optional): Base64 encoded image string.
        text_description (str, optional): Text description.

    Returns:
        list: Embedding vector.
    """
    input_data = {"inputImage": image_base64, "inputText": text_description}
    body = json.dumps(input_data)

    # Invoke the Titan multimodal model
    response = bedrock_runtime.invoke_model(
        body=body,
        modelId="amazon.titan-embed-image-v1",
        accept="application/json",
        contentType="application/json"
    )

    response_body = json.loads(response.get("body").read())

    if response_body.get("message"):
        raise ValueError(f"Embeddings generation error: {response_body.get('message')}")

    return response_body.get("embedding")


## Create Custom Embeddings Class

We'll now define a custom embedding class that uses the AWS Titan multimodal model to fetch embeddings. 

This class overrides key methods in LlamaIndex's `BaseEmbedding` to integrate AWS Titan into the framework.


In [18]:
class MultimodalEmbeddings(BaseEmbedding):
    """
    Custom embedding class for AWS Titan multimodal embeddings.
    """

    def __init__(self, **kwargs: Any) -> None:
        super().__init__(**kwargs)

    @classmethod
    def class_name(cls) -> str:
        return "multimodal"
    
    async def _aget_query_embedding(self, query: str) -> List[float]:
        return self._get_query_embedding(query)

    async def _aget_text_embedding(self, text: str) -> List[float]:
        return self._get_text_embedding(text)

    def _get_query_embedding(self, query: str) -> List[float]:
        """
        Get embeddings for a query string.
        """
        return request_embedding(text_description=query)

    def _get_text_embedding(self, text: str) -> List[float]:
        """
        Get embeddings for a text string.
        """
        return request_embedding(text_description=text)

    def _get_text_embeddings(self, texts: List[str]) -> List[List[float]]:
        """
        Get embeddings for a batch of text strings.
        """
        return [request_embedding(text_description=text) for text in texts]


## Instantiate the Custom Class

We'll now instantiate the `MultimodalEmbeddings` class to use it in our retriever.


In [19]:
# Instantiate the custom embedding model
embed_model = MultimodalEmbeddings()

## Create a Simple Retriever

We'll create a simple retriever using the custom embedding model and the vector index.  

Key configurations:
1. **`similarity_top_k`**: Number of top results to retrieve
2. **`vector_store_query_mode`**: Set to **"hybrid"** for combining semantic and keyword search
3. **`alpha`**: Weighting between semantic (embedding) and keyword search


## Create Retriever

In [20]:
# Create a simple retriever
retriever = VectorIndexRetriever(
    index=vector_index,
    embed_model=embed_model,
    similarity_top_k=8,  # Retrieve the top 8 results
    vector_store_query_mode="hybrid",  # Enable hybrid search
    alpha=0.5  # Weighting between semantic and keyword search
)

## Query Vector Database Directly Using a Simple Retriever

We'll use a simple retriever to query the vector database and inspect the results. This method interacts with the embeddings and metadata in a straightforward way, without utilizing an LLM-powered **Query Engine** or **Chat Engine**.

**Why this step matters**:
1. Validates that the vector database is populated correctly
2. Shows how to query embeddings directly, bypassing the overhead of LLM-based reasoning
3. Prepares the groundwork for building advanced workflows with Query Engines and Agents

In the next steps, we'll extend this retriever to integrate with an LLM-powered Query Engine for richer responses.


## Query the Vector Store

In [22]:
# Query the vector store for "red shoes"
results = retriever.retrieve("red shoes")

# Display results
for item in results:
    score = item.score
    print(f"Score: {score:.4f}")
    print(f"Text: {item.get_content()}")
    print("-" * 50)

Score: 2.2390
Text: Catwalk women red shoes
--------------------------------------------------
Score: 2.2251
Text: Fila men leonard red shoes
--------------------------------------------------
Score: 2.2235
Text: Basics men red casual shoes
--------------------------------------------------
Score: 2.2148
Text: Carlton london women casual red casual shoes
--------------------------------------------------
Score: 2.2105
Text: Nike men jordan fly wade red sports shoes
--------------------------------------------------
Score: 2.1868
Text: Red tape men brown shoes
--------------------------------------------------
Score: 2.1822
Text: Adidas men blue & red f10 sports shoes
--------------------------------------------------
Score: 2.1682
Text: Red tape men casual brown casual shoes
--------------------------------------------------


## Visualize Vector Database Pull

The vector database query returns a list of red shoes based on the embeddings. To verify the results, let's visualize the pulled vectors.  

We'll create a function that loops through the retrieved nodes and displays each image along with its metadata in a row for easy inspection.


In [24]:
def display_nodes_with_images_in_row(vector_database_response_nodes, image_folder="data/footwear", img_width=150):
    html_content = "<div style='display: flex; flex-wrap: wrap; gap: 20px;'>"
    
    for node in vector_database_response_nodes:
        # Retrieve text and product_id from node metadata
        text = node.metadata.get('text')
        product_id = node.metadata.get('product_id')
        
        # Generate image path based on product_id
        image_path = os.path.join(image_folder, f"{product_id}.jpg")
        
        if os.path.exists(image_path):
            # Add each text and image in a flex container
            html_content += f"""
                <div style="text-align: center;">
                    <p>{text}</p>
                    <img src='{image_path}' width='{img_width}px' style="border: 1px solid #ddd; padding: 5px;"/>
                </div>
            """
        else:
            # Handle missing images gracefully
            html_content += f"""
                <div style="text-align: center;">
                    <p>{text}</p>
                    <p style='color: red;'>Image not found for product_id {product_id}</p>
                </div>
            """

    # Close the main div
    html_content += "</div>"
    
    # Display the content as HTML
    display(HTML(html_content))

## Visualize Shoes
Let's visualize the shoes retrieved from the vector database to confirm that the results match the query for "red shoes."

In [25]:
display_nodes_with_images_in_row(results)

## Examine the Shoes

As shown, all the retrieved shoes are red or have red details, confirming that the vector index query works well for focused queries.  

Next, we'll involve an LLM to add more flexibility to our queries.


## Why Not Keep Using the Index Alone?

The results look great, so why involve an LLM?

Let's try querying the vector database with something unrelated, like "Thank you!" and examine the response.

![Why involve an LLM?](images/1_dec/6_naive_rag_reply.png)


## Test Naive Query

In [26]:
# Query the vector store with an unrelated query
results = retriever.retrieve("Thank you!")

# Display results
for item in results:
    score = item.score
    print(f"Score: {score:.4f}")
    print(f"Text: {item.get_content()}")
    print("-" * 50)

# Visualize the results
display_nodes_with_images_in_row(results)


Score: 1.1846
Text: Timberland women femmes brown boot
--------------------------------------------------
Score: 1.1784
Text: Nike men's air max black shoe
--------------------------------------------------
Score: 1.1670
Text: Nike men's egoli white black shoe
--------------------------------------------------
Score: 1.1652
Text: Nike women ten blue white shoe
--------------------------------------------------
Score: 1.1650
Text: Nike women zoo blue shoe
--------------------------------------------------
Score: 1.1609
Text: Adidas women's piona white shoe
--------------------------------------------------
Score: 1.1591
Text: Nike women main draw white blue shoe
--------------------------------------------------
Score: 1.1566
Text: Hm women brown shoes
--------------------------------------------------


## Limitations of a Naive RAG System

Regardless of the query, the vector database matches the closest vectors based on the embeddings. 

In this case, querying with "Thank you!" still returns shoes because the database doesn't understand the context or intent of the query.

This demonstrates the limitation of a **naive RAG (Retrieval-Augmented Generation) system**. 

While it works well for focused queries like **"red shoes"**, it fails to adapt to non-specific or conversational inputs.

Here's an illustration of this limitation:

![Naive RAG System](images/1_dec/6_naive_rag.png)

## Create a Vector Index Query Engine

To overcome the limitations of naive queries, we'll integrate an LLM into our workflow by creating a **Query Engine**.  

This Query Engine will:
1. Interpret the user's natural language input
2. Retrieve contextually relevant information from the vector database
3. Enable more dynamic and flexible interactions with the data

For this guide, we'll use the `openai-o4` model for the LLM.


## Initialize LLM

In [27]:
# Initialize LLM
llm = OpenAI_Llama(
    temperature=0.0, 
    model="gpt-4o", 
    api_key=os.environ["OPENAI_API_KEY"]
)
Settings.llm = llm

## Create Query Engine

We'll now create a Query Engine using the vector index and our custom embedding model. This engine will leverage the LLM for intelligent query interpretation and responses.


In [28]:
# Create a query engine from the vector index
query_engine = vector_index.as_query_engine(
    embed_model=embed_model,
    similarity_top_k=8,
    vector_store_query_mode="hybrid",
    alpha=0.5,
)

## Test with Today's Challenge: Woment's Casual Shoes, but Then Suddenly Formal

Let's test the Query Engine by asking for women's casual shoes.


In [29]:
# Query the engine for casual women's shoes
response = query_engine.query("I'm looking for women's casual shoes")

print("Chatbot response: ", response.response)

Chatbot response:  Here are some options for women's casual shoes:

1. Adidas women's piona white shoe - Price: 50
2. Carlton London women casual red casual shoes - Price: 90
3. Nike women flyclave black casual shoes - Price: 175
4. Catwalk women turquoise casual shoes - Price: 65


## Visualize pulled shoes

In [30]:
display_nodes_with_images_in_row(response.source_nodes)

## Examine Results

Looking at the results, the chatbot vectorizes the customer query and retrieves products and recommends a variety of women's casual shoes.

Here's the chatbot reply:

> Here are some options for women's casual shoes:
>1. Adidas women's piona white shoe - Price: 50
>2. Carlton London women casual red casual shoes - Price: 90
>3. Nike women flyclave black casual shoes - Price: 175
>4. Catwalk women turquoise casual shoes - Price: 65

![The chatbot correctly gave us casual women's shoes](images/2_dec/2_casual_womens_shoes.png)

## We suddenly change our minds, we actually need something more formal
Wait, I know we said casual shoes, but we actually need something more formal, let's ask the chatbot

In [31]:
# Query the engine for formal alternatives
response = query_engine.query("Actually, I need something more formal")

print("Chatbot response: ", response.response)

Chatbot response:  For a more formal option, consider the Arrow men formal black shoe or the Enroute men leather brown formal shoes.


## Examine results
Instead of pulling formal women's shoes, the chatbot now gave us **"Arrow men formal black shoe"** or the **"Enroute men leather brown formal shoes"**.

Why is that?

![A naive RAG chatbot processes each user message independently](images/2_dec/4_irrelevant_formal_shoes.png)

# Visualized pulled shoes

In [32]:
display_nodes_with_images_in_row(response.source_nodes)

### Why Did the Naive Chatbot Fail?

When the customer updates their request, the naive RAG system processes it as a brand-new query without considering the earlier conversation.

The naive chatbot doesn't carry over important details from the first message when handling the second one. Each time, it starts fresh:

- **No Ongoing Memory:** Doesn't connect **"Actually, I need something more formal"** to the earlier **"women's casual shoes"** request
- **Independent Vectorization:** Treats each message on its own, losing details like "women's" or "shoes"
- **No Context Linking:** Doesn't adjust its search to include both past and present requirements

### Limitations Highlighted
- **Lack of Conversation Memory:** Forgets earlier information when the user's request changes
- **Rigid Query Handling:** Each message is processed as if it's the very first
- **Inconsistent Results:** The second answer doesn't build on the first, causing confusion and irrelevant product suggestions

In [35]:
# Verify the dataset for formal women's shoes
df_shoes[(df_shoes['usage'] == 'formal') & (df_shoes['gender'] == 'women')]

Unnamed: 0,product_id,gender,category,sub_category,product_type,color,color_details,usage,product_title,image,price_usd,heels_height,brand,titan_embedding,token_count
85,86,women,footwear,shoes,heels,black,[],formal,Carlton london women black heels,86.jpg,200,2,carlton london,"[0.010144724, 0.0051543294, -0.02019924, 0.003...",9
86,87,women,footwear,shoes,heels,nude,[],formal,Carlton london women nude heels,87.jpg,200,2,carlton london,"[0.01662784, 0.0043598893, -0.03550609, 0.0269...",9
87,88,women,footwear,shoes,heels,black,[],formal,Catwalk women corporate leather black heels,88.jpg,155,1,catwalk,"[0.017327517, 0.002226485, -0.01856189, 0.0036...",9


## Query engine limitations

This demonstrates another limitation: by not linking the two messages, the chatbot forgets the earlier details and may show results that no longer fit the full picture (women's formal shoes).


To address this, we'll now involve an AI Agent which can handle context shifts.

## Create Vector Store Info

We'll define metadata about the vector store to allow the AI agent to filter results based on specific attributes like gender, usage, and color. This metadata will enhance the agent's ability to refine queries.


In [36]:
# Create vector store information
vector_store_info = VectorStoreInfo(
    content_info="shoes in the shoe store",
    metadata_info=[
        MetadataInfo(
            name="gender",
            type="str",
            description="Either 'men' or 'women'",
        ),
        MetadataInfo(
            name="usage",
            type="str",
            description="Either 'sports', 'casual', or 'formal'",
        ),
        MetadataInfo(
            name="color",
            type="str",
            description=("Either 'black', 'white', 'blue', 'turquoise blue', 'red', 'pink', 'brown', 'green', or 'multi'"),
        ),
    ],
)


## Create Tools

Tools are essential for enabling the AI Agent to interact with the vector store.  
We'll define two tools:
1. **`create_metadata_filter`**: Generates metadata filters for refining the search query
2. **`search_footwear_database`**: Searches the vector database using the query and optional filters


## Define Metadata Filter Tool

In [173]:
# Define a tool to create metadata filters
def create_metadata_filter(filter_string):
    """
    Creates metadata filter JSON for vector database queries.

    Args:
        filter_string (str): Query string for generating metadata filters.

    Returns:
        str: JSON string of filters.
    """
    class CustomRetriever(VectorIndexAutoRetriever):
        def __init__(self, vector_index, vector_store_info, **kwargs):
            super().__init__(vector_index, vector_store_info, **kwargs)

        def _retrieve(self, query, **kwargs):
            query_bundle = QueryBundle(query_str=query)
            retrieval_spec = self.generate_retrieval_spec(query_bundle)
            return retrieval_spec

    llm_filter = OpenAI_Llama(
        temperature=0.5, # higher temperature than 0
        model="gpt-4o",
        api_key=os.environ.get("OPENAI_API_KEY"),
        system_prompt="You are a helpful assistant, help the user purchase shoes.",
    )

    custom_retriever = CustomRetriever(vector_index=vector_index, vector_store_info=vector_store_info, llm=llm_filter)
    retrieval_spec = custom_retriever._retrieve(filter_string)

    filters_dicts = [{'key': f.key, 'value': f.value, 'operator': f.operator.value} for f in retrieval_spec.filters]
    return json.dumps(filters_dicts)


## Define Footwear Vector Database Search Tool

In [149]:
# Define a tool to search the footwear vector database
def search_footwear_database(query_str, filters_json=None):
    """
    Searches the footwear vector database using a query string and optional filters.

    Args:
        query_str (str): Query string describing the footwear.
        filters_json (Optional[List]): JSON list of metadata filters.

    Returns:
        list: Search results from the vector database.
    """

    # Generate the embedding for the query string
    query_embedding = embed_model._get_query_embedding(query_str)

    # Deserialize from JSON
    metadata_filters = MetadataFilters.from_dicts(filters_json, condition=FilterCondition.AND)
    
    vector_store_query = VectorStoreQuery(
        query_str=query_str,
        query_embedding=query_embedding,
        alpha=0.5,
        mode='hybrid',
        filters=metadata_filters,
        similarity_top_k=8
    )
    
    # Execute the query against the vector store
    query_result = vector_store.query(vector_store_query)

    # Create output without embeddings
    nodes_with_scores = []
    for index, node in enumerate(query_result.nodes):
        score: Optional[float] = None
        if query_result.similarities is not None:
            score = query_result.similarities[index]
        nodes_with_scores.append({
            'color': node.metadata['color'],
            'text': node.metadata['text'],
            'gender': node.metadata['gender'],
            'product_type': node.metadata['product_type'],
            'product_id': node.metadata['product_id'],
            'brand': node.metadata['brand'],
            'usage': node.metadata['usage'],
            'price': node.metadata['price'],
            'similarity_score': score
        })

    return nodes_with_scores


## Define Agent Tools

In [150]:
create_metadata_filters_tool = FunctionTool.from_defaults(
    name="create_metadata_filter",
    fn=create_metadata_filter
)

query_vector_database_tool = FunctionTool.from_defaults(
    name="search_footwear_database",
    fn=search_footwear_database
)

## Create AI Agent

We'll now define an AI Agent capable of reasoning over the data, generating filters, and performing refined searches to address customer queries more effectively.


### Create the agent worker

In [151]:
# Create the agent worker
agent_worker = FunctionCallingAgentWorker.from_tools(
    [
        create_metadata_filters_tool,
        query_vector_database_tool,
    ],
    llm=llm,
    verbose=True,
    system_prompt="""\
You are an agent designed to answer customers looking for shoes.\
Please always use the tools provided to answer a question. Do not rely on prior knowledge.\
Drive sales and always feel free to ask a user for more information.\

- Always consider if filters are needed based on the user's query.
- Use the tools provided to answer questions; do not rely on prior knowledge.
- Always feel free to ask a user for more information.

**Example 1:**

User Query: "Hi! I'm going to a party and I'm looking for red women's shoes. Thank you!"

Agent Actions:

1. Determine what query string to use for filters e.g. "red woman's shoes"
2. Call:
   filter_string = create_metadata_filter_string("red woman's shoes")
3. Call:
   results = search_footwear_database(query_str='shoes', filter_string=filter_string)

**Example 2:**

User Query: "Hi! I'm going to a meeting and I'm looking for formal women's shoes. Thank you!"

Agent Actions:

1. Determine what query string to use for filters e.g. "formal woman's shoes"
2. Call:
   filter_string = create_metadata_filter_string("formal woman's shoes")
3. Call:
   results = search_footwear_database(query_str='shoes', filter_string=filter_string)

**Example 3:**

User Query: "I'm looking for shoes"

Agent Actions:

1. Ask for more information

Remember to follow these instructions carefully.
""",
)

### Create the agent runner

In [169]:
agent = AgentRunner(agent_worker)

## Test Agent

Let's test the AI Agent by asking for casual women's shoes.


In [170]:
# Test the agent
agent_response = agent.chat("I'm looking for women's casual shoes")

Added user message to memory: I'm looking for women's casual shoes
=== Calling Function ===
Calling function: create_metadata_filter with args: {"filter_string": "women's casual shoes"}
=== Function Output ===
[{"key": "gender", "value": "women", "operator": "=="}, {"key": "usage", "value": "casual", "operator": "=="}]
=== Calling Function ===
Calling function: search_footwear_database with args: {"query_str": "shoes", "filters_json": [{"key": "gender", "value": "women", "operator": "=="}, {"key": "usage", "value": "casual", "operator": "=="}]}
=== Function Output ===
[{'color': 'brown', 'text': 'Hm women brown shoes', 'gender': 'women', 'product_type': 'flats', 'product_id': 81, 'brand': 'hm', 'usage': 'casual', 'price': 155, 'similarity_score': 1.73761463}, {'color': 'brown', 'text': 'Gliders women brown shoes', 'gender': 'women', 'product_type': 'casual shoes', 'product_id': 80, 'brand': 'gliders', 'usage': 'casual', 'price': 75, 'similarity_score': 1.71296632}, {'color': 'red', 'te

### Interpreting the Query

Unlike a naive query engine, which vectorizes the full user query, the AI agent approaches the request by changing the query and add a custom filter.

For the query **"I'm looking for women's casual shoes"**, the agent decides to not vectorize the entire customer query. Instead, the agent vectorizes **shoes** and then creates a filter for the condition **women's casual shoes** and then recommends 8 different shoes

![Agent ensures it fully understands the customer's needs](images/2_dec/6_ai_agent_options.png)


# Visualize the Agent's Recommendations
Let's create a function that visualizes the agent's recommended shoes

In [174]:
def visualize_agent_response(agent_response, image_folder="data/footwear", img_width=150, threshold=98):
    """
    Visualizes products from agent response if they match (fuzzily) names in an unstructured string.

    Args:
    - agent_response: Agent response.
    - image_folder: Path to the folder containing product images.
    - img_width: Width of the product images in the visualization.
    - threshold: Minimum similarity score for fuzzy matching.

    Returns:
    - None: Displays the visualization directly in the notebook.
    """
    # Extract product names from raw output and make them lowercase
    products = [product['text'].lower() for product in agent_response.sources[1].raw_output]

    # Prepare HTML content for visualization
    html_content = "<div style='display: flex; flex-wrap: wrap; gap: 20px;'>"

    # Loop through the products and match with unstructured string
    for product in agent_response.sources[1].raw_output:
        product_name = product['text'].lower()

        # Perform fuzzy matching
        match = process.extractOne(product_name, [agent_response.response.lower()], scorer=fuzz.partial_ratio)
        if match and match[1] > threshold:  # If a match is found and meets the threshold
            # Generate image path based on product_id
            image_path = os.path.join(image_folder, f"{product['product_id']}.jpg")

            # Append product info and image to HTML content
            html_content += f"""
                <div style="text-align: center;">
                    <p>{product['text']}</p>
                    <img src='{image_path}' width='{img_width}px' style="border: 1px solid #ddd; padding: 5px;"/>
                </div>
            """

    # Close the main div
    html_content += "</div>"

    # Display the content as HTML
    display(HTML(html_content))


In [172]:
# Call the function
visualize_agent_response(agent_response)

## We suddenly change our minds, we actually need something more formal
Let's change our minds again and ask for something more formal instead

In [164]:
# Specify preferences in the query
agent_response = agent.chat("Actually, I need something more formal")

Added user message to memory: Actually, I need something more formal
=== Calling Function ===
Calling function: create_metadata_filter with args: {"filter_string": "women's formal shoes"}
=== Function Output ===
[{"key": "gender", "value": "women", "operator": "=="}, {"key": "usage", "value": "formal", "operator": "=="}]
=== Calling Function ===
Calling function: search_footwear_database with args: {"query_str": "shoes", "filters_json": [{"key": "gender", "value": "women", "operator": "=="}, {"key": "usage", "value": "formal", "operator": "=="}]}
=== Function Output ===
[{'color': 'black', 'text': 'Catwalk women corporate leather black heels', 'gender': 'women', 'product_type': 'heels', 'product_id': 88, 'brand': 'catwalk', 'usage': 'formal', 'price': 155, 'similarity_score': 1.19167948}, {'color': 'black', 'text': 'Carlton london women black heels', 'gender': 'women', 'product_type': 'heels', 'product_id': 86, 'brand': 'carlton london', 'usage': 'formal', 'price': 200, 'similarity_sco

# Visualize the agents recommendations

In [168]:
# Call the function
visualize_agent_response(agent_response)

# Detailed Agent Workflow
## Agent Workflow

When refining the query from "women's casual shoes" to "women's formal shoes," the agent takes the following steps:

1. **Context Tracking:**
    - Retains the original focus on "women's shoes" from the earlier part of the conversation
    - Recognizes that the context has shifted from "casual" to "formal" without losing track of gender  


2. **Tool Invocation:**
    - Calls the `create_metadata_filter` function with the argument `filter_string="women's formal shoes"`, generating the following metadata filter:

    ```json
    [
      {"key": "gender", "value": "women", "operator": "=="}, 
      {"key": "usage", "value": "formal", "operator": "=="}
    ]
    ```
    - Calls the `search_footwear_database` function with the query string `"shoes"` and the generated filter. This refines the search to include only women's formal shoes.

3. **Response Generation:**
    - Combines the retrieved vector database results with its retained memory to provide a response that acknowledges the context shift:

>Here are some options for women's formal shoes:
>
>1. **Catwalk Women Corporate Leather Black Heels**
>2. **Carlton London Women Black Heels**
>3. **Carlton London Women Nude Heels**
>If you have any specific preferences or need further assistance, feel free to let me know!

![Agent workflow](images/2_dec/5_ai_agent_memory.png)

## Why the AI Agent Succeeds

The AI agent keeps track of the whole conversation. 

When the customer changes their mind, the AI agent doesn't forget the original focus on women's shoes. 

Instead, it updates the search from **"casual"** to **"formal"** while still looking for women's footwear.

1. **Conversation Memory:** Remembers the earlier detail - women's shoes - and updates only the **"casual"** part to **"formal"**
2. **Flexible Reasoning:** Adapts the vectorized query without starting from zero each time
3. **Accurate Results:** Finds women's formal shoes that match the new requirement

## Key Takeaways
### Naive Chatbot Limitation
- Doesn't remember earlier requests and treats new messages as unrelated searches

### AI Agent Advantages
- **Conversation Memory:** Maintains a running understanding of the conversation
- **Flexible Reasoning:** Updates the search based on the latest input without losing previous details
- **Accurate Results:** Delivers results that stay relevant as the user's needs evolve


## Conclusion: Day 2 - How AI agents improve naive chatbots by understanding context shifts

This example shows how a naive RAG chatbot fails when the user changes their mind mid-conversation. By not connecting the dots, it provides unhelpful results. 

An AI agent, on the other hand, can smoothly adapt to the changing request, keeping track of what was said before and making sure the recommendations stay on target.

Stay tuned for tomorrow's issue, where we'll explore another challenge and see how AI agents handle it better than simple chatbots.

### What's Next?

Tomorrow, we'll continue our journey by tackling another challenge faced by naive chatbots.

---

### Ready to implement this in your own systems?  
AI agents are the future of e-commerce chatbots. If you're interested in applying these concepts to your business, feel free to reach out!


# Like this repo? 
I’m working on a course to help you build and deploy your own AI agent chatbot from scratch. [Sign up here!](https://braine.ai/#agent)