# E-commerce Bot using AWS Generative AI Services & MongoDB

Run the below cell to install the latest version of boto3 and dependencies in the notebook kernel:

In [None]:
!python --version

In [None]:
# Uncomment to install the dependencies
%pip install --no-build-isolation --force-reinstall \
    "boto3>=1.28.57" \
    "awscli>=1.29.57" \
    "botocore>=1.31.57"

The examples demonstrates mix of invoking Bedrock models directly using the AWS SDK and also using [LangChain](https://github.com/hwchase17/langchain) for easy orchestration of LLM chains:

In [None]:
%pip install --quiet langchain==0.0.304 transformers pymongo pandas \
"pillow>=9.5,<10" "faiss-cpu>=1.7,<2" 

In [None]:
import requests
import logging
import json
logger = logging.getLogger()

### Create boto3 Bedrock Client

##### Update the `region_name` to the region where you have the model access.

#### A note about `langchain`
The Bedrock classes provided by `langchain` create a Bedrock boto3 client by default. To customize your Bedrock configuration, we recommend to explicitly create the Bedrock client using the method below, and pass it to the [`langchain.Bedrock`](https://python.langchain.com/docs/integrations/llms/bedrock) class instantiation method using `client=bedrock_client`
   

In [None]:
# import os
# os.environ.pop('AWS_ACCESS_KEY_ID')

# for name, value in os.environ.items():
#     print(f"{name}: {value}") 

In [None]:
import boto3

session = boto3.Session(region_name='us-west-2')

#https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-runtime.html#bedrockruntime
#Describes the API operations for running inference using Bedrock models.
bedrock_client = session.client( service_name='bedrock-runtime')

#https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock.html#bedrock
# Describes the API operations for creating and managing Bedrock models.
bedrock = session.client(service_name='bedrock')

In [None]:
Provider= ''
CustomizationType=''
OutputModality='TEXT'
InferenceType='ON_DEMAND'
bedrock.list_foundation_models(byInferenceType=InferenceType, byOutputModality= OutputModality, byCustomizationType = CustomizationType )

### Use Case : Shopping Assistant

Add Product details to VectorDB for retrieval   

For the retail chatbot, we chose to work with the [Amazon Berkeley Objects](https://amazon-berkeley-objects.s3.amazonaws.com/index.html) dataset. This includes a large selection of Amazon products that are perfect for generating a retail assistant. Download the ´product_data.csv´ file from the link, or use the gdown command line interface to download the file from a [hosted link](https://drive.google.com/file/d/1tHWB6u3yQCuAgOYc-DxtZ8Mru3uV5_lj/view).

gdown --id 1tHWB6u3yQCuAgOYc-DxtZ8Mru3uV5_lj

In [None]:
import pandas as pd
 
MAX_TEXT_LENGTH=1500  # Maximum num of text characters to use
 
def auto_truncate(val):
 
    """Truncate the given text."""
 
    return val[:MAX_TEXT_LENGTH]
 
# Load Product data and truncate long text fields
 
all_prods_df = pd.read_csv("../data/product_data.csv", converters={
 
    'bullet_point': auto_truncate,
 
    'item_keywords': auto_truncate,
 
    'item_name': auto_truncate
 
})

In [None]:
# Replace empty strings with None and drop
 
all_prods_df['item_keywords'].replace('', None, inplace=True)
 
all_prods_df.dropna(subset=['item_keywords'], inplace=True)
 
# Reset pandas dataframe index
 
all_prods_df.reset_index(drop=True, inplace=True)

In [None]:
# computing number of rows
rows = len(all_prods_df.axes[0])
print(rows)

In [None]:
# Num products to use (subset)
NUMBER_PRODUCTS = 10000  
 
# Get the first 10000 products
product_metadata = ( 
    all_prods_df
     .head(NUMBER_PRODUCTS)
     .to_dict(orient='index')
)
 
# Check one of the products
product_metadata[0]

In [None]:
from langchain.embeddings import BedrockEmbeddings

# data that will be embedded and converted to vectors
texts = [
    v['item_name'] for k, v in product_metadata.items()
]
 
# product metadata that we'll store along our vectors
metadatas = list(product_metadata.values())
 
br_embeddings = BedrockEmbeddings(client=bedrock_client, model_id='amazon.titan-embed-text-v1')
 

#### Get the MongoDB Cluster URI

In [None]:
import os
import getpass

MONGODB_ATLAS_CLUSTER_URI = getpass.getpass("MongoDB Atlas Cluster URI:")

Let's create a database named 'langchain_db' and collection named 'e_commerce' and create a vector search index 'products' in MongoDB Atlas GUI using the below mapping. See [quick start](https://www.mongodb.com/docs/atlas/atlas-search/field-types/knn-vector/).

Write the following definition in the JSON editor on MongoDB Atlas:

In [None]:
{
  "mappings": {
    "dynamic": true,
    "fields": {
      "embedding": {
        "dimensions": 1536,
        "similarity": "cosine",
        "type": "knnVector"
      }
    }
  }
}

In [None]:
from pymongo import MongoClient

# initialize MongoDB python client

uri = f"mongodb+srv://{os.environ.get('MDB_USERNAME')}:{os.environ.get('MDB_PASSWORD')}@{os.environ.get('MDB_HOST')}/?retryWrites=true&w=majority"
client = MongoClient(uri)

collection_name = os.environ.get('MDB_COLLECTION')
db_name = os.environ.get('MDB_DATABASE')
collection = client[db_name][collection_name]
index_name = "products-metadata"


In [None]:
from langchain.vectorstores import MongoDBAtlasVectorSearch

# create and insert the documents in MongoDB Atlas with their embedding
vectorstore = MongoDBAtlasVectorSearch.from_texts(
    texts=texts,
    metadatas=metadatas,
    embedding=br_embeddings,
    index_name=index_name,
    collection=collection, 
)

In [None]:
from langchain.vectorstores import MongoDBAtlasVectorSearch
from langchain.embeddings import BedrockEmbeddings

br_embeddings = BedrockEmbeddings(client=bedrock_client, model_id='amazon.titan-embed-text-v1')

vectorstore = MongoDBAtlasVectorSearch(
    embedding=br_embeddings,
    index_name=index_name,
    collection=collection, 
)

In [None]:
# perform a similarity search between the embedding of the query and the embeddings of the documents
query = "I am looking for red shoes"
docs = vectorstore.similarity_search(query)

In [None]:
print(docs)