# E-commerce Bot using AWS Generative AI Services & MongoDB

Run the below cell to install the latest version of boto3 and dependencies in the notebook kernel:

In [None]:
!python --version

In [None]:
# Uncomment to install the dependencies
!pip install -r ../requirements.txt  --no-build-isolation --force-reinstall

The examples demonstrates mix of invoking Bedrock models directly using the AWS SDK and also using [LangChain](https://github.com/hwchase17/langchain) for easy orchestration of LLM chains:

In [1]:
import sys
sys.path.append('../utils')

### Create boto3 Bedrock Client

##### Update the `region_name` to the region where you have the model access.

#### A note about `langchain`
The Bedrock classes provided by `langchain` create a Bedrock boto3 client by default. To customize your Bedrock configuration, we recommend to explicitly create the Bedrock client using the method below, and pass it to the [`langchain.Bedrock`](https://python.langchain.com/docs/integrations/llms/bedrock) class instantiation method using `client=bedrock_client`
   

In [2]:
import boto3
from load_env import load_env

env = load_env('../.env')

session = boto3.Session(region_name=env.get('REGION', 'us-west-2'))

#https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-runtime.html#bedrockruntime
#Describes the API operations for running inference using Bedrock models.
bedrock_client = session.client(service_name='bedrock-runtime')

#https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock.html#bedrock
# Describes the API operations for creating and managing Bedrock models.
bedrock = session.client(service_name='bedrock')

In [3]:
Provider= ''
CustomizationType=''
OutputModality='TEXT'
InferenceType='ON_DEMAND'
bedrock.list_foundation_models()

{'ResponseMetadata': {'RequestId': '139f508c-4f1b-42e6-8bb8-11b5546a3f66',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Fri, 09 Feb 2024 13:57:12 GMT',
   'content-type': 'application/json',
   'content-length': '17086',
   'connection': 'keep-alive',
   'x-amzn-requestid': '139f508c-4f1b-42e6-8bb8-11b5546a3f66'},
  'RetryAttempts': 0},
 'modelSummaries': [{'modelArn': 'arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-tg1-large',
   'modelId': 'amazon.titan-tg1-large',
   'modelName': 'Titan Text Large',
   'providerName': 'Amazon',
   'inputModalities': ['TEXT'],
   'outputModalities': ['TEXT'],
   'responseStreamingSupported': True,
   'customizationsSupported': [],
   'inferenceTypesSupported': ['ON_DEMAND'],
   'modelLifecycle': {'status': 'ACTIVE'}},
  {'modelArn': 'arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-embed-g1-text-02',
   'modelId': 'amazon.titan-embed-g1-text-02',
   'modelName': 'Titan Text Embeddings v2',
   'providerName': 'Amazon',
   'inp

### Use Case : Shopping Assistant

#### Add Product details to VectorDB for retrieval   

For the retail chatbot, we chose to work with the [Amazon Berkeley Objects](https://amazon-berkeley-objects.s3.amazonaws.com/index.html) dataset. This includes a large selection of Amazon products that are perfect for generating a retail assistant. 

Download the ´product_data.csv´ file into the ´data´ folder from the link, or use the gdown command line interface to download the file from a [hosted link](https://drive.google.com/file/d/1tHWB6u3yQCuAgOYc-DxtZ8Mru3uV5_lj/view).

gdown --id 1tHWB6u3yQCuAgOYc-DxtZ8Mru3uV5_lj

&#x26a0;&#xfe0f;  This does not work I get Access denied with the following error:

 	Cannot retrieve the public link of the file. You may need to change
	the permission to 'Anyone with the link', or have had many accesses. 

You may still be able to access the file from the browser:

	 https://drive.google.com/uc?id=1tHWB6u3yQCuAgOYc-DxtZ8Mru3uV5_lj 

In [21]:
import pandas as pd

MAX_TEXT_LENGTH=1000  # Maximum num of text characters to use
 
def auto_truncate(val):
 
    """Truncate the given text."""
 
    return val[:MAX_TEXT_LENGTH]
 
# Load Product data and truncate long text fields
 
all_prods_df = pd.read_csv("../data/product_data.csv", converters={
 
    'bullet_point': auto_truncate,
 
    'item_keywords': auto_truncate,
 
    'item_name': auto_truncate
 
})

In [22]:
# Display the list of columns
columns_list = all_prods_df.columns.tolist()
print(columns_list)

# Get the distinct values for the 'product_type' column
distinct_values = all_prods_df['product_type'].unique()

# Display the list of distinct values
print(distinct_values)

['item_id', 'marketplace', 'country', 'main_image_id', 'domain_name', 'bullet_point', 'item_keywords', 'material', 'brand', 'color', 'item_name', 'model_name', 'model_number', 'product_type']
['CELLULAR_PHONE_CASE' 'HOME_FURNITURE_AND_DECOR' 'HOME'
 'SKIN_CLEANING_AGENT' 'ACCESSORY' 'WINE' 'SHOES' 'TOILET_PAPER_HOLDER'
 'BACKPACK' 'LIGHT_BULB' 'HEALTH_PERSONAL_CARE' 'JANITORIAL_SUPPLY'
 'VEHICLE_INTERIOR_SHADE' 'HARDWARE_HANDLE' 'WASHER_DRYER_COMBINATION'
 'BOOT' 'LIGHT_FIXTURE' 'CHAIR' 'HAT' 'FINEEARRING' 'FRUIT'
 'SAFETY_SUPPLY' 'KITCHEN' 'GROCERY' 'FINENECKLACEBRACELETANKLET'
 'COMPUTER_INPUT_DEVICE_ACCESSORY' 'COSMETIC_CASE' 'FINERING'
 'OFFICE_PRODUCTS' 'CLOCK' 'NUTRITIONAL_SUPPLEMENT'
 'DISHWARE_PLACE_SETTING' 'BAKING_PAN' 'DESK' 'SPORTING_GOODS'
 'STOOL_SEATING' 'SANDAL' 'FLAT_SHEET' 'ANTENNA' 'EARRING' 'INK_OR_TONER'
 'HOME_BED_AND_BATH' 'TOOLS' 'FILE_FOLDER' 'HEADPHONES' 'TABLE' 'DRESSER'
 'RUG' 'FOOD_SERVICE_SUPPLY' 'SOFA' 'PORTABLE_ELECTRONIC_DEVICE_COVER'
 'HAIR_COMB' 'PANT

In [23]:
# Replace empty strings with None and drop
 
add_prods_df = all_prods_df['item_keywords'].replace('', None)
 
add_prods_df = all_prods_df.dropna(subset=['item_keywords'])
 
# Reset pandas dataframe index
 
add_prods_df = all_prods_df.reset_index(drop=True)

In [24]:

# Define the categories
selected_categories = ['SHOES', 'SANDAL', 'BOOT', 'JEWELRY', 'FASHIONRING', 'FINEEARRING','FASHIONEARRING','HAT','COSMETIC_CASE', 'FASHIONNECKLACEBRACELETANKLET', 'FINENECKLACEBRACELETANKLET']

# Number of items you want to select per category, change this if you want more
num_items_per_category=1000

# Filter the DataFrame for each category and select 1000 rows
filtered_df = pd.concat([all_prods_df[all_prods_df['product_type'] == category].head(num_items_per_category) for category in selected_categories], ignore_index=True)

# Display the result as a table in Jupyter Notebook
#display(filtered_df)

In [25]:
# computing number of rows
rows = len(filtered_df.axes[0])
print(rows)

5841


In [20]:
# Num products to use (subset)
NUMBER_PRODUCTS = 5000  
 
# Get the first 10000 products
product_metadata = ( 
    filtered_df
     .head(NUMBER_PRODUCTS)
     .to_dict(orient='index')
)
 
# Check one of the products
product_metadata[0]

{'item_id': 'B07ZFQ2W57',
 'marketplace': 'Amazon',
 'country': 'IN',
 'main_image_id': '81oct5RNPzL',
 'domain_name': 'amazon.in',
 'bullet_point': 'Outer Material: PU Closure Type: Slip On Heel type: flats Toe Style: Open Toe Warranty Type: No Warranty Warranty Description: No Warranty',
 'item_keywords': 'sandal for women stylish footwear for women stylish chappals for women latest stylish wedges for women stylish slippers for women women sandals latest design flats for womens casual Pink Pink casual chappals for women latest stylish fashion flats for womens casual footwear footwear for women stylish ladies sandal for women stylish slippers slippers for women wedges for women stylish women sandals latest design Pink casual chappals for women latest stylish fashion flats for womens casual footwear footwear for women stylish ladies sandal for women stylish slippers slippers for women wedges for women stylish women sandals latest design Pink casual chappals for women latest stylish fas

In [26]:
from langchain.embeddings import BedrockEmbeddings

# data that will be embedded and converted to vectors
texts = [
    v['item_name'] for k, v in product_metadata.items()
]
 
# product metadata that we'll store along our vectors
metadatas = list(product_metadata.values())
 
br_embeddings = BedrockEmbeddings(client=bedrock_client, model_id='amazon.titan-embed-text-v1')
 

#### Creating the database definition

Let's create a database named 'langchain_db' and collection named 'e_commerce' and create a vector search index 'products-metadata' in MongoDB Atlas GUI using the below mapping. See [quick start](https://www.mongodb.com/docs/atlas/atlas-search/field-types/knn-vector/).

Write the following definition in the JSON editor on MongoDB Atlas:

```json
{
  "fields": [
    {
      "type": "vector",
      "numDimensions": 1536,
      "similarity": "cosine",
      "path" :"embedding"
    }
  ]
}

#### Populating the collection

In [31]:
from pymongo import MongoClient

env = load_env('../.env')

# initialize MongoDB python client

uri = env.get("MDB_URI")
print(uri)
client = MongoClient(uri)
collection_name = env.get('MDB_COLLECTION')
db_name = env.get('MDB_DATABASE')
collection = client[db_name][collection_name]
index_name = "products-metadata"


mongodb+srv://kw:ftkpjXHzx2BxiYFI@mongo.fcs1abj.mongodb.net/?retryWrites=true&w=majority


Use the code below to create and insert the documents in MongoDB Atlas with their embeddings.

In [None]:
from langchain.vectorstores import MongoDBAtlasVectorSearch


vectorstore = MongoDBAtlasVectorSearch.from_texts(
    texts=texts,
    metadatas=metadatas,
    embedding=br_embeddings,
    index_name=index_name,
    collection=collection
)

Alternatively, use the code below to initialize an already existing vector store. 

In [None]:
# The below code can be used to initialize object of existing vector store

from langchain.vectorstores import MongoDBAtlasVectorSearch
from langchain.embeddings import BedrockEmbeddings

br_embeddings = BedrockEmbeddings(client=bedrock_client, model_id='amazon.titan-embed-text-v1')

vectorstore = MongoDBAtlasVectorSearch(
     embedding=br_embeddings,
     index_name=index_name,
     collection=collection, 
 )

In [None]:
# perform a similarity search between the embedding of the query and the embeddings of the documents
query = "I am looking for red shoes"
docs = vectorstore.similarity_search(query)

In [None]:
print(docs)

#### Example questions to ask (NOT TESTED)

    1. Can you provide details about the latest trends in shoes and any popular brands?
    3. Give me recommendations for stylish and durable boot options for the rainy season.
    4. Give me latest jwellery options, especially focusing on innovative designs.
    5. Provide information on earring options with intricate designs and reasonable prices.
    6. Can you suggest some elegant earring choices for formal occasions?
    7. Recommend a hat that combines both functionality and a trendy appearance.
