# E-commerce Bot using AWS Generative AI Services & OpenSearch

Run the below cell to install the latest version of boto3 and dependencies in the notebook kernel:

In [None]:
!python --version

In [1]:
# Uncomment to install the dependencies
%pip install --no-build-isolation --force-reinstall \
    "boto3>=1.28.57" \
    "awscli>=1.29.57" \
    "botocore>=1.31.57"

Defaulting to user installation because normal site-packages is not writeable
Collecting boto3>=1.28.57
  Downloading boto3-1.34.14-py3-none-any.whl (139 kB)
                                              0.0/139.3 kB ? eta -:--:--
     -------------------------------------- 139.3/139.3 kB 4.2 MB/s eta 0:00:00
Collecting awscli>=1.29.57
  Downloading awscli-1.32.14-py3-none-any.whl (4.3 MB)
                                              0.0/4.3 MB ? eta -:--:--
                                              0.1/4.3 MB 4.5 MB/s eta 0:00:01
     ---                                      0.4/4.3 MB 4.6 MB/s eta 0:00:01
     -----                                    0.6/4.3 MB 4.3 MB/s eta 0:00:01
     -------                                  0.8/4.3 MB 4.4 MB/s eta 0:00:01
     --------                                 0.9/4.3 MB 4.1 MB/s eta 0:00:01
     ---------                                1.1/4.3 MB 4.3 MB/s eta 0:00:01
     -------------                            1.4/4.3 MB 4.5 MB/s et

ERROR: Could not install packages due to an OSError: [WinError 5] Access is denied: 'C:\\Users\\fibeg\\AppData\\Roaming\\Python\\Python310\\site-packages\\~~ml\\_yaml.cp310-win_amd64.pyd'
Check the permissions.


[notice] A new release of pip is available: 23.1.2 -> 23.3.2
[notice] To update, run: python.exe -m pip install --upgrade pip


The examples demonstrates mix of invoking Bedrock models directly using the AWS SDK and also using [LangChain](https://github.com/hwchase17/langchain) for easy orchestration of LLM chains:

In [2]:
%pip install --quiet langchain==0.0.304 transformers pymongo pandas \
"pillow>=9.5,<10" "faiss-cpu>=1.7,<2" 

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.1.2 -> 23.3.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [1]:
import requests
import logging
import json
logger = logging.getLogger()

### Create boto3 Bedrock Client

##### Update the `region_name` to the region where you have the model access.

#### A note about `langchain`
The Bedrock classes provided by `langchain` create a Bedrock boto3 client by default. To customize your Bedrock configuration, we recommend to explicitly create the Bedrock client using the method below, and pass it to the [`langchain.Bedrock`](https://python.langchain.com/docs/integrations/llms/bedrock) class instantiation method using `client=bedrock_client`
   

In [3]:
import os
os.environ.pop('AWS_ACCESS_KEY_ID')

for name, value in os.environ.items():
    print(f"{name}: {value}") 

ALLUSERSPROFILE: C:\ProgramData
APPDATA: C:\Users\fibeg\AppData\Roaming
APPLICATION_INSIGHTS_NO_DIAGNOSTIC_CHANNEL: true
AWS_PROFILE: isengard_datalake
AWS_SDK_LOAD_CONFIG: true
CHROME_CRASHPAD_PIPE_NAME: \\.\pipe\crashpad_5612_CWQUQHPFFGHRKBLF
COMMONPROGRAMFILES: C:\Program Files\Common Files
COMMONPROGRAMFILES(X86): C:\Program Files (x86)\Common Files
COMMONPROGRAMW6432: C:\Program Files\Common Files
COMPUTERNAME: ARN-1801025404
COMSPEC: C:\WINDOWS\system32\cmd.exe
DRIVERDATA: C:\Windows\System32\Drivers\DriverData
ELECTRON_RUN_AS_NODE: 1
FPS_BROWSER_APP_PROFILE_STRING: Internet Explorer
FPS_BROWSER_USER_PROFILE_STRING: Default
HOMEDRIVE: C:
HOMEPATH: \Users\fibeg
JAVA_TOOLS_OPTIONS: -Dlog4j2.formatMsgNoLookups=true
JAVA_TOOL_OPTIONS: -Dlog4j2.formatMsgNoLookups=true
JPY_INTERRUPT_EVENT: 3692
LOCALAPPDATA: C:\Users\fibeg\AppData\Local
LOGONSERVER: \\DC-FRA2A-04
NUMBER_OF_PROCESSORS: 8
ONEDRIVE: C:\Users\fibeg\OneDrive
ORIGINAL_XDG_CURRENT_DESKTOP: undefined
OS: Windows_NT
PATH: c:\Pr

In [4]:
import boto3

session = boto3.Session(region_name='us-west-2')

#https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-runtime.html#bedrockruntime
#Describes the API operations for running inference using Bedrock models.
bedrock_client = session.client( service_name='bedrock-runtime')

#https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock.html#bedrock
# Describes the API operations for creating and managing Bedrock models.
bedrock = session.client(service_name='bedrock')

In [None]:
Provider= ''
CustomizationType=''
OutputModality='TEXT'
InferenceType='ON_DEMAND'
bedrock.list_foundation_models(byInferenceType=InferenceType, byOutputModality= OutputModality, byCustomizationType = CustomizationType )

### Use Case : Shopping Assistant

#### Add Product details to VectorDB for retrieval   

For the retail chatbot, we chose to work with the [Amazon Berkeley Objects](https://amazon-berkeley-objects.s3.amazonaws.com/index.html) dataset. This includes a large selection of Amazon products that are perfect for generating a retail assistant. 

Download the ´product_data.csv´ file into the ´temp´ folder from the link, or use the gdown command line interface to download the file from a [hosted link](https://drive.google.com/file/d/1tHWB6u3yQCuAgOYc-DxtZ8Mru3uV5_lj/view).

gdown --id 1tHWB6u3yQCuAgOYc-DxtZ8Mru3uV5_lj

In [5]:
import pandas as pd
 
MAX_TEXT_LENGTH=1000  # Maximum num of text characters to use
MAX_TEXT_LENGTH_KEYWORDS= 200
 
def auto_truncate(val):
 
    """Truncate the given text."""
 
    return val[:MAX_TEXT_LENGTH]

def auto_truncate_keyword(val):
 
    """Truncate the given text."""
 
    return val[:MAX_TEXT_LENGTH_KEYWORDS]
 
# Load Product data and truncate long text fields
 
all_prods_df = pd.read_csv("../temp/product_data.csv", converters={
 
    'bullet_point': auto_truncate,
 
    'item_keywords': auto_truncate_keyword,
 
    'item_name': auto_truncate
 
})

In [6]:
# Replace empty strings with None and drop
 
all_prods_df['item_keywords'].replace('', None, inplace=True)
 
all_prods_df.dropna(subset=['item_keywords'], inplace=True)
 
# Reset pandas dataframe index
 
all_prods_df.reset_index(drop=True, inplace=True)

In [7]:
# computing number of rows
rows = len(all_prods_df.axes[0])
print(rows)

107566


In [8]:
# Num products to use (subset)
NUMBER_PRODUCTS = 10000  
 
# Get the first 10000 products
product_metadata = ( 
    all_prods_df
     .head(NUMBER_PRODUCTS)
     .to_dict(orient='index')
)
 
# Check one of the products
product_metadata[0]

{'item_id': 'B07T6RZ2CM',
 'marketplace': 'Amazon',
 'country': 'IN',
 'main_image_id': '71dZhpsferL',
 'domain_name': 'amazon.in',
 'bullet_point': '3D Printed Hard Back Case Mobile Cover for Lenovo K4 Note Easy to put & take off with perfect cutouts for volume buttons, audio & charging ports. Stylish design and appearance, express your unique personality. Extreme precision design allows easy access to all buttons and ports while featuring raised bezel to life screen and camera off flat surface. Slim Hard Back Cover No Warranty None',
 'item_keywords': 'mobile cover back cover mobile case phone case mobile panel phone panel Lenovo mobile case Lenovo phone cover Lenovo back case hard case 3D printed mobile cover mobile cover back cover mobile case pho',
 'material': nan,
 'brand': 'Amazon Brand - Solimo',
 'color': 'Others',
 'item_name': 'Amazon Brand - Solimo Designer Couples Sitting at Dark 3D Printed Hard Back Case Mobile Cover for Lenovo K4 Note',
 'model_name': 'Lenovo K4 Note',


In [9]:
from langchain.embeddings import BedrockEmbeddings

# data that will be embedded and converted to vectors
texts = [
    v['item_name'] for k, v in product_metadata.items()
]
 
# product metadata that we'll store along our vectors
metadatas = list(product_metadata.values())
 
br_embeddings = BedrockEmbeddings(client=bedrock_client, model_id='amazon.titan-embed-text-v1')
 

#### Get the OpenSearch Cluster

Create .env file in the notebooks folder and add the OS_ environment variables

In [10]:
from dotenv import load_dotenv
import os

load_dotenv()

domain_endpoint = os.environ.get('OS_ENDPOINT')
os_username = os.environ.get('OS_USERNAME')
os_password = os.environ.get('OS_PASSWORD')
domain_index = "products-metadata"

#print(domain_endpoint, os_username, os_password, domain_index)


In [11]:
from langchain.vectorstores import OpenSearchVectorSearch
from langchain.embeddings import BedrockEmbeddings

br_embeddings = BedrockEmbeddings(client=bedrock_client, model_id='amazon.titan-embed-text-v1')

vectorstore = OpenSearchVectorSearch(
        opensearch_url=domain_endpoint,
        is_aoss=False,
        verify_certs = True,
        http_auth=(os_username, os_password),
        index_name = domain_index,
        embedding_function=br_embeddings)

In [12]:
chunk_size = 50 # for OpenSearch Bulk API
for i in range(0, len(texts), chunk_size):
    chunk_texts = texts[i:i + chunk_size]
    chunk_metadatas = metadatas[i:i + chunk_size]
    print (f'Chunk to embed: {len(chunk_texts)}')
    try:
        vectorstore.add_texts(chunk_texts, chunk_metadatas)
    except:
        for text, metadata in zip(chunk_texts, chunk_metadatas):
            try:
                vectorstore.add_texts(text, metadata)
            except:
                print(f'Error chunk: {text}')

Chunk to embed: 50
Error chunk: Amazon Brand - Solimo Designer Couples Sitting at Dark 3D Printed Hard Back Case Mobile Cover for Lenovo K4 Note
Error chunk: Amazon Brand - Solimo Designer Leaf on Wood 3D Printed Hard Back Case Mobile Cover for Sony Xperia Z1 L39H
Error chunk: Stone & Beam Contemporary Doily Wool Farmhouse Rug
Error chunk: Amazon Brand - Solimo Plastic Multipurpose Modular Drawer, 3 Racks, Multicolor
Error chunk: Amazon Brand - Solimo Designer Take It Easy UV Printed Soft Back Case Mobile Cover for Nokia 8.1
Error chunk: Amazon Brand - Solimo Body Wash, Cool Mist Scent, 21 fl oz (Pack of 1)
Error chunk: Thirty Five Kent Men's Cashmere Zig Zag Scarf, Blue
Error chunk: Amazon Brand - Solimo Designer Rangolis 3D Printed Hard Back Case Mobile Cover for Sony Xperia L1
Error chunk: Amazon Brand - Solimo Designer No Hate On Wooden Block 3D Printed Hard Back Case Mobile Cover for HTC U Ultra
Error chunk: Amazon Brand - Solimo Designer Red Paint On Wall 3D Printed Hard Back Cas

In [None]:
# perform a similarity search between the embedding of the query and the embeddings of the documents
query = "I am looking for red shoes"
docs = vectorstore.similarity_search(query)

In [None]:
print(docs)