# Nike Personal Sales Assistant
Many websites are now incorporating chatbots to answer user questions about their page and company. However, many ecommerce websites have yet to use chatbots to fulfill a sales representative role. This type of chatbot would be valuable to the company itself, as it would increase sales by persuading the customer to purchase items relevant to their query. Additionally, the chatbot would be useful to customers, who can directly get summarized information about extended product descriptions and reviews without having to individually click through several product pages to find the product(s) they are looking for.

To implement this project, we first have to scrape Nike's website to get data about at least a few hundred products. 
After we get the full contents of the product pages, including long product descriptions and some reviews, we need to embed the contents as vectors and upload them along with some metadata (product title, image, url, etc) to pinecone. Then, once we have a customer query, we can use similarity search to get the product that best matches. 

Finally, after getting the matching product information and context (description and reviews), we pass this into an LLM to format the result and try to sell the product to the customer in the way a human sales representative might.

## Scraping
The first step of implementing the chatbot is to get access to product data. Ideally, this type of chatbot would have direct access to the database in which product information would be stored. However, since I do not have access to the database of any ecommerce website, I implement this project by utilizing the scraping tool undetected-chromedriver. I specifically chose undetected-chromedriver over requests or the default Selenium chromedriver as there is potential with the other methods of being denied or banned from the website due to bot detection.

In [None]:
%pip install selenium
%pip install undetected-chromedriver

In [4]:
import os
import time
from urllib.parse import urlparse
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
import undetected_chromedriver as uc
from selenium.webdriver.remote.webdriver import By
from website_vectordb_query.constants import *
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd

### Getting all product links
The first step in scraping product pages is being able to get the urls from the main page where all the shoes are listed. Below are some helper functions for this task. The driver parameter refers to the undetected chromedriver instance we will define later.

In [13]:
# Fetch the potential hyperlinks for get_hyperlinks
def elements_fetcher(driver, tag_name='a'):
    elems = driver.find_elements(By.TAG_NAME, tag_name)
    if elems:
        return elems
    else:
        return False

# Function to get the hyperlinks from a URL
def get_hyperlinks(url, driver):
    hyperlinks = []
    driver.get(url)
    if "pdf" in url:
        return []
    # The url I am using is an infinite scroll page
    # - Keep scrolling for a while. Need to wait in between for the page to load. 
    for i in range(70):
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(2)
    try:
        # Need to have a conditional wait here until the elements are found
        elems = WebDriverWait(driver, 60).until(elements_fetcher)
    
    except:
        print("timeout exception on url " + url)
        return []

    for elem in elems:
        try:
            href = elem.get_attribute('href')
            if href is not None:
                # print(href)
                hyperlinks.append(href)
        except:
            continue
    return hyperlinks


# Parse hyperlinks from get_hyperlinks to get only the links that are for shoe products
def get_relevant_hyperlinks(url, driver):
    local_domain = urlparse(url).netloc
    clean_links = []
    for link in set(get_hyperlinks(url, driver)):
        clean_link = None

        company_name = local_domain[4:-4]
        if company_name in url:
            clean_link = link
            
        # If the link is not a URL, check if it is a relative link
        else:
            if link.startswith("/"):
                link = link[1:]
            elif link.startswith("#") or link.startswith("mailto:"):
                continue
            clean_link = "https://" + local_domain + "/" + link

        if clean_link is not None:
            if clean_link.endswith("/"):
                clean_link = clean_link[:-1]
            
            #Only append the link if it looks like a shoe product page
            if "nike.com/t/" not in clean_link or "shoes" not in clean_link:
                continue
            clean_links.append(clean_link)

    # Return the list of relevant hyperlinks
    return list(set(clean_links))

We have a url that should link to all available shoes that Nike offers. Let's pass this into get_relevant_hyperlinks and see if we were able to get all of the products available (1761). 

In [14]:
url = "https://www.nike.com/w/shoes-y7ok"

#Define the driver instance with options
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
# Add headless argument so we don't see the window actually popping up on our screen
options.add_argument('headless')
driver = uc.Chrome(options=options)

links = get_relevant_hyperlinks(url, driver)
print(len(links))

1280


Even after increasing the number of scrolls, I was only able to get 1280 product pages, probably because some of the links didn't match the requirements. 

### Getting the contents
Now that we have all our product links, we need to open each one, extend the product description and reviews, and read the full contents of the web pages. Then we write them to local text files for reading and embedding later, as well as a dataframe to keep track of metadata. 

Below I have defined some helper functions for this task. Some of them click buttons to extend the page to acquire more information, and some just parse the html for metadata we need.

In [16]:
# Click the "View Product Details" to get the extended product description
def open_product_details(driver):
    buttons = driver.find_elements(By.TAG_NAME, 'button')

    for button in buttons:
        if "View Product Details" in button.text:
            button.click()
            break

#Click the "More Reviews" button to get the reviews
def get_reviews(driver):
    spans = driver.find_elements(By.TAG_NAME, 'span')
    for span in spans:
        if "Reviews" in span.text:
            span.click()
            driver.implicitly_wait(180)
            buttons = driver.find_elements(By.TAG_NAME, 'button')
            for button in buttons:
                if "More Reviews" in button.text:
                    button.click()

# Get the title from html
def get_title(driver):
    try:
        # I was able to find that the titles use the css-16cqcdq tag by using browser inspect manually
        header = driver.find_element(By.CSS_SELECTOR, '.css-16cqcdq')
        return header.get_attribute("innerHTML")
    except:
        # Sometimes the driver isn't able to find by CSS selector so I implemented this backup method
        try:
            headers = driver.find_elements(By.TAG_NAME, 'h1')
            for h in headers:
                if "css-16cqcdq" in h.get_attribute("class"):
                    return h.get_attribute("innerHTML")
        except:
            # Just return N/A if we can't find the title and remove N/A rows later 
            return "N/A"

# Get the images for the product from html
def get_product_images(driver, title):
    imgs = driver.find_elements(By.TAG_NAME, 'img')
    img_urls = []

    for img in imgs:
        alt_text = img.get_attribute("alt")
        if title in alt_text:
            img_urls.append(img.get_attribute("src"))

    return img_urls

# Get the price from html
def get_price(driver):
    try:
        # I was able to find that the price uses the this css tag by using browser inspect manually
        price = driver.find_element(By.CSS_SELECTOR, '.product-price.css-11s12ax.is--current-price.css-tpaepq')
        return price.get_attribute("innerHTML")
    except:
        # Sometimes the driver isn't able to find by CSS selector so I implemented this backup method
        divs = driver.find_elements(By.TAG_NAME, 'div')
        for d in divs:
            if "is--current-price" in d.get_attribute("class"):
                return d.get_attribute("innerHTML")
        # Just return N/A if we can't find the title and remove N/A rows later 
        return "N/A"

Now that we have the helper functions we need, we can go through each link to read and store its contents.

In [None]:
#Define the driver instance with options
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
# Add headless argument so we don't see the window actually popping up on our screen
options.add_argument('headless')
driver = uc.Chrome(options=options)


# Parse the URL and get the domain
local_domain = urlparse(url).netloc

# DF to associate metadata to local writepath
df = pd.DataFrame(columns=['url', 'writepath', 'title', 'price', 'images'])

# Create a directory to store the text files
if not os.path.exists("./text/"):
    os.mkdir("./text/")

if not os.path.exists("./text/" + local_domain + "/"):
    os.mkdir("./text/" + local_domain + "/")

# Create a directory to store the csv files
if not os.path.exists("./processed"):
    os.mkdir("./processed")
    

for url in links:
    # Print the url to see progress
    print(url)
    
    # Create the path to write to for this url
    write_path = './text/' + local_domain + '/' + url[8:].replace("/", "_") + ".txt"
    driver.get(url)
    # Set up implicit wait as a timeout for the driver, so it waits until elements are located
    driver.implicitly_wait(180)
    
    # Click the product details
    open_product_details(driver)
    driver.implicitly_wait(180)
    
    # Get the text with product details
    text = driver.find_element(By.TAG_NAME, "html").text
    
    # Reload the page so we can click reviews
    driver.get(url)
    driver.implicitly_wait(180)
    
    # Get the review text and append to text from before
    get_reviews(driver)
    driver.implicitly_wait(180)
    text += driver.find_element(By.TAG_NAME, "html").text

    # Save text from the url to a <url>.txt file
    with open(write_path, "w") as f:
        # write the text to the file in the text directory
        f.write(text)

    # Refresh the page to get out of reviews
    driver.get(url)
    driver.implicitly_wait(180)
    
    #Get the title and price metadata
    title = get_title(driver)
    price = get_price(driver)
    
    # If we can't find the title, don't try to find the image either 
    if title != "N/A":
        image = get_product_images(driver, title)[0]
    else:
        image = "N/A"
    
    if title != "N/A" and image != "N/A":
        row = {'url': url, 'writepath': write_path, 'title': title, 'price': price, 'images': image}
        print(row)
        df = df.append(row, ignore_index=True)

After a few hours, the driver instance had a timeout error. I can simply restart the process starting from where it left out, but for this notebook I figured 300+ example products were enough. 

In [22]:
print(df.head())
print(len(df))
#df.to_csv("scraped.csv")

                                                 url  \
0  https://www.nike.com/t/air-max-90-toggle-baby-...   
1  https://www.nike.com/t/lebron-witness-7-team-b...   
2  https://www.nike.com/t/air-jordan-1-mid-se-big...   
3  https://www.nike.com/t/jordan-delta-3-sp-mens-...   
4  https://www.nike.com/t/court-borough-low-recra...   

                                           writepath  \
0  ./text/www.nike.com/www.nike.com_t_air-max-90-...   
1  ./text/www.nike.com/www.nike.com_t_lebron-witn...   
2  ./text/www.nike.com/www.nike.com_t_air-jordan-...   
3  ./text/www.nike.com/www.nike.com_t_jordan-delt...   
4  ./text/www.nike.com/www.nike.com_t_court-borou...   

                            title   price  \
0          Nike Air Max 90 Toggle  $47.97   
1         LeBron Witness 7 (Team)    $105   
2             Air Jordan 1 Mid SE  $78.97   
3               Jordan Delta 3 SP  $89.97   
4  Nike Court Borough Low Recraft     $65   

                                              images  


Now we have a dataframe row and full page text stored for 309 products. 

## Storing data in Pinecone

Now that we have all our data, we need to tokenize and embed the text for each product page and then upsert the embedding vectors along with the metadata to Pinecone.

In [7]:
#Reading in the text and dataframe and combining them:

df = pd.read_csv("scraped.csv")
df = df.drop(columns=["Unnamed: 0"])
texts = []

for index, row in df.iterrows():
    path = row["writepath"]
    path = path[2:]
    with open(path, 'r') as f:
        text = f.read()
    texts.append(text)

df["text"] = texts

### Tokenizing the text:
Before embedding the text, we split it into chunks of tokens so that our vectors fit Pinecone's size constraints. First, we define some helper functions. I used the tokenizing and chunking functions from the openai-cookbook github repo: https://github.com/openai/openai-cookbook/blob/main/apps/web-crawl-q-and-a/web-qa.ipynb

In [8]:
def split_into_many(text, tokenizer, max_tokens=MAX_TOKENS):
    # Split the text into sentences
    sentences = text.split('. ')

    # Get the number of tokens for each sentence
    n_tokens = [len(tokenizer.encode(" " + sentence)) for sentence in sentences]

    chunks = []
    tokens_so_far = 0
    chunk = []

    # Loop through the sentences and tokens joined together in a tuple
    for sentence, token in zip(sentences, n_tokens):

        # If the number of tokens so far plus the number of tokens in the current sentence is greater
        # than the max number of tokens, then add the chunk to the list of chunks and reset
        # the chunk and tokens so far
        if tokens_so_far + token > max_tokens:
            chunks.append(". ".join(chunk) + ".")
            chunk = []
            tokens_so_far = 0

        # If the number of tokens in the current sentence is greater than the max number of
        # tokens, go to the next sentence
        if token > max_tokens:
            continue

        # Otherwise, add the sentence to the chunk and add the number of tokens to the total
        chunk.append(sentence)
        tokens_so_far += token + 1

    return chunks

# Split text in dataframe into chunks of tokens
def split_tokens_df(df, tokenizer, max_tokens=MAX_TOKENS):
    shortened = []

    # Loop through the dataframe
    for row in df.iterrows():

        # If the text is None, go to the next row
        if row[1]['text'] is None:
            continue

        # If the number of tokens is greater than the max number of tokens, split the text into chunks
        if row[1]['n_tokens'] > max_tokens:
            text_chunks = split_into_many(row[1]['text'], tokenizer)
            shortened.extend([{'title': row[1]['title'], 'price': row[1]['price'],
                              'images': row[1]['images'], 'url': row[1]['url'],
                               'text': chunk} for chunk in text_chunks])

        # Otherwise, add the text, and metadata to the list of shortened texts
        else:
            shortened.append({'title': row[1]['title'], 'price': row[1]['price'],
                              'images': row[1]['images'], 'url': row[0]['url'], 'text': row[1]['text']})

    df = pd.DataFrame(shortened, columns=['title', 'price', 'images', 'url', 'text'])
    df['n_tokens'] = df.text.apply(lambda x: len(tokenizer.encode(x)))
    return df

Now we can use these functions to tokenize and chunk our text:

In [None]:
%pip install tiktoken
import tiktoken

In [11]:
tokenizer = tiktoken.get_encoding("cl100k_base")
df['n_tokens'] = df.text.apply(lambda x: len(tokenizer.encode(x)))
df = split_tokens_df(df=df, tokenizer=tokenizer)

### Embedding the chunked tokenized text as vectors
Now that we have our text as chunks of tokens, we can embed them as vectors. I use OpenAI's ada embedding tool.

In [None]:
%pip install openai
import openai

In [16]:
openai.api_key = <MY KEY>

In [17]:
# Create embeddings from the text that has been broken into chunks of tokens
def create_df_embeddings(df):
    succeeded = False
    for i in range(10):
        # Sometimes OpenAI randomly throws a server error and tells us to try again
        try:
            df['embeddings'] = df.text.apply(
                lambda x: openai.Embedding.create(input=x, engine='text-embedding-ada-002')['data'][0]['embedding'])
        except Exception as e:
            #print(e)
            continue
        else:
            succeeded = True
            break
    if not succeeded:
        print("Couldn't connect to OpenAI server. Please try again later")
        return df
    df.to_csv("./processed/embeddings.csv")
    print(df.head())
    return df

df = create_df_embeddings(df)
df.head()

                     title   price  \
0   Nike Air Max 90 Toggle  $47.97   
1   Nike Air Max 90 Toggle  $47.97   
2  LeBron Witness 7 (Team)    $105   
3  LeBron Witness 7 (Team)    $105   
4      Air Jordan 1 Mid SE  $78.97   

                                              images  \
0  https://static.nike.com/a/images/t_default/2df...   
1  https://static.nike.com/a/images/t_default/2df...   
2  https://static.nike.com/a/images/t_default/a36...   
3  https://static.nike.com/a/images/t_default/a36...   
4  https://static.nike.com/a/images/t_default/057...   

                                                 url  \
0  https://www.nike.com/t/air-max-90-toggle-baby-...   
1  https://www.nike.com/t/air-max-90-toggle-baby-...   
2  https://www.nike.com/t/lebron-witness-7-team-b...   
3  https://www.nike.com/t/lebron-witness-7-team-b...   
4  https://www.nike.com/t/air-jordan-1-mid-se-big...   

                                                text  n_tokens  \
0  Find a Store\n|\nHelp\n|\nJo

Unnamed: 0,title,price,images,url,text,n_tokens,embeddings
0,Nike Air Max 90 Toggle,$47.97,https://static.nike.com/a/images/t_default/2df...,https://www.nike.com/t/air-max-90-toggle-baby-...,Find a Store\n|\nHelp\n|\nJoin Us\n|\nSign In\...,307,"[-0.016106130555272102, -0.0064754183404147625..."
1,Nike Air Max 90 Toggle,$47.97,https://static.nike.com/a/images/t_default/2df...,https://www.nike.com/t/air-max-90-toggle-baby-...,Synthetic pieces on the heel and tongue add st...,331,"[-0.0018631403800100088, -0.012792213819921017..."
2,LeBron Witness 7 (Team),$105,https://static.nike.com/a/images/t_default/a36...,https://www.nike.com/t/lebron-witness-7-team-b...,Find a Store\n|\nHelp\n|\nJoin Us\n|\nSign In\...,441,"[-0.008434830233454704, 0.0011455637868493795,..."
3,LeBron Witness 7 (Team),$105,https://static.nike.com/a/images/t_default/a36...,https://www.nike.com/t/lebron-witness-7-team-b...,A classic herringbone pattern adds durable tra...,426,"[-0.0024147110525518656, 0.003211177419871092,..."
4,Air Jordan 1 Mid SE,$78.97,https://static.nike.com/a/images/t_default/057...,https://www.nike.com/t/air-jordan-1-mid-se-big...,Find a Store\n|\nHelp\n|\nJoin Us\n|\nSign In\...,273,"[-0.0038967994041740894, -0.006805399432778358..."


### Upserting the vectors to Pinecone
Now that we have our vectors, we can use our dataframe to upsert them along with metadata to Pinecone. I used code from this article https://www.mlq.ai/gpt-4-pinecone-website-ai-assistant/ to create the function below:

In [None]:
%pip install pinecone-client
import pinecone
from uuid import uuid4
from tqdm.auto import tqdm

In [19]:
PINECONE_API_ENV="us-west1-gcp-free"
PINECONE_API_KEY=<MY KEY>

def store_embeddings_pinecone(df, index_name, namespace):
    # Add an 'id' column to the DataFrame
    df['id'] = [str(uuid4()) for _ in range(len(df))]

    # Initialize connection to Pinecone
    pinecone.init(api_key=PINECONE_API_KEY, environment=PINECONE_API_ENV)

    # Check if index already exists, create it if it doesn't
    if index_name not in pinecone.list_indexes():
        pinecone.create_index(index_name, dimension=1536, metric='dotproduct')

    # Connect to the index
    index = pinecone.Index(index_name)

    batch_size = 100  # how many embeddings we create and insert at once

    # Convert the DataFrame to a list of dictionaries
    chunks = df.to_dict(orient='records')

    # Upsert embeddings into Pinecone in batches of 100
    for i in tqdm(range(0, len(chunks), batch_size)):
        i_end = min(len(chunks), i + batch_size)
        meta_batch = chunks[i:i_end]
        ids_batch = [x['id'] for x in meta_batch]
        embeds = [x['embeddings'] for x in meta_batch]
        meta_batch = [{
            'title': x['title'],
            'url': x['url'],
            'text': x['text'],
            'images': x['images'],
            'price':x['price'],
        } for x in meta_batch]
        to_upsert = list(zip(ids_batch, embeds, meta_batch))
        index.upsert(vectors=to_upsert, namespace=namespace)

    return index

In [22]:
store_embeddings_pinecone(df=df, index_name="websites", namespace="nike")

  0%|          | 0/11 [00:00<?, ?it/s]

<pinecone.index.Index at 0x7fe8423b5b70>

## Querying Pinecone
Now that we have all our data in Pinecone, we can use vector similarity to search for the product that best matches the customer's query. To do this, we also have to embed the query.

Below, I have defined a function that embeds a given query and performs similarity search to get the top matching product. Then it appends the matching text context and product title to the query and returns this along with the url, image, and price.  

In [48]:
def create_query_with_pinecone_context(index_name, namespace, query):
    embed_query = openai.Embedding.create(
        input=query,
        engine=EMBED_MODEL
    )
    query_embeds = embed_query['data'][0]['embedding']

    # Initialize connection to Pinecone
    pinecone.init(api_key=PINECONE_API_KEY, environment=PINECONE_API_ENV)
    index = pinecone.Index(index_name)

    # Get top matching result from pinecone along with the metadata (text, title, image, url, price)
    response = index.query(query_embeds, top_k=1, include_metadata=True, namespace=namespace)

    # Get all metadata
    contexts = [item['metadata']['text'] for item in response['matches']][0]
    title = [item['metadata']['title'] for item in response['matches']][0]
    image = [item['metadata']['images'] for item in response['matches']][0]
    url = [item['metadata']['url'] for item in response['matches']][0]
    price = [item['metadata']['price'] for item in response['matches']][0]

    # Combine the original query with the text context and the name of the product
    augmented_query = f"""
    CONTEXT: {contexts}
    PRODUCT TITLE: {title}
    PRICE: {price}
"""
    augmented_query = augmented_query + query

    # Return the augmented query with the metadata we want to present to the user
    return augmented_query, url, image, price

Let's out try a query a customer might ask and see what context is returned

In [28]:
query = "What's a nice looking women's shoe that's versatile for multiple activites?"
augmented_query, _, _, _ = create_query_with_pinecone_context(index_name="websites", namespace="nike", query=query)
print(augmented_query)


    CONTEXT: A waterproof layer paired with a higher ankle gaiter gives you extra coverage so you stay dry.
Shown: Diffused Taupe/Dark Pony/Sail/Picante Red
Style: DJ7929-200
View Product Details
Size & Fit
Shipping & Returns

Reviews (92)
4.5 Stars
Write a Review
Really tight shoe
tracyville - Jun 25, 2023
As nice as this shoe is it really hurts my feet no matter how thin of a sock I put on and o yea lies about it being waterproof because I wore them to workout with my sauna suit and the sweat had the shoe drenched ðŸ˜’foo
...
More
Very Comfy.
ScreenName829363571 - Jun 24, 2023
Very comfy even if just for every day use.
Truly amazing and so light!
Giovanna155249306 - Jun 16, 2023
This shoes are truly amazing! They keep your feet dry and comfortable after a long day.
    PRODUCT TITLE: Nike Pegasus Trail 4 GORE-TEX
What's a nice looking women's shoe that's versatile for multiple activites?


We can see that it returned some context about style (colors available), comfort, and activities. It was able to access information from both the product description and the reviews to provide a wholistic reccommendation for the product. 

### Generating Sales Messages
Now that we can query pinecone for the top matching product, we can pass in the context and metadata we get from pinecone to GPT to create a product reccommendation tailored to a user's needs the way a human sales representative would. I have defined a helper function for generation below using GPT3.5-turbo.

In [30]:
def generate(augmented_query, system_msg):
    chat = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": system_msg},
            {"role": "user", "content": augmented_query}
        ]
    )
    result = chat['choices'][0]['message']['content']
    return result

An import aspect of using pre-trained models like GPT for generation is the system message. The system message tells the model what role to take and how to respond to a query. For use-cases like these, it is important to be concise, tell it to act as a human and to explain its reasoning.

In [44]:
system_msg = f"""You are a Nike sales representative. 
    Given the pre-determined recommended shoe data, present it to the customer, 
    explaining why the shoe fits the customer's query."""

Now we are ready to try out some examples

In [46]:
query = "Recommend me a shoe that I can golf in and is also comfortable for walking around on campus"

augmented_query, url, image, price = create_query_with_pinecone_context(index_name="websites", namespace="nike", query=query)

response = generate(augmented_query, system_msg)
response += '\n' + "price: " + price + " link: " + url + " image: " + image
print(response)

Based on your query, I would recommend the Air Jordan 12 Low golf shoes. These shoes are not only perfect for golfing but also provide great comfort for walking around on campus.

One of the key features of the Air Jordan 12 Low is its comfort. According to the reviews, these shoes are rated as "Very Comfortable" by customers who have worn them. Additionally, the size of these shoes has been reported to be "Just Right" by 50% of the reviewers, indicating a good fit for most people.

In terms of durability, the Air Jordan 12 Low is rated as "Very Durable" by the majority of customers. This means you can expect these shoes to hold up well even with regular use on the golf course and around campus.

Not only do these shoes offer comfort and durability, but they also have a stylish retro Jordan design. So, you can be sure to make a fashion statement while wearing them.

In summary, the Air Jordan 12 Low golf shoes are a perfect choice for both golfing and walking around on campus. With the

Given the pinecone context with reviews included, the model was able to return an excellent recommendation and explain how it fits the customer's query. Customers tend to trust reviews of other customers, so the model being able to access and summarize these reviews would really help to boost sales.

The next two questions also highlight the quality of model response when it has access to both product descriptions and reviews.

In [47]:
query = "What's a nice looking women's shoe that's versatile for multiple activites?"

augmented_query, url, image, price = create_query_with_pinecone_context(index_name="websites", namespace="nike", query=query)

response = generate(augmented_query, system_msg)
response += '\n' + "price: " + price + " link: " + url + " image: " + image
print(response)

I recommend the Nike Pegasus Trail 4 GORE-TEX shoe for your needs. This shoe is not only stylish but also versatile for multiple activities. It features a waterproof layer and a higher ankle gaiter, providing extra coverage to keep you dry. The shoe is designed to be comfortable and lightweight, making it suitable for everyday use. Additionally, it has received positive reviews from customers, with many praising its comfort and ability to keep feet dry. With its combination of style and functionality, the Nike Pegasus Trail 4 GORE-TEX is a great choice for any activity.
price: $160 link: https://www.nike.com/t/pegasus-trail-4-gore-tex-womens-waterproof-trail-running-shoes-9knDsQ/DJ7929-200 image: https://static.nike.com/a/images/t_default/58d8ea1d-6afe-469b-b481-235fb8b4e0a5/pegasus-trail-4-gore-tex-womens-waterproof-trail-running-shoes-9knDsQ.png


In [50]:
query = "What's a decent men's running shoe under $150 that carries size 7?"

augmented_query, url, image, price = create_query_with_pinecone_context(index_name="websites", namespace="nike", query=query)

response = generate(augmented_query, system_msg)
response += '\n' + "price: " + price + " link: " + url + " image: " + image
print(response)

Based on your request for a decent men's running shoe under $150 in size 7, I would recommend the Nike Tanjun. 

The Nike Tanjun is a versatile and affordable running shoe that offers great value for its price. It features a lightweight design, making it comfortable to wear for long distance runs. Additionally, it has a simple and plain look that is both stylish and timeless. 

Priced at $70, the Nike Tanjun is well within your budget of under $150. It is a popular choice among runners and has received positive reviews for its comfort and durability. While it may not have as many color options as you may desire, it still offers a reliable and high-quality performance.

I believe the Nike Tanjun would be a suitable choice for your running needs, providing you with comfort, durability, and affordability.
price: $70 link: https://www.nike.com/t/tanjun-mens-shoes-jQ3z1W/DJ6258-100 image: https://static.nike.com/a/images/t_default/6310e9a6-9cab-46fb-834d-06b2ea2967fb/tanjun-mens-shoes-jQ3z1

In [51]:
query = "Can you recommend me a good adidas shoe for hiking?"

augmented_query, url, image, price = create_query_with_pinecone_context(index_name="websites", namespace="nike", query=query)

response = generate(augmented_query, system_msg)
response += '\n' + "price: " + price + " link: " + url + " image: " + image
print(response)

Thank you for reaching out! While I understand you're looking for an Adidas shoe for hiking, as a Nike sales representative, I can recommend some Nike shoes that can be suitable for your hiking needs.

Based on your feedback regarding the Nike Zegama, it seems like you prioritize comfort, cushioning, and a secure fit. With that in mind, I'd recommend the Nike Air Zoom Pegasus 38 Trail shoe. This shoe is designed specifically for trail running, making it a great option for hiking as well.

The Nike Air Zoom Pegasus 38 Trail offers a plush and cushioned feel, similar to what you liked about the heel of the Nike Zegama. It features Nike's renowned Zoom Air technology in the forefoot for responsive cushioning and a comfortable stride. The shoe also has a more durable and rugged outsole with multi-directional lugs, providing excellent traction on various terrains, including trails.

Additionally, the Nike Air Zoom Pegasus 38 Trail incorporates an improved lacing system for better lockdown, 

Out of confusion or wanting to test its limits, a customer might ask a chatbot about products from another company. A quality response is given for the model - instead of saying it doesn't have knowledge of Adidas products, the model instead provides a Nike recommendation.

## What's next?
1.) Short Term Memory: A great sales chatbot should also utilize conversation memory so that customers can have a back and forth conversation. When given a recommendation, the customer will naturally want to ask more questions. What colors does it come in? What are the pros and cons of the product? How does it compare to another product?

2.) Multiple product recommendations: Customers generally want 2-4 recommendations at once so that they can compare and contrast to be assured their final purchase is really the best product for them. 

## References:
1.) Open AI Cookbook Web Crawl: https://github.com/openai/openai-cookbook/blob/main/apps/web-crawl-q-and-a/web-qa.ipynb

2.) MLQ GPT-4 & Pinecone: https://www.mlq.ai/gpt-4-pinecone-website-ai-assistant/

3.) Pinecone Examples - Langchain Retreival: https://github.com/pinecone-io/examples/blob/master/generation/gpt4-retrieval-augmentation/gpt-4-langchain-docs.ipynb?ref=mlq.ai
