# Slack Bot Demo

This is the demo version of the Slack Bot, "Genius". You can run this file locally. If you would like to access the Slack Bot deployed on Slack, please inform me and I will invite you to my Slack Workplace. Since the bot is hosted locally on my computer, please provide the specific time you plan to access the bot so that I can run a development server and establish the connection with Slack.

> #### Atenttion!!!
>
> **About API keys and tokens**
> - **Slack:**
> The Slack bot uses my API keys. The keys may expire or exceed the request limits at any time...
> - **This file:**
> To run this file, please provide your API keys and access tokens in a `.env` file. *(see [template.env](template.env))

# 1. Set up the environment

We will first need to import the required packages, load the API keys from `.env` file, and set the language model and the embedding model.

> **Notes**
> - I was using OpenAI's models originally, but had to switch to AI21's LLM and Cohere's embedding model because my free trial account conveniently expired...
> - I am using models from two companies so that I won't send too many requests to one service too frequently and exeed my call limits...
> - I think OpenAI's models perform a bit more accurate, but repond slightly slower.

In [1]:
# Set up the environment

import os
from dotenv import find_dotenv, load_dotenv
import flatdict  # flattens nested dict
import re

# packages for web scraping/ searching
import validators
from bs4 import BeautifulSoup
import requests
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time  # gives extra time for webdriver to load webpage
from serpapi import GoogleSearch

# Language models and embedding models
# from langchain.llms import OpenAI  # my OpenAI trial expired :/
# from langchain.embeddings.openai import OpenAIEmbeddings
# from langchain.llms import Cohere
from langchain.llms import AI21
from langchain.embeddings import CohereEmbeddings

# Chains and chain components
from langchain.vectorstores import Chroma
from langchain.chains.question_answering import load_qa_chain
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain.agents import initialize_agent, Tool, AgentType
from langchain.tools import tool
from langchain.memory import ConversationBufferMemory

# Load api key variables from .env file and set api keys
load_dotenv(find_dotenv('private/.env'))
SERPAPI_API_KEY = os.environ["SERPAPI_API_KEY"]
AI21_API_KEY = os.environ["AI21_API_KEY"]
COHERE_API_KEY = os.environ["COHERE_API_KEY"]

# Set language model
# llm = OpenAI(temperature=0)
llm = AI21(temperature=0.5)
llm_doc = AI21(temperature=0.1)  # for queries related to docs

# Rate limit for Cohere model is 5 calls/min
# llm = Cohere(temperature=0.5)
# llm_doc = Cohere(temperature=0)  # for queries related to docs

# Get Cohere embedding model
embeddings = CohereEmbeddings(cohere_api_key=COHERE_API_KEY)

# 2. Set up the base vector store

The base vector store contains basic information about Home Depot's services and products. It will help the bot to decide whether a query is relavant. New texts and documents can be added to this vector store later.

In this step, I will index Home Depot's [site map](https://www.homedepot.com/c/site_map) to get the information needed. I will first scrape the texts from the web page and then store the texts in a Chroma vector store.

> **Notes**: Home Depot server rejects requests from / delays responses to unknown users. To bypass that restriction, see the script below.

In [2]:
# Scrape Home Depot's site map to get a list of services and products

# Home Depot server rejects requests from / delays responses to unknown users. Need the headers here to bypass that restriction.
# Source: https://stackoverflow.com/a/62028209 
headers = {"User-Agent": "Mozilla/5.0 (X11; CrOS x86_64 12871.102.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.141 Safari/537.36"}
req = requests.get("https://www.homedepot.com/c/site_map", headers=headers)
soup = BeautifulSoup(req.text, 'html.parser')
sitemap = []
all_links = soup.find_all('a')
for a in all_links:
    sitemap.append(a.text)

db = Chroma.from_texts(sitemap, embeddings)

# 3. Configure the prompt

I will create a prompt template that will be applied to every query passed to the bot. The template provides the bot guidelines on how to answer the query.

In [3]:
# Test query

query = "Can you suggest a power drill?"

In [4]:
# Prompt guidelines

guidelines = """

If the query is not related to services or products offered by Home Depot, say you can't help.
If the query is asking about a specific product available at Home Depot, look for the product.
If the query is not about a specific product, offer the customer some general advice and suggest a few related Home Depot products.
If you don't know the answer, say you don't know and provide contact information for customer service.

"""

# Set prompt template

prompt_template = """
You are a helpful shop assistant at Home Depot. This is a query from a Home Depot customer: {query}

Answer the query following these guidelines: {guidelines}

In your answer:
Explain your answer briefly.
When you mention specific products, use full product names.
If you are asked about one specific product and you found the product, provide a link to that product.

"""

prompt = PromptTemplate(template=prompt_template, input_variables=["query", "guidelines"])
prompt_str = prompt.format(query=query, guidelines=guidelines)

# 4. Create chains and tools

The bot uses different tools to gather the necessary information for answering the query. Each tool consists of a RetrievalQA chain.

The chains employ language models with varying temperatures, which determine the level of randomness in the generated answers based on the tool's objective. For the tools `get_products` and `get_details`, a zero-temperature LLM model (`llm_doc`) is used to ensure that answers are generated based on the information from the vector store without any 'creative' answers. On the other hand, for other tools, a medium-temperature LLM model called (`llm`) is used to ensure that the answers align with both the semantic and *pragmatic* meaning of the query.

The prompts are modified and fed into the chains to ensure that the answers generated are relevant to the objectives of the tools.

> **Notes**
> 
> - To ensure a high accuracy, the bot needs to get the most up-to-date information from Home Depot in real time. Therefore, I decided not to pre-scrape Home Depot's website, but instead scrape only the relavant information based on the query.
>
> - Tool `get_products`: SerpAPI is used to perform a search through Home Depot's search engine. Structured information on search result pages is scraped.
>
> - Tool `get_details`: The tool aims to extract detailed information about a product. However, Home Depot's product detail pages are generated dynamically with javascript. To scrape these pages, one needs to first render the javascript code. The tool `get_details` employs a web driver to render the page, though it is not fully automated: due to the website's responsive design, manual scrolling down the page is necessary during the rendering process to trigger the JavaScript code. Therefore, this tool will not funciton properly outside the development environment. Unfortunately, I have very limited experience in web development and did not have enough time to come up with a solution. Moreoever, it seems that no third-party API service is able to scrape these pages (I have tried Apify and it failed to render the entire page).

In [5]:
# Chains and functions

# Create a chain instance
chain = RetrievalQA.from_chain_type(llm=llm_doc, chain_type="stuff", retriever=db.as_retriever())

# Cheeck if url is valid
def check_url(url):
    valid=validators.url(url)
    if valid==True:
        try:
            # Headers to bypass server restrictions
            headers = {"User-Agent": "Mozilla/5.0 (X11; CrOS x86_64 12871.102.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.141 Safari/537.36"}       
            response = requests.head(url,headers=headers)         
            if response.status_code == 200:
                return True
            else:
                return False
        except requests.ConnectionError as e:
            return False
    else:
        return False

# Create tools from functions

@tool
def is_homedepot(query):
    """
    Decide whether the query is asking about a product available at Home Depot 
    query : customer query
    """
    prompt_new = 'Is the following asking about a product available at Home Depot? ' + query
    chain_new = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=db.as_retriever())
    ans = chain_new.run(prompt_new)
    return ans

@tool
def get_keyword(query):
    """
    Get the search keyword from the query.
    query : customer query
    """
    prompt_new = 'What is the product mentioned in the query? ' + query
    chain_new = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=db.as_retriever())
    ans = chain_new.run(prompt_new) 
    return ans

@tool
def get_products(keyword):
    """
    Searches for the keyword at Home Depot. Add results to vector store.
    keyword : Search keyword
    """

    params = {
    "engine": "home_depot",
    "api_key": SERPAPI_API_KEY,
    "q": keyword
    }
    search = GoogleSearch(params)
    if "products" in search.get_dict().keys():
        products = search.get_dict()["products"]
    else:
        ans = chain.run(prompt_str)
        return ans
    
    ######################################################    
    # Clean the search results

    # drop unused product info
    for p in products:
        for key in ['position', 'thumbnails', 'serpapi_link', 'collection', 'variants']:
            if key in p.keys():
                del p[key]
    
    # Embedding models only work on texts or numbers. Fix values other than str, int or float.
    for p in products:
        for k,v in p.items():
            # Change value to 'none' if it's not str/int/float or list/bool (will take care of these later)
            if type(v) not in [str, int, float, list, bool]:
                p[k] = 'none'
            # Convert list of strings to string
            if type(v)==list:
                p[k] = '-'.join(v)
            # Convert boolean to string
            if type(v)==bool:
                if p[k]==True:
                    p[k] = 'true'
                else:
                    p[k] = 'false'

    # Vector stores do not accept nested dict as metadata. Need to flatten it.
    for p in range(len(products)):
        # flatten each dict in products
        p_flatdict = flatdict.FlatDict(products[p], delimiter='.')
        # convert flatdict back to dict
        p_dict = {}
        for i in p_flatdict.iteritems():
            p_dict.update({i})
            products[p] = p_dict
    
    ######################################################
    # Add search results to vector store

    # Get a list of product names (will be used as the texts for vector stores)
    product_names = [p['title'] for p in products]

    # Add texts to vector store
    global db  # assess db outside the function
    db.add_texts(product_names, metadatas=products)

    ans = chain.run(prompt_str)
    return ans

@tool
def get_details(link):
    """
    Get product details from the link. Add results to vector store.
    link : link to product detail page
    """

    # Validate link url
    if not check_url(link): return "Please provide more information."
     
    # Use webdriver to load the page
    driver = webdriver.Chrome('./chromedriver') 
    driver.get(link)    
    time.sleep(20)  # give extra time to ensure that the page is fully rendered    
    page = driver.page_source

    # Scrape texts on page
    soup = BeautifulSoup(page, "html.parser")
    name = soup.find('div', {'class': 'product-details__badge-title--wrapper'}).text
    ids = soup.find('div', {'class': 'sui-flex sui-text-xs sui-flex-wrap'}).text
    overview = soup.find('section', {'id': 'product-section-product-overview'}).text
    specs = soup.find('section', {'id': 'specifications-desktop'}).text

    # Close webdriver after scraping
    driver.close()

    # clean the data
    overview = re.sub(r"(\w)([A-Z])", r"\1 \2", overview)
    specs = re.sub(r"(\w)([A-Z])", r"\1 \2", specs)
    specs = re.sub('See Similar Items', ' ', specs)
    ids = re.sub('Internet', ' Internet:', ids)
    ids = re.sub('Model', ' Model:', ids)
    ids = re.sub('Store SKU', ' Store SKU:', ids)
    ids = re.sub('Store SO SKU', ' Store SO SKU:', ids)

    # Concatenate scraped texts into a string
    pdp = 'Product Name: ' + name + '\nProduct IDs: ' + ids + '\nProduct Overview: ' + overview + '\nSpecifications: ' + specs

    # Add text to vector store
    global db  # assess db outside the function
    db.add_texts(pdp)
    
    ans = chain.run(prompt_str)
    return ans

# 5. Create an agent

The bot contains an agent that decides the tools to be used to get the information needed to answer the query. Note that the agent takes in the query directly without the prompt template applied. Besides, a medium-temprature language model (`llm`) is employed with no access to external information or the vector store. 

The bot incorporates an agent responsible for selecting the appropriate tools to gather the necessary information for answering a query. It's important to note that the agent receives the query directly without the prompt template applied. Additionally, a medium-temperature language model (`llm`) is employed and the agent has no access to external information or the vector store. Such measures provide the agent with a sufficient degree of freedom to make the best decision based solely on the query and the tool descriptions.

Therefore, the bot initially focuses on gathering sufficient information using the available tools before receiving instructions on how to answer specific questions through prompt templates.

The figure below illustrates the structure of the bot:

![structure](/bot_structure.jpg)

In [6]:
# Create an agent with tools

tools = [

    Tool(
        name = "is_homedepot",
        func=is_homedepot,
        description="Use this tool if you are not sure the query is asking about a specific product available at Home Depot. Input is the query",        
    ),

    Tool(
        name = "get_products",
        func=get_products,
        description="Look for the products on Home Depot's website. Input is a string.",
        return_direct=True
    ),

    Tool(
        name = "get_details",
        func=get_details,
        description="Find the product details about a specific product. The input is the link to the product. You can get this link from the search results you got from the function 'get_products'",
        return_direct=True
    )

]

memory = ConversationBufferMemory(memory_key="chat_history")

agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True, memory=memory)
print(agent.run(query))




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m First, I need to know if the suggested power drill is available at Home Depot.
Action: is_homedepot
Action Input: suggested power drill[0m
Observation: [36;1m[1;3m
Yes[0m
Thought:[32;1m[1;3m Okay, now I need to find power drills at Home Depot.
Action: get_products
Action Input: power drills[0m
Observation: [33;1m[1;3m
I can help you with that. Home Depot offers a wide variety of power drills, including cordless, corded, and hammer drills. If you're looking for a cordless drill, I recommend the DeWalt 20V Max Cordless Drill. It's a powerful and versatile drill that's perfect for a variety of tasks. If you're not sure what type of drill you need, or if you have any questions, you can contact Home Depot customer service at 1-800-HOME-DEPOT (1-800-466-3337).[0m
[32;1m[1;3m[0m

[1m> Finished chain.[0m

I can help you with that. Home Depot offers a wide variety of power drills, including cordless, corded, and hammer