# Customer Support Question Answering Chatbot
Implement a context-aware question-answering system using LangChain.

---

## Workflow
This project aims to build a chatbot that leverages GPT4 to search for answers within documents. The workflow for the experiment is explained in the following diagram:
<br/>
<img src="../../images/chatbot-workflow.png" alt="Chatbot Workflow" style="width: 70%; height: auto;"/>

## Setup

In [1]:
import openai
import os
from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv())
openai.api_type = os.environ.get("OPENAI_API_TYPE")
openai.api_base = os.environ.get("OPENAI_API_BASE")
openai.api_version = os.environ.get("OPENAI_API_VERSION")
openai.api_key = os.environ.get("OPENAI_API_KEY")

## Building the Chatbot

In [15]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import DeepLake
from langchain.text_splitter import CharacterTextSplitter
from langchain.chat_models import AzureChatOpenAI
from langchain.document_loaders import SeleniumURLLoader
from langchain import PromptTemplate

The database for our chatbot will consist of articles regarding technical issues.

In [3]:
# we'll use information from the following articles
urls = [
    "https://beebom.com/what-is-nft-explained/",
    "https://beebom.com/how-delete-spotify-account/",
    "https://beebom.com/how-download-gif-twitter/",
    "https://beebom.com/how-use-chatgpt-linux-terminal/",
    "https://beebom.com/how-delete-spotify-account/",
    "https://beebom.com/how-save-instagram-story-with-music/",
    "https://beebom.com/how-install-pip-windows/",
    "https://beebom.com/how-check-disk-usage-linux/",
]

### 1. Split the documents into chunks and compute their embeddings

We load the documents from the provided URLs and split them into chunks using the `CharacterTextSplitter` with a chunk size of 1000 and no overlap:

> Note: Please provide the full path to your browser executable file in `binary_location`.

In [4]:
# use the selenium scraper to load the documents
loader = SeleniumURLLoader(
    urls=urls, binary_location=os.environ.get("BROWSER_EXEC_PATH")
)
docs_not_splitted = loader.load()

# we split the documents into smaller chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(docs_not_splitted)

Next, we compute the embeddings using `HuggingFaceEmbeddings` and store them in a Deep Lake vector store on the cloud. 

In [None]:
embeddings = HuggingFaceEmbeddings()

# create Deep Lake dataset
# TODO: use your organization id here. (by default, org id is your username)
my_activeloop_org_id = os.environ.get("ACTIVELOOP_ORG_ID")
my_activeloop_dataset_name = "customer_support"
dataset_path = f"hub://{my_activeloop_org_id}/{my_activeloop_dataset_name}"
db = DeepLake(dataset_path=dataset_path, embedding_function=embeddings)

# add documents to our Deep Lake dataset
db.add_documents(docs)

To retrieve the most similar chunks to a given query, we can use the `similarity_search` method of the Deep Lake vector store:

In [6]:
# let's see the top relevant documents to a specific query
query = "how to check disk usage in linux?"
docs = db.similarity_search(query)
print(docs[0].page_content)

Home  Tech  How to Check Disk Usage in Linux (4 Methods)

How to Check Disk Usage in Linux (4 Methods)

Beebom Staff

Last Updated: June 19, 2023 5:14 pm

There may be times when you need to download some important files or transfer some photos to your Linux system, but face a problem of insufficient disk space. You head over to your file manager to delete the large files which you no longer require, but you have no clue which of them are occupying most of your disk space. In this article, we will show some easy methods to check disk usage in Linux from both the terminal and the GUI application.

Monitor Disk Usage in Linux (2023)

Table of Contents

Check Disk Space Using the df Command
		
Display Disk Usage in Human Readable FormatDisplay Disk Occupancy of a Particular Type

Check Disk Usage using the du Command
		
Display Disk Usage in Human Readable FormatDisplay Disk Usage for a Particular DirectoryCompare Disk Usage of Two Directories


### 2: Craft a prompt for GPT-4

We will create a prompt template that incorporates role-prompting, relevant Knowledge Base information, and the user's question:

In [8]:
# let's write a prompt for a customer support chatbot that
# answer questions using information extracted from our db
template = """You are an exceptional customer support chatbot that gently answer questions.

You know the following context information.

{chunks_formatted}

Answer to the following question from a customer. Use only information from the previous context \
information. Do not invent stuff.

Question: {query}

Answer:"""

prompt = PromptTemplate(
    input_variables=["chunks_formatted", "query"],
    template=template,
)

### 3: Utilize the GPT4 model with a temperature of 0 for text generation

#### 3.1 The Manual Way

In [10]:
# the full pipeline

# user question
query = "How to check disk usage in linux?"

# retrieve relevant chunks
docs = db.similarity_search(query)
retrieved_chunks = [doc.page_content for doc in docs]

# format the prompt
chunks_formatted = "\n\n".join(retrieved_chunks)
prompt_formatted = prompt.format(chunks_formatted=chunks_formatted, query=query)

# generate answer
llm = AzureChatOpenAI(deployment_name="gpt4", temperature=0)
answer = llm.predict(prompt_formatted)
print(answer)

To check disk usage in Linux, you can use one of the following methods:

1. Using the df command: Open the terminal and type "df" to display disk space usage for all mounted filesystems. You can also use "df -h" to display the output in a human-readable format.

2. Using the du command: In the terminal, type "du" to display disk usage for a specific directory. You can use "du -h" for a human-readable format or "du -h /path/to/directory" to check disk usage for a particular directory.

3. Using Gnome Disk Tool: Install the Gnome Disk Tool using "sudo apt-get -y install gnome-disk-utility". Open the tool, and click on a partition's name to view details such as device name, file system type, and available space.

4. Using Disk Usage Analyzer Tool: Install the tool using "sudo snap install gdu-disk-usage-analyzer". Access it via the Applications menu, and click on a device name to view a ring chart of disk occupancy for all folders.

Remember to choose the method that best suits your needs

#### 3.2 The Automated Way

In [14]:
from langchain.chains import RetrievalQA
from IPython.display import display, Markdown

qa_stuff = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=db.as_retriever(),
    verbose=True,
)

response = qa_stuff.run(query)
display(Markdown(response))



[1m> Entering new  chain...[0m

[1m> Finished chain.[0m


There are several methods to check disk usage in Linux, both from the terminal and using GUI applications. Here are some common methods:

1. Using the df command in the terminal:
To check disk space, open the terminal and type the following command:

```
df
```

To display disk usage in a human-readable format, use the -h flag:

```
df -h
```

2. Using the du command in the terminal:
To check disk usage for a particular directory, use the following command:

```
du /path/to/directory
```

To display disk usage in a human-readable format, use the -h flag:

```
du -h /path/to/directory
```

3. Using the Gnome Disk Tool (GUI):
First, install the Gnome Disk Tool using the following command:

```
sudo apt-get -y install gnome-disk-utility
```

Open the Gnome Disk Tool, and click on the partition's name to view details such as device name, file system type, and available space.

4. Using the Disk Usage Analyzer Tool (GUI):
Install the Disk Usage Analyzer using the following command:

```
sudo snap install gdu-disk-usage-analyzer
```

Open the Disk Usage Analyzer tool from the Applications menu. It will show all the storage partitions connected to your system along with your Home directory. Click on the device name to view a ring chart of the disk occupancy for all the folders.

## Issue with Generating Answers using LLMs

GPT-4 is less likely to generate false information when the answer to the user's question is contained within the context. But what if the answer is not contained within the context? In this case, GPT-4 might make things up (**hallucination**). Since user questions are often brief and ambiguous, we cannot always rely on the semantic search step to retrieve the correct document. Thus, there is always a risk of generating false information.

To mimimize this we can write better prompts by adding the following statement when there is not enough information in the context: `Say "I don't have enough information." if you don't have enough information in the context to answer the question.`

In [None]:
template = """You are an exceptional customer support chatbot that gently answer questions.

You know the following context information.

{chunks_formatted}

Answer to the following question from a customer. Use only information from the previous context \
information. Do not invent stuff. Say "I don't have enough information." if you don't have enough \
information in the context to answer the question.

Question: {query}

Answer:"""