# Building a Chatbot that Doesn't Suck

In this notebook we'll build a RAG-based chatbot for a small furniture manufacturer in Oahu, Hawaii

## Prequisites

### Install Pandoc

This notebook requires [Pandoc](https://pandoc.org/) to be installed on your system, to convert the furniture company's HTML pages to markdown format. The cell below will check if it's installed.

In [14]:
import shutil
import sys
from IPython.display import HTML

def check_pandoc():
    if not shutil.which("pandoc"):
        return HTML("Please <a href=\"https://pandoc.org/installing.html\">install Pandoc</a> before continuing")
    else:
        return HTML("Pandoc is already installed. You're good to go!")

check_pandoc()

### Set Up Authorization Tokens

In this notebook we'll use:

- [Jina Embeddings v2](https://jina.ai/embeddings/)
- [Hugging Face Inference API](https://huggingface.co/settings/tokens) (token link)

You'll need to get tokens for each of the above and enter them below.

**Note:** For Hugging Face token, please choose finegrained permissions and enable _Make calls to the serverless Inference API_.

In [None]:
from getpass import getpass

jinaai_api_key = getpass(prompt="Your Jina Embeddings API key: ")
hf_inference_api_key = getpass(prompt="Your Hugging Face Inference API key: ")

In [None]:
# RAG dependencies
!pip install -q llama-index llama-index-llms-openai llama-index-embeddings-jinaai llama-index-llms-huggingface "huggingface_hub[inference]"

## Process data

We used GPT to generate some sample data for a fictitious small furniture maker in Oahu, Hawaii. This consists of four simple HTML pages:

- [Front page](https://raw.githubusercontent.com/jina-ai/workshops/main/notebooks/embeddings/rag-chatbot/data/front.html)
- [Product listings page](https://raw.githubusercontent.com/jina-ai/workshops/main/notebooks/embeddings/rag-chatbot/data/products.html)
- [FAQ page](https://raw.githubusercontent.com/jina-ai/workshops/main/notebooks/embeddings/rag-chatbot/data/faq.html)
- [Contact page](https://raw.githubusercontent.com/jina-ai/workshops/main/notebooks/embeddings/rag-chatbot/data/contact.html)

### Download Data

In [15]:
from glob import glob
import os
import subprocess
import requests

In [36]:
data_dir = "./data"

In [34]:
# cleanup from last run
!rm -rf {data_dir}
!mkdir {data_dir}

In [17]:
html_urls = [
    "https://raw.githubusercontent.com/jina-ai/workshops/main/notebooks/embeddings/rag-chatbot/data/front.html",
    "https://raw.githubusercontent.com/jina-ai/workshops/main/notebooks/embeddings/rag-chatbot/data/products.html",
    "https://raw.githubusercontent.com/jina-ai/workshops/main/notebooks/embeddings/rag-chatbot/data/faq.html",
    "https://raw.githubusercontent.com/jina-ai/workshops/main/notebooks/embeddings/rag-chatbot/data/contact.html"
]

In [35]:
html_files = []

for url in html_urls:
    filename = url.split('/')[-1]
    file_path = os.path.join(data_dir, filename)

    html = requests.get(url).content
    html_files.append(file_path) # store path in a list for future processing

    with open(file_path, "wb") as file:
        file.write(html)

### Convert to Markdown

HTML is a pain to break into chunks and unreliable for LLMs to parse. We'll convert it to [markdown]() to make things easier:

In [None]:
# convert html files to markdown for easier chunking
for filename in html_files:
  base_name = os.path.splitext(filename)[0]
  md_file = os.path.join(base_name + ".md")

  # Colab uses ancient pandoc, with different argument for markdown header style
  try:
    # colab pandoc
    subprocess.run(["pandoc", "--atx-headers", filename, "-o", md_file], check=True)
  except:
    # newer pandoc
    subprocess.run(["pandoc", "--markdown-headings=atx", filename, "-o", md_file], check=True)

md_files = glob(f'{data_dir}/*.md')

### Break Pages into Chunks

We'll make the data more digestible to our chatbot by breaking it into chunks:

In [None]:
# break markdown files into chunks
docs = []

for md_file in md_files:
    with open(md_file, 'r') as f:
          
        content = f.read()
        docs.append(content) # add full page
    
        content_chunks = content.split("\n#")
        docs.extend(content_chunks) # add individual sections

## Build RAG system

### Access Jina Embeddings v2 via the LlamaIndex interface.

This code creates the LlamaIndex object that manages your connection to the Jina Embeddings v2 API.

The resulting object is held in the variable `jina_embedding_model`.


In [None]:
from llama_index.embeddings.jinaai import JinaEmbedding

jina_embedding_model = JinaEmbedding(
    api_key=jinaai_api_key,
    model="jina-embeddings-v2-base-en",
)

### Access the Mixtral Model via the HuggingFace Inference API

This code creates a holder for accessing the `mistralai/Mixtral-8x7B-Instruct-v0.1` model via the Hugging Face Inference API. The resulting object is held in the variable `mixtral_llm`.

In [None]:
from llama_index.llms.huggingface import HuggingFaceInferenceAPI

mixtral_llm = HuggingFaceInferenceAPI(
    model_name="mistralai/Mixtral-8x7B-Instruct-v0.1", token=hf_inference_api_key
)

### Convert Chunks To Be Suitable for LlamaIndex

In [None]:
from llama_index.core.readers import StringIterableReader
from llama_index.core.schema import Document

chunks = StringIterableReader().load_data(docs)

### Create a Service

The code creates a RAG service that has access to Jina Embeddings and Mixtral Instruct and stores it in the variable `service_context`.

In [None]:
from llama_index.core import ServiceContext

service_context = ServiceContext.from_defaults(
    llm=mixtral_llm, embed_model=jina_embedding_model
)

### Build the Document Index

Next, we store the documents in LlamaIndex' `VectorStoreIndex`, generating embeddings with Jina Embeddings v2 model and using them as keys for retrieval.

**Note:** this may take several minutes.

In [None]:
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(
    documents=chunks, service_context=service_context
)

### Prepare a Prompt Template

This is the prompt template that will be presented to Mixtral Instruct, with `{context_str}` and `{query_str}` replaced with the retrieved documents and your query respectively.

In [None]:
from llama_index.core import PromptTemplate

qa_prompt_tmpl = (
    "Context information is below.\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Given the context information and not prior knowledge, "
    "answer the query. Please be brief, concise, and complete.\n"
    "If the context information does not contain an answer to the query, "
    "respond with \"I'm sorry, but we don't have any information about that. Please contact us on info@oahufurniture.com for more information.\"."
    "Query: {query_str}\n"
    "Answer: "
)
qa_prompt = PromptTemplate(qa_prompt_tmpl)

### Assemble the Full Query Engine

The query engine has three parts:

* `retriever` is the search engine that takes user requests and retrieves relevant documents from the vector store.
* `response_synthesizer` uses the prompt created above to join the retrieved documents and user request and passes them to the LLM, getting back its response.
* `query_engine` is a container object that holds the two together.

In [None]:
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core import get_response_synthesizer

# configure retriever
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=2,
)

# configure response synthesizer
response_synthesizer = get_response_synthesizer(
    service_context=service_context,
    text_qa_template=qa_prompt,
    response_mode="compact",
)

# assemble query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
)

## Ask Some Questions

Let's ask some queries to see our chatbot in action:

In [None]:
from IPython.display import HTML

def get_answer(question):
    result = query_engine.query(question)
    return HTML(result.response.strip())

### Ask Relevant Questions

In [None]:
get_answer("What kind of furniture do you make?")

In [None]:
get_answer("How much does your furniture cost?")

In [None]:
get_answer("Can I see your furniture in person?")

In [None]:
get_answer("What payment methods do you accept?")

In [None]:
get_answer("What is your furniture made from?")

### Ask Irrelevant Question

We want to be sure our chatbot _won't_ answer irrelevant questions:

In [None]:
get_answer("How is a computer useful on a farm?")

### Ask Questions in Different Languages

Although the language model is specifically for English, it's often possible to get answers in other languages:

In [None]:
# German
get_answer("Was für Möbel stellen Sie her?")

In [None]:
# Simplified Chinese
get_answer("你的家具是用什么材料制成的？")

### Ask Your Own Questions

Enter your question below and then hit enter to send. The question should be generated within a few seconds. Type "stop" to quit the question loop.

In [None]:
while True:
    question = input("Please enter your question: ")
    if question.lower() == "stop":
        break
    answer = get_answer(question).data
    print(answer)