# SJ Chatbot: Create Bot

The Saint James Backpackers chatbot, or ChatSJB, was designed to answer new staff questions by combining the intelligence of ChatGPT with the hostel's 57 page operations manual. This better facilitates the onboarding of new employees and volunteers in an establishment with a high level of staff turnover. The text base for this project can be added to over time and even expanded to cover guest queries. The bot is constructed via the following steps:  

</br>

> **1) Splits**  
> A basic algorithm is applied to try and find appropriate split points in the operations manual (~100)

> **2) Summarization**  
> Any chunks over 200 words are shortened to a maximum length not exceeding this through a ChatGPT API call

> **3) Edits**  
> Iterative edits are made to a saved version of of the text (.csv) to improve the bot's answers

> **4) Embeddings**  
> The embeddings (numbers describing the meaning of words) for each chunk are obtained from OpenAI

> **5) Retrieval**    
> The bot obtains the embeddings for any given question and uses them to find the the most similar chunk

> **6) Context**  
> The bot then feeds the question, the text of the most relevant chunk and several other cues into ChatGPT for a response  

</br>
</br>

## Links to preprocessing notebooks

The following notebooks were also used in the preprocessing steps for this project:

</br>

> **[Prepare Documents](Prepare-Documents.ipynb)**  
> The prepare documents notebook shows how the document was split and summarized.

> **[Save Embeddings](Save-Embeddings.ipynb)**  
> The save embeddings notebook shows how the embeddings were obtained from OpenAI.

</br>
</br>

## Set up environment 

In [1]:
!pip install numpy | findstr /V /C:"Requirement already satisfied"
!pip install pandas | findstr /V /C:"Requirement already satisfied"
!pip install openai==1.3.7 | findstr /V /C:"Requirement already satisfied"
!pip install gradio==3.48.0 | findstr /V /C:"Requirement already satisfied"

import numpy as np
import pandas as pd
import openai
import os
from openai import OpenAI
from openai.types import CreateEmbeddingResponse, Embedding
from IPython.display import display, Markdown
import gradio as gr

os.environ["OPENAI_API_KEY"] = "XXX"
client = OpenAI()

## Read-in document embeddings

In [2]:
paragraphs = pd.read_csv('embeddings.csv')
doc_vectors = list(paragraphs.embeddings)
paragraphs.head(6)

Unnamed: 0,paragraphs,embeddings
0,About us. Saint James Backpackers is a family ...,"[0.023656275123357773, 0.01101954746991396, -0..."
1,A new modern communal kitchen. Free Breakfast ...,"[-0.003322854870930314, 0.0005134791717864573,..."
2,"Rooms. In the hostel, we have a total of 104 b...","[0.01565832830965519, 0.0019718254916369915, 0..."
3,New staff members arriving at SJB hostel shoul...,"[-0.004849358927458525, -0.015835190191864967,..."
4,The rota changes. The rota detailing staff shi...,"[0.005587662570178509, -0.03242306038737297, -..."
5,"At SJB, all volunteers are required to work 5 ...","[-0.008781889453530312, -0.009302792139351368,..."


## ChatGPT Prompt

### Prompt functions

In [3]:
def get_question_vector(question):
    question_vector = client.embeddings.create(
            input=question,
            model= "text-embedding-ada-002"
        ).data[0].embedding
    return question_vector

def get_relevant_doc(question_vector, doc_vectors):
    sim_scores = []
    for i in range(len(doc_vectors)):
        vec = eval(doc_vectors[i])
        sim_scores.append( np.dot(question_vector, vec) )

    max = np.argmax(sim_scores)

    if sim_scores[max] < 0.8:
        return 'No relevant information in the manual 😔'

    else:
        relvant_doc = paragraphs.paragraphs[max]
        return relvant_doc

def create_prompt(doc, question):
    prompt=f"""I have a document below. I want you to use it to answer my question in sentences of less than 75 words. Do not refer to the document itself. 
    Document: {doc}
    
    My Question is: {question}
    """
    prompt_answer_response = openai.completions.create(
        model="text-davinci-003",
        prompt=prompt,
        temperature=1,
        max_tokens=2000
    )
    return prompt_answer_response.choices[0].text.strip()

def get_response(question, showcase=False):
    question_vector = get_question_vector(question)
    doc = get_relevant_doc(question_vector, doc_vectors)

    if doc == 'No relevant information in the manual 😔':
        return doc

    else:
        response = create_prompt(doc, question)
    
        if showcase==True:
            print()
            print('Q:')
            print()
            print(question)
            print()
            print('A:')
            print()
            print(response)
            print()
            print()
            print()
    
        else:
            return response

## Display with Gradio

### Read-in markdown file

In [4]:
markdown_file = open('README.md').read()

### Create interface

In [5]:
iface = gr.Interface(
    fn=get_response, 
    inputs=gr.Textbox(label="Your question 🙋", placeholder="Enter text"),
    outputs=gr.Textbox(label="My answer 🤖"),
    title="Chat SJB 🔥",
    theme=gr.themes.Soft(),
    article=markdown_file
)
iface.launch(share=True)

Running on local URL:  http://127.0.0.1:7860
Running on public URL: https://d342e686612761c8b1.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




## View example responses

### Read-in questions

In [6]:
questions = pd.read_csv('questions.csv')

### Print responses

In [7]:
for i in range(len(questions)):
    get_response(questions.questions[i], showcase=True)


Q:

How do booking.com payments work?

A:

Booking.com has 3 different payments: Booking VC, Booking VCD, and Booking HC. Booking VC is charged when the reservation is first made, Booking VCD 48 hours before check-in, and Booking HC upon check-in. The type of payment can be identified by looking at the guest reservation on the Booking.com extranet. If there is a virtual card present, it is either a VC or VCD payment. If there is no virtual card, it is a HC payment.




Q:

How do booking button payments work?

A:

Booking Button payments are taken in full upon check in. Guests do not have to pay in advance, and their card details are taken at the time of making the reservation, but not charged until arrival. The cost of the first night's stay is pre-authorized immediately when the booking is made 12 or less days prior to check-in. No commission is required to be paid on payments from The Booking Button channel.




Q:

How do I use the washing machines?

A:

To use the washing machine