<a href="https://colab.research.google.com/github/kutyadog/ai_notebooks/blob/main/OpenAI_Chatbot_test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Guidepost Chatbot Demo

** This project built in March 2023 as a POC for HR chatbot

**TO RUN THIS APP SIMPLY RUN STEP 1 & STEP 2 BELOW.**

NOTES:


*   Source information is pulled from nearly 1,600 Guidepost HR web pages.
*   Guidepost data has been saved into a CSV file
*   I have already created the embeddings for the guidepost data, saving it into a column called **'embeddings'**.
*   I have saved this csv file as 'question_embeddings.csv' in my google drive, and included code in the notebook to auto download it for easy usage.
*   Gradio is used to easily build an interface to this project.
*   User can choose which model they wish to chat with in interface.

Questions regarding code, please feel free to reach out at kutyadog@gmail.com (Chris Johnson).




# STEP 1: Click play below

Will install and import all libraries, load run all the custom functions fo

In [None]:
# @title Install & import, set variables, download csv file

!pip install -q openai
!pip install -q gradio
!pip install tiktoken

import openai
from google.colab import userdata


# import whisper
import pandas as pd
# import io
# from google.colab import files
# from pytube import YouTube
from getpass import getpass
import ast  # for converting embeddings saved as strings back to arrays
from ast import literal_eval
import tiktoken  # for counting tokens
import os # for getting API token from env variable OPENAI_API_KEY
import gradio as gr
from scipy import spatial  # for calculating vector similarities for search

COMPLETIONS_MODEL = "gpt-4-1106-preview" # "gpt-3.5-turbo" #
EMBEDDINGS_MODEL = "text-embedding-ada-002"
openai.api_key = userdata.get('OPENAI_API_KEY') # wapo gpt-4

# download question_embeddings.csv from my google drive (contains embeddings already so its ready to go)
!gdown 1HLyJJ7NciWvZaupfutqt5m_P6AsEcGsl -O question_embeddings.csv

In [2]:
# @title Load guidepost data into memory as dataframe

theData = pd.read_csv('question_embeddings.csv')

# Assuming embedding column contains string representations of lists
theData['embedding'] = theData['embedding'].apply(literal_eval)
# print(theData.columns)
theData

Unnamed: 0.1,Unnamed: 0,title,url,last_modified,description,context,embedding
0,0,Overview,https://guidepost.washpost.com/pf/benefits_and...,2023-02-22,,CVS Caremark is the pharmacy administrator fo...,"[-0.002578002167865634, 0.0024170803371816874,..."
1,1,International SOS,https://guidepost.washpost.com/pf/benefits_and...,2023-03-10,,"The Post has partnered with <a href=""https://...","[0.005726261530071497, 0.006739968899637461, 0..."
2,2,Overview,https://guidepost.washpost.com/pf/benefits_and...,2023-03-10,,All employees are eligible for the following ...,"[-0.019079700112342834, -0.013846032321453094,..."
3,3,World Wide Travel Associates (WWWTA),https://guidepost.washpost.com/pf/benefits_and...,2023-03-10,,"<a href=""https://www.wwtainc.com/"" target=""_b...","[0.00044795835856348276, -0.008983242325484753..."
4,4,Business Travel Accident (BTA),https://guidepost.washpost.com/pf/benefits_and...,2023-03-10,,Every Post employee is covered by the Busines...,"[-0.0033833058550953865, -0.010314584709703922..."
...,...,...,...,...,...,...,...
1084,1084,West Penthouse Temporary Closure,https://guidepost.washpost.com/pf/safety_and_s...,2020-03-18,"<p>March 6, 2018</p>\r\n<p>Please be advised t...","<p>March 6, 2018</p>\r\n<p>Please be advised ...","[-0.016541440039873123, -0.013538258150219917,..."
1085,1085,DC Office: Garage Construction,https://guidepost.washpost.com/pf/safety_and_s...,2020-07-07,<p>Ev-Air-Tight Shoemaker will begin conductin...,"February 22, 2018 Ev-Air-Tight Shoemaker will...","[0.009927400387823582, -0.005498898681253195, ..."
1086,1086,Garage Repairs Continue,https://guidepost.washpost.com/pf/safety_and_s...,2020-07-17,,Phase 10 of One Franklin Square Garage constr...,"[-0.010752275586128235, -0.014265526086091995,..."
1087,1087,Garage Repairs Continue...,https://guidepost.washpost.com/pf/safety_and_s...,2020-07-17,,Do you drive to work and park at the One Fran...,"[-0.013834958896040916, -0.0024375098291784525..."


In [10]:
# @title Functions

# search through guidepost embeddings for vectors similar to the questions embeddings
def strings_ranked_by_relatedness(
    query: str,
    df: pd.DataFrame,
    relatedness_fn=lambda x, y: 1 - spatial.distance.cosine(x, y),
    top_n: int = 20
) -> tuple[list[str], list[float]]:
    """Returns a list of strings and relatednesses, sorted from most related to least."""
    query_embedding_response = openai.embeddings.create(
        model=EMBEDDINGS_MODEL,
        input=query,
    )
    query_embedding = query_embedding_response.data[0].embedding
    strings_and_relatednesses = [
        (row["context"], relatedness_fn(query_embedding, row["embedding"]))
        for i, row in df.iterrows()
    ]
    strings_and_relatednesses.sort(key=lambda x: x[1], reverse=True)
    strings, relatednesses = zip(*strings_and_relatednesses)
    return strings[:top_n], relatednesses[:top_n]

def num_tokens(text: str, model: str = COMPLETIONS_MODEL) -> int:
    """Return the number of tokens in a string."""
    encoding = tiktoken.encoding_for_model(model)
    return len(encoding.encode(text))

#     Format your whole answer as structured HTML.

def query_message(
    query: str,
    df: pd.DataFrame,
    model: str,
    token_budget: int
) -> str:
    """Return a message for GPT, with relevant source texts pulled from a dataframe."""
    print("-----------query_message")
    print(query)
    print(model)

    strings, relatednesses = strings_ranked_by_relatedness(query, df)
    introduction = """
    You are a friendly chatbot assistant for The Washington Post's HR website called Guidepost.
    Answer the following question using only the context below, offering links when available and appropriate.
    All links should open a new tab.
    Only give information related to the context. If you don't know the answer for certain, say 'I don't know the answer to that.'
    """
    question = f"\n\nQuestion: {query}"
    message = introduction
    for string in strings:
        next_article = f'\n\nContext:\n"""\n{string}\n"""'
        if (
            num_tokens(message + next_article + question, model=model)
            > token_budget
        ):
            break
        else:
            message += next_article

    print(message)
    return message + question


def ask(
    query: str,
    df: pd.DataFrame = theData,
    model = COMPLETIONS_MODEL,
    token_budget: int = 4096 - 500,
    print_message: bool = False,
) -> str:
    global COMPLETIONS_MODEL
    print("-----------ask()")
    print(query)
    print(model)

    COMPLETIONS_MODEL = model

    message = query_message(query, df, model=model, token_budget=token_budget)
    if print_message:
        print(message)
    messages = [
        {"role": "system", "content": "Answer in the style of a friendly chatbot assistant for The Washington Post's HR website called Guidepost."},
        {"role": "user", "content": message},
    ]
    response = openai.chat.completions.create(
        model=COMPLETIONS_MODEL,
        messages=messages,
        temperature=0
    )
    # print('------------AI response')
    # print(response)
    response_message = response.choices[0].message.content
    # print(response_message)
    return response_message



In [4]:
# @title interface functions

def ask_question(question, model):
  return ask(query=question, model=model).replace("```html", "").replace("```", "")

def respond(message, chat_history: list, modelName):
  bot_message = ask_question(message, modelName)
  # chat_history.append((message, bot_message))
  return bot_message

In [5]:
# @title examples

exampleItems = [
    ["How do I figure out how much vacation time I have left?"],
    ["Someone keeps sending me threatening emails, what should I do?"],
    ["My supervisor gave me an unfair review. What should I do?"],
    ["Is there a way for me to view that last town hall?"],
    ["How does the post determine payscale?"],
    ["Will I be forced to keep getting coronavirus boosters?"],
    ["A coworker said something offensive to me. What should I do?"],
    ["I am attracted to a coworker. Should I ask her on a date?"],
    ["My boss called me stupid. Can I get him fired?"],
    ["I want to change how much I contribute to my 401k. How can I do that?"],
    ["What is the leadership project?"],
    ["I need to file a SARS request. How can I do that?"],
    ["I see a therapist. How can I pay with my insurance?"],
    ["What is Taco Bell and what food do they offer?"],
    ["When was George Washington born?"],
    ["what is the Wash posts smoking policy?"],
    ["Can I get in trouble with the Post for posting something on Facebook?"],
    ["I need to look at my payslips?"],
    ["When will I get my W2"],
    ["Can I get my payslips mailed to me?"],
    ["I need to file an expense. How can I do that?"],
    ["Will the Post pay for my gym membership?"],
    ["What is gympass?"],
    ["I dont like where I am sitting at the Post. Can I change my seat?"],
    ["Do I need to join the Guild?"],
    ["What cultural programs does the post offer employees?"],
    ["What holidays does the Post recognize?"],
    ["Does the Post offer tax services?"],
    ["Should illegal aliens be imprisoned before they are returned to their country?"],
    ["How many calories are in the average hamburger?"],
    ["How do I change my 401k amount?"]
]

# Step 2 : Click play below

When it is done running, if there are no errors, it will print out a sharable link, something like this:

```Running on public URL: https://6bcf18eb5678c6f91f.gradio.live```

Open that link in a new window. That is the link you will want to share as long as this colab is running.

In [11]:
with gr.Blocks() as demo:
    slider = gr.Radio(["gpt-4-1106-preview", "gpt-3.5-turbo"], value=COMPLETIONS_MODEL, label="Model", interactive=True)
    chat = gr.ChatInterface(respond, examples=exampleItems, additional_inputs=slider )
demo.launch(debug=True)
# demo.launch()

  self.chatbot = Chatbot(


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://728a1025a067ddb8db.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


-----------ask()
Does the Post offer tax services?
gpt-4-1106-preview
-----------query_message
Does the Post offer tax services?
gpt-4-1106-preview

    You are a friendly chatbot assistant for The Washington Post's HR website called Guidepost.
    Answer the following question using only the context below, offering links when available and appropriate.
    All links should open a new tab.
    Only give information related to the context. If you don't know the answer for certain, say 'I don't know the answer to that.'
    

Context:
"""
 Want to make filing your tax return easier and save money along the way? Check out these PostPerks that help you file your tax return in 2023. <a href="https://washpost.perkspot.com/merchant/1422/turbotax">TurboTax</a>: Save up to an additional $20 and get your taxes done right <a href="https://washpost.perkspot.com/merchant/45660/taxslayer">TaxSlayer</a>: Get 25% Off Your Federal E-File <a href="https://washpost.perkspot.com/merchant/910/h-r-block">H&





---

<BR><BR>

**Everything below is not necessary for running the chatbot.**

<BR><BR>

---




# Testing

In [12]:
# @title Show Guidepost data that might answer our question

strings, relatednesses = strings_ranked_by_relatedness("how to change my 401k?", theData, top_n=5)

for string, relatedness in zip(strings, relatednesses):
    print(f"{relatedness=:.3f}")
    display(string)

relatedness=0.865


' You can make changes to your 401(k) at any time. <b>Changing your savings rate</b> Go to&nbsp;<a href="https://guidepost.washpost.com/pf/home">GuidePost&nbsp;</a>&gt; Okta apps &gt;&nbsp;<a href="https://washpost.okta.com/home/thewashingtonpostandcompaniesprod_vanguard_3/0oa1e7vb7q8XtIIUx0h8/aln1e7vk9ba4Tgt050h8">Vanguard</a>. Once you are logged in to your account in our plan, select MANAGE MY MONEY &gt; Change My Paycheck Deduction. Please confirm paycheck deductions from your payslip on or after the effective date. <b>Adjusting your investments</b> To change how your future contributions are invested, go to <a href="https://guidepost.washpost.com/pf/home">GuidePost&nbsp;</a>&gt; Okta apps &gt; <a href="https://washpost.okta.com/home/thewashingtonpostandcompaniesprod_vanguard_3/0oa1e7vb7q8XtIIUx0h8/aln1e7vk9ba4Tgt050h8">Vanguard</a>. Once you’re logged in select MANAGE MY MONEY &gt; Change my investments &gt; Change paycheck investment mix. Once you complete this step, you’ll be as

relatedness=0.842


' <b>How do I sign up?</b> You can enroll in the 401(k) at any time (<a href="https://guidepost.washpost.com/pf/home">GuidePost </a>&gt; Okta apps &gt; <a href="https://washpost.okta.com/home/thewashingtonpostandcompaniesprod_vanguard_3/0oa1e7vb7q8XtIIUx0h8/aln1e7vk9ba4Tgt050h8" target="_blank">Vanguard</a>). You may be asked to enter some personal information and input The Post’s plan number, 094181, then to make deferral and investment elections. Please confirm on your first payslip on or after the effective date. <b>How much does the company match?</b> The company matches $0.50 on the dollar on the first 6 percent of pay that you invest on a combined pre-tax/Roth basis. <b>How do I change my contribution rate?</b> Go to <a href="https://guidepost.washpost.com/pf/home">GuidePost </a>&gt; Okta apps &gt; <a href="https://washpost.okta.com/home/thewashingtonpostandcompaniesprod_vanguard_3/0oa1e7vb7q8XtIIUx0h8/aln1e7vk9ba4Tgt050h8" target="_blank">Vanguard</a>. Once you’re logged in to y

relatedness=0.842


' <b>How do I sign up?</b> You can enroll in the 401(k) at any time (<a href="https://guidepost.washpost.com/pf/home">GuidePost </a>&gt; Okta apps &gt; <a href="https://washpost.okta.com/home/thewashingtonpostandcompaniesprod_vanguard_3/0oa1e7vb7q8XtIIUx0h8/aln1e7vk9ba4Tgt050h8" target="_blank">Vanguard</a>). You may be asked to enter some personal information and input The Post’s plan number, 094181, then to make deferral and investment elections. Please confirm on your first payslip on or after the effective date. <b>How much does the company match?</b> The company may make matching contributions to your account. Different groups of employees receive different contributions. Company matching contributions provide an additional incentive for you to participate in the plan.,\r\nTable:\r\nEmployee Group,Condition,Percent of Base Salary Matched,Year of Service Required\r\nCraft union electricians,N/A,1%,No\r\nCraft union engineers, carpenters and painters,Hired before July 15, 2011,4.5%,

relatedness=0.841


' <b>How do I sign up?</b> You can enroll in the 401(k) at any time (<a href="https://guidepost.washpost.com/pf/home">GuidePost </a>&gt; Okta apps &gt; <a href="https://washpost.okta.com/home/thewashingtonpostandcompaniesprod_vanguard_3/0oa1e7vb7q8XtIIUx0h8/aln1e7vk9ba4Tgt050h8" target="_blank">Vanguard</a>). You may be asked to enter some personal information and input The Post’s plan number, 094181, then to make deferral and investment elections. Please confirm on your first payslip on or after the effective date. <b>How much does the company match?</b> The company matches $0.50 on the dollar on the first 6 percent of pay you invest on a combined pre-tax/Roth basis. <b>How do I change my contribution rate?</b> Go to <a href="https://guidepost.washpost.com/pf/home">GuidePost </a>&gt; Okta apps &gt; <a href="https://washpost.okta.com/home/thewashingtonpostandcompaniesprod_vanguard_3/0oa1e7vb7q8XtIIUx0h8/aln1e7vk9ba4Tgt050h8" target="_blank">Vanguard</a>. Once you are logged in to your 

relatedness=0.839


' <b>Enrollment</b> You can enroll in the 401(k) at any time (<a href="https://guidepost.washpost.com/pf/home">GuidePost </a>&gt; Okta apps &gt; <a href="https://washpost.okta.com/home/thewashingtonpostandcompaniesprod_vanguard_3/0oa1e7vb7q8XtIIUx0h8/aln1e7vk9ba4Tgt050h8" target="_blank">Vanguard</a>). You will be asked to enter some personal information and input The Post’s plan number, 094181, then to make deferral and investment elections. Please confirm paycheck deductions from your payslip on or after the effective date. For employees hired on or after March 1, 2018, if you do not enroll when first eligible, you will be enrolled automatically to contribute 6 percent of eligible pay on a pre-tax basis. Your contribution percentage will increase by one percentage point each April thereafter, until it reaches 10 percent or until you complete an online enrollment. Until you select your own investments, your contributions will be invested in the default fund, which is the Vanguard targ

In [None]:
# @title manually ask question

# set print_message=True to see the source text GPT was working off of
ask('How do i change my 401k?', print_message=True)

# ask('How do i change my 401k?')

-----------ask()
How do i change my 401k?
gpt-3.5-turbo
-----------query_message
How do i change my 401k?
gpt-3.5-turbo

    You are a friendly chatbot assistant for The Washington Post's HR website called Guidepost. 
    Answer the following question using only the context below, offering links when available and appropriate.
    All links should open a new tab.
    Only give information related to the context. If you don't know the answer for certain, say 'I don't know the answer to that.'
    

Context:
"""
 You can make changes to your 401(k) at any time. <b>Changing your savings rate</b> Go to&nbsp;<a href="https://guidepost.washpost.com/pf/home">GuidePost&nbsp;</a>&gt; Okta apps &gt;&nbsp;<a href="https://washpost.okta.com/home/thewashingtonpostandcompaniesprod_vanguard_3/0oa1e7vb7q8XtIIUx0h8/aln1e7vk9ba4Tgt050h8">Vanguard</a>. Once you are logged in to your account in our plan, select MANAGE MY MONEY &gt; Change My Paycheck Deduction. Please confirm paycheck deductions from your

"To make changes to your 401(k), you have a few options depending on what specific changes you want to make. Here are the steps for different changes you may want to make:\n\n1. Changing your savings rate: \n   - Go to GuidePost > Okta apps > Vanguard.\n   - Once you are logged in to your account in our plan, select MANAGE MY MONEY > Change My Paycheck Deduction.\n   - Please confirm paycheck deductions from your payslip on or after the effective date.\n\n2. Adjusting your investments:\n   - Go to GuidePost > Okta apps > Vanguard.\n   - Once you’re logged in, select MANAGE MY MONEY > Change my investments > Change paycheck investment mix.\n   - Once you complete this step, you’ll be asked if you want to rebalance your entire portfolio to match your new contribution allocations.\n\n3. Designating beneficiaries:\n   - Go to GuidePost > Okta apps > Vanguard.\n   - Once you’re logged in to your account in our plan, select MENU > My Profile > Beneficiaries.\n\n4. Rolling over money from a p

# Redo the embeddings (ONLY run this if you have new Q&A data)

Running this will reload the original csv file (without embeddings) and create the embeddings again, saving the file as *question_embeddings.csv*. Leaving this code in just so the process is shown and can be reproduced with new data.

NOTE: THIS IS OLD CODE AND WILL PROBABLY NOT WORK ANYMORE. :(

In [None]:
# import pandas as pd
!gdown 1aK7p7ZlrX-QD-WWguBPUHfPX5WP-HBy1 -O formatted_articles.csv

theData = pd.read_csv('formatted_articles.csv', names=('title', 'url', 'context'))
print(theData.columns)
theData

Build the embedding vectors and save them to question_embeddings.csv

In [None]:
from openai.embeddings_utils import get_embedding

#get_embedding(episode.iloc[0]['context'], engine='text-embedding-ada-002')

theData['embedding'] = theData['context'].apply(lambda row: get_embedding(row, engine='text-embedding-ada-002'))
theData.to_csv('question_embeddings.csv')