<a href="https://colab.research.google.com/github/kutyadog/ai_notebooks/blob/main/OpenAI_HR_Chatbot-3-2023.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# HR Chatbot POC - March 14, 2023

This project created by Chris Johnson - kutyadog@gmail.com

Built for Washington Post hackathon 2k23-ai-ml - March 14, 2023



---



**Problem:** HR is constantly getting calls and emails, people asking basic stuff that is outlined in detail on the HR website.

**Solution:** Using the HR website as the source, build an AI chatbot to answer these questions for them.



---



**Structure:**
* Pull 2300+ articles from db for HR website.
* Create vector embeddings for the article content.
* For each user query, query the content embeddings and include the top results in the prompt as context.

**Issues:**
* GPT-3.5 Turbo (the premier model when project started) had STRONG tendencies to hallucinate. Finding a prompt that could prevent this was difficult.
* Luckily GPT-4 was released just as I started this project and the Post was given early access. GPT-4 hallucinations were considerably less than 3.5.
* For this POC, I pulled a blind dump for all HR website content, regardless of age of content, etc. This caused issues. For example: If user asks, "What date can I expect to get my W2 from the Post?". There might be 4 different stories announcing when employees should expect their W2s, one from each of the past 4 years, each with a different date. This can cause confusion in the model. Long term solution:
  * Refining of source content (remove old stories)
  * Add additional checks/logic in coding when creating context

**Long-term Tools**
* Eventually build React interface to browse/edit/delete/add live content.
  * Give access too HR (content owners) so they can manage/own upkeep of project content.
* Do we tie-into HR website CMS, so that when a story is published, it is automatically added to chatbots embedded content?
* Save embedded content into vector database (Pinecone?)
* Build in thumbs-up & thumbs-down into chatbot interface so users can give feedback.
* log user communication with chatbot (anonymously) so that we can analyze discussions, look for issues.

# STEP 1: Run the Essentuals

In [None]:
!pip install openai

In [2]:
import openai
import pandas as pd

from google.colab import userdata

COMPLETIONS_MODEL = "gpt-4" # "text-davinci-003"
EMBEDDINGS_MODEL = "text-embedding-ada-002"
openai.organization = userdata.get('OPENAI_ORG')    # Chris Johnson's acct
openai.api_key = userdata.get('OPENAI_API_KEY')    # Chris johnson's key

In [3]:
# download source csv file with questions & embeddings
!gdown 1HLyJJ7NciWvZaupfutqt5m_P6AsEcGsl -O question_embeddings.csv

Downloading...
From: https://drive.google.com/uc?id=1HLyJJ7NciWvZaupfutqt5m_P6AsEcGsl
To: /content/question_embeddings.csv
100% 39.4M/39.4M [00:00<00:00, 44.3MB/s]


Getting / Processing the Content

For this part, you only need to run STEP 2 below.


# STEP 2: Process content with Embeddings already saved with them
Creating embeddings can take a while and cost $$$, so I have already done this step and saved it into the file `question_embeddings.csv`

For details on how embeddings were created, view code further below.

In [9]:
from google.colab import files
import pandas as pd
import os
import numpy as np

theData = pd.read_csv('question_embeddings.csv')
# , names=('xnum', 'title', 'url', 'last_modified', 'description', 'context', 'embedding')

# df = pd.read_csv('output/embedded_1k_reviews.csv')
theData['embedding'] = theData.embedding.apply(eval).apply(np.array)

# print(theData.columns)
pd.DataFrame(theData)


Unnamed: 0.1,Unnamed: 0,title,url,last_modified,description,context,embedding
0,0,Overview,https://guidepost.washpost.com/pf/benefits_and...,2023-02-22,,CVS Caremark is the pharmacy administrator fo...,"[-0.002578002167865634, 0.0024170803371816874,..."
1,1,International SOS,https://guidepost.washpost.com/pf/benefits_and...,2023-03-10,,"The Post has partnered with <a href=""https://...","[0.005726261530071497, 0.006739968899637461, 0..."
2,2,Overview,https://guidepost.washpost.com/pf/benefits_and...,2023-03-10,,All employees are eligible for the following ...,"[-0.019079700112342834, -0.013846032321453094,..."
3,3,World Wide Travel Associates (WWWTA),https://guidepost.washpost.com/pf/benefits_and...,2023-03-10,,"<a href=""https://www.wwtainc.com/"" target=""_b...","[0.00044795835856348276, -0.008983242325484753..."
4,4,Business Travel Accident (BTA),https://guidepost.washpost.com/pf/benefits_and...,2023-03-10,,Every Post employee is covered by the Busines...,"[-0.0033833058550953865, -0.010314584709703922..."
...,...,...,...,...,...,...,...
1084,1084,West Penthouse Temporary Closure,https://guidepost.washpost.com/pf/safety_and_s...,2020-03-18,"<p>March 6, 2018</p>\r\n<p>Please be advised t...","<p>March 6, 2018</p>\r\n<p>Please be advised ...","[-0.016541440039873123, -0.013538258150219917,..."
1085,1085,DC Office: Garage Construction,https://guidepost.washpost.com/pf/safety_and_s...,2020-07-07,<p>Ev-Air-Tight Shoemaker will begin conductin...,"February 22, 2018 Ev-Air-Tight Shoemaker will...","[0.009927400387823582, -0.005498898681253195, ..."
1086,1086,Garage Repairs Continue,https://guidepost.washpost.com/pf/safety_and_s...,2020-07-17,,Phase 10 of One Franklin Square Garage constr...,"[-0.010752275586128235, -0.014265526086091995,..."
1087,1087,Garage Repairs Continue...,https://guidepost.washpost.com/pf/safety_and_s...,2020-07-17,,Do you drive to work and park at the One Fran...,"[-0.013834958896040916, -0.0024375098291784525..."


# The Functions
This contains functions necessary to do the stuffs! Run it before anythings below.

WARNING: This no longer works. OpenAI's updates are NOT backwards compatible


In [12]:
# The Functions
from openai.embeddings_utils import get_embedding
from openai.embeddings_utils import cosine_similarity

def getNearestEmbeddings(the_question):
  question_vector = get_embedding(the_question, engine='text-embedding-ada-002')

  theData["similarities"] = theData['embedding'].apply(lambda x: cosine_similarity(x, question_vector))
  linkAnswers = theData.sort_values("similarities", ascending=False).head(5)
  return linkAnswers

def processTopAnswers(linkAnswers):
  context = []
  urls = []
  for i, row in linkAnswers.iterrows():
    # print(i)
    # print(len(row['context']))
    if row['similarities'] >= 0.77:
      context.append("link: "+ row['url'] + ' - '+ row['context'])
      urls.append(row['url'])
    # print(row['context'])

  # context

  text = "\n".join(context)
  # text

  # 18200 LENGTH worked... but I want to make sure there is always room for long url, etc.
  # print(len(text))

  context = text[:10000]

  if len(urls) > 0:
    print( urls[0])
  return context, urls


# Answer the question as truthfully as possible using the provided text, and if the answer is not contained within the text below, say "I don't know"

# Alternate way of doing this:
# https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb

# Answer the following question using only the context below. Answer in the style of a friendly chatbot assistant for The Washington Post's HR website called Guidepost, offering links when available and appropriate. DO not provide links that are not on https://guidepost.washpost.com/. Only give information related to the context. If you don't know the answer for certain, say I don't know.
def createPrompt(context, question):
  if len(context) <= 5:
    return ""
  prompt = f"""Answer the question as truthfully as possible using the provided context, and if the answer is not contained within the text below, say "I don't know". Offer the given link when appropriate.

  Context:
  {context}

  Q: {question}
  A:"""
  return prompt

def askQuestion(prompt):
  if prompt == "":
    return "I dont know"

  response = openai.ChatCompletion.create(
      messages=[
      {
        "role": "system",
        "content": "You are a helpful chatbot."
      },
      {
        "role": "user",
        "content": prompt
      },
    ],
      # prompt=prompt,
      temperature=0,
      max_tokens=512,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0,
      model=COMPLETIONS_MODEL,
  )

  print(response)
  return response["choices"][0]["message"]["content"]

def interface_ask_question(question):
  topAnswers = getNearestEmbeddings(question)

  # from others
  tempData = processTopAnswers(topAnswers)
  context = tempData[0]
  urls = tempData[1]

  url_string = ''
  for item in urls:
    url_string = url_string + '<a href="'+ item +'" target="_blank">'+ item+'</a><BR>'

  prompt = createPrompt(context, question)
  return askQuestion(prompt), url_string


ModuleNotFoundError: No module named 'openai.embeddings_utils'

# Step By Step Question Asking
For debugging

In [None]:
question = "How do I change my 401?"
topAnswers = getNearestEmbeddings(question)

topAnswers

Unnamed: 0.1,Unnamed: 0,title,url,last_modified,description,context,embedding,similarities
411,411,401(k) Changes,https://guidepost.washpost.com/pf/benefits_and...,2020-08-26,,You can make changes to your 401(k) at any ti...,"[-0.010378698818385601, -0.018784914165735245,...",0.839643
559,559,FAQ,https://guidepost.washpost.com/pf/benefits_and...,2023-01-12,,<b>How do I sign up?</b> You can enroll in th...,"[-0.002475699409842491, -0.021441766992211342,...",0.833999
561,561,FAQ,https://guidepost.washpost.com/pf/benefits_and...,2023-01-12,,<b>How do I sign up?</b> You can enroll in th...,"[-0.002396609168499708, -0.023753423243761063,...",0.831939
560,560,FAQ,https://guidepost.washpost.com/pf/benefits_and...,2023-01-12,,<b>How do I sign up?</b> You can enroll in th...,"[-0.0021199057810008526, -0.023978672921657562...",0.831704
527,527,Plan Details,https://guidepost.washpost.com/pf/benefits_and...,2022-05-19,,<b>Enrollment</b> You can enroll in the 401(k...,"[-0.009357093833386898, -0.024943415075540543,...",0.823685


# Ask it questions
Ask it to find out the closest embedded vector:

In [None]:
tempData = processTopAnswers(topAnswers)
context = tempData[0]
urls = tempData[1]
context




In [None]:
prompt = createPrompt(context, question)
askQuestion(prompt)

{
  "id": "chatcmpl-86Ooh30y1EDe2G8h0vDkfyhnSfCH8",
  "object": "chat.completion",
  "created": 1696536055,
  "model": "gpt-4-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "To change your 401(k), go to GuidePost > Okta apps > Vanguard. Once you are logged in to your account in our plan, you can make various changes such as changing your savings rate, adjusting your investments, designating beneficiaries, rolling over money from a previous employer\u2019s plan or an IRA into your 401(k), and converting assets to Roth. The specific steps for each change are detailed in the provided context. Here is the [link](https://guidepost.washpost.com/pf/benefits_and_resources/my_benefits/enroll_or_change_benefits/anytime_benefits/beneficiary-changes) for more information."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 2892,
    "completion_tokens": 135,
    "total_tokens": 3027
  }
}


'To change your 401(k), go to GuidePost > Okta apps > Vanguard. Once you are logged in to your account in our plan, you can make various changes such as changing your savings rate, adjusting your investments, designating beneficiaries, rolling over money from a previous employer’s plan or an IRA into your 401(k), and converting assets to Roth. The specific steps for each change are detailed in the provided context. Here is the [link](https://guidepost.washpost.com/pf/benefits_and_resources/my_benefits/enroll_or_change_benefits/anytime_benefits/beneficiary-changes) for more information.'

In [None]:


# question = "How do I figure out how much vacation time I have left?"
# question = "Someone keeps sending me threatening emails, what should I do?"
# question = "My supervisor gave me an unfair review. What should I do?"
# question = "Is there a way for me to view that last town hall?"     # THIS GAVE ME BAD ANSWER (fixed)
# question = "How does the post determine payscale?"

# question = "Will I be forced to keep getting coronavirus boosters?"
question = "A coworker said something offensive to me. What should I do?"
# question = "I am attracted to a coworker. Should I ask her on a date?"     #BAD ANSWER W BAD LINK (fixed)
# question = "My boss called me stupid. Can I get him fired?"

# question = "I want to change how much I contribute to my 401k. How can I do that?" #Not great answer
# question = "What is the leadership project?"
# question = "what is pornhub?"   #GIVES MORE INFO THAN WE WANT AND ODDLY PROVIDES A BROKEN LINK. SHOULD NOT TALK ABOUT ANYTHING NOT IN CONTEXT.
# question = "I need to file a SARS request. How can I do that?"  # BAD ANSWER
# question = "I see a therapist. How can I pay with my insurance?"
# question = "What is Taco Bell and what food do they offer?"
# question = "When was George Washington born?"
# question = "what is the Wash posts smoking policy?"
# question = "Can I get in trouble with the Post for posting something on Facebook?"  #NOT GREAT BAD LINKS
# question = "I need to look at my payslips?"
# question = "When will I get my W2"
# question = "Can I get my payslips mailed to me?"
# question = "I need to file an expense. How can I do that?"   #NOT GOOD ANSWERS, OFFSITE LINKS
# question = "Will the Post pay for my gym membership?"    #BAD LINKS
# question = "What is gympass?"     #BAD LINKS
# question = "I dont like where I am sitting at the Post. Can I change my seat?"
# question = "Do I need to join the Guild?"
# question = "What cultural programs does the post offer employees?"
# question = "What holidays does the Post recognize?"
# question = "Does the Post offer tax services?"
# question = "Should illegal aliens be imprisoned before they are returned to their country?"
# question = "How many calories are in the average hamburger?"
# question = "How do I change my 401k amount?"
# question = "How to change my 401k?"

# question = "How much vacation do I have?"                                   #Does not know, even though top answer is .85

# question = "How can I find out how much vacation I have left?"              #Answer is in 3rd article, even though top answer is .85
# question = "My supervisor gave me an unfair review. What should I do?"      #Link given is not 100% correct. Its a good link but to salary compensation

# question_vector = get_embedding(question, engine='text-embedding-ada-002')
# theData["similarities"] = theData['embedding'].apply(lambda x: cosine_similarity(x, question_vector))
# linkAnswers = theData.sort_values("similarities", ascending=False).head(5)

linkAnswers = getNearestEmbeddings(question)
linkAnswers

# # from others
# tempData = processTopAnswers(linkAnswers)
# context = tempData[0]
# urls = tempData[1]

# # # from the other 2
# prompt = createPrompt(context, question)
# askQuestion(prompt)



Unnamed: 0.1,Unnamed: 0,title,url,last_modified,description,context,embedding,similarities
464,464,Advice from around The Post...,https://guidepost.washpost.com/pf/benefits_and...,2020-07-27,,“I have hosted a couple of ‘dog’ social Zooms...,"[-0.007113411091268063, -0.010257313959300518,...",0.775428
519,519,Prohibiting Workplace Harassment,https://guidepost.washpost.com/pf/benefits_and...,2022-05-17,,The Washington Post is committed to providing...,"[-0.013575621880590916, 0.004625060595571995, ...",0.773903
913,913,How to Handle Inappropriate Communications,https://guidepost.washpost.com/pf/safety_and_s...,2022-11-22,Inappropriate communications are any communica...,Inappropriate communications from external in...,"[-0.010548917576670647, -0.009317762218415737,...",0.765102
499,499,Confidential Help Line,https://guidepost.washpost.com/pf/benefits_and...,2020-06-25,,Make a report to be shared with HR. Available...,"[-0.01672852784395218, -0.005579455755650997, ...",0.760618
514,514,Prohibiting Discrimination,https://guidepost.washpost.com/pf/benefits_and...,2022-05-17,,It is the policy of The Washington Post to pr...,"[-0.016293376684188843, -0.0064331660978496075...",0.760345


# Interface
This will build a gradio interface to interact with the code and create a sharable link.

In [None]:
!pip install gradio

In [None]:
import gradio as gr
def flip_text(x):
    return x[::-1]


def flip_image(x):
    return np.fliplr(x)

# demo = gr.Interface(
#     fn=interface_ask_question,
#     # inputs=["text", "checkbox", gr.Slider(0, 100)],
#     inputs=["text"],
#     # outputs=["text", "number"],
#     outputs = ['text', 'html']
#     # , 'dataframe'
# )

with gr.Blocks() as demo:
    gr.Markdown("Ask GuidePost chatbot.")
    with gr.Tab("Question / Answer"):
      with gr.Row():
        text_input = [ gr.Textbox() ]
        answer_text_output = [ gr.Textbox() ]

      ask_button = gr.Button("Ask")
    with gr.Tab("Data"):
        with gr.Row():
            image_input = gr.Image()
            image_output = gr.Image()

        image_button = gr.Button("Flip")

    with gr.Accordion("Open for More!"):
        gr.Markdown("Look at me...")

    ask_button.click(interface_ask_question, inputs=text_input, outputs=answer_text_output)
    image_button.click(flip_image, inputs=image_input, outputs=image_output)

demo.launch()


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Note: opening Chrome Inspector may crash demo inside Colab notebooks.

To create a public link, set `share=True` in `launch()`.


<IPython.core.display.Javascript object>



# Pulling Articles directly from HR website and create Embeddings

THIS WILL NOT WORK FOR YOU... nor do you need to do. I have left so you can see how it was done.




The following code will ingest json files pulled from `advanced-search`. They should be uploaded into directory `/data/`. The code will process all files in that directory and output that formatted content into one csv file `formatted_articles.csv`.


**Instructions to ingest content from outside source:**

Gather content in standardized JSON files.
Create directory `/data/` and upload all of your JSON files into it.

Below is a small example of the JSON structure I used to import my data:

```
{
  "content_elements": [
    {
      "content_elements": [
        {
          "content": "Learn more about how to use Zoom:",
          "type": "text"
        },
      ],
      "description": {
        "basic": ""
      },
      "headlines": {
        "basic": "Zoom Instructions"
      },
      "slug": "zoom-instructions",
      "taxonomy": {
        "sections": [
          {
            "path": "/okta_and_technology/help_desk_faq"
          }
        ]
      }
    },
  ]
}
```

Running code below will format all of the data from JSON files into one CSV file called `formatted_articles.csv`

In [None]:
from google.colab import files
import pandas as pd
import os

the_articles = []
for filename in os.listdir('./data'):
  print(filename)
  if filename == '.ipynb_checkpoints':
    continue

  theData = pd.read_json('./data/'+ filename)
  for article in theData['content_elements']:
    # print(article['headlines']['basic'])

    if len(article['content_elements']) == 1:
      if article['content_elements'][0]['type'] == "interstitial_link":
        # print( "Interstitial")
        continue

    the_title = article['headlines']['basic']
    try:
      the_url = 'https://guidepost.washpost.com/pf'+ article['taxonomy']['sections'][0]['path']+'/'+ article['slug']
      # if color['type'] == 'primary':
      #   primary_colors.append(color)
    except KeyError:
      # print("ERROR PARSING URL:")
      # print( KeyError )
      continue

    # the_url = 'https://guidepost.washpost.com/pf'+ article['taxonomy']['sections'][0]['path']+'/'+ article['slug']

    # print('-------------' + the_title, the_url, article_content)

    #--------------------------add the content
    article_content = ''
    for parts in article['content_elements']:
      if parts['type'] == 'text':
        article_content = article_content + ' ' + parts['content']
      elif parts['type'] == 'interstitial_link':
        article_content = article_content + ' ' + parts['content'] + ': '+ parts['url']
      elif parts['type'] == 'list':
        # print("list:---")
        for listitems in parts['items']:
          if listitems['type'] == 'text':
            # print(listitems['content'])
            article_content = article_content + ' ' + listitems['content']
      # else:
      #   print("TYPE UNKNOWN: ", parts['type'])

    # CHECK MAKE SURE CONTENT EXISTS
    if article_content.strip() == '':
      continue

    new_array = [ the_title, the_url, article_content]

    the_articles.append( new_array )


# the_articles.append( ["title", "url", "context"] )



# print(theData)


In [None]:
DataFrame = pd.DataFrame(the_articles)
DataFrame



In [None]:
DataFrame.to_csv('formatted_articles.csv',sep=',',index=None)



---


**Get Embeddings from formatted Data**

We now have our formatted csv file called `formatted_articles.csv` that we can process embeddings and save into a new file called `question_embeddings.csv`.

Use the following to get the data into the project:

In [None]:
import pandas as pd
theData = pd.read_csv('formatted_articles.csv', names=('title', 'url', 'context'))
print(theData.columns)
# theData['context']

Build the embedding vectors and save them to question_embeddings.csv

**WARNING THE NEXT CODE TAKES A LONG TIME!!!**

In [None]:
from openai.embeddings_utils import get_embedding

#get_embedding(episode.iloc[0]['context'], engine='text-embedding-ada-002')

theData['embedding'] = theData['context'].apply(lambda row: get_embedding(row, engine='text-embedding-ada-002'))
theData.to_csv('question_embeddings.csv')

# Other code for reference or testing

In [None]:
# @title Interface test
import gradio as gr

# from openai.embeddings_utils import cosine_similarity



  # '<a href="'+ urls[0] +'" target="_blank">'+urls[0]+'</a>'

demo = gr.Interface(
    fn=interface_ask_question,
    # inputs=["text", "checkbox", gr.Slider(0, 100)],
    inputs=["text"],
    # outputs=["text", "number"],
    outputs = ['text', 'html']
    # , 'dataframe'
)
# demo.launch(share=True)
demo.launch()

gr.Dataframe()


# gr.Interface(fn=scraper_obj.scrape, inputs='text', outputs=gr.Dataframe(headers=['title', 'author', 'text']), allow_flagging='never').launch()


# demo = gr.Interface(
#     filter_records,
#     [
#         gr.Dataframe(
#             headers=["name", "age", "gender"],
#             datatype=["str", "number", "str"],
#             row_count=5,
#             col_count=(3, "fixed"),
#         ),
#         gr.Dropdown(["M", "F", "O"]),
#     ],
#     "dataframe",
#     description="Enter gender as 'M', 'F', or 'O' for other.",


# def greet(name, is_morning, temperature):
#     salutation = "Good morning" if is_morning else "Good evening"
#     greeting = f"{salutation} {name}. It is {temperature} degrees today"
#     celsius = (temperature - 32) * 5 / 9
#     return greeting, round(celsius, 2)

# demo = gr.Interface(
#     fn=greet,
#     inputs=["text", "checkbox", gr.Slider(0, 100)],
#     outputs=["text", "number"],
# )
# demo.launch()


In [None]:
# @title ONLY SHOW TOP 5 STORIES

import gradio as gr

from openai.embeddings_utils import cosine_similarity

# def start_ask_question(question):
#   # topAnswers = getNearestEmbeddings(question)
#   question = "My supervisor gave me an unfair review. What should I do?"
#   topAnswers = getNearestEmbeddings(question)
#   return pd.DataFrame([1,2,3])
#   # , pd.DataFrame(topAnswers)

  # '<a href="'+ urls[0] +'" target="_blank">'+urls[0]+'</a>'

# outputs = [gr.Dataframe(row_count = (2, "dynamic"), col_count=(1, "fixed"), label="Predictions", headers=["Failures"])]
  # papers = gr.DataFrame(
  #     headers = ["title", "abstract"],
  #     datatype = ["str", "str"],
  #     value = paper_data,
  #     visible = True
  # )

demo = gr.Interface(
    fn=getNearestEmbeddings,
    # inputs=["text", "checkbox", gr.Slider(0, 100)],
    inputs=["text"],
    # outputs=["text", "number"],
    outputs = [
        gr.DataFrame(
            headers = ["title", "url", "similarities"],
            datatype = ["str", "str", "str"],
            value = topAnswers,
            visible = True
        )
    ]
    # , 'dataframe'
)
demo.launch(share=True)

gr.Dataframe()


# gr.Interface(fn=scraper_obj.scrape, inputs='text', outputs=gr.Dataframe(headers=['title', 'author', 'text']), allow_flagging='never').launch()


# demo = gr.Interface(
#     filter_records,
#     [
#         gr.Dataframe(
#             headers=["name", "age", "gender"],
#             datatype=["str", "number", "str"],
#             row_count=5,
#             col_count=(3, "fixed"),
#         ),
#         gr.Dropdown(["M", "F", "O"]),
#     ],
#     "dataframe",
#     description="Enter gender as 'M', 'F', or 'O' for other.",


# def greet(name, is_morning, temperature):
#     salutation = "Good morning" if is_morning else "Good evening"
#     greeting = f"{salutation} {name}. It is {temperature} degrees today"
#     celsius = (temperature - 32) * 5 / 9
#     return greeting, round(celsius, 2)

# demo = gr.Interface(
#     fn=greet,
#     inputs=["text", "checkbox", gr.Slider(0, 100)],
#     outputs=["text", "number"],
# )
# demo.launch()


In [None]:
# @title ALt interface
import gradio as gr

from openai.embeddings_utils import cosine_similarity

def start_ask_question(question):
  # question = "How many calories are in the average hamburger?"
  question_vector = get_embedding(question, engine='text-embedding-ada-002')
  theData["similarities"] = theData['embedding'].apply(lambda x: cosine_similarity(x, question_vector))
  linkAnswers = theData.sort_values("similarities", ascending=False).head(4)
  # linkAnswers

  # return question

  tempData = processTopAnswers(linkAnswers)
  context = tempData[0]
  urls = tempData[1]

  # context = []
  # urls = []
  # for i, row in linkAnswers.iterrows():
  #   context.append("link: "+ row['url'] + ' - '+ row['context'])
    # urls.append(row['url'])

  # context

  # text = "\n".join(context)
  # text

  # context = text[:17800]
  # print( urls[0])

  prompt = createPrompt(context, question)
  return askQuestion(prompt)

#   prompt = f"""Answer the following question using only the context below. Answer in the style of a friendly chatbot assistant for The Washington Post's HR website called Guidepost, offering links when available and appropriate. DO not provide links that are not on https://guidepost.washpost.com/. Only give information related to the context. If you don't know the answer for certain, say I don't know.

# Context:
# {context}

# Q: {question}
# A:"""

#   return openai.Completion.create(
#       prompt=prompt,
#       temperature=0,
#       max_tokens=500,
#       top_p=1,
#       frequency_penalty=0,
#       presence_penalty=0,
#       model=COMPLETIONS_MODEL
#   )["choices"][0]["text"].strip(" \n")

demo = gr.Interface(
    fn=start_ask_question,
    # inputs=["text", "checkbox", gr.Slider(0, 100)],
    inputs=["text"],
    # outputs=["text", "number"],
    outputs = ['html']
)
# demo.launch(share=True)

# with gr.Blocks() as demo:
#     chatbot = gr.Chatbot()
#     state = gr.State([])

#     with gr.Row():
#         txt = gr.Textbox(show_label=False, placeholder="Enter question and press enter").style(container=False)

#     txt.submit(ask_question, [txt, state], [chatbot, state])

# if __name__ == "__main__":
#     demo.launch()
