# Custom Chatbot Project

In [1]:
!pip3 install -r requirements.txt

You should consider upgrading via the '/Users/hsin-wenchang/Documents/GitHub/Project-Build-Custom-Chatbot/.env/bin/python3 -m pip install --upgrade pip' command.[0m


In [267]:
from comet_ml import Experiment
import openai
import numpy as np
from pathlib import Path
from datasets import Dataset, load_dataset
import comet_llm
import os
import pandas as pd
from comet_llm import Span, end_chain, start_chain
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
from typing import Union
from dotenv import load_dotenv

load_dotenv('my_config.env')

# API configuration
openai.api_key = os.getenv("OPENAI_API_KEY")
COMET_API_KEY = os.getenv("COMET_API_KEY")


# Initialize the experiment
experiment = Experiment(
    api_key=COMET_API_KEY,
    project_name="build-custom-chatbot",
    workspace="polarbeargo",
    log_code=True,
)


[1;38;5;39mCOMET INFO:[0m Experiment is live on comet.com https://www.comet.com/polarbeargo/build-custom-chatbot/305292f043f84e3d8f268ac1901e4ae8



In this cell, write an explanation of which dataset you have chosen and why it is appropriate for this task
`nyc_food_scrap_drop_off_sites.csv` - This file contains information on food scrap drop-off stations in New York City, such as their locations and hours of operation. The data was obtained in the beginning of 2023.

## Data Wrangling

In the cells below, load your chosen dataset into a `pandas` dataframe with a column named `"text"`. This column should contain all of your text data, separated into at least 20 rows.

In [236]:

file_path = 'data/nyc_food_scrap_drop_off_sites.csv'
with open(file_path, 'r') as file:
    next(file)
    lines = [line.strip() for line in file]

df = pd.DataFrame(lines, columns=["text"])

In [237]:
df.head()

Unnamed: 0,text
0,"0,Staten Island,Grasmere-Arrochar-South Beach-..."
1,"1,Manhattan,Inwood,SE Corner of Broadway & Aca..."
2,"2,Brooklyn,Park Slope,Old Stone House Brooklyn..."
3,"3,Manhattan,East Harlem (North),SE Corner of P..."
4,"4,Queens,Corona,Malcolm X FSDO,""111-26 Norther..."


In [238]:
df.to_csv('data/data_wrangling.csv')

In [239]:
df['text'] = df['text'].str.replace(r'^\d+,', '', regex=True)
df.head(20).to_csv("data/data_wrangling_sample.csv")
df.head(20)


Unnamed: 0,text
0,"Staten Island,Grasmere-Arrochar-South Beach-Do..."
1,"Manhattan,Inwood,SE Corner of Broadway & Acade..."
2,"Brooklyn,Park Slope,Old Stone House Brooklyn,""..."
3,"Manhattan,East Harlem (North),SE Corner of Ple..."
4,"Queens,Corona,Malcolm X FSDO,""111-26 Northern ..."
5,"Queens,Astoria (North)-Ditmars-Steinway,Astori..."
6,"Bronx,Norwood,SE Corner of Kings College Place..."
7,"Brooklyn,Bedford-Stuyvesant (East),NW Corner o..."
8,"Queens,Astoria (Central),Astoria Pug: Broadway..."
9,"Bronx,Mount Eden-Claremont (West),SE Corner of..."


In [240]:
MODEL_NAME = 'paraphrase-MiniLM-L6-v2'
df = pd.read_csv('data/data_wrangling_sample.csv')

def generate_embeddings(input_data: Union[str, list[str]]) -> np.ndarray:    
    model = SentenceTransformer(MODEL_NAME)
    embeddings = model.encode(input_data)
    return embeddings

df['embedding'] = df.text.apply(lambda x: generate_embeddings(x))
df.to_csv('embeddings.csv', index=False)

In [294]:
df = pd.read_csv("embeddings.csv")
df = df.drop("Unnamed: 0", axis=1)
df.head()

Unnamed: 0,text,embedding
0,"Staten Island,Grasmere-Arrochar-South Beach-Do...",[-3.98153216e-02 -4.14079800e-02 5.66049144e-...
1,"Manhattan,Inwood,SE Corner of Broadway & Acade...",[ 0.13020103 -0.13104136 -0.2690476 -0.235636...
2,"Brooklyn,Park Slope,Old Stone House Brooklyn,""...",[-0.00706969 0.02325399 -0.0220314 0.036282...
3,"Manhattan,East Harlem (North),SE Corner of Ple...",[ 1.58163443e-01 -9.84401628e-02 -2.88609684e-...
4,"Queens,Corona,Malcolm X FSDO,""111-26 Northern ...",[-0.22042143 -0.17215516 -0.04905648 -0.030325...


## Custom Query Completion

In the cells below, compose a custom query using your chosen dataset and retrieve results from an OpenAI `Completion` model.

In [296]:
def get_completion(messages):
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=messages,
        max_tokens=100,
    )
    return response.choices[0].message["content"]

def simple_prompt(question):
        return [
            {"role": "user", "content": question}
        ]

def custom_prompt(question, df):
        return [
            {"role": "system", "content": """You are a helpful assistant that provides information about food scrap drop-off sites. Answer the question base on context below. Context: 
                {}
            """.format('\n\n'.join(custom_query(question, df)))},
            {"role": "user", "content": question}
        ]

def custom_query(question, df):
    embeddings_array = generate_embeddings([question])
    df_copy = df.copy()
    df_copy["similarity"] = df_copy["embedding"].apply(lambda emb: cosine_similarity([emb], embeddings_array))
    df_copy.sort_values("similarity", ascending=True, inplace=True)
    return df_copy.iloc[:5]['text'].tolist()
    


## Custom Performance Demonstration

In the cells below, demonstrate the performance of your custom query using at least 2 questions. For each question, show the answer from a basic `Completion` model query as well as the answer from your custom query.

In [297]:
question_1 = "What are the food scrap drop-off sites in Brooklyn?"
df['embedding'] = df['embedding'].apply(lambda x: [float(val) for val in x.strip('[]').split()])
custom_response = get_completion(custom_prompt(question_1, df))
basic_response = get_completion(simple_prompt(question_1))
print('Answer without context: \n', basic_response)
print('Answer with context: \n', custom_response)
experiment.log_text("Custom Response", custom_response)
experiment.log_text("Basic Response", basic_response)



Answer without context: 
 As of my last update, here are some of the food scrap drop-off sites in Brooklyn:

1. Prospect Park Greenmarket: Located at Breeze Hill in Prospect Park, this market accepts food scraps for composting every Saturday from 8am-3pm.

2. Grand Army Plaza Greenmarket: Located at the northwest entrance of Prospect Park, this market also accepts food scraps for composting every Saturday from 8am-4pm.

3. Bay Ridge Greenmarket: Located at 940
Answer with context: 
 I'm sorry, but based on the information provided, there are no specific food scrap drop-off sites listed for Brooklyn in the given context. The locations mentioned are in Queens, Staten Island, and the Bronx. If you are looking for food scrap drop-off sites in Brooklyn, I recommend checking the NYC Department of Sanitation website or contacting local community gardens or farmers markets in Brooklyn for more information on drop-off locations in that borough.


{'web': 'https://www.comet.com/api/asset/download?assetId=7cc142f455054935a5e9543201f562ec&experimentKey=305292f043f84e3d8f268ac1901e4ae8',
 'api': 'https://www.comet.com/api/rest/v2/experiment/asset/get-asset?assetId=7cc142f455054935a5e9543201f562ec&experimentKey=305292f043f84e3d8f268ac1901e4ae8',
 'assetId': '7cc142f455054935a5e9543201f562ec'}

In [298]:
question_2 = "What are the food scrap drop-off sites in Manhattan?"
custom_response = get_completion(custom_prompt(question_2, df))
basic_response = get_completion(simple_prompt(question_2))
print('Answer without context: \n', basic_response)
print('Answer with context: \n', custom_response)
experiment.log_text("Custom Response", custom_response)
experiment.log_text("Basic Response", basic_response)



Answer without context: 
 As of my knowledge cut-off date in September 2021, some of the food scrap drop-off sites in Manhattan include:

1. Union Square Greenmarket (Union Square Park North Plaza)
2. Tompkins Square Greenmarket (Avenue A and East 7th Street)
3. Inwood Greenmarket (Isham Street between Seaman Avenue and Cooper Street)
4. Tribeca Greenmarket (Greenwich Street at Chambers Street)
5. Dag Hammarskjold Plaza Greenmarket
Answer with context: 
 There is a food scrap drop-off site in Manhattan located at the SE Corner of Eastburn Avenue & East 174th Street. This site is managed by the Department of Sanitation and is available year-round, 24/7. For more information, you can visit www.nyc.gov/smartcomposting. This location accepts all food scraps, including meat and dairy products.


{'web': 'https://www.comet.com/api/asset/download?assetId=50fb4a99b06c457c83b94fa6e03c713c&experimentKey=305292f043f84e3d8f268ac1901e4ae8',
 'api': 'https://www.comet.com/api/rest/v2/experiment/asset/get-asset?assetId=50fb4a99b06c457c83b94fa6e03c713c&experimentKey=305292f043f84e3d8f268ac1901e4ae8',
 'assetId': '50fb4a99b06c457c83b94fa6e03c713c'}

The method below retrieves the full response (by chain-of-thought) before extracting the final response based on the user's question. Using Comet's prompt chains logging features, it logs the final response and the CoT findings for each question. 

In [300]:
prompt = """
Property 1 : Food Scrap Drop-off Site at South Beach

Neighborhood : Grasmere-Arrochar-South Beach-Dongan Hills
Location : 21 Robin Road, Staten Island NY
Hosted By : Snug Harbor Youth
Open Months : Year Round
Operation Hours : Friday (Start Time: 1:30 PM - End Time: 4:30 PM)
Website : snug-harbor.org(opens in a new tab)
Coordinates : Latitude 40.595579, Longitude -74.062991
Notes : This site accepts all food scraps. Please compost responsibly.
Property 2 : Food Scrap Drop-off Site at Inwood

Neighborhood : Inwood
Location : SE Corner of Broadway & Academy Street
Hosted By : Department of Sanitation
Open Months : Year Round
Operation Hours : 24/7
Website : www.nyc.gov/smartcomposting(opens in a new tab)
Coordinates : Not specified
Notes : Download the app to access bins. Accepts all food scraps, including meat and dairy. Do not leave food scraps outside of bin!
Property 3 : Food Scrap Drop-off Site at Old Stone House Brooklyn

Neighborhood : Park Slope
Location : 336 3rd St, Brooklyn, NY 11215
Hosted By : Old Stone House Brooklyn
Open Months : Year Round
Operation Hours : 24/7
Website : Not specified
Coordinates : Latitude 40.6727118, Longitude -73.984731
Notes : This site accepts all food scraps. Please compost responsibly.
Property 4 : Food Scrap Drop-off Site at East Harlem

Neighborhood : East Harlem (North)
Location : SE Corner of Pleasant Avenue & E 116 Street
Hosted By : Department of Sanitation
Open Months : Year Round
Operation Hours : 24/7
Website : www.nyc.gov/smartcomposting(opens in a new tab)
Coordinates : Not specified
Notes : Download the app to access bins. Accepts all food scraps, including meat and dairy. Do not leave food scraps outside of bin!
Property 5 : Food Scrap Drop-off Site at Malcolm X FSDO

Neighborhood : Corona
Location : 111-26 Northern Blvd, Flushing, NY 11368
Hosted By : NYC Compost Project Hosted by Big Reuse
Open Months : Year Round
Operation Hours : Tuesdays (Start Time: 12:00 PM - End Time: 2:00 PM)
Website : Not specified
Coordinates : Latitude 40.7496855, Longitude -73.8630721
Notes : This site accepts all food scraps. Please compost responsibly.
Property 6 : Food Scrap Drop-off Site at Astoria Pug

Neighborhood : Astoria (North)-Ditmars-Steinway
Location : Ditmars Boulevard and 41st Street
Hosted By : Astoria Pug
Open Months : Year Round
Operation Hours : Mondays (Start Time: 8:00 AM - End Time: 2:00 PM)
Website : Instagram(opens in a new tab)
Coordinates : Latitude 40.7724122, Longitude -73.9053388
Notes : Not accepted: meat, bones, or dairy. Please compost responsibly.
Property 7 : Food Scrap Drop-off Site at Norwood

Neighborhood : Norwood
Location : SE Corner of Kings College Place & Gun Hill Rd.
Hosted By : Department of Sanitation
Open Months : Year Round
Operation Hours : 24/7
Website : www.nyc.gov/smartcomposting(opens in a new tab)
Coordinates : Not specified
Notes : Download the app to access bins. Accepts all food scraps, including meat and dairy. Do not leave food scraps outside of bin!
Property 8 : Food Scrap Drop-off Site at Bedford-Stuyvesant (East)

Neighborhood : Bedford-Stuyvesant (East)
Location : NW Corner of Malcolm X Boulevard & Bainbridge Street
Hosted By : Department of Sanitation
Open Months : Year Round
Operation Hours : 24/7
Website : www.nyc.gov/smartcomposting(opens in a new tab)
Coordinates : Not specified
Notes : Download the app to access bins. Accepts all food scraps, including meat and dairy. Do not leave food scraps outside of bin!
"""

In [301]:
questions = [
    "What is the food scrap drop-off site in Brooklyn?",
    "Where can I drop off food scraps in Manhattan?",
    "Are there any food scrap drop-off sites in Queens?",
    "What are the hours of operation for food scrap drop-off sites in the Bronx?",
    "Can I drop off food scraps in Staten Island?",
    "Is there a food scrap drop-off site near me?",
]

In [303]:

chatbot_responses = []

def get_only_response(response):
  messages = [
    {
      "role": "system",
      "content": "Your task is to extract only the response to the user in the following full chatbot response: {response}".format(response=response)
    }
  ]

  return get_completion(messages)

for question in questions:
  messages=[
    {
      "role": "system",
      "content": "Your task is to answer questions factually about a nyc food scrap drop off sites, provided below and delimited by +++++. The user request is provided here: {request}\n\nStep 1: The first step is to check if the user is asking a question related to any type of food scrap drop off sites (even if that food scrap drop off sites is not on the list). If the question is about any type of food scrap drop off sites, we move on to Step 2 and ignore the rest of Step 1. If the question is not about food scrap drop off sites, then you send a response: \"Sorry! I cannot help with that. Please let me know if you have a question about our food scrap drop off sites.\"\n\nStep 2: In this step, you check that the user question is relevant to any of the items on the food scrap drop off sites. You should check that the food scrap drop off site exists in the food scrap drop off sites. If it doesn't exist then send a kind response to the user that the item doesn't exist in the exsisting food scrap drop off sites and then include a list of available but similar food scrap drop off sites without any other details (e.g., location). The food scrap drop off sites available are provided below and delimited by +++++: {Location}+++++\n\nStep 3: If the item exists in the food scrap drop off sites and the user is requesting specific information, provide that relevant information to the user using the food scrap drop off sites. Make sure to use a friendly tone and keep the response concise.\n\nPerform the following reasoning steps to send a response to the user:\nStep 1: <Step 1 reasoning>\nStep 2: <Step 2 reasoning>\nResponse to the user (only output the final response): <response to user>".format(request=question, food_scrap_drop_off_sites=prompt, Location=prompt)
    }
  ]

  response = get_completion(messages)
  chatbot_responses.append(response)
 
  start_chain(
    inputs={"question": question},
    api_key=COMET_API_KEY,
  )

  with Span(
    category="reasoning",
    name="chain-of-thought",
    inputs={"user_question": question},
    ) as span:
      span.set_outputs(outputs={"full_response": response})

  with Span(
    category="response-extraction",
    inputs={
        "user_question": question,
        "full_response": response,
    },
  ) as span:
    final_response = get_only_response(response)
    span.set_outputs(outputs={"final_response": final_response})

  end_chain(outputs={"final_response": final_response})
  print(final_response)

Please let me know if you have a question about our food scrap drop off sites.
Unfortunately, there are no food scrap drop-off sites in Manhattan in the provided list. However, here are some available food scrap drop-off sites that might be of interest:
1. Food Scrap Drop-off Site at Inwood
2. Food Scrap Drop-off Site at East Harlem
Yes, there is a food scrap drop-off site in Queens at Inwood, located at the SE Corner of Broadway & Academy Street. It is hosted by the Department of Sanitation and is open 24/7.
The Bronx is not currently listed as a food scrap drop-off site.
Please let me know if you have a question about our food scrap drop off sites.
Yes, there are several food scrap drop-off sites available. Here is a list of available sites:
1. Food Scrap Drop-off Site


In [304]:
experiment.end()

[1;38;5;39mCOMET INFO:[0m ---------------------------------------------------------------------------------------
[1;38;5;39mCOMET INFO:[0m Comet.ml Experiment Summary
[1;38;5;39mCOMET INFO:[0m ---------------------------------------------------------------------------------------
[1;38;5;39mCOMET INFO:[0m   Data:
[1;38;5;39mCOMET INFO:[0m     display_summary_level : 1
[1;38;5;39mCOMET INFO:[0m     name                  : live_resin_9268
[1;38;5;39mCOMET INFO:[0m     url                   : https://www.comet.com/polarbeargo/build-custom-chatbot/305292f043f84e3d8f268ac1901e4ae8
[1;38;5;39mCOMET INFO:[0m   Uploads:
[1;38;5;39mCOMET INFO:[0m     environment details      : 1
[1;38;5;39mCOMET INFO:[0m     filename                 : 1
[1;38;5;39mCOMET INFO:[0m     git metadata             : 1
[1;38;5;39mCOMET INFO:[0m     git-patch (uncompressed) : 1 (38.95 KB)
[1;38;5;39mCOMET INFO:[0m     installed packages       : 1
[1;38;5;39mCOMET INFO:[0m     notebook       