# Custom Chatbot Project

In this project, I'm building a custom chatbot that is able to answer questions related to New York City food scrap drop off sites. 

To retrieve the relevant information, I used the data in "nyc_food_scrap_drop_off_sites.csv" file. The file "contains locations, hours, and other information about food scrap drop-off sites in New York City." This dataset is appropriate for my task becasue it contains relevant information.

## Data Wrangling

In [53]:
import pandas as pd
import openai
import numpy as np
import tiktoken
openai.api_base = "https://openai.vocareum.com/v1"
openai.api_key = "YOUR API KEY"

In [33]:
df = pd.read_csv("./data/nyc_food_scrap_drop_off_sites.csv")

In [34]:
# add the "text" colum to the dataframe
df["text"] = None

for i, row in df.iterrows():
    df.at[i, "text"] = f"borough: {row['borough']}\n"\
    f"Neighborhood Tabulation Area Name: {row['ntaname']}\n"\
    f"food scrap drop off site: {row['food_scrap_drop_off_site']}\n"\
    f"hosted by: {row['hosted_by']}\n"\
    f"open months: {row['open_months']}\n"\
    f"operation day hours: {row['operation_day_hours']}\n"\
    f"website: {row['website']}\n"\
    f"notes: {row['notes']}\n"

In [35]:
print(df.at[1, "text"])

borough: Manhattan
Neighborhood Tabulation Area Name: Inwood
food scrap drop off site: SE Corner of Broadway & Academy Street
hosted by: Department of Sanitation
open months: Year Round
operation day hours: 24/7
website: www.nyc.gov/smartcomposting
notes: Download the app to access bins. Accepts all food scraps, including meat and dairy. Do not leave food scraps outside of bin!



## Custom Query Completion

In [36]:
drop_off_site_question = "In Manhattan borough and East Village Neighborhood Tabulation Area Name, where's the food scrap drop off site?"
host_question = "Who is hosting the food scrap drop off site in Manhattan borough and East Village Neighborhood Tabulation Area Name?"

### Generating Embeddings


In [37]:
EMBEDDING_MODEL_NAME = "text-embedding-ada-002"
# Uncomment the following if you need to use the API to get the embeddings
# batch_size = 100
# embeddings = []
# for i in range(0, len(df), batch_size):
#     # Send text data to OpenAI model to get embeddings
#     response = openai.Embedding.create(
#         input=df.iloc[i:i+batch_size]["text"].tolist(),
#         engine=EMBEDDING_MODEL_NAME
#     )
    
#     # Add embeddings to list
#     embeddings.extend([data["embedding"] for data in response["data"]])

# # Add embeddings list to dataframe
# df["embeddings"] = embeddings
# df

Unnamed: 0.1,Unnamed: 0,borough,ntaname,food_scrap_drop_off_site,location,hosted_by,open_months,operation_day_hours,website,borocd,...,:@computed_region_92fq_4b7q,:@computed_region_sbqj_enih,:@computed_region_efsh_h5xi,:@computed_region_f5dn_yrer,notes,ct2010,bbl,bin,text,embeddings
0,0,Staten Island,Grasmere-Arrochar-South Beach-Dongan Hills,South Beach,"21 Robin Road, Staten Island NY",Snug Harbor Youth,Year Round,Friday (Start Time: 1:30 PM - End Time: 4:30 PM),snug-harbor.org,502,...,14.0,76.0,10692.0,30.0,,,,,borough: Staten Island\nNeighborhood Tabulatio...,"[0.005969604942947626, -0.0260244719684124, 0...."
1,1,Manhattan,Inwood,SE Corner of Broadway & Academy Street,,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,112,...,,,,,Download the app to access bins. Accepts all f...,,,,borough: Manhattan\nNeighborhood Tabulation Ar...,"[-0.0009904223261401057, -0.014946683309972286..."
2,2,Brooklyn,Park Slope,Old Stone House Brooklyn,"336 3rd St, Brooklyn, NY 11215",Old Stone House Brooklyn,Year Round,24/7 (Start Time: 24/7 - End Time: 24/7),,306,...,27.0,50.0,17617.0,14.0,,,,,borough: Brooklyn\nNeighborhood Tabulation Are...,"[0.005709781311452389, -0.040901798754930496, ..."
3,3,Manhattan,East Harlem (North),SE Corner of Pleasant Avenue & E 116 Street,,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,111,...,,,,,Download the app to access bins. Accepts all f...,,,,borough: Manhattan\nNeighborhood Tabulation Ar...,"[0.00042385648703202605, -0.021642452105879784..."
4,4,Queens,Corona,Malcolm X FSDO,"111-26 Northern Blvd, Flushing, NY 11368",NYC Compost Project Hosted by Big Reuse,Year Round,Tuesdays (Start Time: 12:00 PM - End Time: 2:...,,404,...,21.0,68.0,14510.0,66.0,,,,,borough: Queens\nNeighborhood Tabulation Area ...,"[0.005171763710677624, -0.031190568581223488, ..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
571,571,Brooklyn,Kensington,Albemarle Road and McDonald Avenue,southwest corner of McDonald Avenue and Albema...,NYC Compost Project Hosted by LES Ecology Center,Year Round,Tuesdays (Start Time: 10:00 AM - End Time: 2:...,https://www.lesecologycenter.org/programs/comp...,312,...,27.0,39.0,17620.0,2.0,"Not accepted: meat, bones, or dairy",,,,borough: Brooklyn\nNeighborhood Tabulation Are...,"[0.007184811867773533, -0.026496510952711105, ..."
572,572,Queens,Old Astoria-Hallets Point,NW Corner of 21st Street & 30th Drive,,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,401,...,,,,,Download the app to access bins. Accepts all f...,,,,borough: Queens\nNeighborhood Tabulation Area ...,"[0.01071658544242382, -0.02099519968032837, -0..."
573,573,Brooklyn,Crown Heights (North),Rochester Avenue & St. Johns Place,,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,308,...,,,,,Download the app to access bins. Accepts all f...,,,,borough: Brooklyn\nNeighborhood Tabulation Are...,"[0.008518521673977375, -0.025968298316001892, ..."
574,574,Brooklyn,Windsor Terrace-South Slope,*CLOSED FOR THE SEASON* East 4th Street Commun...,"173 E 4th St, Brooklyn, NY 11218",Members at East 4th Street Community Garden,April - October,Wednesdays and Saturdays (Start Time: Wednesda...,https://eastfourthstreetgarden.tumblr.com/,307,...,27.0,45.0,17620.0,9.0,"Not accepted: meat, bones, or dairy",,,,borough: Brooklyn\nNeighborhood Tabulation Are...,"[0.0012587112141773105, -0.03266900032758713, ..."


In [38]:
# df.to_csv("embeddings.csv")

In [36]:
# Use this cell to load the data without calling the API
df = pd.read_csv("embeddings.csv", index_col=0)
df["embeddings"] = df["embeddings"].apply(eval).apply(np.array)

### Retrieve relevant information

In [39]:
# this function comes from the lesson's material
from openai.embeddings_utils import get_embedding, distances_from_embeddings

def get_rows_sorted_by_relevance(question, df):
    """
    Function that takes in a question string and a dataframe containing
    rows of text and associated embeddings, and returns that dataframe
    sorted from least to most relevant for that question
    """
    
    # Get embeddings for the question text
    question_embeddings = get_embedding(question, engine=EMBEDDING_MODEL_NAME)
    
    # Make a copy of the dataframe and add a "distances" column containing
    # the cosine distances between each row's embeddings and the
    # embeddings of the question
    df_copy = df.copy()
    df_copy["distances"] = distances_from_embeddings(
        question_embeddings,
        df_copy["embeddings"].values,
        distance_metric="cosine"
    )
    
    # Sort the copied dataframe by the distances and return it
    # (shorter distance = more relevant so we sort in ascending order)
    df_copy.sort_values("distances", ascending=True, inplace=True)
    return df_copy

In [40]:
get_rows_sorted_by_relevance(drop_off_site_question, df)

Unnamed: 0.1,Unnamed: 0,borough,ntaname,food_scrap_drop_off_site,location,hosted_by,open_months,operation_day_hours,website,borocd,...,:@computed_region_sbqj_enih,:@computed_region_efsh_h5xi,:@computed_region_f5dn_yrer,notes,ct2010,bbl,bin,text,embeddings,distances
212,212,Manhattan,East Village,1st Ave and 1st St,First Avenue between Houston and First Street ...,NYC Compost Project Hosted by LES Ecology Center,Year Round,Mondays (Start Time: 9:00 AM - End Time: 2:00...,https://www.lesecologycenter.org/programs/comp...,103,...,5.0,11724.0,70.0,"Not accepted: meat, bones, or dairy",,,,borough: Manhattan\nNeighborhood Tabulation Ar...,"[0.002873592544347048, -0.034644246101379395, ...",0.085361
227,227,Manhattan,East Harlem (South),NW Corner of East 106th Street & Park Avenue,NW East 106th Street & Park Avenue,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,111,...,14.0,12426.0,7.0,Download the app to access bins. Accepts all f...,,,,borough: Manhattan\nNeighborhood Tabulation Ar...,"[7.570060552097857e-05, -0.03050478734076023, ...",0.087386
164,164,Manhattan,East Harlem (South),SE Corner of East 112th Street & Park Avenue,SE East 112th Street & Park Avenue,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,111,...,14.0,12426.0,7.0,Download the app to access bins. Accepts all f...,,,,borough: Manhattan\nNeighborhood Tabulation Ar...,"[0.0006138096214272082, -0.027530871331691742,...",0.087589
61,61,Manhattan,East Harlem (South),NW Corner of E 106 Street & Park Avenue,,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,111,...,,,,Download the app to access bins. Accepts all f...,,,,borough: Manhattan\nNeighborhood Tabulation Ar...,"[0.0019608139991760254, -0.03150755539536476, ...",0.087756
83,83,Manhattan,East Harlem (North),NE Corner of East 120 Street & Madison Avenue,NE East 120 Street & Madison Avenue,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,111,...,16.0,13093.0,7.0,Download the app to access bins. Accepts all f...,,,,borough: Manhattan\nNeighborhood Tabulation Ar...,"[0.0020921695977449417, -0.026472488418221474,...",0.087777
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
285,285,Brooklyn,East Flatbush-Rugby,*CLOSED FEBRUARY* Wyckoff Farmhouse Museum,"5816 Clarendon Rd, Brooklyn, NY 11203",Staff at Wychoff Farmhouse Museum,Year Round (except February),Every day (Start Time: Dawn - End Time: Dusk),wyckoffmuseum.org,317,...,40.0,13827.0,61.0,"Not accepted: meat, bones, or dairy",,,,borough: Brooklyn\nNeighborhood Tabulation Are...,"[-0.021586209535598755, -0.04849237576127052, ...",0.152359
200,200,Staten Island,West New Brighton-Silver Lake-Grymes Hill,Grymes Hill Wagner College,"1 Campus Rd, Staten Island, NY 10301",Wagner College,September-June,24/7 (Start Time: 24/7 - End Time: 24/7),wagner.edu,501,...,74.0,10691.0,4.0,"Not accepted: meat, bones, or dairy",,,,borough: Staten Island\nNeighborhood Tabulatio...,"[0.022687679156661034, -0.02465062588453293, 0...",0.153001
29,29,Staten Island,Port Richmond,*CLOSED FOR THE SEASON* West Brighton,Chappell Street and Henderson Avenue,Snug Harbor Youth,Year Round,Fridays (Start Time: 4:00 PM - End Time: 5:30...,snug-harbor.org,501,...,74.0,10697.0,4.0,"Not accepted: meat, bones, or dairy",13301.0,,,borough: Staten Island\nNeighborhood Tabulatio...,"[0.008678135462105274, -0.03553430736064911, 0...",0.153249
377,377,Staten Island,Annadale-Huguenot-Prince's Bay-Woodrow,Pleasant Plains,Bloomingdale Rd and Drumgoole Road East,Snug Harbor Youth,Year Round,Mondays (Start Time: 12:00 PM - End Time: 2:0...,https://snug-harbor.org/,503,...,77.0,10696.0,15.0,"Not accepted: meat, bones, or dairy",20803.0,,,borough: Staten Island\nNeighborhood Tabulatio...,"[0.005153149366378784, -0.02921896055340767, 0...",0.153859


In [41]:
get_rows_sorted_by_relevance(host_question, df)

Unnamed: 0.1,Unnamed: 0,borough,ntaname,food_scrap_drop_off_site,location,hosted_by,open_months,operation_day_hours,website,borocd,...,:@computed_region_sbqj_enih,:@computed_region_efsh_h5xi,:@computed_region_f5dn_yrer,notes,ct2010,bbl,bin,text,embeddings,distances
212,212,Manhattan,East Village,1st Ave and 1st St,First Avenue between Houston and First Street ...,NYC Compost Project Hosted by LES Ecology Center,Year Round,Mondays (Start Time: 9:00 AM - End Time: 2:00...,https://www.lesecologycenter.org/programs/comp...,103,...,5.0,11724.0,70.0,"Not accepted: meat, bones, or dairy",,,,borough: Manhattan\nNeighborhood Tabulation Ar...,"[0.002873592544347048, -0.034644246101379395, ...",0.089472
385,385,Manhattan,East Village,Tompkins Square Greenmarket,"E 7th St & Avenue A, New York, NY 10009",NYC Compost Project Hosted by LES Ecology Center,Year Round,Sundays (Start Time: 8:00 AM - End Time: 5:00...,https://www.lesecologycenter.org/programs/comp...,103,...,5.0,11729.0,70.0,"Not accepted: meat, bones, or dairy",,,,borough: Manhattan\nNeighborhood Tabulation Ar...,"[0.007218232378363609, -0.02844279259443283, -...",0.091638
227,227,Manhattan,East Harlem (South),NW Corner of East 106th Street & Park Avenue,NW East 106th Street & Park Avenue,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,111,...,14.0,12426.0,7.0,Download the app to access bins. Accepts all f...,,,,borough: Manhattan\nNeighborhood Tabulation Ar...,"[7.570060552097857e-05, -0.03050478734076023, ...",0.092051
61,61,Manhattan,East Harlem (South),NW Corner of E 106 Street & Park Avenue,,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,111,...,,,,Download the app to access bins. Accepts all f...,,,,borough: Manhattan\nNeighborhood Tabulation Ar...,"[0.0019608139991760254, -0.03150755539536476, ...",0.092296
164,164,Manhattan,East Harlem (South),SE Corner of East 112th Street & Park Avenue,SE East 112th Street & Park Avenue,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,111,...,14.0,12426.0,7.0,Download the app to access bins. Accepts all f...,,,,borough: Manhattan\nNeighborhood Tabulation Ar...,"[0.0006138096214272082, -0.027530871331691742,...",0.093557
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
0,0,Staten Island,Grasmere-Arrochar-South Beach-Dongan Hills,South Beach,"21 Robin Road, Staten Island NY",Snug Harbor Youth,Year Round,Friday (Start Time: 1:30 PM - End Time: 4:30 PM),snug-harbor.org,502,...,76.0,10692.0,30.0,,,,,borough: Staten Island\nNeighborhood Tabulatio...,"[0.005969604942947626, -0.0260244719684124, 0....",0.152279
29,29,Staten Island,Port Richmond,*CLOSED FOR THE SEASON* West Brighton,Chappell Street and Henderson Avenue,Snug Harbor Youth,Year Round,Fridays (Start Time: 4:00 PM - End Time: 5:30...,snug-harbor.org,501,...,74.0,10697.0,4.0,"Not accepted: meat, bones, or dairy",13301.0,,,borough: Staten Island\nNeighborhood Tabulatio...,"[0.008678135462105274, -0.03553430736064911, 0...",0.153174
445,445,Bronx,University Heights (South)-Morris Heights,*CLOSED FOR THE SEASON* Francis Martin Library,"2150 University Avenue Bronx, NY 10453",,Year Round,Monday (Start Time: 12:00 PM - End Time: 1:00...,https://www.nypl.org/locations/francis-martin,205,...,29.0,10931.0,6.0,"Not accepted: meat, bones, or dairy",,,,borough: Bronx\nNeighborhood Tabulation Area N...,"[-0.004862444009631872, -0.034543685615062714,...",0.153215
200,200,Staten Island,West New Brighton-Silver Lake-Grymes Hill,Grymes Hill Wagner College,"1 Campus Rd, Staten Island, NY 10301",Wagner College,September-June,24/7 (Start Time: 24/7 - End Time: 24/7),wagner.edu,501,...,74.0,10691.0,4.0,"Not accepted: meat, bones, or dairy",,,,borough: Staten Island\nNeighborhood Tabulatio...,"[0.022687679156661034, -0.02465062588453293, 0...",0.154811


### Create a custom prompt

In [51]:
# this is from the course material
def create_prompt(question, df, max_token_count):
    """
    Given a question and a dataframe containing rows of text and their
    embeddings, return a text prompt to send to a Completion model
    """
    # Create a tokenizer that is designed to align with our embeddings
    tokenizer = tiktoken.get_encoding("cl100k_base")
    
    # Count the number of tokens in the prompt template and question
    prompt_template = """
Answer the question based on the context below, and if the question
can't be answered based on the context, say "I don't know"

Context: 

{}

---

Question: {}
Answer:"""
    
    current_token_count = len(tokenizer.encode(prompt_template)) + \
                            len(tokenizer.encode(question))
    
    context = []
    for text in get_rows_sorted_by_relevance(question, df)["text"].values:
        
        # Increase the counter based on the number of tokens in this row
        text_token_count = len(tokenizer.encode(text))
        current_token_count += text_token_count
        
        # Add the row of text to the list if we haven't exceeded the max
        if current_token_count <= max_token_count:
            context.append(text)
        else:
            break

    return prompt_template.format("\n\n###\n\n".join(context), question)
    

In [43]:
print(create_prompt(drop_off_site_question, df, 1800))


Answer the question based on the context below, and if the question
can't be answered based on the context, say "I don't know"

Context: 

borough: Manhattan
Neighborhood Tabulation Area Name: East Village
food scrap drop off site: 1st Ave and 1st St
hosted by: NYC Compost Project Hosted by LES Ecology Center
open months: Year Round
operation day hours: Mondays (Start Time: 9:00 AM - End Time:  2:00 PM)
website: https://www.lesecologycenter.org/programs/compost/compost-drop-off-locations/
notes: Not accepted: meat, bones, or dairy


###

borough: Manhattan
Neighborhood Tabulation Area Name: East Harlem (South)
food scrap drop off site: NW Corner of East 106th Street & Park Avenue
hosted by: Department of Sanitation
open months: Year Round
operation day hours: 24/7
website: www.nyc.gov/smartcomposting
notes: Download the app to access bins. Accepts all food scraps, including meat and dairy. Do not leave food scraps outside of bin!


###

borough: Manhattan
Neighborhood Tabulation Area 

In [44]:
print(create_prompt(host_question, df, 1800))


Answer the question based on the context below, and if the question
can't be answered based on the context, say "I don't know"

Context: 

borough: Manhattan
Neighborhood Tabulation Area Name: East Village
food scrap drop off site: 1st Ave and 1st St
hosted by: NYC Compost Project Hosted by LES Ecology Center
open months: Year Round
operation day hours: Mondays (Start Time: 9:00 AM - End Time:  2:00 PM)
website: https://www.lesecologycenter.org/programs/compost/compost-drop-off-locations/
notes: Not accepted: meat, bones, or dairy


###

borough: Manhattan
Neighborhood Tabulation Area Name: East Village
food scrap drop off site: Tompkins Square Greenmarket
hosted by: NYC Compost Project Hosted by LES Ecology Center
open months: Year Round
operation day hours: Sundays (Start Time: 8:00 AM - End Time:  5:00 PM)
website: https://www.lesecologycenter.org/programs/compost/compost-drop-off-locations/
notes: Not accepted: meat, bones, or dairy


###

borough: Manhattan
Neighborhood Tabulatio

## Custom Performance Demonstration

In the cells below, demonstrate the performance of your custom query using at least 2 questions. For each question, show the answer from a basic `Completion` model query as well as the answer from your custom query.

In [45]:
COMPLETION_MODEL_NAME = "gpt-3.5-turbo-instruct"

def answer_question(
    question, df, max_prompt_tokens=1800, max_answer_tokens=150
):
    """
    Given a question, a dataframe containing rows of text, and a maximum
    number of desired tokens in the prompt and response, return the
    answer to the question according to an OpenAI Completion model
    
    If the model produces an error, return an empty string
    """
    
    prompt = create_prompt(question, df, max_prompt_tokens)
    
    try:
        response = openai.Completion.create(
            model=COMPLETION_MODEL_NAME,
            prompt=prompt,
            max_tokens=max_answer_tokens
        )
        return response["choices"][0]["text"].strip()
    except Exception as e:
        print(e)
        return ""

### Question 1

In [46]:
drop_off_site_prompt = f"""
Question: {drop_off_site_question}
Answer:
"""
initial_drop_off_site_answer = openai.Completion.create(
    model="gpt-3.5-turbo-instruct",
    prompt=drop_off_site_prompt,
    max_tokens=150
)["choices"][0]["text"].strip()

In [47]:
custom_drop_off_site_answer = answer_question(drop_off_site_question, df)

### Question 2

In [48]:
host_prompt = f"""
Question: {host_question}
Answer:
"""
initial_host_prompt_answer = openai.Completion.create(
    model="gpt-3.5-turbo-instruct",
    prompt=host_prompt,
    max_tokens=150
)["choices"][0]["text"].strip()

In [49]:
custom_host_prompt_answer = answer_question(host_question, df)

In [50]:
print(f"""
{drop_off_site_question}

Original Answer: {initial_drop_off_site_answer}
Custom Answer:   {custom_drop_off_site_answer}

#######

{host_question}

Original Answer: {initial_host_prompt_answer}
Custom Answer:   {custom_host_prompt_answer}
""")


In Manhattan borough and East Village Neighborhood Tabulation Area Name, where's the food scrap drop off site?

Original Answer: According to the NYC Zero Waste website, the nearest food scrap drop off site in the East Village Neighborhood Tabulation Area Name is located at the Union Square Greenmarket on Union Square West between 14th and 15th Streets.
Custom Answer:   Tompkins Square Greenmarket.

#######

Who is hosting the food scrap drop off site in Manhattan borough and East Village Neighborhood Tabulation Area Name?

Original Answer: The NYC Compost Project Host Partner on Governor's Island is hosting the food scrap drop off site in Manhattan borough and East Village Neighborhood Tabulation Area Name.
Custom Answer:   NYC Compost Project Hosted by LES Ecology Center



**Question 1**: The original answer seems to be incorrect, but the custom answer exists in the dataset. That said, the custom answer does not list all sites.
**Question 2**: The original answer does not actually contain the host name, but the custom answer is correct and concise.