# Custom Chatbot Project


For this chatbot project, I chose the nyc_food_scrap_drop_off_sites dataset, which lists food scrap drop-off sites in New York City. This dataset is a good fit because it includes important details like the location, opening hours, and the organization managing each site.

With this information, the chatbot can help users find the nearest drop-off spots, check when they’re open, and see who runs them. Since food scrap drop-off supports waste reduction, this chatbot could also help users learn more about sustainable practices. This dataset allows the chatbot to give clear, useful answers about food scrap disposal in NYC.

## Data Wrangling



In [2]:
import pandas as pd
df= pd.read_csv("./data/nyc_food_scrap_drop_off_sites.csv")
df

Unnamed: 0.1,Unnamed: 0,borough,ntaname,food_scrap_drop_off_site,location,hosted_by,open_months,operation_day_hours,website,borocd,...,location_point,:@computed_region_yeji_bk3q,:@computed_region_92fq_4b7q,:@computed_region_sbqj_enih,:@computed_region_efsh_h5xi,:@computed_region_f5dn_yrer,notes,ct2010,bbl,bin
0,0,Staten Island,Grasmere-Arrochar-South Beach-Dongan Hills,South Beach,"21 Robin Road, Staten Island NY",Snug Harbor Youth,Year Round,Friday (Start Time: 1:30 PM - End Time: 4:30 PM),snug-harbor.org,502,...,"{'type': 'Point', 'coordinates': [-74.062991, ...",1.0,14.0,76.0,10692.0,30.0,,,,
1,1,Manhattan,Inwood,SE Corner of Broadway & Academy Street,,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,112,...,,,,,,,Download the app to access bins. Accepts all f...,,,
2,2,Brooklyn,Park Slope,Old Stone House Brooklyn,"336 3rd St, Brooklyn, NY 11215",Old Stone House Brooklyn,Year Round,24/7 (Start Time: 24/7 - End Time: 24/7),,306,...,"{'type': 'Point', 'coordinates': [-73.984731, ...",2.0,27.0,50.0,17617.0,14.0,,,,
3,3,Manhattan,East Harlem (North),SE Corner of Pleasant Avenue & E 116 Street,,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,111,...,,,,,,,Download the app to access bins. Accepts all f...,,,
4,4,Queens,Corona,Malcolm X FSDO,"111-26 Northern Blvd, Flushing, NY 11368",NYC Compost Project Hosted by Big Reuse,Year Round,Tuesdays (Start Time: 12:00 PM - End Time: 2:...,,404,...,"{'type': 'Point', 'coordinates': [-73.8630721,...",3.0,21.0,68.0,14510.0,66.0,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
571,571,Brooklyn,Kensington,Albemarle Road and McDonald Avenue,southwest corner of McDonald Avenue and Albema...,NYC Compost Project Hosted by LES Ecology Center,Year Round,Tuesdays (Start Time: 10:00 AM - End Time: 2:...,https://www.lesecologycenter.org/programs/comp...,312,...,"{'type': 'Point', 'coordinates': [-73.97997, 4...",2.0,27.0,39.0,17620.0,2.0,"Not accepted: meat, bones, or dairy",,,
572,572,Queens,Old Astoria-Hallets Point,NW Corner of 21st Street & 30th Drive,,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,401,...,,,,,,,Download the app to access bins. Accepts all f...,,,
573,573,Brooklyn,Crown Heights (North),Rochester Avenue & St. Johns Place,,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,308,...,,,,,,,Download the app to access bins. Accepts all f...,,,
574,574,Brooklyn,Windsor Terrace-South Slope,*CLOSED FOR THE SEASON* East 4th Street Commun...,"173 E 4th St, Brooklyn, NY 11218",Members at East 4th Street Community Garden,April - October,Wednesdays and Saturdays (Start Time: Wednesda...,https://eastfourthstreetgarden.tumblr.com/,307,...,"{'type': 'Point', 'coordinates': [-73.9772287,...",2.0,27.0,45.0,17620.0,9.0,"Not accepted: meat, bones, or dairy",,,


In [3]:
#Drop non-essential columns
columns_to_drop = ['Unnamed: 0', 'borocd', ':@computed_region_yeji_bk3q', 
                   ':@computed_region_92fq_4b7q', ':@computed_region_sbqj_enih', 
                   ':@computed_region_efsh_h5xi', ':@computed_region_f5dn_yrer', 
                   'ct2010', 'bbl', 'bin','object_id']
df_cleaned = df.drop(columns=columns_to_drop)
df_cleaned


Unnamed: 0,borough,ntaname,food_scrap_drop_off_site,location,hosted_by,open_months,operation_day_hours,website,councildist,latitude,longitude,precinct,location_point,notes
0,Staten Island,Grasmere-Arrochar-South Beach-Dongan Hills,South Beach,"21 Robin Road, Staten Island NY",Snug Harbor Youth,Year Round,Friday (Start Time: 1:30 PM - End Time: 4:30 PM),snug-harbor.org,50,40.595579,-74.062991,122,"{'type': 'Point', 'coordinates': [-74.062991, ...",
1,Manhattan,Inwood,SE Corner of Broadway & Academy Street,,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,10,,,34,,Download the app to access bins. Accepts all f...
2,Brooklyn,Park Slope,Old Stone House Brooklyn,"336 3rd St, Brooklyn, NY 11215",Old Stone House Brooklyn,Year Round,24/7 (Start Time: 24/7 - End Time: 24/7),,39,40.672712,-73.984731,78,"{'type': 'Point', 'coordinates': [-73.984731, ...",
3,Manhattan,East Harlem (North),SE Corner of Pleasant Avenue & E 116 Street,,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,8,,,25,,Download the app to access bins. Accepts all f...
4,Queens,Corona,Malcolm X FSDO,"111-26 Northern Blvd, Flushing, NY 11368",NYC Compost Project Hosted by Big Reuse,Year Round,Tuesdays (Start Time: 12:00 PM - End Time: 2:...,,21,40.749685,-73.863072,110,"{'type': 'Point', 'coordinates': [-73.8630721,...",
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
571,Brooklyn,Kensington,Albemarle Road and McDonald Avenue,southwest corner of McDonald Avenue and Albema...,NYC Compost Project Hosted by LES Ecology Center,Year Round,Tuesdays (Start Time: 10:00 AM - End Time: 2:...,https://www.lesecologycenter.org/programs/comp...,39,40.644910,-73.979970,66,"{'type': 'Point', 'coordinates': [-73.97997, 4...","Not accepted: meat, bones, or dairy"
572,Queens,Old Astoria-Hallets Point,NW Corner of 21st Street & 30th Drive,,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,22,,,114,,Download the app to access bins. Accepts all f...
573,Brooklyn,Crown Heights (North),Rochester Avenue & St. Johns Place,,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,41,,,77,,Download the app to access bins. Accepts all f...
574,Brooklyn,Windsor Terrace-South Slope,*CLOSED FOR THE SEASON* East 4th Street Commun...,"173 E 4th St, Brooklyn, NY 11218",Members at East 4th Street Community Garden,April - October,Wednesdays and Saturdays (Start Time: Wednesda...,https://eastfourthstreetgarden.tumblr.com/,39,40.648307,-73.977229,72,"{'type': 'Point', 'coordinates': [-73.9772287,...","Not accepted: meat, bones, or dairy"


In [4]:
# Missing Values Handling
print(df_cleaned.isnull().sum())
default_values = {
    'location': "Location not available",
    'hosted_by': "Hosted_By not available",
    'website': "Website not available",
    'notes': "No additional notes available"
}


df_cleaned = df_cleaned.apply(lambda x: x.fillna(default_values.get(x.name, x)), axis=0)
print("_____")
print(df_cleaned.isnull().sum())






borough                       0
ntaname                       0
food_scrap_drop_off_site      0
location                    241
hosted_by                     5
open_months                   0
operation_day_hours           0
website                      35
councildist                   0
latitude                    241
longitude                   241
precinct                      0
location_point              241
notes                        12
dtype: int64
_____
borough                       0
ntaname                       0
food_scrap_drop_off_site      0
location                      0
hosted_by                     0
open_months                   0
operation_day_hours           0
website                       0
councildist                   0
latitude                    241
longitude                   241
precinct                      0
location_point              241
notes                         0
dtype: int64


In [7]:
df_cleaned['text'] = df_cleaned.apply(lambda row: 
    f"The {row['food_scrap_drop_off_site']} drop-off site is located in {row['borough']} at {row['location']}. "
    f"It is hosted by {row['hosted_by']} and operates during {row['open_months']} in the following hours: {row['operation_day_hours']}. "
    f"For more information, visit: {row['website']}. "
    f"Additional notes: {row['notes']}.", axis=1)
df_cleaned[['text']].head()


Unnamed: 0,text
0,The South Beach drop-off site is located in St...
1,The SE Corner of Broadway & Academy Street dro...
2,The Old Stone House Brooklyn drop-off site is ...
3,The SE Corner of Pleasant Avenue & E 116 Stree...
4,The Malcolm X FSDO drop-off site is located in...


## Creating Embeddings


In [19]:
import openai
import os
from config import OPENAI_API_KEY
openai.api_base = "https://openai.vocareum.com/v1"
openai.api_key = OPENAI_API_KEY

EMBEDDING_MODEL_NAME = "text-embedding-ada-002"
batch_size = 100
embeddings = []
for i in range(0, len(df_cleaned), batch_size):
    # Send text data to OpenAI model to get embeddings
    response = openai.Embedding.create(
        input=df_cleaned.iloc[i:i+batch_size]["text"].tolist(),
        engine=EMBEDDING_MODEL_NAME
    )
    
    # Add embeddings to list
    embeddings.extend([data["embedding"] for data in response["data"]])

# Add embeddings list to dataframe
df_cleaned["embeddings"] = embeddings
df_cleaned

Unnamed: 0,borough,ntaname,food_scrap_drop_off_site,location,hosted_by,open_months,operation_day_hours,website,councildist,latitude,longitude,precinct,location_point,notes,text,embeddings
0,Staten Island,Grasmere-Arrochar-South Beach-Dongan Hills,South Beach,"21 Robin Road, Staten Island NY",Snug Harbor Youth,Year Round,Friday (Start Time: 1:30 PM - End Time: 4:30 PM),snug-harbor.org,50,40.595579,-74.062991,122,"{'type': 'Point', 'coordinates': [-74.062991, ...",No additional notes available,The South Beach drop-off site is located in St...,"[0.0004550209268927574, -0.018072519451379776,..."
1,Manhattan,Inwood,SE Corner of Broadway & Academy Street,Location not available,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,10,,,34,,Download the app to access bins. Accepts all f...,The SE Corner of Broadway & Academy Street dro...,"[0.01048674713820219, -0.007631680928170681, -..."
2,Brooklyn,Park Slope,Old Stone House Brooklyn,"336 3rd St, Brooklyn, NY 11215",Old Stone House Brooklyn,Year Round,24/7 (Start Time: 24/7 - End Time: 24/7),Website not available,39,40.672712,-73.984731,78,"{'type': 'Point', 'coordinates': [-73.984731, ...",No additional notes available,The Old Stone House Brooklyn drop-off site is ...,"[0.013765440322458744, -0.029853390529751778, ..."
3,Manhattan,East Harlem (North),SE Corner of Pleasant Avenue & E 116 Street,Location not available,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,8,,,25,,Download the app to access bins. Accepts all f...,The SE Corner of Pleasant Avenue & E 116 Stree...,"[0.013371450826525688, 0.003581035416573286, -..."
4,Queens,Corona,Malcolm X FSDO,"111-26 Northern Blvd, Flushing, NY 11368",NYC Compost Project Hosted by Big Reuse,Year Round,Tuesdays (Start Time: 12:00 PM - End Time: 2:...,Website not available,21,40.749685,-73.863072,110,"{'type': 'Point', 'coordinates': [-73.8630721,...",No additional notes available,The Malcolm X FSDO drop-off site is located in...,"[0.002409494947642088, -0.018397321924567223, ..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
571,Brooklyn,Kensington,Albemarle Road and McDonald Avenue,southwest corner of McDonald Avenue and Albema...,NYC Compost Project Hosted by LES Ecology Center,Year Round,Tuesdays (Start Time: 10:00 AM - End Time: 2:...,https://www.lesecologycenter.org/programs/comp...,39,40.644910,-73.979970,66,"{'type': 'Point', 'coordinates': [-73.97997, 4...","Not accepted: meat, bones, or dairy",The Albemarle Road and McDonald Avenue drop-of...,"[0.016093147918581963, -0.012265551835298538, ..."
572,Queens,Old Astoria-Hallets Point,NW Corner of 21st Street & 30th Drive,Location not available,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,22,,,114,,Download the app to access bins. Accepts all f...,The NW Corner of 21st Street & 30th Drive drop...,"[0.009980657137930393, -0.0022672568447887897,..."
573,Brooklyn,Crown Heights (North),Rochester Avenue & St. Johns Place,Location not available,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,41,,,77,,Download the app to access bins. Accepts all f...,The Rochester Avenue & St. Johns Place drop-of...,"[0.005902835633605719, -0.010530848056077957, ..."
574,Brooklyn,Windsor Terrace-South Slope,*CLOSED FOR THE SEASON* East 4th Street Commun...,"173 E 4th St, Brooklyn, NY 11218",Members at East 4th Street Community Garden,April - October,Wednesdays and Saturdays (Start Time: Wednesda...,https://eastfourthstreetgarden.tumblr.com/,39,40.648307,-73.977229,72,"{'type': 'Point', 'coordinates': [-73.9772287,...","Not accepted: meat, bones, or dairy",The *CLOSED FOR THE SEASON* East 4th Street Co...,"[0.008079242892563343, -0.025679975748062134, ..."


In [20]:
df_cleaned.to_csv("./data/embeddings.csv", index=False)


In [21]:
import numpy as np
df = pd.read_csv("./data/embeddings.csv")
df["embeddings"] = df["embeddings"].apply(eval).apply(np.array)
df

Unnamed: 0,borough,ntaname,food_scrap_drop_off_site,location,hosted_by,open_months,operation_day_hours,website,councildist,latitude,longitude,precinct,location_point,notes,text,embeddings
0,Staten Island,Grasmere-Arrochar-South Beach-Dongan Hills,South Beach,"21 Robin Road, Staten Island NY",Snug Harbor Youth,Year Round,Friday (Start Time: 1:30 PM - End Time: 4:30 PM),snug-harbor.org,50,40.595579,-74.062991,122,"{'type': 'Point', 'coordinates': [-74.062991, ...",No additional notes available,The South Beach drop-off site is located in St...,"[0.0004550209268927574, -0.018072519451379776,..."
1,Manhattan,Inwood,SE Corner of Broadway & Academy Street,Location not available,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,10,,,34,,Download the app to access bins. Accepts all f...,The SE Corner of Broadway & Academy Street dro...,"[0.01048674713820219, -0.007631680928170681, -..."
2,Brooklyn,Park Slope,Old Stone House Brooklyn,"336 3rd St, Brooklyn, NY 11215",Old Stone House Brooklyn,Year Round,24/7 (Start Time: 24/7 - End Time: 24/7),Website not available,39,40.672712,-73.984731,78,"{'type': 'Point', 'coordinates': [-73.984731, ...",No additional notes available,The Old Stone House Brooklyn drop-off site is ...,"[0.013765440322458744, -0.029853390529751778, ..."
3,Manhattan,East Harlem (North),SE Corner of Pleasant Avenue & E 116 Street,Location not available,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,8,,,25,,Download the app to access bins. Accepts all f...,The SE Corner of Pleasant Avenue & E 116 Stree...,"[0.013371450826525688, 0.003581035416573286, -..."
4,Queens,Corona,Malcolm X FSDO,"111-26 Northern Blvd, Flushing, NY 11368",NYC Compost Project Hosted by Big Reuse,Year Round,Tuesdays (Start Time: 12:00 PM - End Time: 2:...,Website not available,21,40.749685,-73.863072,110,"{'type': 'Point', 'coordinates': [-73.8630721,...",No additional notes available,The Malcolm X FSDO drop-off site is located in...,"[0.002409494947642088, -0.018397321924567223, ..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
571,Brooklyn,Kensington,Albemarle Road and McDonald Avenue,southwest corner of McDonald Avenue and Albema...,NYC Compost Project Hosted by LES Ecology Center,Year Round,Tuesdays (Start Time: 10:00 AM - End Time: 2:...,https://www.lesecologycenter.org/programs/comp...,39,40.644910,-73.979970,66,"{'type': 'Point', 'coordinates': [-73.97997, 4...","Not accepted: meat, bones, or dairy",The Albemarle Road and McDonald Avenue drop-of...,"[0.016093147918581963, -0.012265551835298538, ..."
572,Queens,Old Astoria-Hallets Point,NW Corner of 21st Street & 30th Drive,Location not available,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,22,,,114,,Download the app to access bins. Accepts all f...,The NW Corner of 21st Street & 30th Drive drop...,"[0.009980657137930393, -0.0022672568447887897,..."
573,Brooklyn,Crown Heights (North),Rochester Avenue & St. Johns Place,Location not available,Department of Sanitation,Year Round,24/7,www.nyc.gov/smartcomposting,41,,,77,,Download the app to access bins. Accepts all f...,The Rochester Avenue & St. Johns Place drop-of...,"[0.005902835633605719, -0.010530848056077957, ..."
574,Brooklyn,Windsor Terrace-South Slope,*CLOSED FOR THE SEASON* East 4th Street Commun...,"173 E 4th St, Brooklyn, NY 11218",Members at East 4th Street Community Garden,April - October,Wednesdays and Saturdays (Start Time: Wednesda...,https://eastfourthstreetgarden.tumblr.com/,39,40.648307,-73.977229,72,"{'type': 'Point', 'coordinates': [-73.9772287,...","Not accepted: meat, bones, or dairy",The *CLOSED FOR THE SEASON* East 4th Street Co...,"[0.008079242892563343, -0.025679975748062134, ..."


## Similarity Measure  Using Cosine Similarity  

In [23]:
from openai.embeddings_utils import get_embedding, distances_from_embeddings

def get_rows_sorted_by_relevance(question, df):
    """
    Function that takes in a question string and a dataframe containing
    rows of text and associated embeddings, and returns that dataframe
    sorted from least to most relevant for that question
    """
    
    # Get embeddings for the question text
    question_embeddings = get_embedding(question, engine=EMBEDDING_MODEL_NAME)
    
    # Make a copy of the dataframe and add a "distances" column containing
    # the cosine distances between each row's embeddings and the
    # embeddings of the question
    df_copy = df.copy()
    df_copy["distances"] = distances_from_embeddings(
        question_embeddings,
        df_copy["embeddings"].values,
        distance_metric="cosine"
    )
    
    # Sort the copied dataframe by the distances and return it
    # (shorter distance = more relevant so we sort in ascending order)
    df_copy.sort_values("distances", ascending=True, inplace=True)
    return df_copy


In [24]:
get_rows_sorted_by_relevance("What are the operation hours for the drop-off site in Sunset Park?", df)

Unnamed: 0,borough,ntaname,food_scrap_drop_off_site,location,hosted_by,open_months,operation_day_hours,website,councildist,latitude,longitude,precinct,location_point,notes,text,embeddings,distances
481,Brooklyn,Sunset Park (Central),Sunset Park FSDO,7th Ave & 44th St,GrowNYC,Year Round,Saturdays (Start Time: 8:00 AM - End Time: 12...,grownyc.org/compost,38,40.646152,-74.002213,72,"{'type': 'Point', 'coordinates': [-74.002213, ...","Not accepted: meat, bones, or dairy",The Sunset Park FSDO drop-off site is located ...,"[0.02519982121884823, -0.017577148973941803, 0...",0.103310
281,Brooklyn,Coney Island-Sea Gate,PS 90 Coney Island,"2840 W 12th Street, Brooklyn, NY 11224",PS 90 Coney Island,Year Round,Wednesdays (Start Time: 7:30 AM - End Time: 1...,Website not available,47,40.578254,-73.979703,60,"{'type': 'Point', 'coordinates': [-73.979703, ...","Not accepted: meat, bones, or dairy",The PS 90 Coney Island drop-off site is locate...,"[0.010200629010796547, -0.021694481372833252, ...",0.126663
329,Brooklyn,Bay Ridge,Bay Ridge,3rd Ave & 95th Street,GrowNYC,Year Round,Saturdays (Start Time: 8:00 AM - End Time: 12...,grownyc.org/compost,47,40.617400,-74.033703,68,"{'type': 'Point', 'coordinates': [-74.033703, ...","Not accepted: meat, bones, or dairy",The Bay Ridge drop-off site is located in Broo...,"[0.007109519559890032, -0.022969216108322144, ...",0.128884
488,Brooklyn,East Flatbush-Remsen Village,Howard Garden,750 Howard Avenue,Howard Garden,April - October,Fridays (Start Time: 9:00 AM - End Time: 10:0...,Website not available,41,40.663629,-73.919036,73,"{'type': 'Point', 'coordinates': [-73.9190362,...","Not accepted: meat, bones, or dairy",The Howard Garden drop-off site is located in ...,"[0.0023135258816182613, -0.026688704267144203,...",0.129044
517,Brooklyn,Canarsie,Rockaway Parkway,1425 Rockaway Pkwy,GrowNYC,Year Round,Wednesday (Start Time: 10:00 AM - End Time: 2...,grownyc.org/compost,46,40.645419,-73.902365,69,"{'type': 'Point', 'coordinates': [-73.902365, ...",No additional notes available,The Rockaway Parkway drop-off site is located ...,"[0.003281597513705492, -0.0295096542686224, 0....",0.130441
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
59,Queens,Jamaica,King Manor Museum,"150-03 Jamaica Ave, Jamaica, NY 11432",King Manor Museum,Year Round,Fridays and Saturdays (Start Time: Friday 10 A...,https://www.kingmanor.org/,27,40.702160,-73.804450,103,"{'type': 'Point', 'coordinates': [-73.80445, 4...","Not accepted: meat, bones, or dairy",The King Manor Museum drop-off site is located...,"[-0.007802816107869148, -0.014526243321597576,...",0.182024
252,Manhattan,The Battery-Governors Island-Ellis Island-Libe...,Governors Island Soissons Ferry Landing,Adjacent to Soissons Landing and Taco Vista,NYC Compost Project Hosted by Earth Matter NY,Year Round,Whenever the Island is open to the public (ch...,www.earthmatter.org,1,40.692810,-74.014831,1,"{'type': 'Point', 'coordinates': [-74.0148306,...",No additional notes available,The Governors Island Soissons Ferry Landing dr...,"[0.012407448142766953, -0.014020693488419056, ...",0.183682
563,Queens,Long Island City-Hunters Point,Smiling Hogshead Ranch,"Pearson Pl & Skillman Ave, Queens, NY 11101",Volunteers at Smiling Hogshead Ranch,Year Round,Every day (Start Time: Dawn - End Time: Dusk),smilinghogsheadranch.org,26,40.743233,-73.943134,108,"{'type': 'Point', 'coordinates': [-73.9431336,...",No additional notes available,The Smiling Hogshead Ranch drop-off site is lo...,"[-0.015774356201291084, -0.01843949221074581, ...",0.183998
411,Queens,Forest Hills,Commonpoint Queens in partnership with The For...,NE corner of 67 Rd & 108th Street 67-09 108th ...,Forest Hills Green Team,Year Round,Sundays (Start Time: 10:00 AM - End Time: 1:0...,commonpointqueens.org,29,40.728415,-73.847110,112,"{'type': 'Point', 'coordinates': [-73.84711, 4...","Not accepted: meat, bones, or dairy",The Commonpoint Queens in partnership with The...,"[0.01581118255853653, -0.0017178953858092427, ...",0.185470


## Using ChromaDb like  Vector Database (OPTIONAL)

In [43]:
pip install numpy==1.23

Collecting numpy==1.23Note: you may need to restart the kernel to use updated packages.


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
faiss-cpu 1.9.0 requires numpy<3.0,>=1.25.0, but you have numpy 1.23.0 which is incompatible.
pandas 1.5.2 requires numpy>=1.23.2; python_version >= "3.11", but you have numpy 1.23.0 which is incompatible.



  Downloading numpy-1.23.0.tar.gz (10.7 MB)
     ---------------------------------------- 0.0/10.7 MB ? eta -:--:--
     ------------------ --------------------- 5.0/10.7 MB 25.2 MB/s eta 0:00:01
     --------------------------------------- 10.7/10.7 MB 30.5 MB/s eta 0:00:00
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: numpy
  Building wheel for numpy (pyproject.toml): started
  Building wheel for numpy (pyproject.toml): still running...
  Building wheel for numpy (pyproject.toml): still running...
  Building wheel for numpy (pyproject.toml): finished with status 'done'
  Created wheel for numpy: filename=numpy-1.23.0-cp311-cp311-win_amd64.whl size=5840486 sha256=3d

In [46]:
pip install typing-extensions==4.5.0

Note: you may need to restart the kernel to use updated packages.


In [None]:
pip uninstall chromadb


In [None]:
pip install chromadb

In [48]:
import pandas as pd
import numpy as np
import chromadb
from chromadb.config import Settings

df = pd.read_csv("./data/embeddings.csv")
df['embedding'] = df['embedding'].apply(eval)

# Initialize Chroma client
client = chromadb.Client(Settings(
    chroma_db_impl="duckdb+parquet",  # Default local setup
    persist_directory=".chroma/"       # Directory for Chroma to store data
))

embeddings.shape[1]

# Using IndexFlatIP for cosine similarity after normalization
#index = faiss.IndexFlatIP(embeddings.shape[1])

ImportError: cannot import name 'deprecated' from 'typing_extensions' (c:\Users\HP\OneDrive\Documents\Eduardo Toledo\ASAI\generative_ai\.venv\Lib\site-packages\typing_extensions.py)

## Creating Prompt With Contex

In [None]:
# Code to create a prompt for the model. This code was taken  from the course.
import tiktoken

def create_prompt(question, df, max_token_count):
    """
    Given a question and a dataframe containing rows of text and their
    embeddings, return a text prompt to send to a Completion model
    """
    # Create a tokenizer that is designed to align with our embeddings
    tokenizer = tiktoken.get_encoding("cl100k_base")
    
    # Count the number of tokens in the prompt template and question
    prompt_template = """
Answer the question based on the context below, and if the question
can't be answered based on the context, say "I don't know"

Context: 

{}

---

Question: {}
Answer:"""
    
    current_token_count = len(tokenizer.encode(prompt_template)) + \
                            len(tokenizer.encode(question))
    
    context = []
    for text in get_rows_sorted_by_relevance(question, df)["text"].values:
        
        # Increase the counter based on the number of tokens in this row
        text_token_count = len(tokenizer.encode(text))
        current_token_count += text_token_count
        
        # Add the row of text to the list if we haven't exceeded the max
        if current_token_count <= max_token_count:
            context.append(text)
        else:
            break

    return prompt_template.format("\n\n###\n\n".join(context), question)
    

def answer_question_chatbot(
    question, df, max_prompt_tokens=1800, max_answer_tokens=150
):
    """
    Given a question, a dataframe containing rows of text, and a maximum
    number of desired tokens in the prompt and response, return the
    answer to the question according to an OpenAI Completion model
    
    If the model produces an error, return an empty string
    """
    
    prompt = create_prompt(question, df, max_prompt_tokens)
    
    try:
        response = openai.Completion.create(
            model=COMPLETION_MODEL_NAME,
            prompt=prompt,
            temperature=0.2,
            top_p=1,
            max_tokens=max_answer_tokens
        )
        return response["choices"][0]["text"].strip()
    except Exception as e:
        print(e)
        return ""

COMPLETION_MODEL_NAME = "gpt-3.5-turbo-instruct"

## Custom Performance Demonstration

TODO: In the cells below, demonstrate the performance of your custom query using at least 2 questions. For each question, show the answer from a basic `Completion` model query as well as the answer from your custom query.

### Question 1

In [59]:
question= "List drop off sites are open 24/7?"
custom_answer = answer_question_chatbot(question, df)
print(custom_answer)

The 34-04 24 Street, 34-15 31 Avenue, 36-04 31 Avenue, 14-02 31 Avenue, 25-10 31 Avenue, 31-10 23 Street, 29-08 31 Avenue, 35-04 31 Avenue, 25-35 31 Avenue, 21-03 31 Avenue, 31-28 Crescent Street, Opp31-06 38 Street, New York Avenue & Lincoln Place, New York Avenue & Prospect Place, Vanderbilt Avenue & Dean Street, Vanderbilt Avenue & Plaza Street East, and NW Corner of 21st Street & 30th Drive drop-off sites are open 24/7.


In [60]:
basic_answer = openai.Completion.create(
    model="gpt-3.5-turbo-instruct",
    prompt=question,
    temperature=0.2,
    top_p=1,
    max_tokens=150
)["choices"][0]["text"].strip()
print(basic_answer)

No, most drop off sites have specific hours of operation and may not be open 24/7. It is best to check the specific drop off site's hours before planning a visit.


### Question 2

In [61]:
question= "Who hosts the drop-off site at E 116 Street?"
custom_answer = answer_question_chatbot(question, df)
print(custom_answer)

Department of Sanitation


In [62]:
basic_answer = openai.Completion.create(
    model="gpt-3.5-turbo-instruct",
    prompt=question,
    temperature=0.2,
    top_p=1,
    max_tokens=150
)["choices"][0]["text"].strip()
print(basic_answer)

It is not specified which drop-off site at E 116 Street you are referring to. Please provide more information for an accurate answer.
