## Dataset Choice and Scenario

I have chosen the NYC Food Scrap Drop-off Sites dataset for this custom chatbot project. 
This dataset contains information about food scrap collection locations throughout New York City, 
including addresses, operating hours, accepted materials, and special instructions.

This dataset is highly appropriate for a custom chatbot because:

1. The data contains detailed, location-specific information that 
   a general AI model would not have access to, making the custom data essential for accurate responses.

2. NYC residents genuinely need this information to participate in food scrap 
   recycling programs, making this a real-world useful application.

3. Each drop-off site has unique characteristics (hours, materials accepted, 
   special instructions) that require the model to understand and synthesize multiple data points 
   to provide helpful answers.

4. Without this custom data, the model cannot provide specific 
   locations, hours, or guidelines for food scrap drop-off in NYC, making the before/after 
   comparison very apparent.

This chatbot would be valuable for NYC residents, community organizations, and environmental 
initiatives seeking to increase participation in food scrap recycling programs.

## Imports

In [35]:
import pandas as pd
import openai
import numpy as np
from openai.embeddings_utils import get_embedding, cosine_similarity
import os
from typing import List, Tuple

import ast
import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.max_colwidth', 1000)

In [36]:
openai.api_key = "sk-YOUR_API_KEY"

try:
    # Test API key first
    openai.Model.list()
    print("‚úÖ API key validated successfully")
    print("\nList of All model ID in OpenAI: \n\n", [model["id"] for model in openai.Model.list()['data']])
except openai.error.AuthenticationError:
    print("‚ùå ERROR: Invalid OpenAI API key!")
    print("Please check your API key and try again.")
except Exception as e:
    print(f"‚ùå API Error: {str(e)}")

‚úÖ API key validated successfully

List of All model ID in OpenAI: 

 ['gpt-4-0613', 'gpt-4', 'gpt-3.5-turbo', 'gpt-4o-audio-preview-2025-06-03', 'gpt-4.1-nano', 'gpt-image-1', 'codex-mini-latest', 'gpt-4o-realtime-preview-2025-06-03', 'davinci-002', 'babbage-002', 'gpt-3.5-turbo-instruct', 'gpt-3.5-turbo-instruct-0914', 'dall-e-3', 'dall-e-2', 'gpt-4-1106-preview', 'gpt-3.5-turbo-1106', 'tts-1-hd', 'tts-1-1106', 'tts-1-hd-1106', 'text-embedding-3-small', 'text-embedding-3-large', 'gpt-4-0125-preview', 'gpt-4-turbo-preview', 'gpt-3.5-turbo-0125', 'gpt-4-turbo', 'gpt-4-turbo-2024-04-09', 'gpt-4o', 'gpt-4o-2024-05-13', 'gpt-4o-mini-2024-07-18', 'gpt-4o-mini', 'gpt-4o-2024-08-06', 'chatgpt-4o-latest', 'o1-preview-2024-09-12', 'o1-preview', 'o1-mini-2024-09-12', 'o1-mini', 'gpt-4o-realtime-preview-2024-10-01', 'gpt-4o-audio-preview-2024-10-01', 'gpt-4o-audio-preview', 'gpt-4o-realtime-preview', 'omni-moderation-latest', 'omni-moderation-2024-09-26', 'gpt-4o-realtime-preview-2024-12-17', '

## Data Wrangling

In [37]:
df = pd.read_csv("data/nyc_food_scrap_drop_off_sites.csv", index_col=0)
print(df.shape)

df.info()

(576, 24)
<class 'pandas.core.frame.DataFrame'>
Index: 576 entries, 0 to 575
Data columns (total 24 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   borough                      576 non-null    object 
 1   ntaname                      576 non-null    object 
 2   food_scrap_drop_off_site     576 non-null    object 
 3   location                     335 non-null    object 
 4   hosted_by                    571 non-null    object 
 5   open_months                  576 non-null    object 
 6   operation_day_hours          576 non-null    object 
 7   website                      541 non-null    object 
 8   borocd                       576 non-null    int64  
 9   councildist                  576 non-null    int64  
 10  latitude                     335 non-null    float64
 11  longitude                    335 non-null    float64
 12  precinct                     576 non-null    int64  
 13  object_id      

In [38]:
def create_text_description(row):
    """Create a comprehensive text description for each drop-off site"""
    text_parts = []

    # Location name and address
    if pd.notna(row.get('location')):
        text_parts.append(f"Location: {row['location']}")

    if pd.notna(row.get('ntaname')):
        text_parts.append(f"Address: {row['ntaname']}")

    # Borough and neighborhood info
    if pd.notna(row.get('borough')):
        text_parts.append(f"Borough: {row['borough']}")

    # Operating hours and schedule
    if pd.notna(row.get('operation_day_hours')):
        text_parts.append(f"Hours: {row['operation_day_hours']}")

    # What's accepted
    if pd.notna(row.get('food_scrap_drop_off_site')):
        text_parts.append(f"Food Scraps Accepted: {row['food_scrap_drop_off_site']}")

    # Special instructions or notes
    if pd.notna(row.get('notes')):
        text_parts.append(f"Special Notes: {row['notes']}")

    if pd.notna(row.get('website')):
        text_parts.append(f"Website: {row['website']}")
        
    if pd.notna(row.get("location_point")):
        text_parts.append(f"Location Coordinates: {ast.literal_eval(row['location_point'])['coordinates']}")

    return " | ".join(text_parts)
    
# Apply the function to create the text column
df['text'] = df.apply(create_text_description, axis=1)

# Clean up any empty or very short entries
df = df[df['text'].str.len() > 50].copy()
df['text'].head()

0                          Location: 21 Robin Road, Staten Island NY | Address: Grasmere-Arrochar-South Beach-Dongan Hills | Borough: Staten Island | Hours: Friday (Start Time: 1:30 PM - End Time:  4:30 PM) | Food Scraps Accepted: South Beach | Website: snug-harbor.org | Location Coordinates: [-74.062991, 40.595579]
1                      Address: Inwood | Borough: Manhattan | Hours: 24/7 | Food Scraps Accepted: SE Corner of Broadway & Academy Street | Special Notes: Download the app to access bins. Accepts all food scraps, including meat and dairy. Do not leave food scraps outside of bin! | Website: www.nyc.gov/smartcomposting
2                                                                                     Location: 336 3rd St, Brooklyn, NY 11215 | Address: Park Slope | Borough: Brooklyn | Hours: 24/7 (Start Time: 24/7 - End Time:  24/7) | Food Scraps Accepted: Old Stone House Brooklyn | Location Coordinates: [-73.984731, 40.6727118]
3    Address: East Harlem (North) | Borough: M

In [9]:
def create_embeddings(df, embedding_model):
    """Create embeddings for all text descriptions"""
    print("Creating embeddings for all drop-off sites...")

    # Get embeddings for all text descriptions
    df['embedding'] = df['text'].apply(
        lambda x: get_embedding(x, engine=embedding_model)
    )
    
    return df

def find_relevant_sites(df, embedding_model, query: str, top_k: int = 5) -> List[Tuple[str, float]]:
    """
    Find the most relevant drop-off sites for a given query

    Args:
        embedding_model: openAI embedding model to create query embeddings
        query: User's question or search terms
        top_k: Number of most relevant sites to return

    Returns:
        List of tuples (site_description, similarity_score)
    """
    # Store embeddings as numpy array for faster similarity calculations
    embeddings = np.array(df['embedding'].tolist())
    
    # Get embedding for the query
    query_embedding = get_embedding(query, engine=embedding_model)

    # Calculate similarities
    similarities = []
    for i, site_embedding in enumerate(embeddings):
        similarity = cosine_similarity(query_embedding, site_embedding)
        similarities.append((i, similarity))

    # Sort by similarity and get top results
    similarities.sort(key=lambda x: x[1], reverse=True)
    top_sites = similarities[:top_k]

    # Return site descriptions and scores
    results = []
    for idx, score in top_sites:
        results.append((df.iloc[idx]['text'], score))

    return results

In [10]:
EMBEDDING_MODEL ="text-embedding-3-small"
df = create_embeddings(df, EMBEDDING_MODEL)
df_processed = df[['text', "embedding"]]

Creating embeddings for all drop-off sites...


In [27]:
def generate_custom_response(df, embedding_model, query: str, max_context_length: int = 3000) -> str:
    """
    Generate a response using relevant site data as context

    Args:
        query: User's question
        max_context_length: Maximum length of context to include

    Returns:
        Generated response from OpenAI
    """
    # Find relevant sites
    relevant_sites = find_relevant_sites(df, embedding_model, query, top_k=5)

    # Build context from relevant sites
    context_parts = []
    current_length = 0

    for site_text, score in relevant_sites:
        if current_length + len(site_text) <= max_context_length:
            context_parts.append(f"Site: {site_text}")
            current_length += len(site_text)
        else:
            break

    context = "\n\n".join(context_parts)
    
    system_prompt = """You are a helpful assistant specializing in NYC food scrap drop-off locations. 
    Use the provided information about drop-off sites to answer questions about food scrap recycling 
    in New York City. Be specific about locations, hours, and requirements when possible.

    If the user asks about a specific neighborhood or borough, prioritize sites in that area.
    Always provide practical, actionable information to help people participate in food scrap recycling."""
    
    user_prompt = f"""Based on the following NYC food scrap drop-off site information, please answer this question: {query}

    Drop-off Site Information:
    {context}

    Please provide a helpful, specific answer based on this information."""

    # Generate response using OpenAI
    response = openai.ChatCompletion.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        max_tokens=500,
        temperature=0.7
    )

    return response.choices[0].message.content


def generate_baseline_response(query: str) -> str:
    """
    Generate a baseline response using OpenAI

    Returns:
        Generated response from OpenAI
    """
    
    user_prompt = f"""Please answer this question: {query}."""

    # Generate response using OpenAI
    response = openai.ChatCompletion.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": user_prompt}
        ],
        max_tokens=500,
        temperature=0.7
    )

    return response.choices[0].message.content

In [29]:
generate_custom_response(df_processed, EMBEDDING_MODEL, query = "Where can I drop off food scraps in Brooklyn on weekends?")

'In Brooklyn, you have two options for dropping off food scraps on weekends:\n\n1. **Flatbush (West)-Ditmas Park-Parkville**:  \n   - **Location**: 58 E 18th St, Brooklyn, NY 11226\n   - **Hours**: \n     - **Saturdays**: Open all night\n     - **Sundays**: Open until 4:00 PM\n   - **Food Scraps Accepted**: Q Gardens\n   - **Special Notes**: Meat, bones, and dairy are not accepted.\n   - **Website**: [Q Gardens](https://qgardenscf.com/places-to-drop-off-your-compost/)\n\n2. **Prospect Heights**:  \n   - **Location**: SW Corner of Washington Avenue & St. Johns Place\n   - **Hours**: 24/7\n   - **Special Notes**: Requires downloading an app to access bins. Accepts all food scraps, including meat and dairy. Do not leave food scraps outside of the bin!\n   - **Website**: [NYC Smart Composting](www.nyc.gov/smartcomposting)\n\nBoth locations provide accessible options for weekend food scrap drop-off in Brooklyn. Depending on your preference for a specific area or the types of food scraps you

In [30]:
"""
Demonstrate the difference between baseline and custom responses
"""

test_questions = [
    "Where can I drop off food scraps in Brooklyn on weekends?",
    "What are the hours for food scrap drop-off sites in Manhattan?",
    "Are there any food scrap collection sites near Union Square?",
    "What types of food scraps are accepted at NYC drop-off locations?",
    "I live in Queens - where's the closest place to drop off my food scraps?"
]

print("=" * 80)
print("CHATBOT PERFORMANCE DEMONSTRATION")
print("=" * 80)

for i, question in enumerate(test_questions, 1):
    print(f"\n{'='*60}")
    print(f"QUESTION {i}: {question}")
    print(f"{'='*60}")

    print(f"\n--- BASELINE RESPONSE (without custom data) ---")
    print(generate_baseline_response(query = question))

    print(f"\n--- CUSTOM RESPONSE (with NYC drop-off site data) ---\n")
    
    print(generate_custom_response(df_processed, EMBEDDING_MODEL, query = question))

CHATBOT PERFORMANCE DEMONSTRATION

QUESTION 1: Where can I drop off food scraps in Brooklyn on weekends?

--- BASELINE RESPONSE (without custom data) ---
In Brooklyn, there are several options for dropping off food scraps on weekends. The NYC Department of Sanitation's composting program provides food scrap drop-off sites throughout the borough. These sites are often located at greenmarkets, community gardens, and other designated areas. Here are a few options:

1. **Greenmarkets**: Many of the greenmarkets operated by GrowNYC have food scrap drop-off spots. Popular ones include:
   - **Grand Army Plaza Greenmarket** (Saturdays)
   - **Fort Greene Park Greenmarket** (Saturdays)
   - **McCarren Park Greenmarket** (Saturdays)

2. **Community Composting Sites**: Various community gardens and local organizations host composting sites. Some include:
   - **Red Hook Community Farm** (Check specific weekend hours)
   - **Brooklyn Botanic Garden** (Check for any specific drop-off events or tim

If you live in Queens and are looking to drop off your food scraps, the NYC Department of Sanitation operates food scrap drop-off sites throughout the borough. These sites are part of the city's efforts to reduce waste and promote composting. Locations and their operating hours can change, so it‚Äôs a good idea to check the most current information before heading out. Here are some options to consider:

1. **Queens Botanical Garden** - They often have a food scrap drop-off site available during certain hours.
2. **Farmer's Markets** - Many farmer's markets in Queens offer food scrap drop-off services. Examples include the Forest Hills Greenmarket and the Jackson Heights Greenmarket.
3. **Local Community Gardens** - Some community gardens accept food scraps for composting.

For the most accurate and detailed information, you can visit the NYC Department of Sanitation website or use the NYC Compost Project‚Äôs map to find the closest drop-off location and check their hours of operation.


In [39]:
while True:
    try:
        # Get user input
        user_input = input(f"\nüí¨ You: ").strip()

        # Check for exit commands
        if user_input.lower() in ['quit', 'exit', 'bye', 'q']:
            print("üëã Thanks for using the NYC Food Scrap Assistant! Have a great day!")
            break

        # Handle empty input
        if not user_input:
            print("ü§î Please ask me a question about food scrap drop-off in NYC!")
            continue

        print(f"\nü§ñ Assistant:")

        # Use actual chatbot with embeddings
        response = generate_custom_response(df_processed, EMBEDDING_MODEL, user_input)
        print(f"‚ú® {response}")


    except KeyboardInterrupt:
        print("\n\nüëã Chat interrupted. Goodbye!")
        break
    except Exception as e:
        print(f"\n‚ùå An error occurred: {str(e)}")
        print("üí° Please try again or type 'quit' to exit.")


üí¨ You: What are the hours for food scrap drop-off sites in Manhattan?

ü§ñ Assistant:
‚ú® Here are the hours for food scrap drop-off sites in Manhattan:

1. **Madison Square Park Food Scrap Drop-off**
   - **Location:** 23rd St & Broadway, Midtown South-Flatiron-Union Square
   - **Hours:** Wednesdays, 8:00 AM - 1:00 PM
   - **Special Notes:** No meat, bones, or dairy accepted.

2. **East 96th Street Food Scrap Drop-off**
   - **Location:** 96th St & Lexington Ave, Upper East Side-Carnegie Hill
   - **Hours:** Fridays, 7:30 AM - 11:30 AM
   - **Special Notes:** No meat, bones, or dairy accepted.

3. **145th St Food Scrap Drop-off**
   - **Location:** 145th St & Edgecombe Ave, Hamilton Heights-Sugar Hill
   - **Hours:** Thursdays, 7:30 AM - 12:00 PM
   - **Special Notes:** No meat, bones, or dairy accepted.

4. **Asphalt Green Food Scrap Drop-off**
   - **Location:** East 91st St & York Ave, Upper East Side-Yorkville
   - **Hours:** Sundays, 7:30 AM - 12:30 PM
   - **Special Notes: