# Vancouver Homeless Shelter Chatbot

The city of vancouver offers many services for low cost food. Unfortunately its often not clear where these services are, if they are still active, and what they offer. It would be convinient to have a chatbot that was able to provide this information to people in need.

ChatGPT (version 3.5) is able to provide some of the information needed. I.e if you ask to get a list of free food services on the eastside it will provide a list in the following form:

```Harvest Project

Location: 3980 Fraser Street
Provides low-income individuals and families with groceries and fresh produce.

... 
```

However when pressed to provide phone numbers, its sometimes not able to, and as well the list is only up to date for January 2022. As new services come online, it would be ideal for the chatbot to be able to provide information for all of the availibles shelters as well as more information for how to contact them.

The datasets that we are using are in the data folder of the Github repo.

# Setup

In [1]:
import openai
import pandas as pd
import tiktoken

pd.set_option('max_colwidth', None) # show full width of showing cols
pd.set_option('display.max_columns', 1000)
pd.set_option("expand_frame_repr", False) # print cols side by side as it's supposed to be
pd.options.display.max_seq_items = 200000
pd.options.display.max_rows = 400000


DATAFILE = "data/free-and-low-cost-food-programs-vancouver.json"
openai.api_key = "<API_TOKEN>"
MODEL_NAME = 'gpt-3.5-turbo-instruct'

CONTEXT = """
Answer the question based on the context below, if it can't be answered using the context, say "I dont know".

Context: 

{}

---

"""

PROMPT_TEMPLATE = """{}
Question: {}

Provide your answer if the following form:

Place
- Location: <address>
- Phone number:
- Description:

Answer:"""

## Data Wrangling
This section cleans the data and makes a text column

In [2]:
df = pd.read_json(DATAFILE)
df.head(3)

Unnamed: 0,program_name,description,program_status,organization_name,program_population_served,address_extra_info,location_address,local_areas,provides_meals,provides_hampers,delivery_available,takeout_available,wheelchair_accessible,meal_cost,hamper_cost,signup_required,signup_phone_number,signup_email,requires_referral,referral_agency_name,referral_phone_number,referral_email,latitude,longitude,last_update_date,geom
0,Washington Community Market,"Low cost essential food and household supplies, Mon-Sat 9am-5pm. For info contact (604) 683-0073.",Open,Portland Hotel Society (PHS),,Ground Floor,"179 E Hastings St, Vancouver, BC",Downtown,False,True,No,No,Yes,,Low cost,No,,,No,,,,49.281418,-123.100279,2023-07-31T10:10:06-07:00,"{'lon': -123.100279, 'lat': 49.281418}"
1,The Dugout - Hot Breakfast,Daily hot meal at 7:30am. For info call (604) 685-5239.,Open,The Dugout,,,"59 Powell St, Vancouver, BC",Downtown,True,False,No,Yes,Unknown,Free,,No,,,No,,,,49.283284,-123.102773,2022-08-10T06:56:58-07:00,"{'lon': -123.102773, 'lat': 49.283284}"
2,Vancouver Community Fridge – LMNH,"Fridge, freezer, and pantry stocked with free food. Temporarily located in front of the Soap Dispensary. Available 24/7. \r\nMore info at https://vcfp.square.site/",Open,Vancouver Community Fridge Project,,In front of the Soap Dispensary,3718 Main Street,Riley Park,False,True,No,Yes,Yes,,Unknown,No,,,No,,,,49.251757,-123.100852,2023-08-28T04:27:30-07:00,"{'lon': -123.100852, 'lat': 49.251757}"


In [3]:
perc_updated = len(df[df.last_update_date > '2022-01-01'])/len(df)
print(f"Percentage of rows updated after January 2022: {round(perc_updated*100.0,1)}%")

Percentage of rows updated after January 2022: 89.9%


We can drop a few of the rows columns that aren't needed.

In [4]:
df.columns

Index(['program_name', 'description', 'program_status', 'organization_name',
       'program_population_served', 'address_extra_info', 'location_address',
       'local_areas', 'provides_meals', 'provides_hampers',
       'delivery_available', 'takeout_available', 'wheelchair_accessible',
       'meal_cost', 'hamper_cost', 'signup_required', 'signup_phone_number',
       'signup_email', 'requires_referral', 'referral_agency_name',
       'referral_phone_number', 'referral_email', 'latitude', 'longitude',
       'last_update_date', 'geom'],
      dtype='object')

In [5]:
keep_columns = ['program_name', 'description', 'program_status', 'organization_name', 'location_address']

df = df[df.program_status == 'Open']
df = df[keep_columns]

Let's make a new column called 'text which has all of the information that we need.

In [6]:
df['text'] = df.apply(lambda row: ', '.join([f"{col.upper()}: {row[col]}" for col in df.columns]), axis=1)

In [7]:
df.head(1).text

0    PROGRAM_NAME: Washington Community Market, DESCRIPTION: Low cost essential food and household supplies, Mon-Sat 9am-5pm. For info contact (604) 683-0073., PROGRAM_STATUS: Open, ORGANIZATION_NAME: Portland Hotel Society (PHS), LOCATION_ADDRESS: 179 E Hastings St, Vancouver, BC
Name: text, dtype: object

## Custom Query Completion

This section makes a custom query using embeddings from our dataset and send it to OpenAI

In [8]:
from openai.embeddings_utils import get_embedding, distances_from_embeddings
EMBEDDING_MODEL_NAME = "text-embedding-ada-002"

In [9]:
batch_size = 100
embeddings = []
for i in range(0, len(df), batch_size):
    # Send text data to OpenAI model to get embeddings
    response = openai.Embedding.create(
        input=df.iloc[i:i+batch_size]["text"].tolist(),
        engine=EMBEDDING_MODEL_NAME
    )

    # Add embeddings to list
    embeddings.extend([data["embedding"] for data in response["data"]])

# Add embeddings list to dataframe
df["embeddings"] = embeddings
df.to_json("embeddings.json")

In [10]:
def get_rows_sorted_by_relevance(question, df):
    """
    Function that takes in a question string and a dataframe containing
    rows of text and associated embeddings, and returns that dataframe
    sorted from least to most relevant for that question
    """

    # Get embeddings for the question text
    question_embeddings = get_embedding(question, engine=EMBEDDING_MODEL_NAME)

    # Make a copy of the dataframe and add a "distances" column containing
    # the cosine distances between each row's embeddings and the
    # embeddings of the question
    df_copy = df.copy()
    df_copy["distances"] = distances_from_embeddings(
        question_embeddings,
        df_copy["embeddings"].values,
        distance_metric="cosine"
    )

    # Sort the copied dataframe by the distances and return it
    # (shorter distance = more relevant so we sort in ascending order)
    df_copy.sort_values("distances", ascending=True, inplace=True)
    return df_copy



def create_prompt(question, df, max_token_count, ask_with_context=True):
    """
    Given a question and a dataframe containing rows of text and their
    embeddings, return a text prompt to send to a Completion model
    """
    
    if not ask_with_context:
        return PROMPT_TEMPLATE.format('', question)
    
    
    # Create a tokenizer that is designed to align with our embeddings
    tokenizer = tiktoken.get_encoding("cl100k_base")

    # Count the number of tokens in the prompt template and question
    current_token_count = len(tokenizer.encode(PROMPT_TEMPLATE)) + \
                            len(tokenizer.encode(question))

    context = []
    for text in get_rows_sorted_by_relevance(question, df)["text"].values:

        # Increase the counter based on the number of tokens in this row
        text_token_count = len(tokenizer.encode(text))
        current_token_count += text_token_count

        # Add the row of text to the list if we haven't exceeded the max
        if current_token_count <= max_token_count:
            context.append(text)
        else:
            break
                                 
    context_string = CONTEXT.format("\n\n###\n\n".join(context))
    return PROMPT_TEMPLATE.format(context_string, question)

def answer_question(
    question, df, max_prompt_tokens=500, max_answer_tokens=500, ask_with_context=True
):
    """
    Given a question, a dataframe containing rows of text, and a maximum
    number of desired tokens in the prompt and response, return the
    answer to the question according to an OpenAI Completion model

    If the model produces an error, return an empty string
    """


    prompt = create_prompt(question, df, max_prompt_tokens, ask_with_context)
#     print(f"PROMPT: {prompt}")

    try:
        response = openai.Completion.create(
            model=MODEL_NAME,
            prompt=prompt,
            max_tokens=max_answer_tokens
        )
        return response["choices"][0]["text"].strip()
    except Exception as e:
        print(e)
        return ""

## Custom Performance Demonstration

TODO: In the cells below, demonstrate the performance of your custom query using at least 2 questions. For each question, show the answer from a basic `Completion` model query as well as the answer from your custom query.

In [11]:
df = pd.read_json("embeddings.json")

### Question 1

In [12]:
question_1 = "Are there any places to get free food on Kaslo street in Vancouver"

In [13]:
print(answer_question(question_1, df))

Yes

Place
- Location: 1275 Kaslo St
- Phone number: (604) 701-1123
- Description: Nanaimo Community Food - Provides free food and household supplies for those in need on Nanaimo Street, open Monday-Saturday 9am-5pm. Contact (604) 701-1123 for more information.


This is the correct place to go get supplies.

In [14]:
print(answer_question(question_1, df, ask_with_context=False))

Place:
- Location: Kaslo Street Food Hub - 2855 Kaslo Street, Vancouver, BC V5M 3H6
- Phone number: (778) 997-9106
- Description: This food hub provides free fresh produce and grocery items to community members in need every Friday from 12-3pm. They also offer free hot meals and snacks throughout the week.


There is no Kaslo Street Food Hub on Kaslo. This is a hallucination.

### Question 2

In [15]:
question_2 = "Are there any places to get free food on Rupert street in Vancouver"

In [16]:
print(answer_question(question_2, df))

Place
- Location: 5381 Rupert Street
- Phone number: (604) 683-0073
- Description: Low cost essential food and household supplies available Mon-Sat 9am-5pm through Rupert Neighborhood House operated by The Universal Church.


Again, this is a correct place that you can go get low cost food.

In [17]:
print(answer_question(question_2, df, ask_with_context=False))

There are no specific places on Rupert Street in Vancouver that offer free food. However, there are some nearby options that may provide food assistance or offer free meals at certain times. These include:

Place 1
- Location: 288 East Hastings St, Vancouver, BC V6A 1P2
- Phone number: (604) 255-3097
- Description: The Salvation Army Harbour Light offers a daily hot meal program for those in need of food assistance. They also provide a food bank service for registered individuals.

Place 2
- Location: 1211 Thurlow St, Vancouver, BC V6E 1X5
- Phone number: (604) 605-2994
- Description: The First Baptist Church offers a free community meal every Thursday evening at 6:15pm. This is open to anyone in need of a meal.

Place 3
- Location: 468 Powell St, Vancouver, BC V6A 1G9
- Phone number: (604) 605-7138
- Description: The Ray-Cam Community Centre offers a food bank service on Tuesdays and Thursdays for residents in the downtown eastside. They also have a daily hot meals program that is fre

It looks like ChatGPT was not able to produce any places to go get free food on Rupert Street.