# Finding Optimal Locations for Food Truck Operators in San Francisco during Covid-19 Pandemic

In [1]:
import config as cfg
import pandas as pd
from sodapy import Socrata
import numpy as np

import requests
from bs4 import BeautifulSoup
import folium
from geopy.geocoders import Nominatim
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

import pyproj

pd.options.mode.chained_assignment = None

### Table of Contents

* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a class="anchor" id="introduction"></a>

The Covid-19 pandemic continues to cause widespread economic disruption leading to the permanent closure of thousands of businesses. Now an increasing number of people have difficulty procuring food supplies as many eateries are shutting down. Restaurants are trying to remain profitable despite losing a significant amount of business due to stay-in-shelter, no indoor seating, and social distancing amid health and safety concerns. However, some alternative eateries are continuing to operate and doing better than they imagined: food trucks. These services provide meals from motorized vehicles or carts. 
Food trucks also experience lower sales due to the absence of office workers and large decline of street traffic. However unlike restaurants that are fixed facilities, food trucks can quickly change location, menu and market. Operators have adapted by branching out into residential areas to capitalize on the large portion of people staying at home or nearby essential businesses such as hospitals.

Food truck sales fluctuate wildly depending on a number of factors, most of which depend on location. This report uses machine learning tools to assist **food trucks operators** looking for the best locations in San Francisco. Due to the absence of office workers, we will try to detect locations near **residential areas**. We are also interested in locations near the **workspaces of essential workers**. The report will use data science analysis to generate promising San Francisco neighborhoods based on these criteria. Advantages of each venue will be expressed so that the best location can be chosen by stakeholders.

## Data <a class="anchor" id="data"></a>

In San Francisco, food trucks must satisfy [DPW Order 182,101](https://www.sfpublicworks.org/sites/default/files/3858-DPW%20Order_182101-MFF.pdf) requirements to be a legal street-food vendor. Hence they can only operate in the approved zones shown in red on the Mobile Food Facility Permit map:

<img src="mff_rev_092014.jpg" width="600"/>

The report will look at areas that are approved for food trucks by using the latest [Mobile Food Facility Permits data](https://data.sfgov.org/Economy-and-Community/Mobile-Food-Facility-Permit/rqzj-sfat) provided by San Francisco Department of Public Works on DataSF.

Factors that will influence our recommendations:
- Whether location is in an approved zone for food trucks
- The type and location of venues in the neighborhood
- Whether the nearby venues are essential businesses

We will use the Mobile Food Facility Permits data to define our venues in the approved zones. The data we will need are:
- **facilitytype**: Type of facility permitted: truck or push cart
- **address**
- **location**: Latitude and Longitude
- **status**: Status of permit: Approved or Requested

This will joined with location data from the FourSquare API, which provides venue data for those neighborhoods.

We also need a list of essential businesses as defined by [sf.gov](https://sf.gov/check-if-business-essential).

### Neighborhood Candidates

Let's create the latitude and longitude coordinates for our candidate neighborhoods from the Mobile Food Facility Permits data.

We will filter the data to only show issued zones in **status** for food truck operators in **facilitytype** only.

In [2]:
client = Socrata("data.sfgov.org",
                cfg.datasf["App Token"],
                username=cfg.datasf["username"],
                password=cfg.datasf["password"])
results = client.get("rqzj-sfat", limit=2000)
results_df = pd.DataFrame.from_records(results)

#Clean data
mff_df = results_df[["facilitytype", "address", "location", 'status']] 
mff_df = mff_df.loc[mff_df['status'] == "ISSUED"]
mff_df = mff_df.loc[mff_df['facilitytype'] == "Truck"]

In [3]:
#Split location to latitude and longitude
latitudes = mff_df.loc[:,"location"].apply(lambda row: row.get('latitude'))
longitudes = mff_df.loc[:,"location"].apply(lambda row: row.get('longitude'))
mff_df['latitude'] = latitudes
mff_df['longitude'] = longitudes
mff_df.head()

Unnamed: 0,facilitytype,address,location,status,latitude,longitude
49,Truck,601 03RD ST,"{'latitude': '37.7800771744392', 'longitude': ...",ISSUED,37.7800771744392,-122.393767294483
58,Truck,400 CALIFORNIA ST,"{'latitude': '37.793304275561', 'longitude': '...",ISSUED,37.793304275561,-122.401458998413
76,Truck,727 SANSOME ST,"{'latitude': '37.7969490060212', 'longitude': ...",ISSUED,37.7969490060212,-122.402183431894


### Foursquare

Now that we have our location candidates, we will use the Foursquare API to get info of the venues in each neighborhood. Due to request limitations for the number of places per neighborhood, the limit parameter is set to 100 and the radius parameter is set to 500.

In [4]:
CLIENT_ID = cfg.foursquare["Client Id"]
CLIENT_SECRET = cfg.foursquare["Client Secret"]
VERSION = '20210118'
LIMIT = 100 
radius = 500

In [5]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [6]:
mff_venues = getNearbyVenues(
    names = mff_df['address'],
    latitudes = mff_df['latitude'],
    longitudes = mff_df['longitude'],
)

601 03RD ST
400 CALIFORNIA ST
727 SANSOME ST


In [7]:
mff_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,601 03RD ST,37.7800771744392,-122.393767294483,Arc Light Apartments,37.779884,-122.392677,Residential Building (Apartment / Condo)
1,601 03RD ST,37.7800771744392,-122.393767294483,Golden Goat Coffee,37.780415,-122.394394,Coffee Shop
2,601 03RD ST,37.7800771744392,-122.393767294483,South Park,37.78165,-122.393899,Park
3,601 03RD ST,37.7800771744392,-122.393767294483,Petit Marlowe,37.778101,-122.393727,Wine Bar
4,601 03RD ST,37.7800771744392,-122.393767294483,Cafe Okawari,37.778169,-122.393798,Café


## Methodology <a class="anchor" id="methodology"></a>

This project aims on recommending venues for food truck operators by detecting areas in San Francisco that are approved for mobile food facilities during the Covid-19 pandemic. 

First we have collected the required data: location of every approved food truck venue. We also identified the type and location of the neighborhoods according to Foursquare categorization. Then we will filter foursquare neighborhoods to show only essential businesses as defined by [sf.gov](https://sf.gov/check-if-business-essential).

Afterwards, we will focus on the most promising areas and create clusters (using k-means clustering) of locations that meet the requirements established in the discussions with the stakeholders. 

Finally, we will present a map of all clusters and approximate the addresses using Google Maps API reverse geocoding which allow stakeholders to search for optimal venue locations.

## Analysis <a class="anchor" id="analysis"></a>

In this section, we perform some exploratory data analysis to derive some additional info from our raw data. We start by using one hot encoding to convert categorial data into numerical data. Afterwards we group each location with their venue categories.

In [8]:
# one hot encoding
mff_onehot = pd.get_dummies(mff_venues[['Venue Category']], prefix="", prefix_sep="")
mff_onehot['Neighborhood'] = mff_venues['Neighborhood'] 
mff_onehot = mff_onehot.set_index('Neighborhood').reset_index()
mff_onehot.head()

mff_grouped = mff_onehot.groupby('Neighborhood').mean().reset_index()
mff_grouped.head()

Unnamed: 0,Neighborhood,Acai House,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Athletics & Sports,Bagel Shop,Bakery,Bank,...,Tea Room,Thai Restaurant,Theme Park Ride / Attraction,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Vineyard,Whisky Bar,Wine Bar,Winery,Yoga Studio
0,400 CALIFORNIA ST,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,...,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.01
1,601 03RD ST,0.0,0.020619,0.010309,0.010309,0.0,0.010309,0.010309,0.0,0.010309,...,0.010309,0.010309,0.010309,0.0,0.0,0.010309,0.0,0.020619,0.010309,0.0
2,727 SANSOME ST,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,...,0.03,0.0,0.0,0.01,0.03,0.0,0.0,0.03,0.0,0.0


In [9]:
#Filter categories to essential businesses listed in sf.gov and residential areas
mff_grouped = mff_grouped[["Neighborhood","Bank", "Convenience Store", "Optical Shop", "Newsstand", "Residential Building (Apartment / Condo)"]]
mff_grouped

Unnamed: 0,Neighborhood,Bank,Convenience Store,Optical Shop,Newsstand,Residential Building (Apartment / Condo)
0,400 CALIFORNIA ST,0.0,0.01,0.01,0.01,0.0
1,601 03RD ST,0.010309,0.0,0.0,0.0,0.020619
2,727 SANSOME ST,0.0,0.0,0.01,0.0,0.0


Now we identify the top venue for each neighborhood.

In [10]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [11]:
num_top_venues = 1
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = mff_grouped['Neighborhood']

for ind in np.arange(mff_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(mff_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue
0,400 CALIFORNIA ST,Convenience Store
1,601 03RD ST,Residential Building (Apartment / Condo)
2,727 SANSOME ST,Optical Shop


In this step, we use k-means clustering to partition our observations into meaningful clusters.

In [12]:
# set number of clusters
kclusters = 3
mff_grouped_clustering = mff_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(mff_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 1, 2], dtype=int32)

In [13]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

mff_merged = mff_df.rename(columns={"address": "Neighborhood"})
mff_merged = mff_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
mff_merged.head()

Unnamed: 0,facilitytype,Neighborhood,location,status,latitude,longitude,Cluster Labels,1st Most Common Venue
49,Truck,601 03RD ST,"{'latitude': '37.7800771744392', 'longitude': ...",ISSUED,37.7800771744392,-122.393767294483,1,Residential Building (Apartment / Condo)
58,Truck,400 CALIFORNIA ST,"{'latitude': '37.793304275561', 'longitude': '...",ISSUED,37.793304275561,-122.401458998413,0,Convenience Store
76,Truck,727 SANSOME ST,"{'latitude': '37.7969490060212', 'longitude': ...",ISSUED,37.7969490060212,-122.402183431894,2,Optical Shop


Now we find the latitude and longitude of San Francisco, California and create a Folium map to display our k-means clusters.

In [14]:
geolocator = Nominatim(user_agent="shpg")
geocode = geolocator.geocode('San Francisco, California')
latitude = geocode.latitude
longitude = geocode.longitude

In [17]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=14)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(mff_merged['latitude'], mff_merged['longitude'], mff_merged['Neighborhood'], mff_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=10,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [16]:
cluster_latitudes = mff_merged['latitude'].apply(float).to_numpy()
cluster_longitudes = mff_merged['longitude'].apply(float).to_numpy()

print('==============================================================')
print('Addresses of areas recommended for further analysis')
print('==============================================================\n')
for row in range(mff_merged.shape[0]):
    print(Nominatim(user_agent="shpg").reverse((cluster_latitudes[row], cluster_longitudes[row])).address)

Addresses of areas recommended for further analysis

601;605, 3rd Street, South Beach, San Francisco, San Francisco City and County, California, 94017, United States
400;410, California Street, Financial District, San Francisco, San Francisco City and County, California, 90104, United States
705;727;729, Sansome Street, Northeast Waterfront Historic District, San Francisco, San Francisco City and County, California, 94133, United States


## Results and Discussion <a class="anchor" id="results"></a>

Our analysis shows that there is a large area of approved zones in Mobile Facility Permits map. The highest concertation of approved zones is on the East side of San Francisco. After directing our attention to this narrower area of interest, we filtered our location candidates to only include approved zones for food trucks that are currently issued. These location candidates were then clustered to create zones of interest which contain the greatest number of location candidates.

The results show 3 zones for potential food truck locations. The addresses of those zones are generated using reverse geocoding. However, this does not imply that those addresses are actually optimal locations for a food truck. The recommended addresses should be considered only as a starting point for more detailed analysis which could eventually results in an optimal location when other factors are taken into account and all other relevant conditions are met.

## Conclusion <a class="anchor" id="conclusion"></a>

The purpose of this project was to identify San Francisco areas in order to aid stakeholders to narrow down the search for the optimal location for a food truck operation during the Covid-19 pandemic. We generate a collection of locations which satisfy some basic requirements regarding zones that are approved for mobile food facilities. We then performed clustering to create major zones of interest and the addresses of those zones were labelled to be used as starting point for final exploration by stakeholders.

The final decision for optimal food truck operation will be made by stakeholders based on the specific characteristics of the neighborhoods and locations in every recommended zone while taking into consideration of additional factors such as street traffic, enough space for social distancing etc.