# Capstone Project - Advises for Dinner Takeout and Grocery Shopping During the Pandemic Period

During the Covid 19 pandemic period, many restaurants are closed. Where to order dinner takeouts while maintaining low risk of exposure to coronavirus? Many markets are closed. Where to shop for grocery safely? In this capstone project, I will help you find safe places for dinenr take-out and grocery shopping.


Although this project targets to provide advices to people live in Bergen County, NJ, it could be easily adapted to any district/location in the US.

This project is divided into three parts. <br>
Part 1. Collect data, including caronavirus cases, geospatial data for 70 towns in Bergen County, restaurant and supermarket rating and open status. <br>
Part 2. Segment and cluster supermarkets based on # of coronavirus cases, rating, and distance to my home. <br>
Part 3. Segment and cluster restaurants based on # of coronavirus cases, rating, and distance to my home. 

Data used in this project:
- Coronavirus cases. This data is stored in a csv file and loaded to the data asset on IBM Watson.
- Geospatial data for 70 boroughs in Bergen county, NJ. This data is retrieved using geocode library.
- Venue information, including rating and open status for restaurants and supermarkets. This data is retrieved through FourSquare API.
- Distance between venue and my home. This is calculated using geod library.

Methodologies used in this project:
- K-Means clustering. Used to cluster restaurants and supermarkets based on # of coronavirus cases, venue rating, and distance to my home. 
- Folium map. Used to visually present the clustering and segmentation of restaurants and supermarkets.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

print('Libraries imported.')

Libraries imported.


In [2]:
# Getting coordinates for the towns in Bergen county, NJ
!pip install geopandas
!pip install geopy

Collecting geopandas
[?25l  Downloading https://files.pythonhosted.org/packages/83/c5/3cf9cdc39a6f2552922f79915f36b45a95b71fd343cfc51170a5b6ddb6e8/geopandas-0.7.0-py2.py3-none-any.whl (928kB)
[K     |████████████████████████████████| 931kB 9.7MB/s eta 0:00:01
[?25hCollecting shapely (from geopandas)
[?25l  Downloading https://files.pythonhosted.org/packages/20/fa/c96d3461fda99ed8e82ff0b219ac2c8384694b4e640a611a1a8390ecd415/Shapely-1.7.0-cp36-cp36m-manylinux1_x86_64.whl (1.8MB)
[K     |████████████████████████████████| 1.8MB 34.0MB/s eta 0:00:01
Collecting fiona (from geopandas)
[?25l  Downloading https://files.pythonhosted.org/packages/ec/20/4e63bc5c6e62df889297b382c3ccd4a7a488b00946aaaf81a118158c6f09/Fiona-1.8.13.post1-cp36-cp36m-manylinux1_x86_64.whl (14.7MB)
[K     |████████████████████████████████| 14.7MB 7.5MB/s eta 0:00:01
[?25hCollecting pyproj>=2.2.0 (from geopandas)
[?25l  Downloading https://files.pythonhosted.org/packages/e5/c3/071e080230ac4b6c64f1a2e2f9161c9737a2bc

In [3]:
# Get geospacial data for my home. This will be used to calculate distance between venue and my home.
from geopy.geocoders import Nominatim
locator = Nominatim(user_agent="myGeocoder")
location = locator.geocode("Upper Saddle River, NJ, USA")
home_latitude = location.latitude
home_longitude = location.longitude
print("Home Latitude = {}, Longitude = {}".format(home_latitude, home_longitude))

Home Latitude = 41.0584299, Longitude = -74.0984756


In [4]:
# Calculate distance between the center of Bergen county, NJ to my home

import pyproj

geod = pyproj.Geod(ellps='WGS84')
location = locator.geocode("Hackensack, NJ, USA")
azimuth1, azimuth2, distance = geod.inv(home_latitude, location.latitude, home_longitude, location.longitude)
distance = int(distance // 1000)
print(distance)

15087


In [6]:
# import file 'Bergen_County_Covid19_Cases.csv' from local dataset. It has # of caronavirus cases for every borough in Bergen county, NJ.

import types
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.
client_48cbe6a955574f42bdf293db9085cbef = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='JwlV4cFFm0S7Cp3G3xvu-qo_QG12PjS4pKLjPZWZhfLr',
    ibm_auth_endpoint="https://iam.ng.bluemix.net/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3-api.us-geo.objectstorage.service.networklayer.com')

body = client_48cbe6a955574f42bdf293db9085cbef.get_object(Bucket='courseracapstone-donotdelete-pr-bqfxymcsrfz1o5',Key='Bergen_County_Covis19_Cases.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

# If you are reading an Excel file into a pandas DataFrame, replace `read_csv` by `read_excel` in the next statement.
df_borough = pd.read_csv(body)
df_borough.head()


Unnamed: 0,Borough,Coronavirus_Positive_Cases
0,Allendale,54
1,Alpine,21
2,Bergenfield,726
3,Bogota,162
4,Carlstadt,89


In [7]:
# Getting latitude and longitude for every borough in Bergen County, NJ

latitude = [] 
longitude = []

for value in df_borough["Borough"]: 
    locator = Nominatim(user_agent="myGeocoder")
    location = locator.geocode(value + ", NJ, USA")
    latitude.append(location.latitude)
    longitude.append(location.longitude)
    print("{}: {},{}".format(value, location.latitude, location.longitude))
       
df_borough["Latitude"] = latitude   
df_borough["Longitude"] = longitude   
df_borough.head()


Allendale: 41.0414858,-74.129032
Alpine: 40.94855,-73.9281139
Bergenfield: 40.9275987,-73.9973608
Bogota: 40.8762106,-74.0298621
Carlstadt: 40.8403778,-74.0906974
Cliffside Park: 40.8214894,-73.9876388
Closter: 40.9731536,-73.9615262
Cresskill: 40.9414874,-73.9593041
Demarest: 40.9573205,-73.9634708
Dumont: 40.9406541,-73.9968051
East Rutherford: 40.833989,-74.0970865
Edgewater: 40.8270448,-73.975694
Elmwood Park: 40.9039878,-74.118476
Emerson: 40.976209,-74.0262505
Englewood: 40.8928771,-73.9726381
Englewood Cliffs: 40.8853773,-73.9523597
Fair Lawn: 40.9403762,-74.1318096
Fairview: 40.8126006,-73.999028
Fort Lee: 40.8509333,-73.9701382
Franklin Lakes: 41.0167639,-74.2057012
Garfield: 40.8814882,-74.113198
Glen Rock: 40.9628758,-74.1329208
Hackensack: 40.8871438,-74.0410865
Harrington Park: 40.9837089,-73.9798601
Hasbrouck Heights: 40.8579293,-74.0816693
Haworth: 40.9609315,-73.9901383
Hillsdale: 41.0025975,-74.0404175
Ho-Ho-Kus: 40.9976524,-74.110284
Leonia: 40.8617353,-73.9880495
Lit

Unnamed: 0,Borough,Coronavirus_Positive_Cases,Latitude,Longitude
0,Allendale,54,41.041486,-74.129032
1,Alpine,21,40.94855,-73.928114
2,Bergenfield,726,40.927599,-73.997361
3,Bogota,162,40.876211,-74.029862
4,Carlstadt,89,40.840378,-74.090697


In [7]:
print('Total {} boroughs in Bergen County, NJ'.format(df_borough.shape[0]))

Total 70 boroughs


In [8]:
# Hackensack is the center of Bergen County, NJ. Get latitude and longitude for the center of Bergen County.
locator = Nominatim(user_agent="myGeocoder")
location = locator.geocode("Hackensack, NJ, USA")
county_latitude = location.latitude
county_longitude = location.longitude

In [9]:
!pip install folium==0.5.0
import folium 

Collecting folium==0.5.0
[?25l  Downloading https://files.pythonhosted.org/packages/07/37/456fb3699ed23caa0011f8b90d9cad94445eddc656b601e6268090de35f5/folium-0.5.0.tar.gz (79kB)
[K     |████████████████████████████████| 81kB 7.4MB/s eta 0:00:011
[?25hCollecting branca (from folium==0.5.0)
  Downloading https://files.pythonhosted.org/packages/13/fb/9eacc24ba3216510c6b59a4ea1cd53d87f25ba76237d7f4393abeaf4c94e/branca-0.4.1-py3-none-any.whl
Building wheels for collected packages: folium
  Building wheel for folium (setup.py) ... [?25ldone
[?25h  Stored in directory: /home/dsxuser/.cache/pip/wheels/f8/98/ff/954791afc47740d554f0d9e5885fa09dd60c2265d42578e665
Successfully built folium
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.5.0


In [10]:
# create map of Bergen County, NJ using latitude and longitude values
map_bergen_county = folium.Map(location=[county_latitude, county_longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, case in zip(df_borough['Latitude'], df_borough['Longitude'], df_borough['Borough'], df_borough['Coronavirus_Positive_Cases']):
    label = '{}: [{} cases]'.format(borough, case)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_bergen_county)  
    
map_bergen_county

In [11]:
CLIENT_ID = 'DTI3F3LYBQK3MGGWQFEWIZGSQWKUC0ON4ZOOZXQPFWPOJPXC' # your Foursquare ID
CLIENT_SECRET = 'CDLNKMXNCTLPUKSXHCBWUE3CRLQ1RAYQFPCY0W53YPJI0DUI' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
ACCESS_TOKEN = 'X10MOF5IU5WTDA3QC0ORNA4GULI2N3EF4E1UYRU0DS0SNOKZ'

print('My Foursquare credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)
print('ACCESS_TOKEN:' + ACCESS_TOKEN)

My Foursquare credentails:
CLIENT_ID: DTI3F3LYBQK3MGGWQFEWIZGSQWKUC0ON4ZOOZXQPFWPOJPXC
CLIENT_SECRET:CDLNKMXNCTLPUKSXHCBWUE3CRLQ1RAYQFPCY0W53YPJI0DUI
ACCESS_TOKEN:X10MOF5IU5WTDA3QC0ORNA4GULI2N3EF4E1UYRU0DS0SNOKZ


In [12]:
# Test FourSquare API with neighborhood data
neighborhood_latitude = 40.888356
neighborhood_longitude = -74.040095
neighborhood_name = 'Hackensack'


In [15]:
# Test FourSquare API with neighborhood data. county_latitude, county_longitude is the geospacial data for Hackensack, center of Bergen county.
radius = 500
LIMIT = 100
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, county_latitude, county_longitude, VERSION, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?client_id=DTI3F3LYBQK3MGGWQFEWIZGSQWKUC0ON4ZOOZXQPFWPOJPXC&client_secret=CDLNKMXNCTLPUKSXHCBWUE3CRLQ1RAYQFPCY0W53YPJI0DUI&ll=40.8871438,-74.0410865&v=20180605&radius=500&limit=100'

In [16]:
results = requests.get(url).json()

In [17]:
# function that extracts the category of the venue

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [19]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.id', 'venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,id,name,categories,lat,lng
0,4b968656f964a52022d234e3,Bangkok Garden,Thai Restaurant,40.886494,-74.041593
1,4e21a424e4cdf68591870f3e,Colombia Bakery,Bakery,40.889106,-74.040124
2,4c53850bb3b09c74a1b613b4,Bohemia Restaurant,Latin American Restaurant,40.887356,-74.041186
3,4bc0a0ca4cdfc9b6a61e9321,Birkenmeier Sport Shop,Sporting Goods Shop,40.885433,-74.042643
4,4a9616eaf964a520fc2520e3,White Manna Hamburgers,Burger Joint,40.889151,-74.038539


In [21]:
# function that extract nearby venues based on latitude and longitude

def getNearbyVenues(names, cases, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, case, lat, lng in zip(names, cases, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            case,
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['id'],
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Coronavirus Positive Cases',
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue id',
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [22]:
# extract venues for every borough in Bergen county, NJ
bergen_venues = getNearbyVenues(names=df_borough['Borough'],
                                cases=df_borough['Coronavirus_Positive_Cases'],
                                latitudes=df_borough['Latitude'],
                                longitudes=df_borough['Longitude']
                                )
print("Neighborhood venues extracted.")

Neighborhood venues extracted.


In [23]:
print(bergen_venues.shape)
bergen_venues.head()

(1395, 9)


Unnamed: 0,Neighborhood,Coronavirus Positive Cases,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue id,Venue Latitude,Venue Longitude,Venue Category
0,Bergenfield,726,40.927599,-73.997361,Chapala Grill,4baff2e6f964a520b02d3ce3,40.926046,-73.996719,Mexican Restaurant
1,Bergenfield,726,40.927599,-73.997361,Christine's Bake Shop,4b49f4d3f964a520b07626e3,40.928778,-73.99797,Bakery
2,Bergenfield,726,40.927599,-73.997361,Tommy Fox's,4b3617f3f964a520163025e3,40.926707,-73.996453,Pub
3,Bergenfield,726,40.927599,-73.997361,U.M.A. United Martial Arts Taekwondo,535acd65498e70b6d315b6fc,40.926948,-73.996311,Martial Arts Dojo
4,Bergenfield,726,40.927599,-73.997361,Empanada Mania,4ed92ffb6c2547bc612ccc96,40.92571,-73.996562,Empanada Restaurant


In [24]:
print('There are {} uniques categories.'.format(len(bergen_venues['Venue Category'].unique())))

There are 212 uniques categories.


In [26]:
# Filter to get supermarkets
column_filters = {'Venue Category': ['Supermarket', 'Grocery', 'Farmers Market']}
for k in column_filters.keys():
    column_filters[k] = '|'.join(column_filters[k])
column_filters

bergen_supermarket = bergen_venues[bergen_venues['Venue Category'].str.contains(column_filters['Venue Category'], case=False)].reset_index(drop=True)

bergen_supermarket.shape

(33, 9)

In [25]:
# Filter to get restaurants
column_filters = {'Venue Category': ['Restaurant', 'Pizza']}
for k in column_filters.keys():
    column_filters[k] = '|'.join(column_filters[k])
column_filters

bergen_restaurant = bergen_venues[bergen_venues['Venue Category'].str.contains(column_filters['Venue Category'], case=False)].reset_index(drop=True)

bergen_restaurant.shape

(480, 9)

In [27]:
# search detail for one venue based on venue id
# https://api.foursquare.com/v2/venues/venue_id?client_id=client_id&client_secret=client_secret&v=version
venue_id = '4b5762e8f964a520243528e3'
url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}&oauth_token={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION, ACCESS_TOKEN)
url

'https://api.foursquare.com/v2/venues/4b5762e8f964a520243528e3?client_id=DTI3F3LYBQK3MGGWQFEWIZGSQWKUC0ON4ZOOZXQPFWPOJPXC&client_secret=CDLNKMXNCTLPUKSXHCBWUE3CRLQ1RAYQFPCY0W53YPJI0DUI&v=20180605&oauth_token=X10MOF5IU5WTDA3QC0ORNA4GULI2N3EF4E1UYRU0DS0SNOKZ'

In [28]:
results = requests.get(url).json()

In [31]:
# Function that use FourSquare API to get venue rating and open status, and use geod to calculate distance to my home

def getRatingDistanceOpenStatus(boroughs, cases, venues, venue_ids, categories, venue_latitudes, venue_longitudes):
    
    venues_list=[]
    for borough, case, venue, venue_id, category, venue_latitude, venue_longitude in zip(boroughs, cases, venues, venue_ids, categories, venue_latitudes, venue_longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}&oauth_token={}'.format(
            venue_id, 
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            ACCESS_TOKEN)
            
        # make the GET request
        results = requests.get(url).json()
        
        # get rating and open status
        try:
            rating = results['response']['venue']['rating']
        except:
            rating = 'N/A'
        
        try:
            openStatus = results['response']['venue']['hours']['status']
        except:
            openStatus = 'N/A'
        
        # Calculate distance to my home
        azimuth1, azimuth2, distance = geod.inv(home_latitude, venue_latitude, home_longitude, venue_longitude)
        distance = int(distance // 1000)
        #print('{} {} {} {}'.format(venue, rating, distance, openStatus))
        
        # filter our venues without rating or venues not open
        if rating != 'N/A' and openStatus != 'N/A':
            venues_list.append([(
            venue, 
            category,
            borough, 
            case,
            rating,
            distance,
            venue_latitude,
            venue_longitude,
            openStatus)])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = [
                    'Venue', 
                    'Venue Category',
                    'Borough', 
                    'Coronavirus Positive Cases',
                    'Rating',
                    'Distance',
                    'Venue Latitude',
                    'Venue Longitude',
                    'Open Status']
    
    return(nearby_venues)

In [30]:
# Get rating, open status, and distance for supermarkets
bergen_open_supermarket = getRatingDistanceOpenStatus(boroughs=bergen_supermarket['Neighborhood'],
                                cases=bergen_supermarket['Coronavirus Positive Cases'],
                                venues=bergen_supermarket['Venue'],
                                venue_ids=bergen_supermarket['Venue id'],
                                venue_latitudes=bergen_supermarket['Venue Latitude'],
                                venue_longitudes=bergen_supermarket['Venue Longitude'],                 
                                categories=bergen_supermarket['Venue Category']
                                )

bergen_open_supermarket.head()

Food Bazaar Supermarket 8.0 15081 Open until 10:30 PM
Nizam Turkish Supermarket 7.9 15082 N/A
Stop & Shop 5.9 15097 Closed until 7:30 AM tomorrow
Kings Food Markets 7.5 15094 N/A
Super Stop & Shop 6.8 15093 Closed until 7:30 AM tomorrow
Aldi's 7.2 15080 Closed until 9:00 AM tomorrow
Whole Foods Market 8.5 15082 Closed until 9:00 AM tomorrow
ShopRite of Englewood 7.2 15090 Open
Englewood Farmer's Market N/A 15090 N/A
Price Rite Of Garfield 6.9 15085 Closed until 8:00 AM tomorrow
Kilroy's Wonder Market 7.6 15093 Closed until 8:00 AM tomorrow
Giant Farmers Market 7.3 15087 N/A
Kings Food Markets 7.6 15099 Closed until 7:00 AM tomorrow
H-Mart 6.5 15086 N/A
ShopRite of Lodi 7.3 15085 Open until 11:59 PM
ShopRite 6.9 15078 Open until Midnight
Quick Mart N/A 15082 N/A
Krauszers 7.7 15097 N/A
Trader Joe's 9.2 15093 Closed until 9:00 AM tomorrow
Ramsey Farmer's Market 8.8 15103 Closed until 9:00 AM Sunday
H&Y Marketplace 6.4 15082 N/A
Village IGA 5.5 15084 N/A
Whole Foods Market 7.9 15095 Close

Unnamed: 0,Venue,Venue Category,Borough,Coronavirus Positive Cases,Rating,Distance,Venue Latitude,Venue Longitude,Open Status
0,Food Bazaar Supermarket,Supermarket,Cliffside Park,415,8.0,15081,40.821489,-73.99155,Open until 10:30 PM
1,Stop & Shop,Supermarket,Closter,72,5.9,15097,40.969031,-73.962986,Closed until 7:30 AM tomorrow
2,Super Stop & Shop,Supermarket,Dumont,278,6.8,15093,40.937412,-73.996043,Closed until 7:30 AM tomorrow
3,Aldi's,Supermarket,East Rutherford,123,7.2,15080,40.831393,-74.092718,Closed until 9:00 AM tomorrow
4,Whole Foods Market,Grocery Store,Edgewater,119,8.5,15082,40.824706,-73.973828,Closed until 9:00 AM tomorrow


In [32]:
print('Total supermarkets: {}'.format(bergen_supermarket.shape[0]))
print('Total open supermarkets: {}'.format(bergen_open_supermarket.shape[0]))

Total supermarkets: 33
Total open supermarkets: 19


In [33]:
# sort by low caronavirus cases, high venue rating, short distance to my home
bergen_open_supermarket.sort_values(['Coronavirus Positive Cases', 'Rating', 'Distance'], ascending=[True, False, True])

Unnamed: 0,Venue,Venue Category,Borough,Coronavirus Positive Cases,Rating,Distance,Venue Latitude,Venue Longitude,Open Status
1,Stop & Shop,Supermarket,Closter,72,5.9,15097,40.969031,-73.962986,Closed until 7:30 AM tomorrow
17,Super Foodtown,Supermarket,Washington Twp.,89,7.3,15097,40.986631,-74.064956,Closed until 7:00 AM tomorrow
7,Kilroy's Wonder Market,Grocery Store,Glen Rock,99,7.6,15093,40.962113,-74.127409,Closed until 8:00 AM tomorrow
8,Kings Food Markets,Grocery Store,Hillsdale,101,7.6,15099,41.001234,-74.041039,Closed until 7:00 AM tomorrow
12,Ramsey Farmer's Market,Farmers Market,Ramsey,109,8.8,15103,41.056289,-74.142212,Closed until 9:00 AM Sunday
4,Whole Foods Market,Grocery Store,Edgewater,119,8.5,15082,40.824706,-73.973828,Closed until 9:00 AM tomorrow
3,Aldi's,Supermarket,East Rutherford,123,7.2,15080,40.831393,-74.092718,Closed until 9:00 AM tomorrow
16,Stop & Shop,Supermarket,Tenafly,141,5.7,15093,40.926974,-73.965214,Closed until 7:30 AM tomorrow
18,Trader Joe's,Grocery Store,Westwood,177,8.3,15098,40.99295,-74.035835,Closed until 10:00 AM tomorrow
13,Whole Foods Market,Grocery Store,Ridgewood,262,7.9,15095,40.978418,-74.122593,Closed until 9:00 AM tomorrow


In [34]:
# clustering venues using K-Means

# set number of clusters
kclusters = 5

# set criteria for K-Means
bergen_supermarket_clustering = bergen_open_supermarket[['Coronavirus Positive Cases', 'Rating', 'Distance']]

# normalize 
normalized_df=(bergen_supermarket_clustering-bergen_supermarket_clustering.mean())/bergen_supermarket_clustering.std()

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(normalized_df)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 3, 3, 1, 1, 4, 4, 0, 0, 4, 1, 2, 0, 0, 3, 1, 3, 0, 0],
      dtype=int32)

In [35]:
# add clustering labels
bergen_open_supermarket.insert(2, 'Cluster Labels', kmeans.labels_)

bergen_open_supermarket.head()

Unnamed: 0,Venue,Venue Category,Cluster Labels,Borough,Coronavirus Positive Cases,Rating,Distance,Venue Latitude,Venue Longitude,Open Status
0,Food Bazaar Supermarket,Supermarket,1,Cliffside Park,415,8.0,15081,40.821489,-73.99155,Open until 10:30 PM
1,Stop & Shop,Supermarket,3,Closter,72,5.9,15097,40.969031,-73.962986,Closed until 7:30 AM tomorrow
2,Super Stop & Shop,Supermarket,3,Dumont,278,6.8,15093,40.937412,-73.996043,Closed until 7:30 AM tomorrow
3,Aldi's,Supermarket,1,East Rutherford,123,7.2,15080,40.831393,-74.092718,Closed until 9:00 AM tomorrow
4,Whole Foods Market,Grocery Store,1,Edgewater,119,8.5,15082,40.824706,-73.973828,Closed until 9:00 AM tomorrow


In [36]:
# create map for supermarket clustering
map_supermarket_clusters = folium.Map(location=[county_latitude, county_longitude], zoom_start=11)

# add my home
folium.CircleMarker(
        [home_latitude, home_longitude],
        radius=9,
        popup='Home',
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_supermarket_clusters)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, venue, cluster, cases, rating in zip(bergen_open_supermarket['Venue Latitude'], bergen_open_supermarket['Venue Longitude'], bergen_open_supermarket['Venue'], bergen_open_supermarket['Cluster Labels'], bergen_open_supermarket['Coronavirus Positive Cases'], bergen_open_supermarket['Rating']):
    label = folium.Popup(str(venue) + ' [cluster: ' + str(cluster) + ', Coronavirus cases: ' + str(cases) + ', Rating: ' + str(rating) + ']', parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=6,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_supermarket_clusters)
       
map_supermarket_clusters

In [37]:
# supermarket - cluster 0 - low coronuavirus cases, high rating, close to my home
bergen_open_supermarket.loc[bergen_open_supermarket['Cluster Labels'] == 0, bergen_open_supermarket.columns[[0]+[1]+[3]+[4]+[5]+[6]]]

Unnamed: 0,Venue,Venue Category,Borough,Coronavirus Positive Cases,Rating,Distance
7,Kilroy's Wonder Market,Grocery Store,Glen Rock,99,7.6,15093
8,Kings Food Markets,Grocery Store,Hillsdale,101,7.6,15099
12,Ramsey Farmer's Market,Farmers Market,Ramsey,109,8.8,15103
13,Whole Foods Market,Grocery Store,Ridgewood,262,7.9,15095
17,Super Foodtown,Supermarket,Washington Twp.,89,7.3,15097
18,Trader Joe's,Grocery Store,Westwood,177,8.3,15098


In [38]:
# supermarket - cluster 1 - high coronavirus cases, low rating, far from my home
bergen_open_supermarket.loc[bergen_open_supermarket['Cluster Labels'] == 1, bergen_open_supermarket.columns[[0]+[1]+[3]+[4]+[5]+[6]]]

Unnamed: 0,Venue,Venue Category,Borough,Coronavirus Positive Cases,Rating,Distance
0,Food Bazaar Supermarket,Supermarket,Cliffside Park,415,8.0,15081
3,Aldi's,Supermarket,East Rutherford,123,7.2,15080
4,Whole Foods Market,Grocery Store,Edgewater,119,8.5,15082
10,ShopRite,Supermarket,Lyndhurst,331,6.9,15078
15,Acme,Supermarket,Saddle Brook,267,7.1,15087


In [39]:
# supermarket - cluster 2 - high coronavirus cases, high rating, and close to my home
bergen_open_supermarket.loc[bergen_open_supermarket['Cluster Labels'] == 2, bergen_open_supermarket.columns[[0]+[1]+[3]+[4]+[5]+[6]]]

Unnamed: 0,Venue,Venue Category,Borough,Coronavirus Positive Cases,Rating,Distance
11,Trader Joe's,Grocery Store,Paramus,831,9.2,15093


In [40]:
# supermarket - cluster 3 - low coronavirus cases, low rating, close to my home
bergen_open_supermarket.loc[bergen_open_supermarket['Cluster Labels'] == 3, bergen_open_supermarket.columns[[0]+[1]+[3]+[4]+[5]+[6]]]

Unnamed: 0,Venue,Venue Category,Borough,Coronavirus Positive Cases,Rating,Distance
1,Stop & Shop,Supermarket,Closter,72,5.9,15097
2,Super Stop & Shop,Supermarket,Dumont,278,6.8,15093
14,Super Stop & Shop,Supermarket,Ridgewood,262,5.8,15095
16,Stop & Shop,Supermarket,Tenafly,141,5.7,15093


In [41]:
# supermarket - cluster 4 - medium coronavirus cases and high rating, far from my home
bergen_open_supermarket.loc[bergen_open_supermarket['Cluster Labels'] == 4, bergen_open_supermarket.columns[[0]+[1]+[3]+[4]+[5]+[6]]]

Unnamed: 0,Venue,Venue Category,Borough,Coronavirus Positive Cases,Rating,Distance
5,ShopRite of Englewood,Grocery Store,Englewood,690,7.2,15090
6,Price Rite Of Garfield,Grocery Store,Garfield,669,6.9,15085
9,ShopRite of Lodi,Grocery Store,Lodi,574,7.3,15085


In [43]:
# Filter to get restaurants
column_filters = {'Venue Category': ['Restaurant', 'Pizza']}
for k in column_filters.keys():
    column_filters[k] = '|'.join(column_filters[k])
column_filters

bergen_restaurant = bergen_venues[bergen_venues['Venue Category'].str.contains(column_filters['Venue Category'], case=False)].reset_index(drop=True)

bergen_restaurant.head(15)

Unnamed: 0,Neighborhood,Coronavirus Positive Cases,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue id,Venue Latitude,Venue Longitude,Venue Category
0,Bergenfield,726,40.927599,-73.997361,Chapala Grill,4baff2e6f964a520b02d3ce3,40.926046,-73.996719,Mexican Restaurant
1,Bergenfield,726,40.927599,-73.997361,Empanada Mania,4ed92ffb6c2547bc612ccc96,40.92571,-73.996562,Empanada Restaurant
2,Bergenfield,726,40.927599,-73.997361,Nick's Pizza,4b8feb35f964a520886933e3,40.9277,-73.99818,Pizza Place
3,Bergenfield,726,40.927599,-73.997361,Cusinera,4c26424a136d20a1a1fee461,40.92561,-73.996678,Asian Restaurant
4,Bergenfield,726,40.927599,-73.997361,Bamboo Grill,4b7c8698f964a520ce982fe3,40.925985,-73.996648,Asian Restaurant
5,Bergenfield,726,40.927599,-73.997361,Papa John's Pizza,4c166694a9c220a15a945a9d,40.926377,-73.997253,Pizza Place
6,Bergenfield,726,40.927599,-73.997361,Asian-American Food Mart,4c0e78ff512f76b00fdd7a11,40.928696,-73.997985,Asian Restaurant
7,Bergenfield,726,40.927599,-73.997361,Nihon Kai Japanese Cuisine,4bd3725cb221c9b672ced9d0,40.926393,-73.996404,Japanese Restaurant
8,Bergenfield,726,40.927599,-73.997361,Restaurante La Candelaria,5151e932e4b0f967e95b95fb,40.925486,-73.996489,Mexican Restaurant
9,Bogota,162,40.876211,-74.029862,Mona Lisa's Cafe,4bff0767c30a2d7f7c68101d,40.878092,-74.029431,Pizza Place


In [44]:
# Restaurant clustering
bergen_open_restaurant = getRatingDistanceOpenStatus(boroughs=bergen_restaurant['Neighborhood'],
                                cases=bergen_restaurant['Coronavirus Positive Cases'],
                                venues=bergen_restaurant['Venue'],
                                venue_ids=bergen_restaurant['Venue id'],
                                venue_latitudes=bergen_restaurant['Venue Latitude'],
                                venue_longitudes=bergen_restaurant['Venue Longitude'],                        
                                categories=bergen_restaurant['Venue Category']
                                )

bergen_open_restaurant.head()

Unnamed: 0,Venue,Venue Category,Borough,Coronavirus Positive Cases,Rating,Distance,Venue Latitude,Venue Longitude,Open Status
0,Chapala Grill,Mexican Restaurant,Bergenfield,726,8.4,15092,40.926046,-73.996719,Closed until 11:30 AM tomorrow
1,Empanada Mania,Empanada Restaurant,Bergenfield,726,7.7,15092,40.92571,-73.996562,Closed until Noon tomorrow
2,Nick's Pizza,Pizza Place,Bergenfield,726,7.5,15092,40.9277,-73.99818,Closed until 11:00 AM tomorrow
3,Cusinera,Asian Restaurant,Bergenfield,726,7.4,15092,40.92561,-73.996678,Closed until 8:00 AM tomorrow
4,Bamboo Grill,Asian Restaurant,Bergenfield,726,6.7,15092,40.925985,-73.996648,Closed until 11:00 AM tomorrow


In [47]:
# Sort restaurants by low coronavirus cases, high rating, and short distance to my home
bergen_open_restaurant.sort_values(['Coronavirus Positive Cases', 'Rating', 'Distance'], ascending=[True, False, True]).head(50)

Unnamed: 0,Venue,Venue Category,Borough,Coronavirus Positive Cases,Rating,Distance,Venue Latitude,Venue Longitude,Open Status
188,Chick-fil-A,Fast Food Restaurant,Teterboro,5,8.0,15084,40.859524,-74.061317,Closed until 11:00 AM tomorrow
190,The Halal Guys,Halal Restaurant,Teterboro,5,7.7,15084,40.860972,-74.061633,Open until 11:00 PM
189,Chipotle Mexican Grill,Mexican Restaurant,Teterboro,5,7.6,15084,40.860623,-74.061912,Closed until 10:45 AM tomorrow
191,Blaze Pizza,Pizza Place,Teterboro,5,7.3,15084,40.860946,-74.061576,Closed until 11:00 AM tomorrow
192,Sarku Japan,Sushi Restaurant,Teterboro,5,6.9,15084,40.860908,-74.061644,Closed until 11:00 AM tomorrow
101,Ho-Ho-Kus Inn & Tavern,American Restaurant,Ho-Ho-Kus,42,8.7,15097,40.997717,-74.109788,Closed until 11:30 AM tomorrow
102,St. Eve's,American Restaurant,Ho-Ho-Kus,42,8.4,15097,40.996394,-74.110246,Closed until Noon tomorrow
103,Cafe Amici,American Restaurant,Ho-Ho-Kus,42,8.3,15097,40.997266,-74.109164,Closed until 11:00 AM tomorrow
104,Sicilian Sun,Italian Restaurant,Ho-Ho-Kus,42,6.7,15097,40.996267,-74.110514,Closed until 4:30 PM tomorrow
23,Yasou Mykonos,Greek Restaurant,Demarest,45,8.4,15096,40.957303,-73.963558,Closed until 11:00 AM tomorrow


In [46]:
print('Total restaurants: {}'.format(bergen_restaurant.shape[0]))
print('Total open restaurants: {}'.format(bergen_open_restaurant.shape[0]))

Total restaurants: 480
Total open restaurants: 205


In [48]:
# set number of clusters
kclusters = 5

# set criteria for clustering
bergen_restaurant_clustering = bergen_open_restaurant[['Coronavirus Positive Cases', 'Rating', 'Distance']]

# normalize 
normalized_df=(bergen_restaurant_clustering-bergen_restaurant_clustering.mean())/bergen_restaurant_clustering.std()

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(normalized_df)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 3, 3, 3, 3, 3, 4, 0, 4, 4], dtype=int32)

In [49]:
# add clustering labels
bergen_open_restaurant.insert(2, 'Cluster Labels', kmeans.labels_)

bergen_open_restaurant.head()


Unnamed: 0,Venue,Venue Category,Cluster Labels,Borough,Coronavirus Positive Cases,Rating,Distance,Venue Latitude,Venue Longitude,Open Status
0,Chapala Grill,Mexican Restaurant,3,Bergenfield,726,8.4,15092,40.926046,-73.996719,Closed until 11:30 AM tomorrow
1,Empanada Mania,Empanada Restaurant,3,Bergenfield,726,7.7,15092,40.92571,-73.996562,Closed until Noon tomorrow
2,Nick's Pizza,Pizza Place,3,Bergenfield,726,7.5,15092,40.9277,-73.99818,Closed until 11:00 AM tomorrow
3,Cusinera,Asian Restaurant,3,Bergenfield,726,7.4,15092,40.92561,-73.996678,Closed until 8:00 AM tomorrow
4,Bamboo Grill,Asian Restaurant,3,Bergenfield,726,6.7,15092,40.925985,-73.996648,Closed until 11:00 AM tomorrow


In [50]:
# create map
map_restaurant_clusters = folium.Map(location=[county_latitude, county_longitude], zoom_start=12)

# add my home
folium.CircleMarker(
        [home_latitude, home_longitude],
        radius=9,
        popup='Home',
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_restaurant_clusters)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, venue, cluster, cases, rating in zip(bergen_open_restaurant['Venue Latitude'], bergen_open_restaurant['Venue Longitude'], bergen_open_restaurant['Venue'], bergen_open_restaurant['Cluster Labels'], bergen_open_restaurant['Coronavirus Positive Cases'], bergen_open_restaurant['Rating']):
    label = folium.Popup(str(venue) + ' [cluster: ' + str(cluster) + ', Coronavirus cases: ' + str(cases) + ', Rating: ' + str(rating) + ']', parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_restaurant_clusters)
       
map_restaurant_clusters

In [54]:
# restaurant - cluster 0 - low coronuavirus cases, low rating, far from my home
bergen_open_restaurant.loc[bergen_open_restaurant['Cluster Labels'] == 0, bergen_open_restaurant.columns[[0]+[1]+[3]+[4]+[5]+[6]]]

Unnamed: 0,Venue,Venue Category,Borough,Coronavirus Positive Cases,Rating,Distance
7,The Balcony Restaurant and Lounge,Restaurant,Carlstadt,89,7.3,15081
16,Szechaun Mirchi,Asian Restaurant,Cliffside Park,415,7.0,15082
30,Papa John's Pizza,Pizza Place,East Rutherford,123,6.2,15080
36,Roberto's II Ristorante,Italian Restaurant,Edgewater,119,7.2,15082
38,Villa Amalfi Ristorante,Italian Restaurant,Edgewater,119,6.6,15082
39,That Fondue Place,Fondue Restaurant,Edgewater,119,6.0,15082
40,Havana's Cuban Cafe & Restaurant,Cuban Restaurant,Edgewater,119,6.4,15083
41,Burger King,Fast Food Restaurant,Edgewater,119,5.7,15083
69,Bistro Du Saigon,Vietnamese Restaurant,Fort Lee,386,7.3,15085
70,Pizza Nova,Pizza Place,Fort Lee,386,7.3,15085


In [55]:
# restaurant - cluster 1 - low coronuavirus cases, close to my home
bergen_open_restaurant.loc[bergen_open_restaurant['Cluster Labels'] == 1, bergen_open_restaurant.columns[[0]+[1]+[3]+[4]+[5]+[6]]]

Unnamed: 0,Venue,Venue Category,Borough,Coronavirus Positive Cases,Rating,Distance
19,Carmine & Ray's Pizza,Pizza Place,Closter,72,6.3,15098
27,Il Mulino Restaurant,Italian Restaurant,Dumont,278,7.3,15094
28,Domino's Pizza,Pizza Place,Dumont,278,6.8,15093
44,Boston Market,American Restaurant,Emerson,164,6.3,15097
86,The Glen Rock Inn,American Restaurant,Glen Rock,99,6.5,15093
98,Della Cucina Ristorante & Pizzeria,Italian Restaurant,Hillsdale,101,7.3,15099
99,Cornerstone,Restaurant,Hillsdale,101,7.1,15099
100,Wendy’s,Fast Food Restaurant,Hillsdale,101,6.5,15099
104,Sicilian Sun,Italian Restaurant,Ho-Ho-Kus,42,6.7,15097
119,Fionas Restorante,Italian Restaurant,Midland Park,80,7.3,15095


In [56]:
# restaurant - cluster 2 - high coronuavirus cases, close to my home
bergen_open_restaurant.loc[bergen_open_restaurant['Cluster Labels'] == 2, bergen_open_restaurant.columns[[0]+[1]+[3]+[4]+[5]+[6]]]

Unnamed: 0,Venue,Venue Category,Borough,Coronavirus Positive Cases,Rating,Distance
17,Zendiggi Kebab House,Middle Eastern Restaurant,Closter,72,8.1,15098
18,Ma Mi,Vietnamese Restaurant,Closter,72,7.8,15098
20,Farmhouse Cafe & Eatery,American Restaurant,Cresskill,96,8.3,15094
21,Hanami,Chinese Restaurant,Cresskill,96,7.8,15095
22,Umeya,Sushi Restaurant,Cresskill,96,7.7,15094
23,Yasou Mykonos,Greek Restaurant,Demarest,45,8.4,15096
24,Uncle Frank's Pizza,Pizza Place,Dumont,278,8.1,15094
25,Italian Connection Pizza,Pizza Place,Dumont,278,7.7,15094
26,Cuban Eddies,Cuban Restaurant,Dumont,278,8.0,15094
42,Marra's,Italian Restaurant,Emerson,164,7.7,15096


In [57]:
# restaurant - cluster 3 - medium coronuavirus cases, medium distance to my home
bergen_open_restaurant.loc[bergen_open_restaurant['Cluster Labels'] == 3, bergen_open_restaurant.columns[[0]+[1]+[3]+[4]+[5]+[6]]]

Unnamed: 0,Venue,Venue Category,Borough,Coronavirus Positive Cases,Rating,Distance
0,Chapala Grill,Mexican Restaurant,Bergenfield,726,8.4,15092
1,Empanada Mania,Empanada Restaurant,Bergenfield,726,7.7,15092
2,Nick's Pizza,Pizza Place,Bergenfield,726,7.5,15092
3,Cusinera,Asian Restaurant,Bergenfield,726,7.4,15092
4,Bamboo Grill,Asian Restaurant,Bergenfield,726,6.7,15092
5,Papa John's Pizza,Pizza Place,Bergenfield,726,6.5,15092
45,Cassie's Restaurant,Pizza Place,Englewood,690,8.6,15089
46,Akai Lounge,Japanese Restaurant,Englewood,690,8.5,15089
47,Bennies,Middle Eastern Restaurant,Englewood,690,8.5,15089
48,Blue Moon Mexican Cafe,Mexican Restaurant,Englewood,690,7.8,15089


In [58]:
# restaurant - cluster 4 - low coronuavirus cases, high rating, far from my home
bergen_open_restaurant.loc[bergen_open_restaurant['Cluster Labels'] == 4, bergen_open_restaurant.columns[[0]+[1]+[3]+[4]+[5]+[6]]]

Unnamed: 0,Venue,Venue Category,Borough,Coronavirus Positive Cases,Rating,Distance
6,La Famiglia,Italian Restaurant,Bogota,162,8.0,15087
8,Rudy's Seafood Restaurant & Tavern,Italian Restaurant,Cliffside Park,415,8.4,15082
9,Doner Kebab House,Doner Restaurant,Cliffside Park,415,8.4,15081
10,Hakkı Baba,Turkish Restaurant,Cliffside Park,415,7.8,15082
11,Vincent's Pizzeria,Pizza Place,Cliffside Park,415,7.7,15082
12,Silli Point Indian Fusion,Indian Restaurant,Cliffside Park,415,7.6,15081
13,Blackbeard Ramen,Ramen Restaurant,Cliffside Park,415,7.8,15082
14,Petite Soochow | 小苏州,Chinese Restaurant,Cliffside Park,415,7.9,15082
15,House Malaysian,Malay Restaurant,Cliffside Park,415,7.9,15082
29,Caffe Capri,Italian Restaurant,East Rutherford,123,8.5,15080
