# 1. Introduction

Assume I own a consulting firm that provides location services for clients. The clients can be anyone who believe location matters for their business. They can be restaurants, banks, fitness centers, barbarshops and other shops.

During the Pandemic, a lot of restaurants were shut down, and some of them even had to close permanently. As business shutdown order is lifted, new restaurants are emerging to meet people’s demand for food in big cities. In this capstone project, I take an Italian restaurant as example of problem to solve. Now one of my clients is going to open an Italian restaurant in Boston, and he wants us to help him find the best location of three candidate locations.

## Audiance

Because the business we can serve can be all sorts of business that have demand of best location, our audiance is companies, organizations or business persons who want to find the best location to run their business with high performance.

## Problem to solve

The problem to solve is to look for the best location for an Italian restaurant. 

## Criteria of the best location

There are different criteria for different business that define what is the best location. In this project, the criteria of the best location for the Italian restaurant are as follows.

1. There are no more than two Italian Restaurants within 800 meters.
2. There is at least one big park within 800 meters. 
3. There is at least one big mall within 800 meters.


# 2. Data

1. Assume we already know the addresses of three candidate locations in Boston (they could be available rental places in the same time for restaurant business)
2. Park and restaurant data will be obtained from Foursquare that provides venues near the three candidate locations.
3. Source of Boston zipcode data: https://bostonopendata-boston.opendata.arcgis.com/datasets/53ea466a189b4f43b3dfb7b38fa7f3b6_1
4. Source of Massachusetts zipcode and latitude/longitude: https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/table/?q=MA
5. Shopping center data will be obtained from this webpage https://www.bostoncentral.com/shopping.php.


# 3. Methodology

In [1]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
import csv
import geocoder

# Explore Boston by zip code

## Prepare data : (1) Read Boston zip code information from a csv file

In [2]:
df_Boston_zipcode = pd.read_csv('Boston_Zip_Codes.csv')
#df_Boston_zipcode.info
df_Boston_zipcode.head()

Unnamed: 0,OBJECTID,ZIP5,ShapeSTArea,ShapeSTLength
0,1,2134,37219360.0,40794.1824
1,2,2125,64760520.0,62224.52144
2,3,2110,6637284.0,18358.2135
3,4,2118,31161580.0,32353.40762
4,5,2126,60785850.0,45488.39471


In [3]:
# Rename column ZIP5 to ZIP

df_Boston_zipcode.rename(columns={'ZIP5':'Zip'}, inplace=True)
df_Boston_zipcode.head()

Unnamed: 0,OBJECTID,Zip,ShapeSTArea,ShapeSTLength
0,1,2134,37219360.0,40794.1824
1,2,2125,64760520.0,62224.52144
2,3,2110,6637284.0,18358.2135
3,4,2118,31161580.0,32353.40762
4,5,2126,60785850.0,45488.39471


In [4]:
df_Boston_zipcode.shape

(43, 4)

## Prepare data : (2) Read zip code and latitude/longitude of Massachusetts from a csv file. It includes Boston lat/lng information.

In [5]:
df_MA_zipcode_latlng = pd.read_csv('MA_Zip_Codes_latlng.csv',sep=';')
#df_Boston_zipcode.info
print(df_MA_zipcode_latlng.head())

    Zip            City State   Latitude  Longitude  Timezone  \
0  1085       Westfield    MA  42.133642  -72.75029        -5   
1  1340         Colrain    MA  42.673371  -72.73104        -5   
2  1886        Westford    MA  42.592086  -71.43754        -5   
3  2358  North Pembroke    MA  41.805219  -70.62642        -5   
4  1754         Maynard    MA  42.430781  -71.45594        -5   

   Daylight savings time flag             geopoint  
0                           1  42.133642,-72.75029  
1                           1  42.673371,-72.73104  
2                           1  42.592086,-71.43754  
3                           1  41.805219,-70.62642  
4                           1  42.430781,-71.45594  


## Prepare data : (3) Join Boston zip codes with MA zip code & lat/lng information

In [6]:
df_Boston_zip_latlng = pd.merge(df_Boston_zipcode, df_MA_zipcode_latlng, on='Zip', how='left')

In [7]:
df_Boston_zip_latlng.head()


Unnamed: 0,OBJECTID,Zip,ShapeSTArea,ShapeSTLength,City,State,Latitude,Longitude,Timezone,Daylight savings time flag,geopoint
0,1,2134,37219360.0,40794.1824,Allston,MA,42.355147,-71.13164,-5,1,"42.355147,-71.13164"
1,2,2125,64760520.0,62224.52144,Boston,MA,42.316852,-71.05811,-5,1,"42.316852,-71.05811"
2,3,2110,6637284.0,18358.2135,Boston,MA,42.356532,-71.05365,-5,1,"42.356532,-71.05365"
3,4,2118,31161580.0,32353.40762,Boston,MA,42.338724,-71.07276,-5,1,"42.338724,-71.07276"
4,5,2126,60785850.0,45488.39471,Mattapan,MA,42.272098,-71.09426,-5,1,"42.272098,-71.09426"


In [8]:
df_Boston_zip_latlng.shape

(43, 11)

## Prepare data: (4) Create dataframe of Boston with zipcode and latlng

In [9]:
Boston_zip_latlng = df_Boston_zip_latlng[['Zip', 'Latitude', 'Longitude']]
Boston_zip_latlng.head()

Unnamed: 0,Zip,Latitude,Longitude
0,2134,42.355147,-71.13164
1,2125,42.316852,-71.05811
2,2110,42.356532,-71.05365
3,2118,42.338724,-71.07276
4,2126,42.272098,-71.09426


In [10]:
Boston_zip_latlng.shape

(43, 3)

In [11]:
Boston_zip_latlng.groupby('Zip').count()

Unnamed: 0_level_0,Latitude,Longitude
Zip,Unnamed: 1_level_1,Unnamed: 2_level_1
2021,1,1
2026,1,1
2108,1,1
2109,1,1
2110,1,1
2111,1,1
2113,1,1
2114,1,1
2115,1,1
2116,1,1


## Find duplicate records at zip code 2467

## Prepare data : (5) Remove duplicate records at zip code 2467

In [12]:
Boston_zip_latlng.drop_duplicates(inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [13]:
Boston_zip_latlng.shape

(37, 3)

## Create a map of Boston zipcodes

In [14]:
# Get latitude and longitude of Boston certroid in order to center map

from geopy.geocoders import Nominatim

address = 'Boston, Massachusetts, US'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Boston are {}, {}.'.format(latitude, longitude))


The geograpical coordinate of Boston are 42.3602534, -71.0582912.


In [19]:
import folium # map rendering library

Boston_map = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers and popup wondin to map
for zipcode, lat, lng in zip(Boston_zip_latlng['Zip'], Boston_zip_latlng['Latitude'], Boston_zip_latlng['Longitude']):
    label = '{}'.format(zipcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(Boston_map)  
    
Boston_map

## In order to determine the best location for an Italian restaurant, we need to better understand the geographic distribution of all sorts of businesses in Boston. To gain the knowledge of what kinds of businesses are in each zipcode in Boston, we need to obtain venue information in each zipcode.

# Request venues through Foursquare

In [16]:
CLIENT_ID = 'S3PRMDQX0MOCKBHZ40GXLRO3NWJ3EIOY0F0GIN52JDY5FQ0J' # your Foursquare ID
CLIENT_SECRET = 'OVMTVGTK5EW0WM5OIMT2CMJ4MGN5EQNUUKRKU1IRCCAQ0MCL' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

## Define a function to return 100 venues witin 500 meters from the center of each zipcode

In [21]:
def getNearbyVenues(zipcodes, latitudes, longitudes, radius=500):
    LIMIT = 100
    venues_list=[]
    for zipcode, lat, lng in zip(zipcodes, latitudes, longitudes):  
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        print(url)

        results = requests.get(url).json()["response"]['groups'][0]['items']
        #print (results[0])

        venues_list.append([(
            zipcode, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Zip', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

## Obtain venues returned by Foursqure service

In [22]:
Boston_venues = getNearbyVenues(zipcodes=Boston_zip_latlng['Zip'],
                                   latitudes=Boston_zip_latlng['Latitude'],
                                   longitudes=Boston_zip_latlng['Longitude']
                                )
  

https://api.foursquare.com/v2/venues/explore?&client_id=S3PRMDQX0MOCKBHZ40GXLRO3NWJ3EIOY0F0GIN52JDY5FQ0J&client_secret=OVMTVGTK5EW0WM5OIMT2CMJ4MGN5EQNUUKRKU1IRCCAQ0MCL&v=20180605&ll=42.355146999999995,-71.13164&radius=500&limit=100
https://api.foursquare.com/v2/venues/explore?&client_id=S3PRMDQX0MOCKBHZ40GXLRO3NWJ3EIOY0F0GIN52JDY5FQ0J&client_secret=OVMTVGTK5EW0WM5OIMT2CMJ4MGN5EQNUUKRKU1IRCCAQ0MCL&v=20180605&ll=42.316852000000004,-71.05811&radius=500&limit=100
https://api.foursquare.com/v2/venues/explore?&client_id=S3PRMDQX0MOCKBHZ40GXLRO3NWJ3EIOY0F0GIN52JDY5FQ0J&client_secret=OVMTVGTK5EW0WM5OIMT2CMJ4MGN5EQNUUKRKU1IRCCAQ0MCL&v=20180605&ll=42.356532,-71.05365&radius=500&limit=100
https://api.foursquare.com/v2/venues/explore?&client_id=S3PRMDQX0MOCKBHZ40GXLRO3NWJ3EIOY0F0GIN52JDY5FQ0J&client_secret=OVMTVGTK5EW0WM5OIMT2CMJ4MGN5EQNUUKRKU1IRCCAQ0MCL&v=20180605&ll=42.338724,-71.07276&radius=500&limit=100
https://api.foursquare.com/v2/venues/explore?&client_id=S3PRMDQX0MOCKBHZ40GXLRO3NWJ3EIOY0F

In [23]:
Boston_venues.head()

Unnamed: 0,Zip,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,2134,42.355147,-71.13164,Lulu's Allston,42.355068,-71.134107,Comfort Food Restaurant
1,2134,42.355147,-71.13164,Kaju Tofu House,42.354329,-71.132374,Korean Restaurant
2,2134,42.355147,-71.13164,Fish Market Sushi Bar,42.353039,-71.132975,Sushi Restaurant
3,2134,42.355147,-71.13164,BonChon Chicken,42.353105,-71.130921,Fried Chicken Joint
4,2134,42.355147,-71.13164,Mala Restaurant,42.35296,-71.131033,Chinese Restaurant


In [24]:
Boston_venues.shape

(1533, 7)

In [25]:
Boston_venues['Venue Category'].unique()

array(['Comfort Food Restaurant', 'Korean Restaurant', 'Sushi Restaurant',
       'Fried Chicken Joint', 'Chinese Restaurant', 'Falafel Restaurant',
       'Diner', 'Indian Restaurant', 'Japanese Restaurant',
       'Thai Restaurant', 'Italian Restaurant', 'Thrift / Vintage Store',
       'Rock Club', 'Smoke Shop', 'Bakery', 'Restaurant',
       'Vietnamese Restaurant', 'Liquor Store', 'Hot Dog Joint',
       'Mexican Restaurant', 'Mediterranean Restaurant',
       'Asian Restaurant', 'Frozen Yogurt Shop', 'Pizza Place',
       'Tea Room', 'Vegetarian / Vegan Restaurant', 'Board Shop',
       'Dumpling Restaurant', 'Taco Place', 'Gastropub', 'Gym',
       'Electronics Store', 'Ice Cream Shop', 'Sandwich Place', 'Café',
       'Hookah Bar', 'Seafood Restaurant', 'Bubble Tea Shop',
       'Sports Bar', 'Pet Store', 'Dive Bar', 'Pharmacy',
       'Hardware Store', 'Shoe Store', 'Latin American Restaurant',
       'Donut Shop', 'Nightclub', 'Dance Studio', 'Fast Food Restaurant',
       'S

In [26]:
# one hot encoding
boston_onehot = pd.get_dummies(Boston_venues[['Venue Category']], prefix="", prefix_sep="")

# add zipcode column back to dataframe
boston_onehot['Zip'] = Boston_venues['Zip'] 

# move zipcode column to the first column
fixed_columns = [boston_onehot.columns[-1]] + list(boston_onehot.columns[:-1])
boston_onehot = boston_onehot[fixed_columns]

boston_onehot.head()

Unnamed: 0,Zip,Accessories Store,African Restaurant,Airport,American Restaurant,Aquarium,Arepa Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Trail,Udon Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Winery,Women's Store,Yoga Studio
0,2134,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,2134,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,2134,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,2134,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,2134,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [27]:
boston_onehot.shape

(1533, 230)

In [28]:
# Normalize data in order to analyze
boston_grouped = boston_onehot.groupby('Zip').mean().reset_index()
boston_grouped

Unnamed: 0,Zip,Accessories Store,African Restaurant,Airport,American Restaurant,Aquarium,Arepa Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Trail,Udon Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Winery,Women's Store,Yoga Studio
0,2021,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2026,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2108,0.0,0.0,0.0,0.034091,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.011364
3,2109,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,...,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01
4,2110,0.0,0.0,0.0,0.022989,0.034483,0.0,0.0,0.011494,0.0,...,0.0,0.0,0.011494,0.0,0.0,0.0,0.011494,0.0,0.0,0.0
5,2111,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.01
6,2113,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01
7,2114,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025
8,2115,0.0,0.0,0.0,0.030303,0.0,0.0,0.030303,0.121212,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,2116,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,...,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.02,0.0


## Explore what businesses are the most common in a zipcode area

In [29]:
# Check top 5 venue categories and their scores in each zipcode

num_top_venues = 5

for zipcode in boston_grouped['Zip']:
    print("----"+str(zipcode)+"----")
    temp = boston_grouped[boston_grouped['Zip'] == zipcode].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----2021----
                venue  freq
0  Athletics & Sports  0.50
1     Paintball Field  0.25
2                Farm  0.25
3   Accessories Store  0.00
4   Other Repair Shop  0.00


----2026----
            venue  freq
0  Farmers Market   0.2
1           Diner   0.2
2             Gym   0.2
3           Track   0.2
4     High School   0.2


----2108----
                 venue  freq
0          Pizza Place  0.05
1          Coffee Shop  0.05
2        Historic Site  0.03
3           Steakhouse  0.03
4  American Restaurant  0.03


----2109----
                venue  freq
0  Italian Restaurant  0.22
1  Seafood Restaurant  0.10
2              Bakery  0.06
3                Park  0.05
4       Historic Site  0.04


----2110----
                venue  freq
0       Boat or Ferry  0.11
1  Seafood Restaurant  0.07
2                Park  0.06
3               Hotel  0.05
4       Historic Site  0.05


----2111----
                venue  freq
0  Chinese Restaurant  0.13
1              Bakery  0.08
2    A

## Define a function return_most_common_venues() to return the most common businesses in each zipcode

In [30]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

## Obtain top 10 most common venues

In [31]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Zipcode']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
zipcode_10_most = pd.DataFrame(columns=columns)
zipcode_10_most['Zipcode'] = boston_grouped['Zip']

for ind in np.arange(boston_grouped.shape[0]):
    zipcode_10_most.iloc[ind, 1:] = return_most_common_venues(boston_grouped.iloc[ind, :], num_top_venues)

zipcode_10_most.head()

Unnamed: 0,Zipcode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,2021,Athletics & Sports,Paintball Field,Farm,Yoga Studio,Falafel Restaurant,Food Truck,Food Service,Food Court,Food & Drink Shop,Food
1,2026,Diner,Track,Farmers Market,Gym,High School,Yoga Studio,Falafel Restaurant,Food Service,Food Court,Food & Drink Shop
2,2108,Coffee Shop,Pizza Place,Italian Restaurant,Sandwich Place,Restaurant,Plaza,New American Restaurant,Steakhouse,American Restaurant,Historic Site
3,2109,Italian Restaurant,Seafood Restaurant,Bakery,Park,Historic Site,Pizza Place,Café,Tourist Information Center,Grocery Store,Hotel
4,2110,Boat or Ferry,Seafood Restaurant,Park,Historic Site,Hotel,Harbor / Marina,Coffee Shop,Asian Restaurant,Italian Restaurant,Aquarium


## Create clusters with k-mean with k=5

In [32]:
# import k-means
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

boston_grouped_clustering = boston_grouped.drop('Zip', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(boston_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 2, 1, 1, 1, 1, 1, 1, 1, 1])

In [33]:
Boston_zip_latlng.columns

Index(['Zip', 'Latitude', 'Longitude'], dtype='object')

In [34]:
Boston_zip_latlng.shape

(37, 3)

In [35]:
# add clustering labels
zipcode_10_most.insert(0, 'Cluster Labels', kmeans.labels_)

In [36]:
zipcode_10_most.head()

Unnamed: 0,Cluster Labels,Zipcode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,4,2021,Athletics & Sports,Paintball Field,Farm,Yoga Studio,Falafel Restaurant,Food Truck,Food Service,Food Court,Food & Drink Shop,Food
1,2,2026,Diner,Track,Farmers Market,Gym,High School,Yoga Studio,Falafel Restaurant,Food Service,Food Court,Food & Drink Shop
2,1,2108,Coffee Shop,Pizza Place,Italian Restaurant,Sandwich Place,Restaurant,Plaza,New American Restaurant,Steakhouse,American Restaurant,Historic Site
3,1,2109,Italian Restaurant,Seafood Restaurant,Bakery,Park,Historic Site,Pizza Place,Café,Tourist Information Center,Grocery Store,Hotel
4,1,2110,Boat or Ferry,Seafood Restaurant,Park,Historic Site,Hotel,Harbor / Marina,Coffee Shop,Asian Restaurant,Italian Restaurant,Aquarium


In [37]:
zipcode_10_most.rename(columns={"Zipcode":"Zip"}, inplace=True)

In [38]:
zipcode_10_most.head()

Unnamed: 0,Cluster Labels,Zip,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,4,2021,Athletics & Sports,Paintball Field,Farm,Yoga Studio,Falafel Restaurant,Food Truck,Food Service,Food Court,Food & Drink Shop,Food
1,2,2026,Diner,Track,Farmers Market,Gym,High School,Yoga Studio,Falafel Restaurant,Food Service,Food Court,Food & Drink Shop
2,1,2108,Coffee Shop,Pizza Place,Italian Restaurant,Sandwich Place,Restaurant,Plaza,New American Restaurant,Steakhouse,American Restaurant,Historic Site
3,1,2109,Italian Restaurant,Seafood Restaurant,Bakery,Park,Historic Site,Pizza Place,Café,Tourist Information Center,Grocery Store,Hotel
4,1,2110,Boat or Ferry,Seafood Restaurant,Park,Historic Site,Hotel,Harbor / Marina,Coffee Shop,Asian Restaurant,Italian Restaurant,Aquarium


In [39]:
Boston_zip_latlng.head()

Unnamed: 0,Zip,Latitude,Longitude
0,2134,42.355147,-71.13164
1,2125,42.316852,-71.05811
2,2110,42.356532,-71.05365
3,2118,42.338724,-71.07276
4,2126,42.272098,-71.09426


## In order to append location info to zicode_10_most, merge it with Boston_zip_latlng

In [41]:
# Merge zipcode_10_most with Boston_zip_latlng
boston_merged = Boston_zip_latlng
boston_merged = boston_merged.join(zipcode_10_most.set_index('Zip'), on='Zip')

boston_merged.head(37)

Unnamed: 0,Zip,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,2134,42.355147,-71.13164,1,Korean Restaurant,Pizza Place,Chinese Restaurant,Bakery,Sushi Restaurant,Thai Restaurant,Asian Restaurant,Pharmacy,Electronics Store,Mexican Restaurant
1,2125,42.316852,-71.05811,1,Bar,Pub,Pizza Place,Fried Chicken Joint,Park,Caribbean Restaurant,Food & Drink Shop,Health & Beauty Service,Liquor Store,Plaza
2,2110,42.356532,-71.05365,1,Boat or Ferry,Seafood Restaurant,Park,Historic Site,Hotel,Harbor / Marina,Coffee Shop,Asian Restaurant,Italian Restaurant,Aquarium
3,2118,42.338724,-71.07276,1,Pizza Place,Mexican Restaurant,Thai Restaurant,Park,Latin American Restaurant,Tapas Restaurant,Coffee Shop,Sandwich Place,Gift Shop,Mediterranean Restaurant
4,2126,42.272098,-71.09426,1,Pizza Place,Home Service,Health & Beauty Service,Park,Donut Shop,Caribbean Restaurant,Fast Food Restaurant,Food & Drink Shop,Mobile Phone Shop,Bank
5,2109,42.361477,-71.05417,1,Italian Restaurant,Seafood Restaurant,Bakery,Park,Historic Site,Pizza Place,Café,Tourist Information Center,Grocery Store,Hotel
6,2021,42.166776,-71.1343,4,Athletics & Sports,Paintball Field,Farm,Yoga Studio,Falafel Restaurant,Food Truck,Food Service,Food Court,Food & Drink Shop,Food
7,2113,42.365028,-71.05636,1,Italian Restaurant,Pizza Place,Park,Seafood Restaurant,Coffee Shop,Bakery,Sandwich Place,Café,Hotel,Grocery Store
8,2130,42.309998,-71.11171,1,Bakery,Coffee Shop,Yoga Studio,Thrift / Vintage Store,Art Gallery,Pizza Place,Accessories Store,Park,Donut Shop,Bar
9,2121,42.307448,-71.08127,1,Southern / Soul Food Restaurant,Supermarket,Donut Shop,Nightclub,Caribbean Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Shopping Mall,Men's Store


### From the table above, we can conclude what types of venues each label denotes as follows

1. Label 0 - print shop, trail
2. Label 1 - all sorts of food and drink places
3. Label 2 - farm, gym, school
4. Label 3 - park, sport field
5. Label 4 - sport field, farm

## Create a cluster map based on cluster labels and location

In [42]:
# create cluster map

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

boston_map1 = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(boston_merged['Latitude'], boston_merged['Longitude'], boston_merged['Zip'], boston_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(boston_map1)
       
boston_map1

## Interprete cluster map

### The map tells us that there are all sorts of business in downtown areas in Boston except four zipcodes.  A lot of them are different cuisines of restaurants and drink shops, as well as shopping malls, schools etc. They are equally commercial areas. Therefore, there is no preference for candidate loactions at the city scale. 

### Assume we already have three candidate locations for a new Italian restaurant. Now we are going to check each of them at smaller scale, which is surrending area around them.

# Add three candidate locations to map

## First, obtain latitude & longitude of each candidate location

In [43]:
# Addresses of three candidate locations
locations = ['48 Deckard St, Boston, MA 02121', '52 Mt Vernon St, Boston, MA 02108', '270 W Fifth St, Boston, MA 02127']

# Get latitude and longitude for each candidate location
candidate_lat = []
candidate_lng = []

for i in range(len(locations)):
    print(i)
    address = locations[i]
    #geolocator = Nominatim(user_agent="ny_explorer")
    location = geolocator.geocode(address)
    
    if (location is None):
        print ("None: " + address+ "\n")
    else:
        latitude = location.latitude
        longitude = location.longitude
        print(latitude)
        print(longitude)
        print("\n")
        candidate_lat.append(latitude)
        candidate_lng.append(longitude)
print(candidate_lat)
print(candidate_lng)


0
42.315778819228
-71.08507985113481


1
42.35810105
-71.0669259504036


2
42.3353948028169
-71.04899235211268


[42.315778819228, 42.35810105, 42.3353948028169]
[-71.08507985113481, -71.0669259504036, -71.04899235211268]


## Add candidate locations on map

In [44]:
for i in range(len(candidate_lat)):
    folium.Marker(
        #[lat, lon],
        #radius=6,
        location=[candidate_lat[i], candidate_lng[i]],
        #popup=locations[i],
        popup = 'Candidate location ' + str(i+1),
        icon=folium.Icon(color='orange',icon='utensils',prefix='fa')
        ).add_to(boston_map1)
    

In [45]:
boston_map1

## Find nearby venues of each candidate location within 800 meters

In [46]:
radius = 800
LIMIT = 200
candidate_venues = []

for i in range(len(candidate_lat)):
    # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            candidate_lat[i], 
            candidate_lng[i], 
            radius, 
            LIMIT)
        print(url)
        
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        candidate_venues.append([(
            i, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
        
##
df_candidate_venues = pd.DataFrame([item for venue in candidate_venues for item in venue])

df_candidate_venues.columns = [
              'id', 
              'venue name', 
              'venue latitude', 
              'venue longitude', 
              'venue category']

df_candidate_venues.head()

https://api.foursquare.com/v2/venues/explore?&client_id=S3PRMDQX0MOCKBHZ40GXLRO3NWJ3EIOY0F0GIN52JDY5FQ0J&client_secret=OVMTVGTK5EW0WM5OIMT2CMJ4MGN5EQNUUKRKU1IRCCAQ0MCL&v=20180605&ll=42.315778819228,-71.08507985113481&radius=800&limit=200
https://api.foursquare.com/v2/venues/explore?&client_id=S3PRMDQX0MOCKBHZ40GXLRO3NWJ3EIOY0F0GIN52JDY5FQ0J&client_secret=OVMTVGTK5EW0WM5OIMT2CMJ4MGN5EQNUUKRKU1IRCCAQ0MCL&v=20180605&ll=42.35810105,-71.0669259504036&radius=800&limit=200
https://api.foursquare.com/v2/venues/explore?&client_id=S3PRMDQX0MOCKBHZ40GXLRO3NWJ3EIOY0F0GIN52JDY5FQ0J&client_secret=OVMTVGTK5EW0WM5OIMT2CMJ4MGN5EQNUUKRKU1IRCCAQ0MCL&v=20180605&ll=42.3353948028169,-71.04899235211268&radius=800&limit=200


Unnamed: 0,id,venue name,venue latitude,venue longitude,venue category
0,0,Popeyes Louisiana Kitchen,42.318547,-71.082893,Fried Chicken Joint
1,0,The Merengue Restaurant,42.319199,-71.077655,Caribbean Restaurant
2,0,Roxbury YMCA,42.317791,-71.082789,Gym / Fitness Center
3,0,Flames III,42.30902,-71.083061,Caribbean Restaurant
4,0,Walgreens,42.316881,-71.082464,Pharmacy


In [47]:
df_candidate_venues.shape

(183, 5)

In [48]:
df_candidate_venues.groupby('id').count()

Unnamed: 0_level_0,venue name,venue latitude,venue longitude,venue category
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,23,23,23,23
1,100,100,100,100
2,60,60,60,60


## Select venues near candidate location 1 and put them into a dateframe

In [49]:
# select venues with id 0
candidate1_venues = df_candidate_venues[df_candidate_venues['id']==0]
candidate1_venues.head(50)

Unnamed: 0,id,venue name,venue latitude,venue longitude,venue category
0,0,Popeyes Louisiana Kitchen,42.318547,-71.082893,Fried Chicken Joint
1,0,The Merengue Restaurant,42.319199,-71.077655,Caribbean Restaurant
2,0,Roxbury YMCA,42.317791,-71.082789,Gym / Fitness Center
3,0,Flames III,42.30902,-71.083061,Caribbean Restaurant
4,0,Walgreens,42.316881,-71.082464,Pharmacy
5,0,Marshalls,42.319429,-71.081941,Department Store
6,0,CVS pharmacy,42.309031,-71.081922,Pharmacy
7,0,Shell,42.314149,-71.07878,Gas Station
8,0,The Mall Of Roxbury,42.32022,-71.082122,Shopping Mall
9,0,Save-A-Lot,42.31949,-71.082307,Grocery Store


In [58]:
# Get values of column 'venue category' and put them into a list
candidate1_category = candidate1_venues['venue category']

candidate1_category

0      Fried Chicken Joint
1     Caribbean Restaurant
2     Gym / Fitness Center
3     Caribbean Restaurant
4                 Pharmacy
5         Department Store
6                 Pharmacy
7              Gas Station
8            Shopping Mall
9            Grocery Store
10       Electronics Store
11            Liquor Store
12     Fried Chicken Joint
13     American Restaurant
14       Convenience Store
15              Donut Shop
16        Basketball Court
17             Supermarket
18      Spanish Restaurant
19    Fast Food Restaurant
20          Cosmetics Shop
21            Skating Rink
22                     Gym
Name: venue category, dtype: object

## Select venues near candidate location 2 and put them into a dateframe

In [50]:
# select venues near id 1
candidate2_venues = df_candidate_venues[df_candidate_venues['id']==1]
candidate2_venues.head()

Unnamed: 0,id,venue name,venue latitude,venue longitude,venue category
23,1,Louisburg Square,42.358636,-71.068903,Plaza
24,1,Tatte Bakery & Cafe,42.357904,-71.070439,Bakery
25,1,Frog Pond,42.356134,-71.065672,Lake
26,1,Figs,42.357255,-71.070007,Pizza Place
27,1,Museum of African American History,42.360058,-71.065287,History Museum


In [60]:
# Get values of column 'venue category' and put them into a list

candidate2_category = candidate2_venues['venue category']

candidate2_category

23                  Plaza
24                 Bakery
25                   Lake
26            Pizza Place
27         History Museum
              ...        
118      Sculpture Garden
119            Steakhouse
120         Historic Site
121           Coffee Shop
122    Seafood Restaurant
Name: venue category, Length: 100, dtype: object

## Select venues near candidate location 3 and put them into a dateframe

In [51]:
# select venues near id 2
candidate3_venues = df_candidate_venues[df_candidate_venues['id']==2]
candidate3_venues.head()

Unnamed: 0,id,venue name,venue latitude,venue longitude,venue category
123,2,Capo,42.33607,-71.047079,Italian Restaurant
124,2,Loco Taqueria & Oyster Bar,42.336999,-71.047803,Mexican Restaurant
125,2,Lincoln Tavern & Restaurant,42.336406,-71.047522,American Restaurant
126,2,J.P. Licks,42.336932,-71.048436,Ice Cream Shop
127,2,Sweet Tooth,42.337901,-71.049432,Dessert Shop


In [61]:
# Get values of column 'venue category' and put them into a list
candidate3_category = candidate3_venues['venue category']
candidate3_category

123         Italian Restaurant
124         Mexican Restaurant
125        American Restaurant
126             Ice Cream Shop
127               Dessert Shop
128             Cosmetics Shop
129                Coffee Shop
130                       Park
131         Italian Restaurant
132               Liquor Store
133                        Bar
134           Sushi Restaurant
135              Historic Site
136    New American Restaurant
137                     Market
138                Coffee Shop
139               Gourmet Shop
140             Sandwich Place
141                        Gym
142             Sandwich Place
143                        Bar
144               Climbing Gym
145       Gym / Fitness Center
146                 Sports Bar
147                        Bar
148                        Gym
149                 Donut Shop
150              Grocery Store
151          Convenience Store
152                        Gym
153                Pizza Place
154                       Bank
155     

## Criteria of best Italian restaurant

1. There are no more than two Italian Restaurants within 800 meters.
2. There is at least one park within 800 meters. 
3. There is at least one mall within 800 meters.

## Define a function to check criteria for each location

In [62]:
def check_best_location(candidate_venues):
    i = 0  #number of Italian resraurant
    criterion1 = 'There is no Italian Restaurant within 800 meters'
    
    j = 0 # number of park
    criterion2 = 'There is no park within 800 meters'
    
    k = 0 # number of shopping mall
    criterion3 = 'There is no shopping mall within 800 meters'
    
    for venue in candidate_venues:
        
        #print(venue)
        if (venue.lower() == 'italian restaurant'):
            i = i+1
            criterion1 = str(i) + ' Italian Restaurant(s) found within 800 meters.'
            
        if(venue.lower() == 'park'):
            j = j+1
            criterion2 = str(j) + ' park(s) found within 800 meters.'
            
        if(venue.lower() == 'shopping mall'):
            k = k+1
            criterion3 = str(k) + ' shopping mall(s) found within 800 meters.'
            
    print(criterion1)
    print('\n')
    print(criterion2)
    print('\n')
    print(criterion3)
    print('\n')

# 4. Result

## (1) Results of candidate location 1

In [63]:
check_best_location(candidate1_category)

There is no Italian Restaurant within 800 meters


There is no park within 800 meters


1 shopping mall(s) found within 800 meters.




## (2) Results of candidate location 2

In [64]:
check_best_location(candidate2_category)

5 Italian Restaurant(s) found within 800 meters.


3 park(s) found within 800 meters.


There is no shopping mall within 800 meters




## (3) Results of candidate location 3

In [65]:
check_best_location(candidate3_category)

2 Italian Restaurant(s) found within 800 meters.


1 park(s) found within 800 meters.


There is no shopping mall within 800 meters




# 5. Discussion

### Result 1: There is one big shopping mall within 800 meters, and there is no Italian restaurant and park nearby.

### Result 2: There are five Italian restaurants within 800 meters. Even though there are three parks nearby, that still indicates potentially fierce competition between these Italian restaurants. There is no shopping mall within 800 meters.

### Result 3: Near this candidate location, there are two Italian restaurants and one park within 800 meters, and there is no shopping mall either.

### Locations 2 is not considered as a good option because there are five Italian restaurants near this location already. For location 1 and location 3, which one is better than another? It is not straightforward to make a final decision just based on the results because they convey a mixed message. So further analysis is needed. I'm going to add shopping malls on the map, which will be helpful.

## Add shopping malls on map

### First, open Boston shopping mall data

In [66]:
# shopping malls - points
boston_shopping_mall = pd.read_csv('Boston_shopping_malls.csv')
#df_Boston_zipcode.info
boston_shopping_mall.head()

Unnamed: 0,mall name,address,city,zipcode,lat,lng
0,Prudential Center Boston,800 Boylston St,Boston,2199,,
1,Copley Place,100 Huntington Ave,Boston,2116,,
2,Faneuil Hall Marketplace,4 S Market St,Boston,2109,,
3,South Bay Center,8 Allstate Rd,Boston,2118,,
4,Washington Park Mall,330 M.L.K.Jr Blvd,Boston,2119,,


### Fix incorrect addresses

In [67]:
#boston_shopping_mall.iloc(4) = {'address':'330 Martin Luther King Jr Blvd'}
boston_shopping_mall.at[4,"address"] = '330 Martin Luther King Jr Blvd'

In [68]:
boston_shopping_mall.at[9,"address"] = '24 Winter Place'
boston_shopping_mall.at[13,"address"] = '100 Cambridgeside Place'

In [69]:
boston_shopping_mall.head(16)

Unnamed: 0,mall name,address,city,zipcode,lat,lng
0,Prudential Center Boston,800 Boylston St,Boston,2199,,
1,Copley Place,100 Huntington Ave,Boston,2116,,
2,Faneuil Hall Marketplace,4 S Market St,Boston,2109,,
3,South Bay Center,8 Allstate Rd,Boston,2118,,
4,Washington Park Mall,330 Martin Luther King Jr Blvd,Boston,2119,,
5,Longwood Galleria,400 Brookline Ave,Boston,2215,,
6,Fresh Pond Mall,140 Clarendon St,Boston,2116,,
7,Memorial Convention Center,900 Boylston St,Boston,2199,,
8,Stores At 500 Washington,500 Washington St,Boston,2111,,
9,Downtown Crossing,24 Winter Place,Boston,2108,,


### Obtain lat/lng of each mall through geocoders

In [70]:
from geopy.geocoders import Nominatim

address = 'Boston, Massachusetts, US'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Boston are {}, {}.'.format(latitude, longitude))


The geograpical coordinate of Boston are 42.3602534, -71.0582912.


In [None]:
# 42.3602534, -71.0582912.
# 42.3471718, -71.0825062.

In [71]:
#from geopy.geocoders import Nominatim
lat = []
lng = []
#flag = False
geolocator = Nominatim(user_agent="ny_explorer")
location = None

for street,city,zipcode in zip(boston_shopping_mall['address'],boston_shopping_mall['city'],boston_shopping_mall['zipcode']):
    #flag = True
    
    address = street + "," + city + ", MA 0" + str(zipcode)
    #geolocator = Nominatim(user_agent="ny_explorer")
    location = geolocator.geocode(address)
    
    #while(location is None):
    if (location is None):
        print ("None: " + address+ "\n")
    else:
        latitude = location.latitude
        longitude = location.longitude
        #print(latitude)
        #print(longitude)
        #print("\n")
        lat.append(latitude)
        lng.append(longitude)
print(lat)
print(lng)

[42.3471718, 42.3472309, 42.359706, 42.3268965, 42.318836982353865, 42.33872036842105, 42.34859005263158, 42.3479157, 42.354420600000005, 42.35585979166667, 42.355892957694266, 42.36173, 42.361730300000005, 42.3687832, 42.374218, 42.38901760869565]
[-71.08250628733232, -71.07758361909961, -71.0550683, -71.0617906, -71.08457817184478, -71.10737578947369, -71.07430336842106, -71.0843914, -71.0614958, -71.06174775, -71.06016759569124, -71.054485, -71.06660593534323, -71.0759859, -71.120629, -71.11836373913043]


In [73]:
boston_shopping_mall["latitude"] = lat
boston_shopping_mall["longitude"] = lng

In [74]:
boston_shopping_mall.head()

Unnamed: 0,mall name,address,city,zipcode,lat,lng,latitude,longitude
0,Prudential Center Boston,800 Boylston St,Boston,2199,,,42.347172,-71.082506
1,Copley Place,100 Huntington Ave,Boston,2116,,,42.347231,-71.077584
2,Faneuil Hall Marketplace,4 S Market St,Boston,2109,,,42.359706,-71.055068
3,South Bay Center,8 Allstate Rd,Boston,2118,,,42.326896,-71.061791
4,Washington Park Mall,330 Martin Luther King Jr Blvd,Boston,2119,,,42.318837,-71.084578


In [75]:
boston_shopping_mall.drop(columns = ["lat","lng"])

Unnamed: 0,mall name,address,city,zipcode,latitude,longitude
0,Prudential Center Boston,800 Boylston St,Boston,2199,42.347172,-71.082506
1,Copley Place,100 Huntington Ave,Boston,2116,42.347231,-71.077584
2,Faneuil Hall Marketplace,4 S Market St,Boston,2109,42.359706,-71.055068
3,South Bay Center,8 Allstate Rd,Boston,2118,42.326896,-71.061791
4,Washington Park Mall,330 Martin Luther King Jr Blvd,Boston,2119,42.318837,-71.084578
5,Longwood Galleria,400 Brookline Ave,Boston,2215,42.33872,-71.107376
6,Fresh Pond Mall,140 Clarendon St,Boston,2116,42.34859,-71.074303
7,Memorial Convention Center,900 Boylston St,Boston,2199,42.347916,-71.084391
8,Stores At 500 Washington,500 Washington St,Boston,2111,42.354421,-71.061496
9,Downtown Crossing,24 Winter Place,Boston,2108,42.35586,-71.061748


### Add shopping malls to Boston map as a feature group layer that can be turn on/off through Layer Control

In [76]:
from folium import FeatureGroup, LayerControl, Map, Marker
map1 = boston_map1

feature_group = FeatureGroup(name='shopping mall')

for lat, lon, mall in zip(boston_shopping_mall['latitude'], boston_shopping_mall['longitude'], boston_shopping_mall['mall name']):
    label = folium.Popup(mall, parse_html=True)
    #folium.CircleMarker
    Marker(location=[lat,lon],popup=label).add_to(feature_group)
    
feature_group.add_to(map1)
LayerControl().add_to(map1)
       
map1

# 6. Conclusion

## I read the map carefully after shooping malls are added on it, and focus on candidate location 1 and 3 because location 2 have been taken out of consiferation. I find the shopping mall  near the candidate location 1 is called Washington Park Mall, which is a big mall attracting heavy customer traffic. And there is no a Italian restaurant near it. For location 3, there is a small park called Thomas park close to it, and there are two Italian restaurants near this location already. I further check with my client about rent and restaurant size. The location 1 has good size with lower rent. Now I can draw a conclusion that the candidate location 1 is the best location for the new Italian restaurant.