# The Battle of Neighborhoods

### The best location to open a new chinese restaurant in Toronto 

## Introduction


Location for the restaurant is extremely important for a restaurant's success. The client want to open a chinese restaurant, and they are looking for the best location to open it. To find this answer, we need to first consider whether that neiborhood has enough asian population. To make good profit, we also need to consider the household income in that area, which cannot be too low. Else, other factors might also affect the choice of location. For example, not enough chinese restaurant density, low crime rate, enough parking lot density, etc. 

## Data

To answer this quesiton, following datas will be used:

     - List of Toronto neighbourhood (Wikipidia :https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M)
     - Toronto geospatial data (https://cocl.us/Geospatial_data)
     - Population and Ethnic of each neighborhood (Toronto Census: https://open.toronto.ca/dataset/neighbourhood-profiles/)
     - Household income of each neighborhood (Toronto Census: https://open.toronto.ca/)
     - Number of Chinese restaurant (Foursquare API)
     - Google
     

## Methodology

All the data need to be combined to one data, which should include postal code, neighbourhood, latitude, longitude, population, Income, chinese restaurant count. When dealing with data from all places, some factors need to be checked carefully. For example, the neighborhood name is different from different datasets. The census data are plot on Toronto map, which can give a initial impression of areas having higheast Chinese population, and higheast average income. 

To obtain the chinese restaurant count, Foursquare API was used to explore each neighbourhood and return 500 venus within 3000 meters of the longitude and latitude for each postal code. In hte combined dataframe, K-Mean clustering algorithms is used to analyse Asian origins, Chinese Population, restaurant count, and devide the data to 5 clusters. This result help narrowing down the target neighbourhood which is good for opening a new Chinese restaurant. 




## Result

In [456]:
import pandas as pd
import numpy as np
from itertools import *
from geopy.geocoders import Nominatim
import requests
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
from sklearn.datasets.samples_generator import make_blobs
import folium 
import json
from pandas.io.json import json_normalize
import locale
from locale import atof
locale.setlocale(locale.LC_NUMERIC, '')

'English_United States.1252'

Below is a  post code table scrapeed from wikipedia. Drop the row if Borough has not assigned. Copy Borough name if neighbourhood is not assigned.

Table 1: List of post code for neighbourhoods in Toronto  

In [457]:
# scrape post code data from wikipedia
df_toronto = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M', header=0)[0]
df_toronto.replace("Not assigned", np.nan, inplace = True)
df_toronto.dropna(subset=["Borough"], axis=0, inplace=True)
df_toronto.reset_index(drop=True, inplace=True)
df_toronto["Neighbourhood"].replace(np.nan, df_toronto["Borough"], inplace = True)
df_toronto.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights
5,M6A,North York,Lawrence Manor
6,M7A,Queen's Park,Queen's Park
7,M9A,Etobicoke,Islington Avenue
8,M1B,Scarborough,Rouge
9,M1B,Scarborough,Malvern


collect census data about chinese population and average income of each neighbourhood.

Table 2: Toronto neighborhood profiles sort by average income 


In [460]:
# collect census data about chinese population and average income of each neighbourhood
filename = 'https://ckan0.cf.opendata.inter.prod-toronto.ca/download_resource/ef0239b1-832b-4d0b-a1f3-4153e53b189e?format=csv'
df_census = pd.read_csv(filename, thousands=',')
df_census = df_census.loc[(df_census.Characteristic == 'Neighbourhood Number')|
                          (df_census.Characteristic == ' Asian origins')|
                          (df_census.Characteristic == ' Chinese')|
                          (df_census.Characteristic == 'Total income: Average amount ($)')]

df_census = df_census.drop(['_id', 'Category','Topic', 'Data Source', 'City of Toronto'], axis=1)
df_census.rename({'Characteristic':'index'}, axis = 1, inplace = True)
df_census.set_index('index', inplace = True)
df_census = df_census.transpose()
df_census = df_census.reset_index() 
df_census.rename(columns = {'index':'Neighbourhood', ' Asian origins':'Asian origins', ' Chinese':'Chinese'}, inplace = True)
df_census['Neighbourhood Number'] = df_census["Neighbourhood Number"].astype("float")


df_census[['Asian origins', 'Chinese', 'Total income: Average amount ($)' ]]=df_census[['Asian origins', 'Chinese', 'Total income: Average amount ($)' ]].applymap(atof)



In [461]:
# table of top ten neighbourhoods has most total average income 
df_census.sort_values(by = "Total income: Average amount ($)", ascending = False, axis=0, inplace = True)
df_census.head(10)

index,Neighbourhood,Neighbourhood Number,Asian origins,Chinese,Total income: Average amount ($)
16,Bridle Path-Sunnybrook-York Mills,41.0,3005.0,1445.0,308010.0
104,Rosedale-Moore Park,98.0,3485.0,1300.0,207903.0
44,Forest Hill South,101.0,1880.0,560.0,204521.0
69,Lawrence Park South,103.0,2510.0,935.0,169203.0
21,Casa Loma,96.0,1480.0,450.0,165047.0
64,Kingsway South,15.0,1085.0,335.0,144642.0
70,Leaside-Bennington,56.0,2740.0,1125.0,125564.0
9,Bedford Park-Nortown,39.0,5210.0,1185.0,123077.0
137,Yonge-St.Clair,97.0,2330.0,645.0,114174.0
3,Annex,95.0,6485.0,2400.0,112766.0


Table 3 Toronto neighborhood profiles sort by Chinese population

In [373]:
# table of top ten neighbourhoods has most Chinese population
df_census.sort_values(by = "Chinese", ascending = False, axis=0, inplace = True)
df_census.head(10)

index,Neighbourhood,Neighbourhood Number,Asian origins,Chinese,Total income: Average amount ($)
76,Milliken,130.0,23750.0,19140.0,28085.0
112,Steeles,116.0,21160.0,17835.0,31786.0
129,Willowdale East,51.0,36920.0,17240.0,45326.0
0,Agincourt North,129.0,24305.0,16950.0,30414.0
66,L'Amoreaux,117.0,30785.0,16745.0,31826.0
1,Agincourt South-Malvern West,128.0,17955.0,11455.0,31825.0
122,Waterfront Communities-The Island,77.0,24810.0,9790.0,70600.0
114,Tam O'Shanter-Sullivan,118.0,17925.0,9070.0,34200.0
52,Hillcrest Village,48.0,12030.0,8265.0,40442.0
6,Bay Street Corridor,76.0,15040.0,7585.0,56526.0


Use maps to illustrate income and population distribution. 

In [374]:
# map of each neighbourhood's average income
address = 'Toronto, Ontario'
geolocator = Nominatim(user_agent = "tor_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

toronto_map = folium.Map(location=[latitude, longitude], zoom_start=10)


toronto_geo = r'neighbourhoods_toronto.geojson'
toronto_map.choropleth(
    geo_data = toronto_geo,
    data = df_census,
    columns = ['Neighbourhood Number', 'Total income: Average amount ($)'],
    key_on = 'feature.properties.hoodnum',
    fill_color = 'YlOrRd',
    fill_opacity = 0.7,
    line_opacity = 0.2,
    legend_name = 'Average Income',
    reset = True
    
)
toronto_map



Figure 1: Map of Toronto showing average income distribution 

In [375]:
# map of each neighbourhood's chinese population
toronto_map_pop = folium.Map(location=[latitude, longitude], zoom_start=10)
toronto_map_pop.choropleth(
    geo_data = toronto_geo,
    data = df_census,
    columns = ['Neighbourhood Number', 'Chinese'],
    key_on = 'feature.properties.hoodnum',
    fill_color = 'PuBuGn',
    fill_opacity = 0.7,
    line_opacity = 0.2,
    legend_name = 'Chinese Population',
    reset = True
    
)
toronto_map_pop

Figure 2: Map of Toronto showing Chinese Population distribution 

To merge the census data with the post code data, data from wikipedia are looked up using neighbourhood name from census table and inserted to census table. There are several names cannot be matched. The blank among the top 30 Chinese population neighbourhood are filled manually from googled data.  

Table 4 Toronto Neighbourhood Population and Income with Postcode 

In [495]:
# Find corresponsing postcode of each neighbourhood from wikipedia table. 
# For top 30 Chinese population, if not found in wikipedia, mannually add data from google
df_census["Postcode"] = np.nan
df_census['Postcode'] = df_census["Postcode"].astype("str")
df_census['Neighbourhood'] = df_census['Neighbourhood'].str.replace('-', '|')
for index, row in df_census.iterrows():
        postcode = df_toronto[df_toronto['Neighbourhood'].str.contains(row['Neighbourhood'])].Postcode
        if len(postcode):
            df_census.at[index, 'Postcode'] = postcode.iloc[0]
        else:
            row['Neighbourhood'].replace("|", " ")
            lt = list(row['Neighbourhood'].split(" "))
            postcode = df_toronto[(df_toronto['Neighbourhood'].isin(lt))].Postcode
            if len(postcode):
                df_census.at[index, 'Postcode'] = postcode.iloc[0]
        


df_census.sort_values(by = "Chinese", ascending = False, axis=0, inplace = True)
df_census.at[122, 'Postcode'] = 'M5P'
df_census.at[6, 'Postcode'] = 'M5G'
df_census.at[30, 'Postcode'] = 'M2J'
df_census.at[98, 'Postcode'] = 'M1W'
df_census.at[111, 'Postcode'] = 'M2P'
df_census.at[11, 'Postcode'] = 'M1P'
df_census = df_census[df_census.Postcode != 'nan']
df_census.head(10)


index,Neighbourhood,Neighbourhood Number,Asian origins,Chinese,Total income: Average amount ($),Postcode
76,Milliken,130.0,23750.0,19140.0,28085.0,M1V
112,Steeles,116.0,21160.0,17835.0,31786.0,M1V
129,Willowdale East,51.0,36920.0,17240.0,45326.0,M2M
0,Agincourt North,129.0,24305.0,16950.0,30414.0,M1V
66,L'Amoreaux,117.0,30785.0,16745.0,31826.0,M1V
1,Agincourt South|Malvern West,128.0,17955.0,11455.0,31825.0,M1S
122,Waterfront Communities|The Island,77.0,24810.0,9790.0,70600.0,M5P
114,Tam O'Shanter|Sullivan,118.0,17925.0,9070.0,34200.0,M1T
52,Hillcrest Village,48.0,12030.0,8265.0,40442.0,M2H
6,Bay Street Corridor,76.0,15040.0,7585.0,56526.0,M5G


Table 5: Grouped census data by Post code with population sum up and average income.

In [496]:
# form a table group by postcode 
df_census_gp = df_census.groupby('Postcode').agg({'Asian origins':'sum','Chinese':'sum','Total income: Average amount ($)':'mean'})
df_census_gp.head(10)

Unnamed: 0_level_0,Asian origins,Chinese,Total income: Average amount ($)
Postcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
M1B,57160.0,7120.0,34564.5
M1C,7175.0,1195.0,40972.0
M1E,21485.0,2965.0,39605.666667
M1G,32850.0,4385.0,30878.0
M1J,8120.0,665.0,32913.0
M1K,29645.0,4535.0,32863.0
M1L,7305.0,825.0,26793.0
M1M,9895.0,2530.0,49539.0
M1P,33855.0,7880.0,32474.0
M1S,17955.0,11455.0,31825.0


In [520]:
# combine neighboufhood postcode data with geospatial data
df_toronto["Neighbourhood"] = df_toronto["Neighbourhood"].astype("str")
df_toronto_clean = df_toronto.groupby(['Postcode','Borough'])['Neighbourhood'].apply(", ".join)
df_toronto_clean = df_toronto_clean.reset_index()
df_toronto_clean =df_toronto_clean.drop_duplicates()
df_position = pd.read_csv("https://cocl.us/Geospatial_data")
df_position = df_position.rename({'Postal Code': 'Postcode'}, axis = 1)
df_toronto_full = pd.merge(df_toronto_clean, df_position, on="Postcode", how="inner")
#df_toronto_full.shape

In [521]:
# combine census data and geospatial data
df_toronto_full = pd.merge(df_toronto_full, df_census_gp, on='Postcode', how = 'inner')
df_toronto_full.sort_values(by = "Chinese", ascending = False, axis = 0, inplace = True)
df_toronto_full.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Asian origins,Chinese,Total income: Average amount ($)
11,M1V,Scarborough,"Agincourt North, L'Amoreaux East, Milliken, St...",43.815252,-79.284577,107920.0,75150.0,33871.4
17,M2M,North York,"Newtonbrook, Willowdale",43.789053,-79.408493,62290.0,24790.0,41814.0
9,M1S,Scarborough,Agincourt,43.7942,-79.262029,17955.0,11455.0,31825.0
30,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,20830.0,10760.0,61487.2
14,M2J,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,28930.0,10755.0,36869.0
37,M5P,Central Toronto,"Forest Hill North, Forest Hill West",43.696948,-79.411307,28080.0,10175.0,77849.5
10,M1T,Scarborough,"Clarks Corners, Sullivan, Tam O'Shanter",43.781638,-79.304302,17925.0,9070.0,34200.0
13,M2H,North York,Hillcrest Village,43.803762,-79.363452,12030.0,8265.0,40442.0
8,M1P,Scarborough,"Dorset Park, Scarborough Town Centre, Wexford ...",43.75741,-79.273304,33855.0,7880.0,32474.0
35,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,15040.0,7585.0,56526.0


In [504]:
# write a function to collect venue data from foursquare by location data

def getNearbyVenues(names, query, latitudes, longitudes, radius=3000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&categoryId={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION,
            query,
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [505]:
CLIENT_ID = '0NFSEYBHZISORKI30P1UAOQPV4RJGZLJV5RKSSHAYTYV0YNL' # your Foursquare ID
CLIENT_SECRET = 'KUKV1R5NLBM5ONZFNN304FDKNI2LSGCKAOCK4FVFBNZJUZ1R' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [506]:
LIMIT = 500
df_toronto_venues = getNearbyVenues(names = df_toronto_full['Neighbourhood'], 
                                    query = '4bf58dd8d48988d145941735',
                                  latitudes = df_toronto_full['Latitude'],
                                  longitudes = df_toronto_full['Longitude'])

Table 6: List of Chinese Restaurant with Neighbourhood 

In [507]:
df_toronto_venues.head(5)

Unnamed: 0,Neighbourhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Agincourt North, L'Amoreaux East, Milliken, St...",43.815252,-79.284577,Lotus Pond Vegetarian Restaurant 蓮花素食,43.819421,-79.294682,Chinese Restaurant
1,"Agincourt North, L'Amoreaux East, Milliken, St...",43.815252,-79.284577,Fishman Lobster Clubhouse Restaurant 魚樂軒,43.801909,-79.295409,Chinese Restaurant
2,"Agincourt North, L'Amoreaux East, Milliken, St...",43.815252,-79.284577,Kenny's Noodles 聯記麵家,43.824484,-79.300542,Chinese Restaurant
3,"Agincourt North, L'Amoreaux East, Milliken, St...",43.815252,-79.284577,Magic Noodle 大槐樹,43.8145,-79.294344,Chinese Restaurant
4,"Agincourt North, L'Amoreaux East, Milliken, St...",43.815252,-79.284577,Alton Restaurant 益街坊,43.825582,-79.276038,Chinese Restaurant


Table 7: Toronto Neighbourhood sorted by Chinese restaurant count

In [508]:
# find the total count of chinese restaurant of each area
df_toronto_ChR = df_toronto_venues.groupby('Neighbourhood')['Venue'].count().reset_index()

df_toronto_ChR.sort_values(by = "Venue", ascending = False, axis = 0, inplace = True)
df_toronto_ChR.rename({'Venue':'Chinese Restaurant Count'}, axis = 1, inplace = True)
df_toronto_ChR.head(10)

Unnamed: 0,Neighbourhood,Chinese Restaurant Count
13,Church and Wellesley,100
50,"The Annex, North Midtown, Yorkville",100
41,"Little Portugal, Trinity",100
28,"Harbourfront, Regent Park",100
9,"Cabbagetown, St. James Town",100
11,Central Bay Street,100
12,"Chinatown, Grange Park, Kensington Market",100
38,L'Amoreaux West,93
1,"Agincourt North, L'Amoreaux East, Milliken, St...",89
8,"CN Tower, Bathurst Quay, Island airport, Harbo...",85


Table 8: Full table of Toronto Neighbourhood with Postcode, census, and restaurant count 

In [509]:
# merge the restaurant count with census data
df_tor = pd.merge(df_toronto_full, df_toronto_ChR, on="Neighbourhood", how="outer")
df_tor.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Asian origins,Chinese,Total income: Average amount ($),Chinese Restaurant Count
0,M1V,Scarborough,"Agincourt North, L'Amoreaux East, Milliken, St...",43.815252,-79.284577,107920.0,75150.0,33871.4,89
1,M2M,North York,"Newtonbrook, Willowdale",43.789053,-79.408493,62290.0,24790.0,41814.0,18
2,M1S,Scarborough,Agincourt,43.7942,-79.262029,17955.0,11455.0,31825.0,80
3,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,20830.0,10760.0,61487.2,25
4,M2J,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,28930.0,10755.0,36869.0,28
5,M5P,Central Toronto,"Forest Hill North, Forest Hill West",43.696948,-79.411307,28080.0,10175.0,77849.5,12
6,M1T,Scarborough,"Clarks Corners, Sullivan, Tam O'Shanter",43.781638,-79.304302,17925.0,9070.0,34200.0,68
7,M2H,North York,Hillcrest Village,43.803762,-79.363452,12030.0,8265.0,40442.0,26
8,M1P,Scarborough,"Dorset Park, Scarborough Town Centre, Wexford ...",43.75741,-79.273304,33855.0,7880.0,32474.0,16
9,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,15040.0,7585.0,56526.0,100


Use K-Means clustering algrithm to analyze the neighbourhood, the clustering number is set to 5.

In [510]:
kclusters = 5
df_tor.sort_values(by = "Chinese", ascending = False, axis = 0, inplace = True)
df_tor_clustering = df_tor.drop(['Postcode','Borough','Neighbourhood','Latitude','Longitude'], axis = 1)
#run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(df_tor_clustering)

# check cluster labels generated for each row in the dataframe
#kmeans.labels_[0:10] 

Table 9: Toronto Neighbourhood with cluster labels

In [511]:
df_tor.sort_values(by = "Chinese", ascending = False, axis = 0, inplace = True)
df_tor.insert(0, 'Cluster Labels', kmeans.labels_)
df_tor.head(10)

Unnamed: 0,Cluster Labels,Postcode,Borough,Neighbourhood,Latitude,Longitude,Asian origins,Chinese,Total income: Average amount ($),Chinese Restaurant Count
0,0,M1V,Scarborough,"Agincourt North, L'Amoreaux East, Milliken, St...",43.815252,-79.284577,107920.0,75150.0,33871.4,89
1,0,M2M,North York,"Newtonbrook, Willowdale",43.789053,-79.408493,62290.0,24790.0,41814.0,18
2,3,M1S,Scarborough,Agincourt,43.7942,-79.262029,17955.0,11455.0,31825.0,80
3,1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,20830.0,10760.0,61487.2,25
4,3,M2J,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,28930.0,10755.0,36869.0,28
5,1,M5P,Central Toronto,"Forest Hill North, Forest Hill West",43.696948,-79.411307,28080.0,10175.0,77849.5,12
6,3,M1T,Scarborough,"Clarks Corners, Sullivan, Tam O'Shanter",43.781638,-79.304302,17925.0,9070.0,34200.0,68
7,3,M2H,North York,Hillcrest Village,43.803762,-79.363452,12030.0,8265.0,40442.0,26
8,3,M1P,Scarborough,"Dorset Park, Scarborough Town Centre, Wexford ...",43.75741,-79.273304,33855.0,7880.0,32474.0,16
9,1,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,15040.0,7585.0,56526.0,100


## Discussion 

Check 5 clusters independently. Cluster 0 has ralatively low average income, however not bad chinese population. Cluster 1 has middle range of income from 50k to 80k. Cluster 2 has highest average income and middle amount of chinese population. Cluster 3 has loweast average income. Cluster 4 has middle high range of income, and middle range of chinese population. Map the clusters with different color, which gives an idea of eather cluster's location. 

Table 10: 5 Clusters data serperately

In [512]:
df_tor.dtypes
df_tor_cl0 = df_tor[df_tor['Cluster Labels'] == 0]
df_tor_cl0

Unnamed: 0,Cluster Labels,Postcode,Borough,Neighbourhood,Latitude,Longitude,Asian origins,Chinese,Total income: Average amount ($),Chinese Restaurant Count
0,0,M1V,Scarborough,"Agincourt North, L'Amoreaux East, Milliken, St...",43.815252,-79.284577,107920.0,75150.0,33871.4,89
1,0,M2M,North York,"Newtonbrook, Willowdale",43.789053,-79.408493,62290.0,24790.0,41814.0,18
10,0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,57160.0,7120.0,34564.5,2


In [514]:

df_tor_cl1 = df_tor[df_tor['Cluster Labels'] == 1]
df_tor_cl1.head(10)

Unnamed: 0,Cluster Labels,Postcode,Borough,Neighbourhood,Latitude,Longitude,Asian origins,Chinese,Total income: Average amount ($),Chinese Restaurant Count
3,1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,20830.0,10760.0,61487.2,25
5,1,M5P,Central Toronto,"Forest Hill North, Forest Hill West",43.696948,-79.411307,28080.0,10175.0,77849.5,12
9,1,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,15040.0,7585.0,56526.0,100
12,1,M2K,North York,Bayview Village,43.786947,-79.385975,13845.0,6065.0,52035.0,21
15,1,M3B,North York,Don Mills North,43.745906,-79.352188,12025.0,4850.0,67757.0,9
18,1,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316,10740.0,4305.0,53583.0,100
24,1,M5V,Downtown Toronto,"CN Tower, Bathurst Quay, Island airport, Harbo...",43.628947,-79.39442,8255.0,2720.0,70623.0,85
28,1,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242,11785.0,2250.0,52787.0,4
32,1,M9B,Etobicoke,"Cloverdale, Islington, Martin Grove, Princess ...",43.650943,-79.554724,5765.0,1515.0,73028.5,5
42,1,M6P,West Toronto,"High Park, The Junction South",43.661608,-79.464763,3400.0,920.0,71204.0,7


In [480]:
df_tor_cl2 = df_tor[df_tor['Cluster Labels'] == 2]
df_tor_cl2

Unnamed: 0,Cluster Labels,Postcode,Borough,Neighbourhood,Latitude,Longitude,Asian origins,Chinese,Total income: Average amount ($),Chinese Restaurant Count
33,2,M2L,North York,"Silver Hills, York Mills",43.75749,-79.374714,3005.0,1445.0,308010.0,9
34,2,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316,3485.0,1300.0,207903.0,23


In [515]:
df_tor_cl3 = df_tor[df_tor['Cluster Labels'] == 3]
df_tor_cl3.head(10)

Unnamed: 0,Cluster Labels,Postcode,Borough,Neighbourhood,Latitude,Longitude,Asian origins,Chinese,Total income: Average amount ($),Chinese Restaurant Count
2,3,M1S,Scarborough,Agincourt,43.7942,-79.262029,17955.0,11455.0,31825.0,80
4,3,M2J,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,28930.0,10755.0,36869.0,28
6,3,M1T,Scarborough,"Clarks Corners, Sullivan, Tam O'Shanter",43.781638,-79.304302,17925.0,9070.0,34200.0,68
7,3,M2H,North York,Hillcrest Village,43.803762,-79.363452,12030.0,8265.0,40442.0,26
8,3,M1P,Scarborough,"Dorset Park, Scarborough Town Centre, Wexford ...",43.75741,-79.273304,33855.0,7880.0,32474.0,16
11,3,M5T,Downtown Toronto,"Chinatown, Grange Park, Kensington Market",43.653206,-79.400049,9140.0,6250.0,37422.0,100
13,3,M1W,Scarborough,L'Amoreaux West,43.799525,-79.318389,9515.0,5795.0,36346.0,93
14,3,M6J,West Toronto,"Little Portugal, Trinity",43.647927,-79.41975,7770.0,4915.0,48215.5,100
16,3,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029,29645.0,4535.0,32863.0,11
17,3,M1G,Scarborough,Woburn,43.770992,-79.216917,32850.0,4385.0,30878.0,8


In [482]:
df_tor_cl4 = df_tor[df_tor['Cluster Labels'] == 4]
df_tor_cl4

Unnamed: 0,Cluster Labels,Postcode,Borough,Neighbourhood,Latitude,Longitude,Asian origins,Chinese,Total income: Average amount ($),Chinese Restaurant Count
20,4,M2P,North York,York Mills West,43.752758,-79.400049,8615.0,3890.0,100516.0,11
27,4,M5R,Central Toronto,"The Annex, North Midtown, Yorkville",43.67271,-79.405678,6485.0,2400.0,112766.0,100
38,4,M5M,North York,"Bedford Park, Lawrence Manor East",43.733283,-79.41975,5210.0,1185.0,123077.0,7
39,4,M4G,East York,Leaside,43.70906,-79.363452,2740.0,1125.0,125564.0,10
47,4,M4E,East Toronto,The Beaches,43.676357,-79.293031,2680.0,765.0,92580.0,9


In [484]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_tor['Latitude'], df_tor['Longitude'], df_tor['Neighbourhood'], df_tor['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster],
        fill=True,
        fill_color=rainbow[cluster],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Figure 3: Divide Toronto neighbourhood to 5 clusters based on the population and income

Table 11: Explanation of Clusters 

In [485]:
data = [['cluster 0', 'low income, high population', 'Purple'], ['cluster 1', 'middle income', 'Blue'], ['cluster 2', 'high income, middle low population', 'Turquoise'], ['cluster 3', 'low income', 'Orange'],['cluster 4', 'middle high income, middle population', 'Red'] ]
df = pd.DataFrame(data, columns = ['Cluster', 'Meaning', 'Color'])
df

Unnamed: 0,Cluster,Meaning,Color
0,cluster 0,"low income, high population",Purple
1,cluster 1,middle income,Blue
2,cluster 2,"high income, middle low population",Turquoise
3,cluster 3,low income,Orange
4,cluster 4,"middle high income, middle population",Red


To make a good profit, find the place has higher income is necessary. Thus cluster 2 and cluster 4 are reviewed further to narrow down the target location. Besides all the known data, a ratio of restaurant per person is calculated. Lower value means this area needs more restaurants. 

Table 12: Cluster 2 and Cluster 4 data sorted by Restaurant per person ratio increasing

In [487]:
df_tor_review = df_tor[(df_tor['Cluster Labels'] == 4)|(df_tor['Cluster Labels'] == 2)]
df_tor_review.insert(10, 'Restaurant/Person', df_tor_review['Chinese Restaurant Count']/df_tor_review['Chinese'])
df_tor_review.sort_values(by = 'Restaurant/Person', ascending = True, axis = 0, inplace = True)
df_tor_review

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,Cluster Labels,Postcode,Borough,Neighbourhood,Latitude,Longitude,Asian origins,Chinese,Total income: Average amount ($),Chinese Restaurant Count,Restaurant/Person
20,4,M2P,North York,York Mills West,43.752758,-79.400049,8615.0,3890.0,100516.0,11,0.002828
38,4,M5M,North York,"Bedford Park, Lawrence Manor East",43.733283,-79.41975,5210.0,1185.0,123077.0,7,0.005907
33,2,M2L,North York,"Silver Hills, York Mills",43.75749,-79.374714,3005.0,1445.0,308010.0,9,0.006228
39,4,M4G,East York,Leaside,43.70906,-79.363452,2740.0,1125.0,125564.0,10,0.008889
47,4,M4E,East Toronto,The Beaches,43.676357,-79.293031,2680.0,765.0,92580.0,9,0.011765
34,2,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316,3485.0,1300.0,207903.0,23,0.017692
27,4,M5R,Central Toronto,"The Annex, North Midtown, Yorkville",43.67271,-79.405678,6485.0,2400.0,112766.0,100,0.041667


From Table 12, it seems neighbourhood "York Mills West" with postcode "M2P" has the relatively low chinese restaurant density and meanwhile has relatively high average income. The next two neighbourhoods are "Bedford Park, Lawrence Manor East" and "Silver Hills, York Mills". To confirm this result, all the neighbourhood is checked together with Restaurant per person ratio added. 

Table 13: Full toronto neighbourhood data with Restaurant per person

In [517]:
df_tor.insert(10, 'Restaurant/Person', (df_tor['Chinese Restaurant Count']/df_tor['Chinese']))


In [518]:
df_tor.sort_values(by = "Restaurant/Person", ascending = True, axis = 0, inplace = True)
df_tor.head(5)

Unnamed: 0,Cluster Labels,Postcode,Borough,Neighbourhood,Latitude,Longitude,Asian origins,Chinese,Total income: Average amount ($),Chinese Restaurant Count,Restaurant/Person
10,0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,57160.0,7120.0,34564.5,2,0.000281
1,0,M2M,North York,"Newtonbrook, Willowdale",43.789053,-79.408493,62290.0,24790.0,41814.0,18,0.000726
23,3,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,21485.0,2965.0,39605.666667,3,0.001012
5,1,M5P,Central Toronto,"Forest Hill North, Forest Hill West",43.696948,-79.411307,28080.0,10175.0,77849.5,12,0.001179
0,0,M1V,Scarborough,"Agincourt North, L'Amoreaux East, Milliken, St...",43.815252,-79.284577,107920.0,75150.0,33871.4,89,0.001184


Narrow down data with some threshold setting. Eg: Restaurant/Person < 0.007 and Average income > 100k. 

Table 14 Neighbourhood has low density of Chinese restaurant but high income 

In [494]:
df_tor_rate = df_tor[(df_tor['Restaurant/Person'] < 0.007) & (df_tor['Total income: Average amount ($)'] > 100000)]
df_tor_rate

Unnamed: 0,Cluster Labels,Postcode,Borough,Neighbourhood,Latitude,Longitude,Asian origins,Chinese,Total income: Average amount ($),Chinese Restaurant Count,Restaurant/Person
20,4,M2P,North York,York Mills West,43.752758,-79.400049,8615.0,3890.0,100516.0,11,0.002828
38,4,M5M,North York,"Bedford Park, Lawrence Manor East",43.733283,-79.41975,5210.0,1185.0,123077.0,7,0.005907
33,2,M2L,North York,"Silver Hills, York Mills",43.75749,-79.374714,3005.0,1445.0,308010.0,9,0.006228


The result is same as what we investicated from Clusters. From Figure 4, we can see the location of these top three neighbourhoods. 

In [454]:
map_target = folium.Map(location=[latitude, longitude], zoom_start=10)


# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_tor_rate['Latitude'], df_tor_rate['Longitude'], df_tor_rate['Neighbourhood'], df_tor_rate['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=10,
        popup=label,
        color='Black',
        fill=True,
        fill_color='red',
        fill_opacity=0.7).add_to(map_target)
       
map_target



Figure 4: Neighbourhoods good to open a new high-class chinese restaurant 


## Conclusion 

Open a new chinese restaurant could be very challenging. Picking a good location would greatly increase the posibilly of successing. In this report, many data including chinese population, average income, current chinese restaurant ammount are investigated. The neighbourhood "York Mills West" beat the other neighbourhoods and become a good candidate for a new high-class chinese restaurant. 