# Capstone Project - The Battle of the Neighborhoods (Week 2)
Applied Data Science Capstone by IBM/Coursera

Introduction: Business Problem 
In this project we will try to find an optimal location for a restaurant. Specifically, this report will be targeted to stakeholders interested in opening an Chinese restaurant in Toronto, Canada.
Since there are lots of restaurants in Toronto we will try to detect locations that are not already crowded with restaurants. We are also particularly interested in areas with no Chinese restaurants in Toronto. We would also prefer locations as close to city center as possible, assuming that first two conditions are met.
We will use our data science powers to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

# Data 
Based on definition of our problem, factors that will influence our decission are:
number of existing restaurants in the neighborhood (any type of restaurant)
number of and distance to Chinese restaurants in the neighborhood, if any
distance of neighborhood from city center
We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.
Following data sources will be needed to extract/generate the required information:
centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using Google Maps API reverse geocoding
number of restaurants and their type and location in every neighborhood will be obtained using Foursquare API









# Neighborhood Candidates

In [1]:
pip install folium

Note: you may need to restart the kernel to use updated packages.


In [1]:
import requests
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
import folium


In [2]:
url  = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
page = requests.get(url)
if page.status_code == 200:
    print('Page download successful')

Page download successful


In [3]:
toronto = pd.read_html(url, header=0, na_values = ['Not assigned'])[0]
toronto.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,,
1,M2A,,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [4]:
toronto.dropna(subset=['Borough'], inplace=True)


postcodes = toronto.groupby(['Postal Code','Borough']).Neighborhood.agg([('Neighborhood', ', '.join)])
postcodes.reset_index(inplace=True)
postcodes.head(5)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [5]:
print('The shape of the dataset is:',postcodes.shape)


The shape of the dataset is: (103, 3)


In [6]:
postcodes.to_csv('Toronto_Postcodes.csv')

In [7]:
url_csv = 'http://cocl.us/Geospatial_data'
coordinates = pd.read_csv(url_csv)
coordinates.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [9]:
neighborhoods = pd.read_csv('Toronto_Postcodes.csv',index_col=[0])
neighborhoods.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [10]:
coordinates.rename(columns={'Postal Code': 'PostalCode'}, inplace=True)
neighborhoods.rename(columns={'Postal Code': 'PostalCode'}, inplace=True)


In [11]:
neighborhoods_coordinates = pd.merge(neighborhoods, coordinates, on='PostalCode')
neighborhoods_coordinates.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [12]:
neighborhoods_coordinates.to_csv('Toronto-2.csv')

In [13]:
toronto_df= pd.read_csv('Toronto-2.csv', index_col=0)
toronto_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [14]:
toronto_df.groupby('Borough').count()['Neighborhood']

Borough
Central Toronto      9
Downtown Toronto    19
East Toronto         5
East York            5
Etobicoke           12
Mississauga          1
North York          24
Scarborough         17
West Toronto         6
York                 5
Name: Neighborhood, dtype: int64

In [15]:
boroughs = toronto['Borough'].unique().tolist()

In [16]:
latitude_toronto = toronto_df['Latitude'].mean()
longitude_toronto = toronto_df['Longitude'].mean()
print('The geographical coordinates of Toronto are {}, {}'.format(latitude_toronto, longitude_toronto))


The geographical coordinates of Toronto are 43.70460773398059, -79.39715291165048


In [19]:
pip install geocoder

Note: you may need to restart the kernel to use updated packages.


In [20]:
conda install -c conda-forge folium=0.5.0

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.


Note: you may need to restart the kernel to use updated packages.


In [21]:
pip install geopy

Note: you may need to restart the kernel to use updated packages.


In [22]:
pip install bs4

Note: you may need to restart the kernel to use updated packages.


In [23]:
pip install lxml

Note: you may need to restart the kernel to use updated packages.


In [17]:
borough_color = {}
for borough in boroughs:
    borough_color[borough]= '#%02X%02X%02X' % tuple(np.random.choice(range(256), size=3))
    
map_toronto = folium.Map(location=[latitude_toronto, longitude_toronto], zoom_start=12)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_df['Latitude'], 
                                           toronto_df['Longitude'],
                                           toronto_df['Borough'], 
                                           toronto_df['Neighborhood']):
    label_text = borough + ' - ' + neighborhood
    label = folium.Popup(label_text)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=borough_color[borough],
        fill_color=borough_color[borough],
        fill_opacity=0.7).add_to(map_toronto)  
    
map_toronto

In [27]:
CLIENT_ID = 'VAH25GVAB0FYVVECSXU2TTQ1YPAO4MS1LU1ZMWHCBREJ2J1U' 
CLIENT_SECRET = 'U11TK4UIPXXCDFFYBQW4R042T4NMIWASVZJJPTHDQCYLUJYP'
VERSION = '20200701' 
LIMIT = 100 
radius = 500 

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [28]:
toronto_venues = getNearbyVenues(names=toronto_df['Neighborhood'],
                                latitudes=toronto_df['Latitude'],
                                longitudes=toronto_df['Longitude'])

Malvern, Rouge
Rouge Hill, Port Union, Highland Creek
Guildwood, Morningside, West Hill
Woburn
Cedarbrae
Scarborough Village
Kennedy Park, Ionview, East Birchmount Park
Golden Mile, Clairlea, Oakridge
Cliffside, Cliffcrest, Scarborough Village West
Birch Cliff, Cliffside West
Dorset Park, Wexford Heights, Scarborough Town Centre
Wexford, Maryvale
Agincourt
Clarks Corners, Tam O'Shanter, Sullivan
Milliken, Agincourt North, Steeles East, L'Amoreaux East
Steeles West, L'Amoreaux West
Upper Rouge
Hillcrest Village
Fairview, Henry Farm, Oriole
Bayview Village
York Mills, Silver Hills
Willowdale, Newtonbrook
Willowdale, Willowdale East
York Mills West
Willowdale, Willowdale West
Parkwoods
Don Mills
Don Mills
Bathurst Manor, Wilson Heights, Downsview North
Northwood Park, York University
Downsview
Downsview
Downsview
Downsview
Victoria Village
Parkview Hill, Woodbine Gardens
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto, Broadview North (Old East York)
The Danforth West, 

In [29]:
toronto_venues.shape

(2132, 7)

In [30]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Malvern, Rouge",43.806686,-79.194353,Wendy’s,43.807448,-79.199056,Fast Food Restaurant
1,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
2,"Guildwood, Morningside, West Hill",43.763573,-79.188711,RBC Royal Bank,43.76679,-79.191151,Bank
3,"Guildwood, Morningside, West Hill",43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store
4,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Sail Sushi,43.765951,-79.191275,Restaurant


In [31]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,5,5,5,5,5,5
"Alderwood, Long Branch",9,9,9,9,9,9
"Bathurst Manor, Wilson Heights, Downsview North",21,21,21,21,21,21
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",23,23,23,23,23,23
...,...,...,...,...,...,...
"Willowdale, Willowdale East",35,35,35,35,35,35
"Willowdale, Willowdale West",7,7,7,7,7,7
Woburn,4,4,4,4,4,4
Woodbine Heights,7,7,7,7,7,7


In [32]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 270 uniques categories.


In [33]:
toronto_venues['Venue Category'].unique()[:100]

array(['Fast Food Restaurant', 'Bar', 'Bank', 'Electronics Store',
       'Restaurant', 'Mexican Restaurant', 'Rental Car Location',
       'Medical Center', 'Intersection', 'Breakfast Spot', 'Coffee Shop',
       'Korean Restaurant', 'Pharmacy', 'Hakka Restaurant',
       'Caribbean Restaurant', 'Thai Restaurant', 'Athletics & Sports',
       'Bakery', 'Gas Station', 'Fried Chicken Joint', 'Playground',
       'Department Store', 'Chinese Restaurant', 'Hobby Shop',
       'Bus Station', 'Ice Cream Shop', 'Bus Line', 'Metro Station',
       'Park', 'Motel', 'American Restaurant', 'Café',
       'General Entertainment', 'Skating Rink', 'College Stadium',
       'Indian Restaurant', 'Pet Store', 'Vietnamese Restaurant',
       'Light Rail Station', 'Gaming Cafe', 'Sandwich Place',
       'Middle Eastern Restaurant', 'Smoke Shop', 'Auto Garage', 'Lounge',
       'Latin American Restaurant', 'Clothing Store',
       'Italian Restaurant', 'Noodle House', 'Pizza Place',
       'Grocery Store

In [34]:
to_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
to_onehot['Neighborhoods'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [to_onehot.columns[-1]] + list(to_onehot.columns[:-1])
to_onehot = to_onehot[fixed_columns]

print(to_onehot.shape)
to_onehot.head()

(2132, 271)


Unnamed: 0,Neighborhoods,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Malvern, Rouge",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Rouge Hill, Port Union, Highland Creek",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [35]:
to_grouped = to_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(to_grouped.shape)
to_grouped

(95, 271)


Unnamed: 0,Neighborhoods,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
90,"Willowdale, Willowdale East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0
91,"Willowdale, Willowdale West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
92,Woburn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
93,Woodbine Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0


In [36]:
len(to_grouped[to_grouped["Chinese Restaurant"] > 0])

15

In [46]:
Chinese = to_grouped[["Neighborhoods","Chinese Restaurant"]]
Chinese.head()

Unnamed: 0,Neighborhoods,Chinese Restaurant
0,Agincourt,0.0
1,"Alderwood, Long Branch",0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.047619
3,Bayview Village,0.25
4,"Bedford Park, Lawrence Manor East",0.0


In [47]:
# Run k-means to cluster the neighborhoods in Toronto into 3 clusters.

toclusters = 3

to_clustering = Chinese.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=toclusters, random_state=0).fit(to_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 2, 1, 0, 0, 0, 0, 0, 0], dtype=int32)

In [48]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
to_merged = Chinese.copy()

# add clustering labels
to_merged["Cluster Labels"] = kmeans.labels_
to_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
to_merged.head()

Unnamed: 0,Neighborhood,Chinese Restaurant,Cluster Labels
0,Agincourt,0.0,0
1,"Alderwood, Long Branch",0.0,0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.047619,2
3,Bayview Village,0.25,1
4,"Bedford Park, Lawrence Manor East",0.0,0


In [49]:
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
to_merged = to_merged.join(toronto_venues.set_index("Neighborhood"), on="Neighborhood")

print(to_merged.shape)
to_merged.head()

(2132, 9)


Unnamed: 0,Neighborhood,Chinese Restaurant,Cluster Labels,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Agincourt,0.0,0,43.7942,-79.262029,Panagio's Breakfast & Lunch,43.79237,-79.260203,Breakfast Spot
0,Agincourt,0.0,0,43.7942,-79.262029,Twilight,43.791999,-79.258584,Lounge
0,Agincourt,0.0,0,43.7942,-79.262029,El Pulgarcito,43.792648,-79.259208,Latin American Restaurant
0,Agincourt,0.0,0,43.7942,-79.262029,Mark's,43.791179,-79.259714,Clothing Store
0,Agincourt,0.0,0,43.7942,-79.262029,Commander Arena,43.794867,-79.267989,Skating Rink


In [50]:
print(to_merged.shape)
to_merged.sort_values(["Cluster Labels"], inplace=True)
to_merged

(2132, 9)


Unnamed: 0,Neighborhood,Chinese Restaurant,Cluster Labels,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Agincourt,0.000000,0,43.794200,-79.262029,Panagio's Breakfast & Lunch,43.792370,-79.260203,Breakfast Spot
63,"Regent Park, Harbourfront",0.000000,0,43.654260,-79.360636,The Sweet Escape Patisserie,43.650632,-79.358709,Bakery
63,"Regent Park, Harbourfront",0.000000,0,43.654260,-79.360636,John Fluevog Shoes,43.649896,-79.359436,Shoe Store
63,"Regent Park, Harbourfront",0.000000,0,43.654260,-79.360636,Berkeley Church,43.655123,-79.365873,Event Space
63,"Regent Park, Harbourfront",0.000000,0,43.654260,-79.360636,Soulpepper Theatre,43.650780,-79.357615,Theater
...,...,...,...,...,...,...,...,...,...
16,"Clarks Corners, Tam O'Shanter, Sullivan",0.076923,2,43.781638,-79.304302,Popeyes Louisiana Kitchen,43.780476,-79.298460,Fried Chicken Joint
16,"Clarks Corners, Tam O'Shanter, Sullivan",0.076923,2,43.781638,-79.304302,KFC,43.780400,-79.300700,Fast Food Restaurant
16,"Clarks Corners, Tam O'Shanter, Sullivan",0.076923,2,43.781638,-79.304302,Petro-Canada,43.779337,-79.307682,Gas Station
16,"Clarks Corners, Tam O'Shanter, Sullivan",0.076923,2,43.781638,-79.304302,Eight Noodles,43.778234,-79.308299,Noodle House


In [51]:
import matplotlib.cm as cm
import matplotlib.colors as colors

In [52]:
# create map
map_clusters = folium.Map(location=[latitude_toronto, longitude_toronto], zoom_start=11)

# set color scheme for the clusters
x = np.arange(toclusters)
ys = [i+x+(i*x)**2 for i in range(toclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(to_merged['Neighborhood Latitude'], to_merged['Neighborhood Longitude'], to_merged['Neighborhood'], to_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster))
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [53]:
to_merged.loc[to_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Chinese Restaurant,Cluster Labels,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Agincourt,0.000000,0,43.794200,-79.262029,Panagio's Breakfast & Lunch,43.792370,-79.260203,Breakfast Spot
63,"Regent Park, Harbourfront",0.000000,0,43.654260,-79.360636,The Sweet Escape Patisserie,43.650632,-79.358709,Bakery
63,"Regent Park, Harbourfront",0.000000,0,43.654260,-79.360636,John Fluevog Shoes,43.649896,-79.359436,Shoe Store
63,"Regent Park, Harbourfront",0.000000,0,43.654260,-79.360636,Berkeley Church,43.655123,-79.365873,Event Space
63,"Regent Park, Harbourfront",0.000000,0,43.654260,-79.360636,Soulpepper Theatre,43.650780,-79.357615,Theater
...,...,...,...,...,...,...,...,...,...
35,"Harbourfront East, Union Station, Toronto Islands",0.010000,0,43.640816,-79.381752,WestJet Stage,43.638022,-79.383536,Music Venue
35,"Harbourfront East, Union Station, Toronto Islands",0.010000,0,43.640816,-79.381752,360 Restaurant,43.642537,-79.387042,Wine Bar
35,"Harbourfront East, Union Station, Toronto Islands",0.010000,0,43.640816,-79.381752,Toronto Blue Jays Box Office,43.642416,-79.385862,Baseball Stadium
35,"Harbourfront East, Union Station, Toronto Islands",0.010000,0,43.640816,-79.381752,Glass Floor,43.642643,-79.386948,Scenic Lookout


In [54]:
to_merged.loc[to_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Chinese Restaurant,Cluster Labels,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
3,Bayview Village,0.25,1,43.786947,-79.385975,TD Canada Trust,43.788074,-79.380367,Bank
3,Bayview Village,0.25,1,43.786947,-79.385975,Maxim's Cafe and Patisserie,43.787863,-79.380751,Café
3,Bayview Village,0.25,1,43.786947,-79.385975,Kaga Sushi,43.787758,-79.38109,Japanese Restaurant
87,Westmount,0.125,1,43.696319,-79.532242,Starbucks,43.696338,-79.533398,Coffee Shop
42,"Kennedy Park, Ionview, East Birchmount Park",0.2,1,43.727929,-79.262029,Giant Tiger,43.727447,-79.26624,Department Store
42,"Kennedy Park, Ionview, East Birchmount Park",0.2,1,43.727929,-79.262029,Tim Hortons,43.726895,-79.266157,Coffee Shop
42,"Kennedy Park, Ionview, East Birchmount Park",0.2,1,43.727929,-79.262029,Hakka No.1,43.727688,-79.266057,Chinese Restaurant
87,Westmount,0.125,1,43.696319,-79.532242,Pizza Hut,43.696431,-79.533233,Pizza Place
87,Westmount,0.125,1,43.696319,-79.532242,Subway,43.692927,-79.531471,Sandwich Place
42,"Kennedy Park, Ionview, East Birchmount Park",0.2,1,43.727929,-79.262029,Tandy Leather,43.726974,-79.266513,Hobby Shop


In [55]:
to_merged.loc[to_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Chinese Restaurant,Cluster Labels,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
2,"Bathurst Manor, Wilson Heights, Downsview North",0.047619,2,43.754328,-79.442259,Orly Restaurant & Grill,43.754493,-79.443507,Middle Eastern Restaurant
2,"Bathurst Manor, Wilson Heights, Downsview North",0.047619,2,43.754328,-79.442259,Tim Hortons,43.754767,-79.443250,Coffee Shop
2,"Bathurst Manor, Wilson Heights, Downsview North",0.047619,2,43.754328,-79.442259,Dairy Queen,43.755680,-79.440166,Ice Cream Shop
2,"Bathurst Manor, Wilson Heights, Downsview North",0.047619,2,43.754328,-79.442259,Best for Bride in Toronto,43.755789,-79.437834,Bridal Shop
2,"Bathurst Manor, Wilson Heights, Downsview North",0.047619,2,43.754328,-79.442259,Bagel Plus,43.755395,-79.440686,Restaurant
...,...,...,...,...,...,...,...,...,...
16,"Clarks Corners, Tam O'Shanter, Sullivan",0.076923,2,43.781638,-79.304302,Popeyes Louisiana Kitchen,43.780476,-79.298460,Fried Chicken Joint
16,"Clarks Corners, Tam O'Shanter, Sullivan",0.076923,2,43.781638,-79.304302,KFC,43.780400,-79.300700,Fast Food Restaurant
16,"Clarks Corners, Tam O'Shanter, Sullivan",0.076923,2,43.781638,-79.304302,Petro-Canada,43.779337,-79.307682,Gas Station
16,"Clarks Corners, Tam O'Shanter, Sullivan",0.076923,2,43.781638,-79.304302,Eight Noodles,43.778234,-79.308299,Noodle House


# Results
The results from k-means clustering show that we can categorize Toronto neighbourhoods into 3 clusters based on how many Chinese restaurants are in each neighbourhood:
##• Cluster 0: Neighbourhoods with high number of Chinese restaurants
##• Cluster 1: Neighbourhoods with little to no Chinese restaurants
##• Cluster 2: Neighbourhoods with moderate number of Chinese restaurants
The results are visualized in the above map with Cluster 0 in red colour, Cluster 1 in purple colour and Cluster 2 in mintgreen colour.


# Conclusion
In this project, we have gone through the process of identifying the business problem, specifying the data required, extracting and preparing the data, performing machine learning by clustering the data into 3 clusters based on their similarities, and lastly providing recommendations to the relevant stakeholders i.e. entrepreneurs or households planning to open a restaurant. To answer the business question that was raised in the introduction section, the answer proposed by this project is: The neighbourhoods in cluster 1 are the most preferred locations to open a new Chinese restaurant as there is some healthy competition there. The findings of this project will help the entrepreneurs to make use of the opportunity to high potential locations while avoiding overcrowded areas in their decisions to open a Chinese restaurant.