# This is created for "Capstone Project - The Battle of the Neighborhoods (Week 2)"

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)


## Introduction: Business Problem <a name="introduction"></a>


Toronto, city, capital of the province of Ontario, southeastern Canada. Similar to New York, it is the most populous city in Canada, with approximately 6.2 millions of people. It is a multicultural city, and the country’s financial and commercial centre.
Considering the high population and diversity of people, Toronto is a good location for one to start a cafe/restaurant here. There is always demand for food and beverages.

In this project we will try to find an optimal location for a restaurant. Specifically, this report will be targeted to stakeholders interested in opening a cafe or a restaurant in Toronto, Canada.
The underlying business problem is to perform an analysis of the neighbourhood, to identify a few locations for setting up a restaurant. Location is one of the important factor in F&B business. Therefore, Foursquare data would provide some information that helps our analysis.

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decission are:

* number of existing restaurants in the neighborhood (any type of restaurant)
* number of and distance to cafe/restaurants in the neighborhood, if any
* distance of neighborhood from city center


We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.


In [1]:
# Install BeautifulSoup4 package
!pip install BeautifulSoup4

Collecting BeautifulSoup4
[?25l  Downloading https://files.pythonhosted.org/packages/d1/41/e6495bd7d3781cee623ce23ea6ac73282a373088fcd0ddc809a047b18eae/beautifulsoup4-4.9.3-py3-none-any.whl (115kB)
[K     |████████████████████████████████| 122kB 963kB/s eta 0:00:01
[?25hCollecting soupsieve>1.2; python_version >= "3.0" (from BeautifulSoup4)
  Downloading https://files.pythonhosted.org/packages/36/69/d82d04022f02733bf9a72bc3b96332d360c0c5307096d76f6bb7489f7e57/soupsieve-2.2.1-py3-none-any.whl
Installing collected packages: soupsieve, BeautifulSoup4
Successfully installed BeautifulSoup4-4.9.3 soupsieve-2.2.1


In [2]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis

import requests
import urllib.request, json
from pandas.io.json import json_normalize
from bs4 import BeautifulSoup
 
#postal_url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
postal_url = 'https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M&diff=995657573&oldid=979555370'


In [3]:
# Read the wikipedia URL and parse the data
page = urllib.request.urlopen(postal_url)

data = page.read()
soup = BeautifulSoup(data, "html.parser")

In [4]:
# To find the table that contains the targeted postal data
postal_table = soup.find('table', class_='wikitable sortable')

# Loop through postal_table to populate the respective columns
COL1=[] 
COL2=[]
COL3=[]

for row in postal_table.findAll('tr'):
    cells = row.findAll('td')
    if len(cells)==3:
        COL1.append(cells[0].find(text=True))
        COL2.append(cells[1].find(text=True))
        COL3.append(cells[2].find(text=True))
         


In [5]:
# Assign each column to a dataframe with its desiganted Column name
df = pd.DataFrame(COL1, columns=['PostalCode'])
df['Borough'] = COL2
df['Neighborhood'] = COL3

# COL1, COL2 and COL3 contains "\n" character. The following code was to drop "\n" character in each rows.
df['PostalCode'] = df['PostalCode'].replace('\n','',regex=True)
df['Borough'] = df['Borough'].replace('\n','',regex=True)
df['Neighborhood'] = df['Neighborhood'].replace('\n','',regex=True)

# Drop rows where Borough = "Not assigned"
df = df.drop(df[df['Borough']=='Not assigned'].index)


In [6]:
!pip install geocoder
import geocoder # import geocoder

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |████████████████████████████████| 102kB 1.0MB/s ta 0:00:01
Collecting ratelim (from geocoder)
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad49c/ratelim-0.1.6-py2.py3-none-any.whl
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6


In [7]:
# Could not get the geocode works and it fails to load the latitude and longitude.
# Change to Load coordinate data from CSV
geo_df = pd.read_csv('./Geospatial_Coordinates.csv')


In [8]:
# Rename the postal code column in geo_df to be equal to PostalCode', this renames the column from the geo csv so the merge can happen automatically and cleanly
geo_df = geo_df.rename({'Postal Code':'PostalCode'}, axis=1)

# Merge coordinates into neighbourhood dataframe
df = df.merge(geo_df)
df

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


In [9]:
# Install geopy to import Nominatim
!pip install geopy
from geopy.geocoders import Nominatim

# folium for map generation
import folium

Collecting geopy
[?25l  Downloading https://files.pythonhosted.org/packages/0c/67/915668d0e286caa21a1da82a85ffe3d20528ec7212777b43ccd027d94023/geopy-2.1.0-py3-none-any.whl (112kB)
[K     |████████████████████████████████| 112kB 1.2MB/s eta 0:00:01
[?25hCollecting geographiclib<2,>=1.49 (from geopy)
  Downloading https://files.pythonhosted.org/packages/8b/62/26ec95a98ba64299163199e95ad1b0e34ad3f4e176e221c40245f211e425/geographiclib-1.50-py3-none-any.whl
Installing collected packages: geographiclib, geopy
Successfully installed geographiclib-1.50 geopy-2.1.0


In [10]:
# parse data to create a dataframe that contain the word "Downtown Toronto"
toronto_data = df[df['Borough'].str.contains('Downtown Toronto')].reset_index(drop=True)
toronto_data

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
5,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
6,M6G,Downtown Toronto,Christie,43.669542,-79.422564
7,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
8,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752
9,M5K,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576


In [11]:
# Get the coordinate of Toronto
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="trt_explorer")
location = geolocator.geocode(address)
trt_latitude = location.latitude
trt_longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(trt_latitude, trt_longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [12]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[trt_latitude, trt_longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Borough'], toronto_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [13]:

# Define Foursquare Credentials and Version

CLIENT_ID = '1EJD3MZZNFNR4PNENA2SAVZ1L0MM5X2OPM4LGC1TZ3D1J3BO' # your Foursquare ID
CLIENT_SECRET = 'XLKTX5FHA34KCLQQ1IN5YVYTEJCZMZARIKEKBSMF5VGP4I40' # your Foursquare Secret
ACCESS_TOKEN = '12WLBX1KF5PSMNJ3SLNI5LMIO2YSREJRRSQRFVBVOSSUX2MM' # your FourSquare Access Token
VERSION = '20180605'
LIMIT = 30
print('My credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

My credentails:
CLIENT_ID: 1EJD3MZZNFNR4PNENA2SAVZ1L0MM5X2OPM4LGC1TZ3D1J3BO
CLIENT_SECRET:XLKTX5FHA34KCLQQ1IN5YVYTEJCZMZARIKEKBSMF5VGP4I40


In [14]:
#Get the neighborhood's name of Central Bay Street
toronto_data.loc[5, 'Neighborhood']

'Central Bay Street'

In [15]:
# Get the neighborhood's latitude and longitude values.
neighborhood_latitude = toronto_data.loc[5, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = toronto_data.loc[5, 'Longitude'] # neighborhood longitude value

neighborhood_name = toronto_data.loc[5, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Central Bay Street are 43.6579524, -79.3873826.


In [16]:
# Now, get the top 100 venues that are within a radius of 500 meters of Central Bay Street
# First, create the GET request URL. 

LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius
categoryID= '4d4b7105d754a06374d81259'    # Food category retrieved from foursquare 

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    categoryID,
    radius, 
    LIMIT)
url # display URL


'https://api.foursquare.com/v2/venues/explore?&client_id=1EJD3MZZNFNR4PNENA2SAVZ1L0MM5X2OPM4LGC1TZ3D1J3BO&client_secret=XLKTX5FHA34KCLQQ1IN5YVYTEJCZMZARIKEKBSMF5VGP4I40&v=20180605&ll=43.6579524,-79.3873826&categoryId=4d4b7105d754a06374d81259&radius=500&limit=100'

In [17]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [18]:
# Create a function to repeat the same process to all the neighborhoods in Toronto
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            categoryID,
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)


In [19]:
toronto_venues = getNearbyVenues(names= toronto_data['Neighborhood'],
                                   latitudes=  toronto_data['Latitude'],
                                   longitudes= toronto_data['Longitude']
                                  )

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
Stn A PO Boxes
St. James Town, Cabbagetown
First Canadian Place, Underground city
Church and Wellesley


In [20]:
# check the size of the resulting dataframe
print(toronto_venues.shape)
toronto_venues.head()

(1078, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant
1,"Regent Park, Harbourfront",43.65426,-79.360636,Souvlaki Express,43.655584,-79.364438,Greek Restaurant
2,"Regent Park, Harbourfront",43.65426,-79.360636,Brick Street Bakery,43.650574,-79.359539,Bakery
3,"Regent Park, Harbourfront",43.65426,-79.360636,Figs Breakfast & Lunch,43.655675,-79.364503,Breakfast Spot
4,"Regent Park, Harbourfront",43.65426,-79.360636,Caffe Furbo,43.64997,-79.358849,Café


In [21]:
# check how many venues were returned for each neighborhood
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,45,45,45,45,45,45
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",2,2,2,2,2,2
Central Bay Street,77,77,77,77,77,77
Christie,7,7,7,7,7,7
Church and Wellesley,62,62,62,62,62,62
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
"First Canadian Place, Underground city",100,100,100,100,100,100
"Garden District, Ryerson",97,97,97,97,97,97
"Harbourfront East, Union Station, Toronto Islands",74,74,74,74,74,74
"Kensington Market, Chinatown, Grange Park",54,54,54,54,54,54


In [22]:
# find out how many unique categories can be curated from all the returned venues
print('There are {} uniques food categories within DownTown Toronto.'.format(len(toronto_venues['Venue Category'].unique())))


There are 80 uniques food categories within DownTown Toronto.


In [23]:
# Analyze Each Neighborhood
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()


Unnamed: 0,Neighborhood,Afghan Restaurant,American Restaurant,Arepa Restaurant,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Belgian Restaurant,Bistro,...,Steakhouse,Sushi Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint
0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront",0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [24]:
# Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Afghan Restaurant,American Restaurant,Arepa Restaurant,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Belgian Restaurant,Bistro,...,Steakhouse,Sushi Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint
0,Berczy Park,0.0,0.022222,0.0,0.022222,0.0,0.022222,0.066667,0.022222,0.044444,...,0.066667,0.066667,0.0,0.0,0.022222,0.022222,0.0,0.044444,0.0,0.0
1,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0
2,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,...,0.012987,0.051948,0.0,0.0,0.0,0.038961,0.0,0.012987,0.0,0.012987
3,Christie,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Church and Wellesley,0.016129,0.016129,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,...,0.016129,0.096774,0.0,0.0,0.0,0.0,0.016129,0.0,0.032258,0.0
5,"Commerce Court, Victoria Hotel",0.0,0.04,0.0,0.05,0.0,0.01,0.04,0.0,0.0,...,0.02,0.03,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0
6,"First Canadian Place, Underground city",0.0,0.04,0.0,0.06,0.0,0.01,0.04,0.0,0.0,...,0.03,0.04,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0
7,"Garden District, Ryerson",0.0,0.0,0.0,0.0,0.0,0.0,0.020619,0.0,0.0,...,0.010309,0.030928,0.0,0.0,0.0,0.041237,0.0,0.0,0.010309,0.010309
8,"Harbourfront East, Union Station, Toronto Islands",0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,...,0.040541,0.013514,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0
9,"Kensington Market, Chinatown, Grange Park",0.0,0.0,0.018519,0.0,0.0,0.0,0.074074,0.018519,0.0,...,0.0,0.0,0.018519,0.0,0.0,0.037037,0.0,0.092593,0.074074,0.0


In [25]:
#Let's print each neighborhood along with the top 10 most common venues
num_top_venues = 20

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
                            venue  freq
0                      Steakhouse  0.07
1                          Bakery  0.07
2                Sushi Restaurant  0.07
3                  Sandwich Place  0.07
4                           Diner  0.04
5               French Restaurant  0.04
6   Vegetarian / Vegan Restaurant  0.04
7                          Bistro  0.04
8                      Restaurant  0.04
9              Italian Restaurant  0.04
10             Seafood Restaurant  0.04
11                      Irish Pub  0.02
12            Japanese Restaurant  0.02
13      Middle Eastern Restaurant  0.02
14              Indian Restaurant  0.02
15                  Deli / Bodega  0.02
16            American Restaurant  0.02
17                     Food Truck  0.02
18               Greek Restaurant  0.02
19                    Salad Place  0.02


----CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport----
                       

In [26]:
# First, let's write a function to sort the venues in descending order.

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [27]:
# let's create the new dataframe and display the top 10 venues for each neighborhood.

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Sushi Restaurant,Sandwich Place,Bakery,Steakhouse,Bistro,Italian Restaurant,Restaurant,Seafood Restaurant,French Restaurant,Diner
1,"CN Tower, King and Spadina, Railway Lands, Har...",American Restaurant,Tapas Restaurant,Wings Joint,Fast Food Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Empanada Restaurant,Ethiopian Restaurant
2,Central Bay Street,Café,Sandwich Place,Italian Restaurant,Chinese Restaurant,Sushi Restaurant,Japanese Restaurant,Pizza Place,Thai Restaurant,Restaurant,Middle Eastern Restaurant
3,Christie,Café,American Restaurant,Italian Restaurant,Japanese Restaurant,Restaurant,Wings Joint,Falafel Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant
4,Church and Wellesley,Japanese Restaurant,Sushi Restaurant,Sandwich Place,Burrito Place,Pizza Place,Fast Food Restaurant,Restaurant,Korean Restaurant,Mexican Restaurant,Mediterranean Restaurant
5,"Commerce Court, Victoria Hotel",Sandwich Place,Café,Italian Restaurant,Restaurant,Deli / Bodega,Asian Restaurant,American Restaurant,Japanese Restaurant,Pizza Place,Bakery
6,"First Canadian Place, Underground city",Sandwich Place,Café,Fast Food Restaurant,Asian Restaurant,Deli / Bodega,Restaurant,Japanese Restaurant,American Restaurant,Sushi Restaurant,Bakery
7,"Garden District, Ryerson",Café,Sandwich Place,Restaurant,Pizza Place,Japanese Restaurant,Middle Eastern Restaurant,Thai Restaurant,Burger Joint,Fast Food Restaurant,Sushi Restaurant
8,"Harbourfront East, Union Station, Toronto Islands",Restaurant,Café,Pizza Place,Fast Food Restaurant,Chinese Restaurant,Deli / Bodega,Steakhouse,Fried Chicken Joint,Food Court,Salad Place
9,"Kensington Market, Chinatown, Grange Park",Café,Vegetarian / Vegan Restaurant,Bakery,Vietnamese Restaurant,Chinese Restaurant,Burger Joint,Mexican Restaurant,Dumpling Restaurant,Caribbean Restaurant,Thai Restaurant


In [28]:
# Cluster Neighborhoods
# Run k-means to cluster the neighborhood into 5 clusters.

# import k-means from clustering stage
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 2, 4, 3, 4, 4, 4, 4, 4, 0], dtype=int32)

In [29]:
# Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_data

# Merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,0,Café,Breakfast Spot,Bakery,Restaurant,Italian Restaurant,Mexican Restaurant,Japanese Restaurant,Sandwich Place,Seafood Restaurant,French Restaurant
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,4,Sandwich Place,Sushi Restaurant,Burrito Place,Burger Joint,Persian Restaurant,Deli / Bodega,Mexican Restaurant,Café,Portuguese Restaurant,Korean Restaurant
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,4,Café,Sandwich Place,Restaurant,Pizza Place,Japanese Restaurant,Middle Eastern Restaurant,Thai Restaurant,Burger Joint,Fast Food Restaurant,Sushi Restaurant
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,4,Café,Italian Restaurant,Restaurant,Sandwich Place,Sushi Restaurant,Bakery,Diner,Gastropub,Japanese Restaurant,Moroccan Restaurant
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,4,Sushi Restaurant,Sandwich Place,Bakery,Steakhouse,Bistro,Italian Restaurant,Restaurant,Seafood Restaurant,French Restaurant,Diner


In [30]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# let's visualize the resulting clusters
# Create map
map_clusters = folium.Map(location=[trt_latitude, trt_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters