<h1 align=center><font size = 8>Battle of the Neighborhoods</font></h1>
<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in Toronto</font></h1>
<a id='the_top'></a>

For easier navigation in this project file, please run all cells first and once done, use the table of contents. If the Links are non-responsive, note how the sections are grouped and scroll down to the appropriate section in this file.

### Table of Contents 
1. [Home (The Top)](#the_top)
2. [SECTION A](#sectionA)
    1. [Install Missing Libraries](#install_libraries)
    2. [Import Libraries](#import_libraries)
    3. [Web Scraping](#scrap_webpage_and_create_dataframe)
3. [SECTION B](#sectionB)
    1. [Create Geocoordinates Function](#create_function_for_getting_geocoordinates)
    2. [Geocoordinates to Dataframe](#append_geocoordinates_to_dataframe)
4. [SECTION C](#sectionC)
    1. [Four Square Credentials](#set_foursquare_credentials)
    2. [Nearby Venues Function](#create_function_to_get_nearby_venues)
    3. [Toronto Geocoordinates](#get_toronto_geocoordinates)
    4. [Filter Dataframe. Get Boroughs with Toronto in its Name](#create_dataframe_for_toronto_containing_boroughs)
    5. [Get Nearby Venues](#get_nearby_venues)
    6. [Venues per Neighborhood](#count_venues_per_neighborhood)
    7. [Get Number of Unique Venue Categories](#count_unique_venue_cats)
    8. [Toronto Onehot Encoding](#toronto_onehot_encoding)
    9. [Toronto Onehot Mean](#toronto_onehot_mean)
    10. [Top 5 Venues Per Neighborhood](#top_5_venues_per_neighborhood)
    11. [Most Common Venues](#most_common_venues)
    12. [Top 10 Most Common Venues](#top_ten_most_common_venues)
    13. [Get Venue Clusters](#get_clusters)
    14. [Add Cluster Label to Dataframe](#add_cluster_labels_to_dataframe)
    15. [View Map with Clusters](#view_map_with_clusters)
    16. [Most Common Venues Per Borough](#most_common_venues_per_borough)
    17. [Conclusion](#conclusion) 


### SECTION A
<a id='sectionA'></a>

First we install Folium and Geocoder:
<a id='install_libraries'></a>

In [1]:
!pip3 install folium==0.5.0
!pip3 install geocoder



Then we import all the libraries we will use:
<a id='import_libraries'></a>

In [2]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

# import k-means from clustering stage
from sklearn.cluster import KMeans 

import geocoder # module to convert an address into latitude and longitude values  Nominatim

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import folium # plotting library
import lxml.html as LH # import html to json converter library
from bs4 import BeautifulSoup # for extracting urls from web page

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import math

def text(elt):
    return elt.text_content().replace(u'\xa0', u' ')

print('Folium installed')
print('Libraries imported.')

Folium installed
Libraries imported.


We get data from the Wikipedia Page __https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M__ and load the postcode data into a Pandas Dataframe:
<a id='scrap_webpage_and_create_dataframe'></a>

In [3]:
wikiURL = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

r = requests.get(wikiURL)
root = LH.fromstring(r.content)



for table in root.xpath('//table[@class="wikitable sortable"]'):
    tdata = [text(tr) for tr in table.xpath('//tr')]


mlist = []

for inp in tdata:
    s = inp[1:]
    s2 = s[:-1]
    l = s2.split('\n\n')
    mlist.append(l)
    

columns = mlist[0]

del mlist[0]

pcode = []
borough = []
nhood = []

# list of name, degree, score 
for x in mlist:
    pcode.append(x[0])
    borough.append(x[1])
    if x[2]=='Not assigned':
        nhood.append(x[1]) 
    else:
        nhood.append(x[2]) 
    

# dictionary of lists  
dict = {'PostalCode': pcode, 'Borough': borough, 'Neighborhood': nhood}  

df_nbh = pd.DataFrame(dict) 

dropVal = ['Not assigned','\nNL','NS','Canadian postal codes','B']

df_nbh = df_nbh[~df_nbh['Borough'].isin(dropVal)]

df_nbh['Neighborhood'] = df_nbh[['PostalCode','Borough','Neighborhood']].groupby(['PostalCode'])['Neighborhood'].transform(lambda x: ','.join(x))

df_nbh[['PostalCode','Borough','Neighborhood']].drop_duplicates()

df_nbh.reset_index(drop=True, inplace=True)


df_nbh.head(50)


Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [4]:
 df_nbh.shape

(103, 3)

### SECTION B
<a id='sectionB'></a>

Create function for getting coordinates. This function will make4 consecutive calls to Google Geocoder. If it fails to get the coordinates within four trys, it gets them from the CSV file **Geospatial_Coordinates.csv**:
<a id='create_function_for_getting_geocoordinates'></a>

In [5]:
def getCoordinates(search,getType='default',output='long_lat'):
    searchString = '{}, Toronto, Ontario, Canada'.format(search)
    
    if search == 'Not assigned' or search == None:
        lat_lng_coords = [None,None]
    else:
        # initialize your variable to None
        lat_lng_coords = None
        # loop until you get the coordinates
        df_loc = pd.read_csv('Geospatial_Coordinates.csv')
        
        try:
            lat_lng_coordsx = [df_loc.loc[df_loc['Postal Code'] == search, 'Latitude'].iloc[0], df_loc.loc[df_loc['Postal Code'] == search, 'Longitude'].iloc[0]]
        except:
            lat_lng_coordsx = [None,None]
        
        if getType=='default':
            a = 0
            while(lat_lng_coords is None):
                if a<4:
                    try:
                        g = geocoder.google(searchString)
                        lat_lng_coords = g.latlng
                    except:
                        lat_lng_coords = lat_lng_coordsx
                else:
                    lat_lng_coords = lat_lng_coordsx
                a = a+1
        else:
            lat_lng_coords = lat_lng_coordsx
        #print('Iteration: '+str(a+1)+'\n')
        
    if  output=='long':
        out = lat_lng_coords[1]
    elif  output=='lat':
        out = lat_lng_coords[0]
    else:
        out = lat_lng_coords
        
        ##lat_lng_coords = lat_lng_coordsx
    return out 


We append the Geocoordinates to the dataframe:
<a id='append_geocoordinates_to_dataframe'></a>

In [6]:
df_nbh['Longitude'] = ""
df_nbh['Latitude'] = ""

for i, j in df_nbh.iterrows():
    long_lat = getCoordinates(df_nbh['PostalCode'].iloc[i],'No')
    df_nbh['Longitude'][i] = long_lat[1]
    df_nbh['Latitude'][i] = long_lat[0]

df_nbh.head(50) 

Unnamed: 0,PostalCode,Borough,Neighborhood,Longitude,Latitude
0,M3A,North York,Parkwoods,-79.3297,43.7533
1,M4A,North York,Victoria Village,-79.3156,43.7259
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",-79.3606,43.6543
3,M6A,North York,"Lawrence Manor, Lawrence Heights",-79.4648,43.7185
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",-79.3895,43.6623
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",-79.5322,43.6679
6,M1B,Scarborough,"Malvern, Rouge",-79.1944,43.8067
7,M3B,North York,Don Mills,-79.3522,43.7459
8,M4B,East York,"Parkview Hill, Woodbine Gardens",-79.3099,43.7064
9,M5B,Downtown Toronto,"Garden District, Ryerson",-79.3789,43.6572


We set our Four Square Credentials:
<a id='set_foursquare_credentials'></a>

In [7]:
# @hidden_cell

CLIENT_ID = 'HU374HGK53215G8RJ65FHEUEB3738EHGHKDSLAE82WJKKS' # Please use your Foursquare ID as this one is just a placeholder and so it will not work
CLIENT_SECRET = 'HDKSLEB93HFK2JK09384HBDKFOENAMEJ834390228' # Please use your Foursquare Secret as this one is just a placeholder and so it will not work
ACCESS_TOKEN = '' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: HU374HGK53215G8RJ65FHEUEB3738EHGHKDSLAE82WJKKS
CLIENT_SECRET:HDKSLEB93HFK2JK09384HBDKFOENAMEJ834390228


Create a function to get nearby venues for Neighborhoods in Toronto:
<a id='create_function_to_get_nearby_venues'></a>

In [8]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Get the geograpical coordinates for Toronto:
<a id='get_toronto_geocoordinates'></a>

In [9]:
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


We filter the dataframe by getting all neighborhoods with **Toronto** in its name:
<a id='create_dataframe_for_toronto_containing_boroughs'></a>

In [10]:

toronto_data = df_nbh[df_nbh['Borough'].str.contains("Toronto")].reset_index(drop=True)
#toronto_data = df_nbh
toronto_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Longitude,Latitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",-79.3606,43.6543
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",-79.3895,43.6623
2,M5B,Downtown Toronto,"Garden District, Ryerson",-79.3789,43.6572
3,M5C,Downtown Toronto,St. James Town,-79.3754,43.6515
4,M4E,East Toronto,The Beaches,-79.293,43.6764


In [11]:
43.6563221, -79.3809161

(43.6563221, -79.3809161)

Get nearby venues:
<a id='get_nearby_venues'></a>

In [12]:
# type your answer here
toronto_venues = getNearbyVenues(names=toronto_data['Neighborhood'],latitudes=toronto_data['Latitude'],longitudes=toronto_data['Longitude'])

toronto_venues


Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
The Danforth West, Riverdale
Toronto Dominion Centre, Design Exchange
Brockton, Parkdale Village, Exhibition Place
India Bazaar, The Beaches West
Commerce Court, Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North & West, Forest Hill Road Park
High Park, The Junction South
North Toronto West,  Lawrence Park
The Annex, North Midtown, Yorkville
Parkdale, Roncesvalles
Davisville
University of Toronto, Harbord
Runnymede, Swansea
Moore Park, Summerhill East
Kensington Market, Chinatown, Grange Park
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.654260,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.654260,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.654260,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,"Regent Park, Harbourfront",43.654260,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,"Regent Park, Harbourfront",43.654260,-79.360636,Impact Kitchen,43.656369,-79.356980,Restaurant
...,...,...,...,...,...,...,...
855,"Business reply mail Processing Centre, South C...",43.662744,-79.321558,Amin Car Repair Garage,43.663544,-79.320130,Auto Workshop
856,"Business reply mail Processing Centre, South C...",43.662744,-79.321558,The Ashbridge Estate,43.664691,-79.321805,Garden
857,"Business reply mail Processing Centre, South C...",43.662744,-79.321558,TTC Russell Division,43.664908,-79.322560,Light Rail Station
858,"Business reply mail Processing Centre, South C...",43.662744,-79.321558,Jonathan Ashbridge Park,43.664702,-79.319898,Park


In [13]:
print(toronto_venues.shape)
toronto_venues.head()

(860, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant


Count all venues in each neighborhood in Toronto:
<a id='count_venues_per_neighborhood'></a>

In [14]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,30,30,30,30,30,30
"Brockton, Parkdale Village, Exhibition Place",23,23,23,23,23,23
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",14,14,14,14,14,14
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",18,18,18,18,18,18
Central Bay Street,30,30,30,30,30,30
Christie,17,17,17,17,17,17
Church and Wellesley,30,30,30,30,30,30
"Commerce Court, Victoria Hotel",30,30,30,30,30,30
Davisville,30,30,30,30,30,30
Davisville North,7,7,7,7,7,7


Check how many unique venue categories are there:
<a id='count_unique_venue_cats'></a>

In [15]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 197 uniques categories.


Toronto One Hot Encoding:
<a id='toronto_onehot_encoding'></a>

In [16]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Theater,Theme Restaurant,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [17]:
toronto_onehot.shape

(860, 197)

Get Toronto One Hot Mean:
<a id='toronto_onehot_mean'></a>

In [18]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Theater,Theme Restaurant,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Business reply mail Processing Centre, South C...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0,0.055556,0.055556,0.055556,0.111111,0.166667,0.111111,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0
5,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Church and Wellesley,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333
8,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0
9,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [19]:
toronto_grouped.shape

(39, 197)

Get top 5 venues per Neighborhood:
<a id='top_5_venues_per_neighborhood'></a>

In [20]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
                venue  freq
0            Beer Bar  0.07
1         Coffee Shop  0.07
2        Cocktail Bar  0.07
3      Farmers Market  0.07
4  Seafood Restaurant  0.07


----Brockton, Parkdale Village, Exhibition Place----
               venue  freq
0               Café  0.13
1     Breakfast Spot  0.09
2        Coffee Shop  0.09
3          Nightclub  0.09
4  Convenience Store  0.04


----Business reply mail Processing Centre, South Central Letter Processing Plant Toronto----
                  venue  freq
0  Gym / Fitness Center  0.07
1                Garden  0.07
2           Pizza Place  0.07
3            Restaurant  0.07
4         Burrito Place  0.07


----CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport----
                 venue  freq
0      Airport Service  0.17
1       Airport Lounge  0.11
2     Airport Terminal  0.11
3      Harbor / Marina  0.06
4  Rental Car Location  0.06


----Central Bay Street----


Get most common venues:
<a id='most_common_venues'></a>

In [21]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Get 10 most common venues:
<a id='top_ten_most_common_venues'></a>

In [22]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Seafood Restaurant,Cocktail Bar,Beer Bar,Farmers Market,Coffee Shop,Concert Hall,Park,Cheese Shop,Creperie,Fish Market
1,"Brockton, Parkdale Village, Exhibition Place",Café,Nightclub,Coffee Shop,Breakfast Spot,Performing Arts Venue,Bakery,Stadium,Bar,Italian Restaurant,Burrito Place
2,"Business reply mail Processing Centre, South C...",Park,Gym / Fitness Center,Burrito Place,Fast Food Restaurant,Farmers Market,Restaurant,Pizza Place,Brewery,Comic Shop,Garden
3,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Lounge,Airport Terminal,Boutique,Boat or Ferry,Bar,Plane,Harbor / Marina,Rental Car Location,Coffee Shop
4,Central Bay Street,Coffee Shop,Italian Restaurant,Café,Modern European Restaurant,Miscellaneous Shop,Middle Eastern Restaurant,Office,Park,Comic Shop,Spa


Get clusters using KMeans:
<a id='get_clusters'></a>

In [23]:

# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 2, 2, 2, 2, 2, 2, 2, 2, 1])

Add Cluster labels to dataframe:
<a id='add_cluster_labels_to_dataframe'></a>

In [25]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_data

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Longitude,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",-79.3606,43.6543,2,Coffee Shop,Bakery,Park,Theater,Breakfast Spot,Restaurant,Pub,Café,Performing Arts Venue,Yoga Studio
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",-79.3895,43.6623,2,Coffee Shop,Sushi Restaurant,Yoga Studio,Diner,Bank,Beer Bar,Japanese Restaurant,Sandwich Place,Distribution Center,Discount Store
2,M5B,Downtown Toronto,"Garden District, Ryerson",-79.3789,43.6572,2,Coffee Shop,Café,Art Gallery,Japanese Restaurant,Sandwich Place,Plaza,Shopping Mall,Electronics Store,Burger Joint,Hotel
3,M5C,Downtown Toronto,St. James Town,-79.3754,43.6515,2,Gastropub,Café,Coffee Shop,Farmers Market,Art Gallery,Jazz Club,Japanese Restaurant,Italian Restaurant,Diner,Hotel
4,M4E,East Toronto,The Beaches,-79.293,43.6764,0,Pub,Trail,Health Food Store,Coffee Shop,Comfort Food Restaurant,College Rec Center,Eastern European Restaurant,Donut Shop,Dog Run,Distribution Center


View Clusters of venues on the map:
<a id='view_map_with_clusters'></a>

In [26]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    if math.isnan(cluster):
        pass
    else:
        if isinstance(cluster, int):
            label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
            folium.CircleMarker(
                [lat, lon],
                radius=5,
                popup=label,
                color=rainbow[cluster-1],
                fill=True,
                fill_color=rainbow[cluster-1],
                fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

See most common venues in each Borough:
<a id='most_common_venues_per_borough'></a>

In [27]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,East Toronto,0,Pub,Trail,Health Food Store,Coffee Shop,Comfort Food Restaurant,College Rec Center,Eastern European Restaurant,Donut Shop,Dog Run,Distribution Center


In [28]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,Central Toronto,1,Park,Swim School,Bus Line,College Gym,College Arts Building,Eastern European Restaurant,Donut Shop,Dog Run,Distribution Center,Discount Store
20,Central Toronto,1,Park,Hotel,Sandwich Place,Gym / Fitness Center,Breakfast Spot,Food & Drink Shop,Department Store,Donut Shop,Dog Run,Distribution Center
21,Central Toronto,1,Park,Trail,Sushi Restaurant,Jewelry Store,Gas Station,Cuban Restaurant,Donut Shop,Dog Run,Distribution Center,Discount Store
33,Downtown Toronto,1,Park,Trail,Playground,Creperie,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner,Dessert Shop


In [29]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,2,Coffee Shop,Bakery,Park,Theater,Breakfast Spot,Restaurant,Pub,Café,Performing Arts Venue,Yoga Studio
1,Downtown Toronto,2,Coffee Shop,Sushi Restaurant,Yoga Studio,Diner,Bank,Beer Bar,Japanese Restaurant,Sandwich Place,Distribution Center,Discount Store
2,Downtown Toronto,2,Coffee Shop,Café,Art Gallery,Japanese Restaurant,Sandwich Place,Plaza,Shopping Mall,Electronics Store,Burger Joint,Hotel
3,Downtown Toronto,2,Gastropub,Café,Coffee Shop,Farmers Market,Art Gallery,Jazz Club,Japanese Restaurant,Italian Restaurant,Diner,Hotel
5,Downtown Toronto,2,Seafood Restaurant,Cocktail Bar,Beer Bar,Farmers Market,Coffee Shop,Concert Hall,Park,Cheese Shop,Creperie,Fish Market
6,Downtown Toronto,2,Coffee Shop,Italian Restaurant,Café,Modern European Restaurant,Miscellaneous Shop,Middle Eastern Restaurant,Office,Park,Comic Shop,Spa
7,Downtown Toronto,2,Grocery Store,Café,Park,Athletics & Sports,Coffee Shop,Candy Store,Restaurant,Italian Restaurant,Bank,Baby Store
8,Downtown Toronto,2,Café,Coffee Shop,Hotel,Pizza Place,Sushi Restaurant,Concert Hall,Opera House,Lounge,Plaza,Restaurant
9,West Toronto,2,Bakery,Pharmacy,Music Venue,Middle Eastern Restaurant,Brewery,Bar,Bank,Liquor Store,Supermarket,Café
10,Downtown Toronto,2,Café,Plaza,Hotel,Park,IT Services,Sporting Goods Shop,Lake,Skating Rink,Basketball Stadium,Dance Studio


In [30]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,Central Toronto,3,Garden,Home Service,Wine Bar,Dance Studio,Eastern European Restaurant,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner


In [31]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
29,Central Toronto,4,Lawyer,Restaurant,Wine Bar,Deli / Bodega,Eastern European Restaurant,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner


### CONCLUSION
<a id='conclusion'></a>

1. There is a high conentration of neighborhoods in the areas between Etobicoke, North York and Scarborough.
2. Cluster 2 is by far the most abandant cluster in the neighborhoods between Etobicoke, North York and Scarborough. 
3. Cluster 2 has Caffe's, Coffee Shops and Restourants as a defining trail, with Caffe's and Coffee Shop being far more abandant in the area.