# Segmenting and Clustering Neighborhoods in Toronto

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1.  <a href="#item1">Download and Format Dataset</a>

2.  <a href="#item2">Get the Latitude and Longitude for Each Neighborhood </a>

3.  <a href="#item3">Analyze, Cluster and Examine Each Neighborhood</a> 
</font>
</div>

Import libraries

In [1]:
!pip install beautifulsoup4
!pip install html-table-extractor
!pip install geopy
!pip install folium==0.5.0

import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation
import json

from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas import json_normalize

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # plotting library

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

print('Libraries imported.')

Libraries imported.


## 1. Downloading and Formatting the Dataset

### Scrapping the data from a Wikipedia page that contains a table with postal codes and Neighborhood names

In [2]:
from html_table_extractor.extractor import Extractor
from bs4 import BeautifulSoup
from IPython.display import display_html

In [3]:
#Scrap of the data of Wikipedia and explore the data
data_source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(data_source, 'html.parser')
extractor = Extractor(soup.table) 
a = extractor.parse()
neighborhoods_list=extractor.return_list()
display_html(str(soup.table),raw=True)

Postal Code,Borough,Neighbourhood
M1A,Not assigned,Not assigned
M2A,Not assigned,Not assigned
M3A,North York,Parkwoods
M4A,North York,Victoria Village
M5A,Downtown Toronto,"Regent Park, Harbourfront"
M6A,North York,"Lawrence Manor, Lawrence Heights"
M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
M8A,Not assigned,Not assigned
M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
M1B,Scarborough,"Malvern, Rouge"


In [4]:
df = pd.DataFrame (neighborhoods_list,columns=['Postal Code','Borough','Neighborhood'])
df = df.iloc[1:]
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
1,M1A\n,Not assigned\n,Not assigned\n
2,M2A\n,Not assigned\n,Not assigned\n
3,M3A\n,North York\n,Parkwoods\n
4,M4A\n,North York\n,Victoria Village\n
5,M5A\n,Downtown Toronto\n,"Regent Park, Harbourfront\n"


### Data Cleaning

#### Remove rows that have the value of Borough equal to "Not assigned"

In [5]:
#Data cleaning

#Dropping of rows if the value of Borough is "Not assigned"
df2 = df[df['Borough'] != 'Not assigned']

#Keep only Toronto area (Downtown, Central West, East, North)
df3 = df2[df2['Borough'].str.contains('Toronto',regex=False)]

#Merge rows with the same postal codes
df_toronto_Neighborhoods = df3.groupby(['Postal Code','Borough'], sort=False).agg(', '.join)

df_toronto_Neighborhoods.reset_index(inplace=True)

#Removing \n
for index, row in df_toronto_Neighborhoods.iterrows():
    row['Postal Code'] = row['Postal Code'].replace('\n','')
    row['Borough'] = row['Borough'].replace('\n','')
    row['Neighborhood'] = row['Neighborhood'].replace('\n','')
    if row['Neighborhood'].find(",")>0:
        row['Neighborhood'] = row['Neighborhood'][0:row['Neighborhood'].replace('\n','').find(",")]

df_toronto_Neighborhoods.rename(columns={'Postal Code':'PostalCode'},inplace=True)
df_toronto_Neighborhoods

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M5A,Downtown Toronto,Regent Park
1,M7A,Downtown Toronto,Queen's Park
2,M5B,Downtown Toronto,Garden District
3,M5C,Downtown Toronto,St. James Town
4,M4E,East Toronto,The Beaches
5,M5E,Downtown Toronto,Berczy Park
6,M5G,Downtown Toronto,Central Bay Street
7,M6G,Downtown Toronto,Christie
8,M5H,Downtown Toronto,Richmond
9,M6H,West Toronto,Dufferin


In [6]:
df_toronto_Neighborhoods.shape

(39, 3)

# 2. Get the Latitude and Longitude for Each Neighborhood

In [7]:
latlon_data = pd.read_csv('https://cocl.us/Geospatial_data')

#Rename the first column
latlon_data.rename(columns={'Postal Code':'PostalCode'},inplace=True)
latlon_data.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [8]:
latlon_data.shape

(103, 3)

### Joining the wikipedia data with csv

In [9]:
df_neighborhoods = pd.merge(df_toronto_Neighborhoods,latlon_data,on='PostalCode')
df_neighborhoods

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,Regent Park,43.65426,-79.360636
1,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494
2,M5B,Downtown Toronto,Garden District,43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M4E,East Toronto,The Beaches,43.676357,-79.293031
5,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
6,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
7,M6G,Downtown Toronto,Christie,43.669542,-79.422564
8,M5H,Downtown Toronto,Richmond,43.650571,-79.384568
9,M6H,West Toronto,Dufferin,43.669005,-79.442259


In [10]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(df_neighborhoods['Borough'].unique()),
        df_neighborhoods.shape[0]
    )
)

The dataframe has 4 boroughs and 39 neighborhoods.


# 3. Explore and cluster the neighborhoods in Toronto

### Get the latitude and longitude for Downtown Toronto

In [11]:
address = 'Downtown Toronto, ON, Canada'
geolocator = Nominatim(user_agent="coursera-capstone-project")
location = geolocator.geocode(address)
latitude_toronto = location.latitude
longitude_toronto = location.longitude
print("Downtown Toronto","latitude",latitude_toronto, "& " "longitude" ,longitude_toronto)

Downtown Toronto latitude 43.6563221 & longitude -79.3809161


#### Create a map of Toronto with neighborhoods superimposed on top.

In [12]:
map_toronto = folium.Map(location=[latitude_toronto, longitude_toronto], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_neighborhoods['Latitude'], df_neighborhoods['Longitude'], df_neighborhoods['Borough'], df_neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup("label", parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
map_toronto

### Define Foursquare Credentials and Version

In [13]:
CLIENT_ID = 'LYWUPJFTKCKTT34XWACVHKDWBL1BGOV2NAYNVR3RGUYKZBGJ' # your Foursquare ID
CLIENT_SECRET = '0J3W4BBQM3202OAJ52JTK55XMCZKR3T2DV4D5MFRXM1TSEFF' # your Foursquare Secret
ACCESS_TOKEN = 'YRT1SK2LJMHMBK3PBRQBLETBFSSI12C2VXHT2P53GGUYQ2QR' # your FourSquare Access Token
VERSION = '20180605'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: LYWUPJFTKCKTT34XWACVHKDWBL1BGOV2NAYNVR3RGUYKZBGJ
CLIENT_SECRET:0J3W4BBQM3202OAJ52JTK55XMCZKR3T2DV4D5MFRXM1TSEFF


#### Explore the first neighborhood in our dataframe.

In [14]:
df_neighborhoods.loc[0, 'Neighborhood']

'Regent Park'

In [15]:
neighborhood_latitude = df_neighborhoods.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df_neighborhoods.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = df_neighborhoods.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Regent Park are 43.6542599, -79.3606359.


### Get the top 100 venues that are in Marble Hill within a radius of 500 meters.

In [16]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

#create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude_toronto, 
    longitude_toronto, 
    radius, 
    LIMIT)

In [17]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '600c7957148f584621698f05'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Bay Street Corridor',
  'headerFullLocation': 'Bay Street Corridor, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 95,
  'suggestedBounds': {'ne': {'lat': 43.6608221045, 'lng': -79.37470788695488},
   'sw': {'lat': 43.651822095499995, 'lng': -79.3871243130451}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '57eda381498ebe0e6ef40972',
       'name': 'UNIQLO ユニクロ',
       'location': {'address': '220 Yonge St',
        'crossStreet': 'at Dundas St W',
        'lat': 43.65591027779457,
        'lng': -79.38064099181345,
        'labeledLatLngs'

In [18]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [19]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,UNIQLO ユニクロ,Clothing Store,43.65591,-79.380641
1,Blaze Pizza,Pizza Place,43.656518,-79.380015
2,Burrito Boyz,Burrito Place,43.656265,-79.378343
3,Silver Snail Comics,Comic Shop,43.657031,-79.381403
4,Yonge-Dundas Square,Plaza,43.656054,-79.380495


In [20]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

95 venues were returned by Foursquare.


### Exploring Neighborhoods in Toronto

In [21]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [22]:
toronto_venues = getNearbyVenues(names=df_neighborhoods['Neighborhood'],
                                   latitudes=df_neighborhoods['Latitude'],
                                   longitudes=df_neighborhoods['Longitude']
                                  )

Regent Park
Queen's Park
Garden District
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Richmond
Dufferin
Harbourfront East
Little Portugal
The Danforth West
Toronto Dominion Centre
Brockton
India Bazaar
Commerce Court
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North & West
High Park
North Toronto West
The Annex
Parkdale
Davisville
University of Toronto
Runnymede
Moore Park
Kensington Market
Summerhill West
CN Tower
Rosedale
Stn A PO Boxes
St. James Town
First Canadian Place
Church and Wellesley
Business reply mail Processing Centre


In [23]:
print(toronto_venues.shape)
toronto_venues.head()

(3194, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Regent Park,43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,Regent Park,43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Regent Park,43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant
3,Regent Park,43.65426,-79.360636,The Distillery Historic District,43.650244,-79.359323,Historic Site
4,Regent Park,43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center


In [24]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,100,100,100,100,100,100
Brockton,100,100,100,100,100,100
Business reply mail Processing Centre,49,49,49,49,49,49
CN Tower,15,15,15,15,15,15
Central Bay Street,100,100,100,100,100,100
Christie,100,100,100,100,100,100
Church and Wellesley,100,100,100,100,100,100
Commerce Court,100,100,100,100,100,100
Davisville,100,100,100,100,100,100
Davisville North,100,100,100,100,100,100


In [25]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 270 uniques categories.


### Analyze Each Neighborhood

In [26]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 
# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]
toronto_onehot.head()

Unnamed: 0,Zoo,Accessories Store,Adult Boutique,Airport,Airport Lounge,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,...,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [27]:
toronto_onehot.shape

(3194, 270)

In [28]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Zoo,Accessories Store,Adult Boutique,Airport,Airport Lounge,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,...,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,...,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Brockton,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,...,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Business reply mail Processing Centre,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,CN Tower,0.0,0.0,0.0,0.066667,0.066667,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,...,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02
5,Christie,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,...,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0
6,Church and Wellesley,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,...,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02
7,Commerce Court,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,...,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.01
9,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.02


In [29]:
toronto_grouped.shape

(38, 270)

#### Print each neighborhood along with the top 5 most common venues

In [30]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
                 venue  freq
0          Coffee Shop  0.12
1                 Café  0.06
2  Japanese Restaurant  0.04
3                 Park  0.04
4                Hotel  0.04


----Brockton----
                    venue  freq
0                    Café  0.07
1              Restaurant  0.06
2             Coffee Shop  0.06
3                     Bar  0.05
4  Furniture / Home Store  0.04


----Business reply mail Processing Centre----
              venue  freq
0              Park  0.10
1           Brewery  0.06
2       Coffee Shop  0.06
3       Pizza Place  0.06
4  Sushi Restaurant  0.04


----CN Tower----
             venue  freq
0             Café  0.13
1      Coffee Shop  0.13
2  Harbor / Marina  0.13
3           Garden  0.07
4     Dance Studio  0.07


----Central Bay Street----
              venue  freq
0       Coffee Shop  0.13
1              Café  0.05
2  Ramen Restaurant  0.03
3  Sushi Restaurant  0.03
4              Park  0.03


----Christie----
               ven

In [31]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

### Display the top 10 venues for each neighborhood.

In [32]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Café,Hotel,Japanese Restaurant,Park,Beer Bar,Restaurant,Gastropub,Creperie,Grocery Store
1,Brockton,Café,Coffee Shop,Restaurant,Bar,Furniture / Home Store,Bakery,Tibetan Restaurant,Gift Shop,Vegetarian / Vegan Restaurant,Thrift / Vintage Store
2,Business reply mail Processing Centre,Park,Coffee Shop,Pizza Place,Brewery,Pet Store,Italian Restaurant,Bakery,Fast Food Restaurant,Sushi Restaurant,Pub
3,CN Tower,Coffee Shop,Café,Harbor / Marina,Dance Studio,Garden,Dog Run,Park,Track,Sculpture Garden,Scenic Lookout
4,Central Bay Street,Coffee Shop,Café,Park,Sushi Restaurant,Ramen Restaurant,Pizza Place,Grocery Store,Gastropub,Juice Bar,Mexican Restaurant


## Clustering Neighborhoods 

In [33]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 3, 2, 2, 2, 3, 2, 0, 2, 2])

In [34]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df_neighborhoods

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,Regent Park,43.65426,-79.360636,2,Coffee Shop,Restaurant,Theater,Café,Pub,Park,Bakery,Breakfast Spot,Italian Restaurant,Diner
1,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494,2,Coffee Shop,Park,Sushi Restaurant,Café,Pizza Place,Middle Eastern Restaurant,Japanese Restaurant,Italian Restaurant,Restaurant,Gastropub
2,M5B,Downtown Toronto,Garden District,43.657162,-79.378937,0,Coffee Shop,Gastropub,Café,Japanese Restaurant,Pizza Place,Diner,Theater,Hotel,Plaza,Middle Eastern Restaurant
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Coffee Shop,Café,Gastropub,Restaurant,Theater,Italian Restaurant,Japanese Restaurant,Park,Seafood Restaurant,American Restaurant
4,M4E,East Toronto,The Beaches,43.676357,-79.293031,3,Coffee Shop,Pub,Pizza Place,Japanese Restaurant,Breakfast Spot,Beach,Park,Grocery Store,Caribbean Restaurant,Bar


### Visualize the resulting clusters

In [35]:
# instantiate a mark cluster object for the incidents in the dataframe
map_clusters = folium.Map(location=[latitude_toronto, longitude_toronto], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7,
        parse_html=False).add_to(map_clusters)
       
map_clusters


### Examine clusters

#### Cluster 1

In [36]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Downtown Toronto,0,Coffee Shop,Gastropub,Café,Japanese Restaurant,Pizza Place,Diner,Theater,Hotel,Plaza,Middle Eastern Restaurant
3,Downtown Toronto,0,Coffee Shop,Café,Gastropub,Restaurant,Theater,Italian Restaurant,Japanese Restaurant,Park,Seafood Restaurant,American Restaurant
5,Downtown Toronto,0,Coffee Shop,Café,Hotel,Japanese Restaurant,Park,Beer Bar,Restaurant,Gastropub,Creperie,Grocery Store
8,Downtown Toronto,0,Coffee Shop,Café,Hotel,Theater,Gym,Gastropub,Pizza Place,Sushi Restaurant,Concert Hall,Japanese Restaurant
10,Downtown Toronto,0,Coffee Shop,Café,Hotel,Park,Japanese Restaurant,Gym,Theater,Brewery,Restaurant,Plaza
13,Downtown Toronto,0,Coffee Shop,Café,Hotel,Japanese Restaurant,Theater,Concert Hall,Gym,Restaurant,Monument / Landmark,Park
16,Downtown Toronto,0,Coffee Shop,Café,Hotel,Restaurant,Theater,Japanese Restaurant,Concert Hall,Seafood Restaurant,Gastropub,Italian Restaurant
34,Downtown Toronto,0,Coffee Shop,Café,Hotel,Japanese Restaurant,Gastropub,Restaurant,Seafood Restaurant,Beer Bar,Cocktail Bar,Creperie
35,Downtown Toronto,0,Coffee Shop,Café,Gastropub,Restaurant,Theater,Italian Restaurant,Japanese Restaurant,Park,Seafood Restaurant,American Restaurant
36,Downtown Toronto,0,Coffee Shop,Café,Hotel,Restaurant,Japanese Restaurant,Theater,Gym,Thai Restaurant,Plaza,Vegetarian / Vegan Restaurant


#### Cluster 2

In [37]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,Central Toronto,1,Coffee Shop,Bookstore,Park,College Quad,Gym / Fitness Center,College Gym,Café,Trail,Elementary School,Donut Shop


#### Cluster 3

In [38]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,2,Coffee Shop,Restaurant,Theater,Café,Pub,Park,Bakery,Breakfast Spot,Italian Restaurant,Diner
1,Downtown Toronto,2,Coffee Shop,Park,Sushi Restaurant,Café,Pizza Place,Middle Eastern Restaurant,Japanese Restaurant,Italian Restaurant,Restaurant,Gastropub
6,Downtown Toronto,2,Coffee Shop,Café,Park,Sushi Restaurant,Ramen Restaurant,Pizza Place,Grocery Store,Gastropub,Juice Bar,Mexican Restaurant
9,West Toronto,2,Café,Coffee Shop,Park,Bar,Italian Restaurant,Portuguese Restaurant,Sushi Restaurant,Bakery,Convenience Store,Brewery
19,Central Toronto,2,Sushi Restaurant,Bank,Gym,Coffee Shop,Café,Italian Restaurant,Pharmacy,Gym Pool,Bakery,Skating Rink
20,Central Toronto,2,Coffee Shop,Italian Restaurant,Restaurant,Dessert Shop,Café,Fast Food Restaurant,Gym,Pizza Place,Sushi Restaurant,Supermarket
21,Central Toronto,2,Café,Bank,Park,Gym / Fitness Center,Coffee Shop,Trail,Sushi Restaurant,Burger Joint,Skating Rink,Japanese Restaurant
26,Central Toronto,2,Italian Restaurant,Coffee Shop,Sushi Restaurant,Pizza Place,Café,Gym,Dessert Shop,Middle Eastern Restaurant,Restaurant,Indian Restaurant
28,West Toronto,2,Coffee Shop,Café,Pizza Place,Bakery,Italian Restaurant,Pub,Restaurant,Sushi Restaurant,Falafel Restaurant,Park
29,Central Toronto,2,Grocery Store,Coffee Shop,Italian Restaurant,Thai Restaurant,Park,Gym,Pizza Place,Bagel Shop,Bank,Sandwich Place


#### Cluster 4

In [39]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,East Toronto,3,Coffee Shop,Pub,Pizza Place,Japanese Restaurant,Breakfast Spot,Beach,Park,Grocery Store,Caribbean Restaurant,Bar
7,Downtown Toronto,3,Korean Restaurant,Café,Coffee Shop,Grocery Store,Cocktail Bar,Mexican Restaurant,Pub,Indian Restaurant,Ice Cream Shop,Pizza Place
11,West Toronto,3,Café,Bar,Restaurant,Coffee Shop,Vegetarian / Vegan Restaurant,Bakery,Pizza Place,Cocktail Bar,Italian Restaurant,Asian Restaurant
12,East Toronto,3,Greek Restaurant,Coffee Shop,Café,Pub,Italian Restaurant,Fast Food Restaurant,Bank,Yoga Studio,Bakery,Furniture / Home Store
14,West Toronto,3,Café,Coffee Shop,Restaurant,Bar,Furniture / Home Store,Bakery,Tibetan Restaurant,Gift Shop,Vegetarian / Vegan Restaurant,Thrift / Vintage Store
15,East Toronto,3,Indian Restaurant,Coffee Shop,Restaurant,Park,Fast Food Restaurant,Beach,Café,Burrito Place,Brewery,Bakery
17,East Toronto,3,Coffee Shop,American Restaurant,Brewery,Bar,Bakery,Diner,Vietnamese Restaurant,French Restaurant,Sushi Restaurant,Café
22,West Toronto,3,Café,Bar,Coffee Shop,Convenience Store,Park,Thai Restaurant,Sushi Restaurant,Italian Restaurant,Deli / Bodega,Pizza Place
23,Central Toronto,3,Italian Restaurant,Coffee Shop,Diner,Skating Rink,Café,Restaurant,Thai Restaurant,Park,Sporting Goods Shop,Mexican Restaurant
24,Central Toronto,3,Café,Italian Restaurant,Coffee Shop,Vegetarian / Vegan Restaurant,Restaurant,Gym,Grocery Store,Sandwich Place,Pub,Museum


#### Cluster 5

In [40]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
33,Downtown Toronto,4,Park,Coffee Shop,Grocery Store,Metro Station,Japanese Restaurant,Playground,Convenience Store,Sandwich Place,Candy Store,Filipino Restaurant
