# Segmenting and Clustering Neighborhoods in Toronto

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Scrape the Wikipedia page and transform the data into a pandas dataframe</a>
<br />
<br />
2. <a href="#item2">Get the latitude and the longitude coordinates of each neighborhood</a>
<br />
<br />
3. <a href="#item3">Explore and cluster the neighborhoods in Toronto</a>
</font>
</div>

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [299]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd  # library for data analsysis

import json # library to handle JSON files
from pandas.io.json import json_normalize# tranform JSON file into a pandas dataframe
from sklearn.cluster import KMeans# import k-means from clustering stage

# Matplotlib and associated plotting modules
import matplotlib as mpl
import matplotlib.pylab as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

%matplotlib inline

Solving environment: done

# All requested packages already installed.



 <a id='item1'></a>

### Scrape the Wikipedia page and transform the data into a pandas dataframe

Use pandas to read the table into a pandas dataframe

In [300]:
path = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

In [301]:
df = pd.read_html(path,attrs = {'class': 'wikitable'})

In [302]:
neighborhoods = pd.DataFrame(df[0])
neighborhoods.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


Drop rows with a borough that is Not assigned

In [303]:
neighborhoods = neighborhoods[neighborhoods.Borough != 'Not assigned']
neighborhoods.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


Reset the index

In [304]:
neighborhoods = neighborhoods.reset_index(drop = True)
neighborhoods.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights


Find cells that have a borough but a Not assigned neighborhood, and set the neighborhood the same as the borough

In [305]:
n = 0
for i in neighborhoods['Neighbourhood']:
    if i == "Not assigned":
        neighborhoods.at[n, 'Neighbourhood'] = neighborhoods.at[n, 'Borough']
        print(neighborhoods.loc[n])
    n = n +1
        


Postcode                  M7A
Borough          Queen's Park
Neighbourhood    Queen's Park
Name: 6, dtype: object


Combine rows that have the same Postcode and Borough

In [306]:
neighborhoods=neighborhoods.groupby(['Postcode','Borough'])['Neighbourhood'].apply(', '.join).reset_index()
neighborhoods.head(12)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


Check the number of rows of the dataframe

In [307]:
neighborhoods.shape

(103, 3)

There are 103 rows in the dataframe

<a id='item2'></a>

### Get the latitude and the longitude coordinates of each neighborhood

Read the csv file that has the geographical coordinates of each postal code and transform the data into a pandas dataframe

In [308]:
postcode = pd.read_csv('https://cocl.us/Geospatial_data')
postcode.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Change the name of the column Postal Code to Postcode

In [309]:
postcode.rename(columns={'Postal Code':'Postcode'}, inplace=True) 
postcode.head()

Unnamed: 0,Postcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Merge the two dataframes

In [310]:
neighborhoods = neighborhoods.merge(postcode, on='Postcode')
neighborhoods.head(12)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


<a id='item3'></a>

### Explore and cluster the neighborhoods in Toronto

#### Create a map of Toronto with neighborhoods superimposed on top

Use geopy library to get the latitude and longitude values of New York City

In [311]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="TO_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


create map of Toronto using latitude and longitude values

In [312]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### Let's explore the first neighborhood in our dataframe

Define Foursquare Credentials and Version

In [313]:
CLIENT_ID = 'PEZ3EJJNRHSCBBQFXIHPAJSCAHSES0RQQG513UK5UFPR5KCM' # your Foursquare ID
CLIENT_SECRET = 'HAH5GZURGXMPW2UBWV1GWDKKVJ1CYCTHHFSI3SHCRKE3FZ2J' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: PEZ3EJJNRHSCBBQFXIHPAJSCAHSES0RQQG513UK5UFPR5KCM
CLIENT_SECRET:HAH5GZURGXMPW2UBWV1GWDKKVJ1CYCTHHFSI3SHCRKE3FZ2J


Get the neighborhood's name

In [314]:
neighborhoods.loc[0,'Neighbourhood']

'Rouge, Malvern'

Get the neighborhood's latitude and longitude values.

In [315]:
neighborhood_latitude = neighborhoods.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = neighborhoods.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = neighborhoods.loc[0, 'Neighbourhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Rouge, Malvern are 43.806686299999996, -79.19435340000001.


Now, let's get the top 100 venues that are in Rouge, Malvern within a radius of 1500 meters

First, let's create the GET request URL. Name your URL url.

In [316]:
LIMIT = 100 
radius = 1500

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

url

'https://api.foursquare.com/v2/venues/explore?&client_id=PEZ3EJJNRHSCBBQFXIHPAJSCAHSES0RQQG513UK5UFPR5KCM&client_secret=HAH5GZURGXMPW2UBWV1GWDKKVJ1CYCTHHFSI3SHCRKE3FZ2J&v=20180605&ll=43.806686299999996,-79.19435340000001&radius=1500&limit=100'

Send the GET request and examine the resutls

In [317]:
import requests
results = requests.get(url).json()

Borrow the get_category_type function from the Foursquare lab

In [318]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a pandas dataframe.

In [319]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Images Salon & Spa,Spa,43.802283,-79.198565
1,Canadiana exhibit,Zoo Exhibit,43.817962,-79.193374
2,Caribbean Wave,Caribbean Restaurant,43.798558,-79.195777
3,Staples Morningside,Paper / Office Supplies Store,43.800285,-79.196607
4,Harvey's,Fast Food Restaurant,43.800106,-79.198258


And how many venues were returned by Foursquare?

In [320]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

31 venues were returned by Foursquare.


#### Explore Neighborhoods in Toronto

Let's create a function to repeat the same process to all the neighborhoods in Toronto

In [321]:
def getNearbyVenues(names, latitudes, longitudes, radius=1500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now write the code to run the above function on each neighborhood and create a new dataframe called toronto_venues

In [322]:
toronto_venues = getNearbyVenues(names=neighborhoods['Neighbourhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )

Rouge, Malvern
Highland Creek, Rouge Hill, Port Union
Guildwood, Morningside, West Hill
Woburn
Cedarbrae
Scarborough Village
East Birchmount Park, Ionview, Kennedy Park
Clairlea, Golden Mile, Oakridge
Cliffcrest, Cliffside, Scarborough Village West
Birch Cliff, Cliffside West
Dorset Park, Scarborough Town Centre, Wexford Heights
Maryvale, Wexford
Agincourt
Clarks Corners, Sullivan, Tam O'Shanter
Agincourt North, L'Amoreaux East, Milliken, Steeles East
L'Amoreaux West
Upper Rouge
Hillcrest Village
Fairview, Henry Farm, Oriole
Bayview Village
Silver Hills, York Mills
Newtonbrook, Willowdale
Willowdale South
York Mills West
Willowdale West
Parkwoods
Don Mills North
Flemingdon Park, Don Mills South
Bathurst Manor, Downsview North, Wilson Heights
Northwood Park, York University
CFB Toronto, Downsview East
Downsview West
Downsview Central
Downsview Northwest
Victoria Village
Woodbine Gardens, Parkview Hill
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto
The Danforth West, 

Let's check the size of the resulting dataframe

In [323]:
print(toronto_venues.shape)
toronto_venues.head()

(6801, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Rouge, Malvern",43.806686,-79.194353,Images Salon & Spa,43.802283,-79.198565,Spa
1,"Rouge, Malvern",43.806686,-79.194353,Canadiana exhibit,43.817962,-79.193374,Zoo Exhibit
2,"Rouge, Malvern",43.806686,-79.194353,Caribbean Wave,43.798558,-79.195777,Caribbean Restaurant
3,"Rouge, Malvern",43.806686,-79.194353,Staples Morningside,43.800285,-79.196607,Paper / Office Supplies Store
4,"Rouge, Malvern",43.806686,-79.194353,Harvey's,43.800106,-79.198258,Fast Food Restaurant


Let's check how many venues were returned for each neighborhood

In [324]:
toronto_venues.groupby('Neighborhood').count().head()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
Agincourt,62,62,62,62,62,62
"Agincourt North, L'Amoreaux East, Milliken, Steeles East",66,66,66,66,66,66
"Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown",29,29,29,29,29,29
"Alderwood, Long Branch",44,44,44,44,44,44


Let's find out how many unique categories can be curated from all the returned venues

In [325]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 345 uniques categories.


#### Analyze Each Neighborhood

one hot encoding

In [326]:
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Zoo Exhibit,Accessories Store,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,...,Video Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio,Zoo
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [327]:
toronto_onehot.shape

(6801, 345)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [328]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighborhood,Zoo Exhibit,Accessories Store,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,American Restaurant,Amphitheater,Animal Shelter,...,Video Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio,Zoo
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Agincourt North, L'Amoreaux East, Milliken, St...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0


Let's confirm the new size

In [329]:
toronto_grouped.shape

(103, 345)

Let's print each neighborhood along with the top 5 most common venues

In [330]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
         venue  freq
0  Coffee Shop  0.07
1        Hotel  0.06
2         Café  0.05
3      Theater  0.04
4  Pizza Place  0.03


----Agincourt----
                venue  freq
0  Chinese Restaurant  0.16
1         Coffee Shop  0.05
2       Shopping Mall  0.05
3    Asian Restaurant  0.03
4         Supermarket  0.03


----Agincourt North, L'Amoreaux East, Milliken, Steeles East----
                venue  freq
0  Chinese Restaurant  0.17
1              Bakery  0.08
2         Coffee Shop  0.06
3         Pizza Place  0.05
4   Korean Restaurant  0.05


----Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown----
                  venue  freq
0           Coffee Shop  0.17
1  Fast Food Restaurant  0.17
2         Grocery Store  0.10
3           Pizza Place  0.10
4                  Café  0.03


----Alderwood, Long Branch----
              venue  freq
0              Park  0.07
1          Pharmacy  0.05
2      

Let's put that into a pandas dataframe

First, let's write a function to sort the venues in descending order

In [331]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood

In [332]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Hotel,Café,Theater,Pizza Place,Gastropub,Japanese Restaurant,Art Gallery,Restaurant,Steakhouse
1,Agincourt,Chinese Restaurant,Shopping Mall,Coffee Shop,Caribbean Restaurant,Bakery,Asian Restaurant,Supermarket,Gym / Fitness Center,Cantonese Restaurant,Breakfast Spot
2,"Agincourt North, L'Amoreaux East, Milliken, St...",Chinese Restaurant,Bakery,Coffee Shop,Pizza Place,Pharmacy,Korean Restaurant,Dessert Shop,Tea Room,Discount Store,Dim Sum Restaurant
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",Fast Food Restaurant,Coffee Shop,Grocery Store,Pizza Place,Hardware Store,Fried Chicken Joint,Bus Line,Steakhouse,Beer Store,Café
4,"Alderwood, Long Branch",Park,Burger Joint,Coffee Shop,Pizza Place,Café,Pharmacy,Toy / Game Store,Bar,Grocery Store,Restaurant


#### Cluster Neighborhoods

Run k-means to cluster the neighborhood into 5 clusters.

In [333]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 2, 2, 1, 2, 4, 4, 2, 0, 4], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [334]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
neighborhoods_venues_sorted.head()

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,"Adelaide, King, Richmond",Coffee Shop,Hotel,Café,Theater,Pizza Place,Gastropub,Japanese Restaurant,Art Gallery,Restaurant,Steakhouse
1,2,Agincourt,Chinese Restaurant,Shopping Mall,Coffee Shop,Caribbean Restaurant,Bakery,Asian Restaurant,Supermarket,Gym / Fitness Center,Cantonese Restaurant,Breakfast Spot
2,2,"Agincourt North, L'Amoreaux East, Milliken, St...",Chinese Restaurant,Bakery,Coffee Shop,Pizza Place,Pharmacy,Korean Restaurant,Dessert Shop,Tea Room,Discount Store,Dim Sum Restaurant
3,1,"Albion Gardens, Beaumond Heights, Humbergate, ...",Fast Food Restaurant,Coffee Shop,Grocery Store,Pizza Place,Hardware Store,Fried Chicken Joint,Bus Line,Steakhouse,Beer Store,Café
4,2,"Alderwood, Long Branch",Park,Burger Joint,Coffee Shop,Pizza Place,Café,Pharmacy,Toy / Game Store,Bar,Grocery Store,Restaurant


In [335]:
toronto_merged = neighborhoods

In [336]:
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhood')
toronto_merged.head() # check the last columns!

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,2,Zoo Exhibit,Fast Food Restaurant,Pizza Place,Bakery,Coffee Shop,Paper / Office Supplies Store,Chinese Restaurant,Caribbean Restaurant,Movie Theater,Cosmetics Shop
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,4,Grocery Store,Gym,Gym / Fitness Center,Breakfast Spot,Park,Burger Joint,Pizza Place,Italian Restaurant,Falafel Restaurant,Eastern European Restaurant
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,1,Pizza Place,Breakfast Spot,Coffee Shop,Fast Food Restaurant,Park,Asian Restaurant,Sandwich Place,Liquor Store,Bank,Beer Store
3,M1G,Scarborough,Woburn,43.770992,-79.216917,1,Coffee Shop,Indian Restaurant,Pizza Place,Fast Food Restaurant,Sandwich Place,Pharmacy,Burger Joint,Music Store,Bar,Bank
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,2,Coffee Shop,Clothing Store,Sandwich Place,Bakery,Indian Restaurant,Hotel,Sporting Goods Shop,Pharmacy,Restaurant,Wings Joint


In [337]:
toronto_merged.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,2,Zoo Exhibit,Fast Food Restaurant,Pizza Place,Bakery,Coffee Shop,Paper / Office Supplies Store,Chinese Restaurant,Caribbean Restaurant,Movie Theater,Cosmetics Shop
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,4,Grocery Store,Gym,Gym / Fitness Center,Breakfast Spot,Park,Burger Joint,Pizza Place,Italian Restaurant,Falafel Restaurant,Eastern European Restaurant
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,1,Pizza Place,Breakfast Spot,Coffee Shop,Fast Food Restaurant,Park,Asian Restaurant,Sandwich Place,Liquor Store,Bank,Beer Store
3,M1G,Scarborough,Woburn,43.770992,-79.216917,1,Coffee Shop,Indian Restaurant,Pizza Place,Fast Food Restaurant,Sandwich Place,Pharmacy,Burger Joint,Music Store,Bar,Bank
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,2,Coffee Shop,Clothing Store,Sandwich Place,Bakery,Indian Restaurant,Hotel,Sporting Goods Shop,Pharmacy,Restaurant,Wings Joint


In [338]:
toronto_merged['Cluster Labels'] = toronto_merged['Cluster Labels'].astype(int)

Finally, let's visualize the resulting clusters

In [339]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

In [340]:
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Examine Clusters

Cluster 1

In [341]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
40,East York,0,Greek Restaurant,Café,Bakery,Pizza Place,Coffee Shop,Ice Cream Shop,Park,Ethiopian Restaurant,Pub,Brewery
41,East Toronto,0,Café,Greek Restaurant,Park,Pizza Place,Coffee Shop,Bakery,Pub,Vietnamese Restaurant,Ice Cream Shop,Italian Restaurant
42,East Toronto,0,Coffee Shop,Park,Indian Restaurant,Brewery,Café,Pub,Beach,Restaurant,Bakery,BBQ Joint
43,East Toronto,0,Coffee Shop,Bakery,Café,Vietnamese Restaurant,Bar,Park,Restaurant,Brewery,American Restaurant,Pizza Place
45,Central Toronto,0,Coffee Shop,Italian Restaurant,Café,Park,Bakery,Pizza Place,Sushi Restaurant,Indian Restaurant,Sporting Goods Shop,Mexican Restaurant
46,Central Toronto,0,Coffee Shop,Italian Restaurant,Fast Food Restaurant,Café,Sporting Goods Shop,Sushi Restaurant,Japanese Restaurant,Bakery,Pizza Place,Breakfast Spot
47,Central Toronto,0,Coffee Shop,Café,Italian Restaurant,Bakery,Indian Restaurant,Park,Japanese Restaurant,Pizza Place,Sushi Restaurant,Mexican Restaurant
48,Central Toronto,0,Park,Italian Restaurant,Sushi Restaurant,Coffee Shop,Café,Restaurant,Grocery Store,Gym,Bank,Spa
49,Central Toronto,0,Italian Restaurant,Coffee Shop,Café,Park,Sushi Restaurant,Liquor Store,Pizza Place,American Restaurant,Vegetarian / Vegan Restaurant,Gym
50,Downtown Toronto,0,Coffee Shop,Café,Italian Restaurant,Park,Restaurant,Spa,Sushi Restaurant,Indian Restaurant,French Restaurant,Salad Place


The result shows that people in cluster 1 like to go to Coffee shop, Cafe and Park.

Cluster 2

In [342]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Scarborough,1,Pizza Place,Breakfast Spot,Coffee Shop,Fast Food Restaurant,Park,Asian Restaurant,Sandwich Place,Liquor Store,Bank,Beer Store
3,Scarborough,1,Coffee Shop,Indian Restaurant,Pizza Place,Fast Food Restaurant,Sandwich Place,Pharmacy,Burger Joint,Music Store,Bar,Bank
5,Scarborough,1,Sandwich Place,Fast Food Restaurant,Pharmacy,Breakfast Spot,Grocery Store,Coffee Shop,Chinese Restaurant,Pizza Place,Optical Shop,Restaurant
6,Scarborough,1,Fast Food Restaurant,Coffee Shop,Chinese Restaurant,Sandwich Place,Grocery Store,Pharmacy,Pizza Place,Bank,Beer Store,Sports Bar
7,Scarborough,1,Coffee Shop,Pizza Place,Grocery Store,Sandwich Place,Convenience Store,Intersection,Fast Food Restaurant,Dog Run,Bakery,Bank
15,Scarborough,1,Fast Food Restaurant,Coffee Shop,Chinese Restaurant,Sandwich Place,Pizza Place,Pool,Tennis Court,Bakery,Bank,Beer Store
17,North York,1,Coffee Shop,Chinese Restaurant,Park,Pharmacy,Bakery,Bank,Sandwich Place,Sushi Restaurant,Supermarket,Pizza Place
18,North York,1,Coffee Shop,Clothing Store,Middle Eastern Restaurant,Park,Fast Food Restaurant,Sandwich Place,Chinese Restaurant,Japanese Restaurant,Department Store,Bakery
20,North York,1,Coffee Shop,Bank,Park,Supermarket,Sandwich Place,Pharmacy,Furniture / Home Store,Burger Joint,Japanese Restaurant,Gym
23,North York,1,Coffee Shop,Bank,Sandwich Place,Pharmacy,Park,Fried Chicken Joint,Japanese Restaurant,Fast Food Restaurant,Burger Joint,Gym


The result shows that people in cluster 2 like to go to Coffee Shop, Pizza Place and Grocery Store.

Cluster 3

In [343]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Scarborough,2,Zoo Exhibit,Fast Food Restaurant,Pizza Place,Bakery,Coffee Shop,Paper / Office Supplies Store,Chinese Restaurant,Caribbean Restaurant,Movie Theater,Cosmetics Shop
4,Scarborough,2,Coffee Shop,Clothing Store,Sandwich Place,Bakery,Indian Restaurant,Hotel,Sporting Goods Shop,Pharmacy,Restaurant,Wings Joint
10,Scarborough,2,Fast Food Restaurant,Coffee Shop,Indian Restaurant,Grocery Store,Pizza Place,Electronics Store,Vietnamese Restaurant,Intersection,Burger Joint,Chinese Restaurant
11,Scarborough,2,Middle Eastern Restaurant,Coffee Shop,Restaurant,Pizza Place,Mediterranean Restaurant,Asian Restaurant,Sandwich Place,Breakfast Spot,Grocery Store,Supermarket
12,Scarborough,2,Chinese Restaurant,Shopping Mall,Coffee Shop,Caribbean Restaurant,Bakery,Asian Restaurant,Supermarket,Gym / Fitness Center,Cantonese Restaurant,Breakfast Spot
13,Scarborough,2,Fast Food Restaurant,Pharmacy,Park,Falafel Restaurant,Sandwich Place,Chinese Restaurant,Korean Restaurant,Coffee Shop,Bank,Shopping Mall
14,Scarborough,2,Chinese Restaurant,Bakery,Coffee Shop,Pizza Place,Pharmacy,Korean Restaurant,Dessert Shop,Tea Room,Discount Store,Dim Sum Restaurant
21,North York,2,Korean Restaurant,Coffee Shop,Bubble Tea Shop,Café,Japanese Restaurant,Dessert Shop,Pharmacy,Bank,Grocery Store,Juice Bar
22,North York,2,Korean Restaurant,Bubble Tea Shop,Coffee Shop,Japanese Restaurant,Pizza Place,Ramen Restaurant,Grocery Store,Café,Sandwich Place,Gym
26,North York,2,Coffee Shop,Japanese Restaurant,Pizza Place,Restaurant,Burger Joint,Park,Bank,Italian Restaurant,Ice Cream Shop,Asian Restaurant


The result shows that people in cluster 3 like to go to Coffee Shop, Restaurant and Backery.

Cluster 4

In [344]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,Scarborough,3,Donut Shop,Farm,National Park,Field,Egyptian Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market


The result shows that people in cluster 4 like to go to Donut Shop, Farm and National Park.

Cluster 5

In [345]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Scarborough,4,Grocery Store,Gym,Gym / Fitness Center,Breakfast Spot,Park,Burger Joint,Pizza Place,Italian Restaurant,Falafel Restaurant,Eastern European Restaurant
8,Scarborough,4,Park,Fast Food Restaurant,Harbor / Marina,Pizza Place,Beach,Grocery Store,Coffee Shop,Sandwich Place,Pharmacy,Train Station
9,Scarborough,4,Park,Thai Restaurant,Chinese Restaurant,General Entertainment,Café,Diner,Bank,Restaurant,Asian Restaurant,Fast Food Restaurant
19,North York,4,Bank,Park,Trail,Japanese Restaurant,Skate Park,Café,Chinese Restaurant,Shopping Mall,Grocery Store,Skating Rink
28,North York,4,Park,Coffee Shop,Pizza Place,Ski Chalet,Sushi Restaurant,Supermarket,Gas Station,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant
31,North York,4,Park,Grocery Store,Tea Room,Bank,Plaza,Moving Target,Pizza Place,Vietnamese Restaurant,Falafel Restaurant,Eastern European Restaurant
32,North York,4,Vietnamese Restaurant,Pharmacy,Supermarket,Coffee Shop,Grocery Store,Park,New American Restaurant,Sandwich Place,Salon / Barbershop,Bank
88,Etobicoke,4,Park,Bakery,Café,Indian Restaurant,Grocery Store,Convenience Store,Sandwich Place,Restaurant,Bar,General Entertainment
94,Etobicoke,4,Park,Hotel,Bank,Gym,Intersection,Grocery Store,Gym / Fitness Center,Theater,Electronics Store,Farmers Market
96,North York,4,Park,Asian Restaurant,Vietnamese Restaurant,Mexican Restaurant,Sports Bar,Latin American Restaurant,Electronics Store,Bank,Shopping Mall,Italian Restaurant


The result shows that people in cluster 5 like to go to Park and Grocery Store.