# Segmenting and Clustering Toronto Neighborhoods

## Scrape Wikipedia

In [91]:
import pandas as pd

import numpy as np

wiki_link = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

Pandas provides a method for reading html directly into a DataFrame.

In [11]:
# Read the wikipedia tables into dataframes
dfs = pd.read_html(wiki_link)
# the first DF contains borough data
df = dfs[0]
df.columns

Index(['Postalcode', 'Borough', 'Neighborhood'], dtype='object')

## Scrub the Data

Now that the data is obtained, we must clean it up a bit. 

In [12]:
# The DF will consist of three columns: PostalCode, Borough and Neighborhood
df = df.rename(columns={'Postalcode':'PostalCode'})
df.columns

Index(['PostalCode', 'Borough', 'Neighborhood'], dtype='object')

We are tasked with removing any rows that have a borough of 'Not assigned'. This will also remove any rows that may have had a value of 'Not assigned' in neighborhood

In [13]:
# Ignore cells that have a borough 'Not assigned'
# This also captures empty Neighborhood fields
df = df.drop(labels=df.loc[df.Borough == 'Not assigned'].index)
# Reset the index
df.reset_index(drop=True, inplace=True)

Replace the characters ' / ' with ', ' to match the formatting of the provided example

In [14]:
# Use commas instead of slashes for boroughs made up of multiple
# Neighborhoods
df.Neighborhood = df.Neighborhood.apply(lambda x: x.replace(' / ', ', '))

# Use example from prompt to show it is completed
df.loc[df.PostalCode == 'M5A']

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [15]:
# Print the number of rows in our DataFrame
df.shape

(103, 3)

# Obtaining Coordinate Data

In [16]:
import geocoder

We will use the ArcGIS provider, as it seems to be the most reliable.

In [17]:
def get_lat_lon(p_code):
    print('Trying to get coordinates for {}'.format(p_code))
    lat_lon = None
    while not lat_lon:
        geo_str = '{}, Toronto'.format(p_code)
        g = geocoder.arcgis(geo_str)
        lat_lon = g.latlng
    print('Successfully got coordinates for {}'.format(p_code))
    return lat_lon

Use the applymap function to send each Postal Code to the get_lat_lon function. Please be patient, this may take well over 1 minute to obtain each Postal Code.

In [18]:
lat_lon_df = df[['PostalCode']].applymap(get_lat_lon)

Trying to get coordinates for M3A
Successfully got coordinates for M3A
Trying to get coordinates for M4A
Successfully got coordinates for M4A
Trying to get coordinates for M5A
Successfully got coordinates for M5A
Trying to get coordinates for M6A
Successfully got coordinates for M6A
Trying to get coordinates for M7A
Successfully got coordinates for M7A
Trying to get coordinates for M9A
Successfully got coordinates for M9A
Trying to get coordinates for M1B
Successfully got coordinates for M1B
Trying to get coordinates for M3B
Successfully got coordinates for M3B
Trying to get coordinates for M4B
Successfully got coordinates for M4B
Trying to get coordinates for M5B
Successfully got coordinates for M5B
Trying to get coordinates for M6B
Successfully got coordinates for M6B
Trying to get coordinates for M9B
Successfully got coordinates for M9B
Trying to get coordinates for M1C
Successfully got coordinates for M1C
Trying to get coordinates for M3C
Successfully got coordinates for M3C
Trying

Successfully got coordinates for M1C
Trying to get coordinates for M3C
Successfully got coordinates for M3C
Trying to get coordinates for M4C
Successfully got coordinates for M4C
Trying to get coordinates for M5C
Successfully got coordinates for M5C
Trying to get coordinates for M6C
Successfully got coordinates for M6C
Trying to get coordinates for M9C
Successfully got coordinates for M9C
Trying to get coordinates for M1E
Successfully got coordinates for M1E
Trying to get coordinates for M4E
Successfully got coordinates for M4E
Trying to get coordinates for M5E
Successfully got coordinates for M5E
Trying to get coordinates for M6E
Successfully got coordinates for M6E
Trying to get coordinates for M1G
Successfully got coordinates for M1G
Trying to get coordinates for M4G
Successfully got coordinates for M4G
Trying to get coordinates for M5G
Successfully got coordinates for M5G
Trying to get coordinates for M6G
Successfully got coordinates for M6G
Trying to get coordinates for M1H
Succes

Make sure the shape of this DF matches the original

In [22]:
lat_lon_df.shape

(103, 1)

We now will insert the latitude and longitude values into the original DataFrame.

In [23]:
df['Latitude'] = lat_lon_df.PostalCode.map(lambda x: x[0])
df['Longitude'] = lat_lon_df.PostalCode.map(lambda x: x[1])
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.752935,-79.335641
1,M4A,North York,Victoria Village,43.728102,-79.31189
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.650964,-79.353041
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.723265,-79.451211
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66179,-79.38939


To verify this worked correctly, lets check if our data matches with the provided example. We will see there are slight differences, but this can be assumed to be caused by using different content providers (Google vs ArcGIS)

In [24]:
df.loc[df.PostalCode.isin(['M5G', 'M2H', 'M4B', 'M1J'])]

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.707193,-79.311529
24,M5G,Downtown Toronto,Central Bay Street,43.656072,-79.385653
27,M2H,North York,Hillcrest Village,43.802556,-79.356566
32,M1J,Scarborough,Scarborough Village,43.744203,-79.228725


# Exploring and Clustering Toronto Neighborhoods

In [68]:
import folium

import requests

Use geocoder with the ArcGIS provider to get the address of Toronto, ON.

In [38]:
address = 'Toronto, ON'

g = geocoder.arcgis(address)
lat = g.latlng[0]
lon = g.latlng[1]

In [40]:
print('The geographical coordinates of Toronto, ON are {}, {}.'.format(lat, lon))

The geographical coordinates of Toronto, ON are 43.648690000000045, -79.38543999999996.


Create a map of Toronto with neighborhoods superimposed on top.

In [51]:
map_toronto = folium.Map(location=g.latlng, zoom_start=9)

for lat, lon, borough, neighborhood in zip(df['Latitude'], 
                                            df['Longitude'], 
                                            df['Borough'], 
                                            df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(location=[lat, lon],
                    radius=5,
                    popup=label,
                    color='blue',
                    fill=True,
                    fill_color='#3186cc',
                    fill_opacity=0.7,
                    parse_html=False).add_to(map_toronto)

map_toronto

We will just use boroughs that contain the word 'Toronto'

In [56]:
toronto_boroughs = df.loc[df.Borough.str.contains('Toronto')].reset_index(drop=True)
toronto_boroughs

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.650964,-79.353041
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66179,-79.38939
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657491,-79.377529
3,M5C,Downtown Toronto,St. James Town,43.651734,-79.375554
4,M4E,East Toronto,The Beaches,43.678148,-79.295349
5,M5E,Downtown Toronto,Berczy Park,43.645196,-79.373855
6,M5G,Downtown Toronto,Central Bay Street,43.656072,-79.385653
7,M6G,Downtown Toronto,Christie,43.668602,-79.420387
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650542,-79.384116
9,M6H,West Toronto,"Dufferin, Dovercourt Village",43.66491,-79.438664


Now visualize the boroughs that contained the word 'Toronto'

In [60]:
map_boroughs = folium.Map(location=g.latlng, zoom_start=11)

for lat, lon, borough, neighborhood in zip(toronto_boroughs['Latitude'], 
                                            toronto_boroughs['Longitude'], 
                                            toronto_boroughs['Borough'], 
                                            toronto_boroughs['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(location=[lat, lon],
                    radius=5,
                    popup=label,
                    color='blue',
                    fill=True,
                    fill_color='#3186cc',
                    fill_opacity=0.7,
                    parse_html=False).add_to(map_boroughs)

map_boroughs

Need to specify Foursquare credentials to get the data through their API.

In [66]:
CLIENT_ID = 'your-client-id' # your Foursquare ID
CLIENT_SECRET = 'DSJBVXIKE430PKYPHM5Z3Q5YA1VSNTRUY5IIRGNTLUAJCDYH' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: S2TKMI5O5YER1QGL0BXENI4QIZFRGUIKXDHBSTW3MYBDYRIY
CLIENT_SECRET:DSJBVXIKE430PKYPHM5Z3Q5YA1VSNTRUY5IIRGNTLUAJCDYH


Define a function to obtain venues that are nearby to the Boroughs

In [64]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Use previously defined function to obtain a list of toronto venues.

In [69]:
toronto_venues = getNearbyVenues(names=toronto_boroughs['Neighborhood'],
                                   latitudes=toronto_boroughs['Latitude'],
                                   longitudes=toronto_boroughs['Longitude']
                                  )

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
The Danforth West, Riverdale
Toronto Dominion Centre, Design Exchange
Brockton, Parkdale Village, Exhibition Place
India Bazaar, The Beaches West
Commerce Court, Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North & West
High Park, The Junction South
North Toronto West
The Annex, North Midtown, Yorkville
Parkdale, Roncesvalles
Davisville
University of Toronto, Harbord
Runnymede, Swansea
Moore Park, Summerhill East
Kensington Market, Chinatown, Grange Park
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
Stn A PO Boxes
St. James Town,

Check the size and view the first few results.

In [70]:
print(toronto_venues.shape)
toronto_venues.head()

(1599, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.650964,-79.353041,Souk Tabule,43.653756,-79.35439,Mediterranean Restaurant
1,"Regent Park, Harbourfront",43.650964,-79.353041,Young Centre for the Performing Arts,43.650825,-79.357593,Performing Arts Venue
2,"Regent Park, Harbourfront",43.650964,-79.353041,SOMA chocolatemaker,43.650622,-79.358127,Chocolate Shop
3,"Regent Park, Harbourfront",43.650964,-79.353041,BATLgrounds,43.647088,-79.351306,Athletics & Sports
4,"Regent Park, Harbourfront",43.650964,-79.353041,Cluny Bistro & Boulangerie,43.650565,-79.357843,French Restaurant


Check how many vanues were returned for each neighborhood.

In [71]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,67,67,67,67,67,67
"Brockton, Parkdale Village, Exhibition Place",44,44,44,44,44,44
Business reply mail Processing Centre,100,100,100,100,100,100
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",66,66,66,66,66,66
Central Bay Street,51,51,51,51,51,51
Christie,12,12,12,12,12,12
Church and Wellesley,84,84,84,84,84,84
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
Davisville,26,26,26,26,26,26
Davisville North,5,5,5,5,5,5


Find out how many unique categories can be curated from all the returned venues.

In [72]:
cat_count = len(toronto_venues['Venue Category'].unique())
print('There are {} unique venue categories.'.format(cat_count))

There are 219 unique venue categories.


# Analyzing Each Neighborhood

We would like to see how many of each category is in each Neighborhood. The cell below will construct a new DataFrame containing counts of each category, along with a Neighborhood column.

In [82]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
cols = list(toronto_onehot)
cols.insert(0, cols.pop(cols.index('Neighborhood')))
toronto_onehot = toronto_onehot.loc[:, cols]

toronto_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,...,Toy / Game Store,Trail,Train Station,Tram Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront",0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
4,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [75]:
toronto_onehot.shape

(1599, 219)

We can see the frequency at which each category occurs in each neighborhood by using groupby.

In [85]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,...,Toy / Game Store,Trail,Train Station,Tram Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.014925,...,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.014925
1,"Brockton, Parkdale Village, Exhibition Place",0.022727,0.0,0.0,0.022727,0.0,0.022727,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Business reply mail Processing Centre,0.0,0.02,0.0,0.01,0.0,0.0,0.03,0.0,0.0,...,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0,0.0,0.0,0.0,0.0,0.015152,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.019608,0.019608,0.0,0.0,0.0
5,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Church and Wellesley,0.0,0.011905,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011905
7,"Commerce Court, Victoria Hotel",0.0,0.04,0.0,0.01,0.0,0.0,0.01,0.0,0.0,...,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01
8,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [87]:
toronto_grouped.shape

(39, 219)

We can print each neighborhood along with the top 5 most common venues.

In [89]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [92]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Cocktail Bar,Beer Bar,Seafood Restaurant,Lounge,Bakery,Breakfast Spot,Restaurant,Cheese Shop,Hotel
1,"Brockton, Parkdale Village, Exhibition Place",Coffee Shop,Café,Thrift / Vintage Store,Diner,Pizza Place,Gift Shop,Sandwich Place,Boutique,Italian Restaurant,Brewery
2,Business reply mail Processing Centre,Coffee Shop,Hotel,Japanese Restaurant,Café,Restaurant,Asian Restaurant,Italian Restaurant,Theater,Steakhouse,Bookstore
3,"CN Tower, King and Spadina, Railway Lands, Har...",Coffee Shop,Café,Restaurant,French Restaurant,Park,Bar,Speakeasy,Lounge,Italian Restaurant,Japanese Restaurant
4,Central Bay Street,Coffee Shop,Café,Middle Eastern Restaurant,Plaza,Clothing Store,Restaurant,Bubble Tea Shop,Sandwich Place,Hotel,Mexican Restaurant


Wow, they sure do like coffee in Toronto!

# Cluster Neighborhoods

In [108]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

import matplotlib.cm as cm
import matplotlib.colors as colors

In [134]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)
toronto_grouped_clustering
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 0, 0, 0,
       0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0], dtype=int32)

In [135]:
# add clustering labels
#neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_boroughs

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.650964,-79.353041,0,Pub,Café,Athletics & Sports,Coffee Shop,Performing Arts Venue,Theater,Seafood Restaurant,Mexican Restaurant,Food Truck,French Restaurant
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66179,-79.38939,0,Coffee Shop,Sushi Restaurant,Café,Yoga Studio,Discount Store,Pharmacy,Park,Middle Eastern Restaurant,Juice Bar,Italian Restaurant
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657491,-79.377529,0,Coffee Shop,Clothing Store,Sandwich Place,Middle Eastern Restaurant,Café,Cosmetics Shop,Restaurant,Hotel,Bar,Italian Restaurant
3,M5C,Downtown Toronto,St. James Town,43.651734,-79.375554,0,Coffee Shop,Café,American Restaurant,Seafood Restaurant,Cosmetics Shop,Gastropub,Cocktail Bar,Theater,Italian Restaurant,Hotel
4,M4E,East Toronto,The Beaches,43.678148,-79.295349,0,Health Food Store,Trail,Pub,Yoga Studio,Donut Shop,Flower Shop,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market


In [136]:
# create map
map_clusters = folium.Map(location=g.latlng, zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Examine Clusters

### Cluster 1

In [137]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,0,Pub,Café,Athletics & Sports,Coffee Shop,Performing Arts Venue,Theater,Seafood Restaurant,Mexican Restaurant,Food Truck,French Restaurant
1,Downtown Toronto,0,Coffee Shop,Sushi Restaurant,Café,Yoga Studio,Discount Store,Pharmacy,Park,Middle Eastern Restaurant,Juice Bar,Italian Restaurant
2,Downtown Toronto,0,Coffee Shop,Clothing Store,Sandwich Place,Middle Eastern Restaurant,Café,Cosmetics Shop,Restaurant,Hotel,Bar,Italian Restaurant
3,Downtown Toronto,0,Coffee Shop,Café,American Restaurant,Seafood Restaurant,Cosmetics Shop,Gastropub,Cocktail Bar,Theater,Italian Restaurant,Hotel
4,East Toronto,0,Health Food Store,Trail,Pub,Yoga Studio,Donut Shop,Flower Shop,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
5,Downtown Toronto,0,Coffee Shop,Cocktail Bar,Beer Bar,Seafood Restaurant,Lounge,Bakery,Breakfast Spot,Restaurant,Cheese Shop,Hotel
6,Downtown Toronto,0,Coffee Shop,Café,Middle Eastern Restaurant,Plaza,Clothing Store,Restaurant,Bubble Tea Shop,Sandwich Place,Hotel,Mexican Restaurant
7,Downtown Toronto,0,Grocery Store,Café,Park,Baby Store,Athletics & Sports,Coffee Shop,Candy Store,Playground,Fish & Chips Shop,Fish Market
8,Downtown Toronto,0,Coffee Shop,Café,Restaurant,Clothing Store,Salad Place,Gym,Sushi Restaurant,Deli / Bodega,Thai Restaurant,Hotel
9,West Toronto,0,Park,Smoke Shop,Pharmacy,Brazilian Restaurant,Café,Liquor Store,Bank,Bakery,Furniture / Home Store,Pool


### Cluster 2

In [138]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,Central Toronto,1,Gym / Fitness Center,Park,Yoga Studio,Eastern European Restaurant,Flower Shop,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm


### Cluster 3

In [139]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,Central Toronto,2,Bus Line,Swim School,Yoga Studio,Food Court,Food,Flower Shop,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market


### Cluster 4

In [140]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,East Toronto,3,Business Service,Government Building,Night Market,Yoga Studio,Electronics Store,Flower Shop,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market


### Cluster 5

In [141]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,Central Toronto,4,IT Services,Yoga Studio,Eastern European Restaurant,Food,Flower Shop,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm


We can see 'Cluster 1' ends up being really the only true cluster as all other clusters end up with only one Borough in each. One though on why this may be is that in Cluster 1, Cafes and Coffee Shops are the most common venues, while in the others we don't see coffee shops or cafes. 