# Clustering Neghborhoods in Toronto

Using the Toronto neighborhood classification from https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, we will explore the neighborhood using the Foursquare API and then create clusters using k-Means Clustering ALgorithm. Finally, we will map the clusters using Folium  

First,we import all the necessary libraries

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
import json 
import geocoder
from pandas import json_normalize
from sklearn.cluster import KMeans
import folium 

print('Libraries imported.')

Libraries imported.


Next, we use BeautifulSoup4 to parse the HTML script of the link above and find the table containing the pertinent information, i.e. the neighborhoods and postal codes. 

In [2]:
url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
soup = BeautifulSoup(url.content,'html.parser')
interest = soup.find('table', class_='wikitable sortable')
rows = interest.find('tbody').find_all('tr')
rows_array=[] #to convert into ndarray
for x in rows:
    x_lst = [] #this would be a single row
    for y in x.find_all('td'):
        x_lst.append(y.get_text())
    x_lst = list(map(lambda s: s.strip(), x_lst)) #stripping the rows of whitespace characters that were carried over from the HTML script
    rows_array.append(x_lst)

We then map `rows_array` to a dataframe, `df`.

In [3]:
df = pd.DataFrame(rows_array)
df.rename(columns={0:'Postal Code', 1:'Borough', 2:'Neighborhood'}, inplace=True)
df.drop([0], inplace=True)
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
1,M1A,Not assigned,
2,M2A,Not assigned,
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [4]:
df.reset_index(inplace=True)
df.drop(columns=['index'], inplace=True)
df.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,
8,M9A,Etobicoke,Islington Avenue
9,M1B,Scarborough,"Malvern, Rouge"


Getting the dimensions of the dataframe

In [5]:
df.shape

(180, 3)

Removing rows where Borough isn't assigned

In [6]:
df = df[df['Borough']!='Not assigned']
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


The new dimensions are now:

In [7]:
df.shape

(103, 3)

Next, we will use `geocoder` to get the coordinates of every borough. Since the package is unreliable and sometimes provides no results even though they exist, we will run a `while` loop till `geocoder` provides results we can store.

First we create columns for the latitude and longitude

In [8]:
df.loc[:, 'Latitude'] = np.nan
df.loc[:, 'Longitude'] = np.nan
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
2,M3A,North York,Parkwoods,,
3,M4A,North York,Victoria Village,,
4,M5A,Downtown Toronto,"Regent Park, Harbourfront",,
5,M6A,North York,"Lawrence Manor, Lawrence Heights",,
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",,


In [9]:
df.reset_index(inplace=True)
df.drop(columns='index', inplace=True)

In [10]:
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,,
1,M4A,North York,Victoria Village,,
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",,
3,M6A,North York,"Lawrence Manor, Lawrence Heights",,
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",,


In [11]:
for x in df['Postal Code']:
    pcindex = df.set_index('Postal Code').index.get_loc(x)
    coords = None
    while(coords is None):
        g = geocoder.bing('{}, Toronto, Ontario'.format(x), key='As1R7w1ShKBfPP0G1D40E18xHxVX_bgYkc6Asvvf2YpzNWxCRn5VtixACZwMnFTp')
        coords = g.latlng
    df.at[pcindex, 'Latitude'] = coords[0]
    df.at[pcindex, 'Longitude'] = coords[1]
    
df

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.751881,-79.330360
1,M4A,North York,Victoria Village,43.730419,-79.312820
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.655140,-79.362648
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.723209,-79.451408
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.664490,-79.393021
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653690,-79.511124
99,M4Y,Downtown Toronto,Church and Wellesley,43.666592,-79.381302
100,M7Y,East Toronto,Business reply mail Processing Centre,43.648689,-79.385437
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.632881,-79.489548


Now, we're going to map the neighborhoods to a map using `Folium` 

In [12]:
h = geocoder.bing('Toronto, Ontario', key='As1R7w1ShKBfPP0G1D40E18xHxVX_bgYkc6Asvvf2YpzNWxCRn5VtixACZwMnFTp')
location = h.latlng
map_toronto = folium.Map(location=[location[0],location[1]], zoom_start=10)

for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood,borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)
    
map_toronto

We'll now use the Foursquare API to explore the neighborhoods and then perform k-Means clustering on them. First we need our API crednetials.

In [13]:
CLIENT_ID = 'DL5GYQMRSFCW3SIQZXTRSII0AYDVFS45U2H4WJJFFPPNZHTJ'
CLIENT_SECRET = 'TG1TYEHYIDNJB2NL5TBG2RVKTOOL1MXA0NZJIPM4XGIHI01N'
VERSION = '20180605'

print('''Your credentials: 
CLIENT_ID: %s
ClIENT_SECRET: %s''' % (CLIENT_ID, CLIENT_SECRET))

Your credentials: 
CLIENT_ID: DL5GYQMRSFCW3SIQZXTRSII0AYDVFS45U2H4WJJFFPPNZHTJ
ClIENT_SECRET: TG1TYEHYIDNJB2NL5TBG2RVKTOOL1MXA0NZJIPM4XGIHI01N


Let's define a function to get nearby venues for the postal codes in Toronto.

In [34]:
radius = 500
limit = 100

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name'],
            v['venue']['id']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category',
                  'id']
    
    return(nearby_venues)




And write the results to a dataframe called `toronto_venues`

In [36]:
toronto_venues = getNearbyVenues(names=df['Neighborhood'],
                                 latitudes = df['Latitude'],
                                 longitudes = df['Longitude'])

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmount Park
Bayview Village
Downsview
The Danforth West, Ri

In [37]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,id
0,Parkwoods,43.751881,-79.33036,Brookbanks Park,43.751976,-79.33214,Park,4e8d9dcdd5fbbbb6b3003c7b
1,Parkwoods,43.751881,-79.33036,PetSmart,43.748639,-79.333488,Pet Store,4e13a489b0fb5dfdd0756c90
2,Parkwoods,43.751881,-79.33036,Careful & Reliable Painting,43.752622,-79.331957,Construction & Landscaping,58d5b8d0102f4722b70e487b
3,Parkwoods,43.751881,-79.33036,Variety Store,43.751974,-79.333114,Food & Drink Shop,4cb11e2075ebb60cd1c4caad
4,Victoria Village,43.730419,-79.31282,Memories of Africa,43.726602,-79.312427,Grocery Store,4bd7591b35aad13a78ae8ef3


Let's get a count of venues in every neighborhood

In [39]:
toronto_count = toronto_venues.groupby('Neighborhood').count()
toronto_count.drop(columns=['Neighborhood Longitude','Venue','Venue Latitude', 'Venue Longitude', 'Venue Category','id'], inplace=True)
toronto_count.rename(columns={'Neighborhood Latitude':'count'}, inplace=True)
toronto_count

Unnamed: 0_level_0,count
Neighborhood,Unnamed: 1_level_1
Agincourt,8
"Alderwood, Long Branch",7
Bayview Village,4
"Bedford Park, Lawrence Manor East",20
Berczy Park,65
...,...
"Willowdale, Newtonbrook",19
Woburn,4
Woodbine Heights,16
York Mills West,4


In [40]:
print('There are {} unique categories'.format(len(toronto_venues['Venue Category'].unique())))
print(toronto_venues.shape)

There are 263 unique categories
(2405, 8)


Let's display these neighborhoods as a map and cluster the venues so it's easier to read.

In [41]:
from folium import plugins

map_toronto = folium.Map(location =[location[0],location[1]], zoom_start = 11)

# instantiate a mark cluster object for the venues in the dataframe
venues = plugins.MarkerCluster().add_to(map_toronto)

# loop through the dataframe and add each data point to the mark cluster
for lat, lng, label, in zip(toronto_venues['Venue Latitude'],toronto_venues['Venue Longitude'],toronto_venues['Venue']):
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=label,
    ).add_to(venues)

# display map
map_toronto

To analyze and perform k-Means clustering on the neighborhoods, we first need to assign dummy values for each venue category to the neighborhoods.

In [42]:
# one hot encoding
toronto_dummy = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_dummy['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
first = toronto_dummy['Neighborhood']
toronto_dummy.drop(['Neighborhood'], axis=1, inplace=True)
toronto_dummy.insert(0, 'Neighborhood', first)
toronto_dummy.head()

Unnamed: 0,Neighborhood,ATM,Accessories Store,Afghan Restaurant,Airport,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,...,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [43]:
toronto_dummy.shape

(2405, 263)

Let's group by neighborhood and get the means of each category

In [46]:
toronto_dummy = toronto_dummy.groupby('Neighborhood').mean().reset_index()
toronto_dummy

Unnamed: 0,Neighborhood,ATM,Accessories Store,Afghan Restaurant,Airport,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,...,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
2,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
3,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
4,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,...,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
91,"Willowdale, Newtonbrook",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
92,Woburn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
93,Woodbine Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
94,York Mills West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000


Let's limit the dataframe to only the top 10 venues for each neighborhood

In [47]:
def return_most_common_venues(row, num):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num]

In [49]:
num = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num):
    try:
        columns.append('{}{}'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th'.format(ind+1))

# create a new dataframe
neighborhoods_sorted = pd.DataFrame(columns=columns)
neighborhoods_sorted['Neighborhood'] = toronto_dummy['Neighborhood']

for ind in np.arange(toronto_dummy.shape[0]):
    neighborhoods_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_dummy.iloc[ind, :], num)

neighborhoods_sorted.head()

Unnamed: 0,Neighborhood,1st,2nd,3rd,4th,5th,6th,7th,8th,9th,10th
0,Agincourt,Badminton Court,Park,Discount Store,Sushi Restaurant,Skating Rink,Pool,Shopping Mall,Supermarket,Yoga Studio,Ethiopian Restaurant
1,"Alderwood, Long Branch",Pizza Place,Pub,Sandwich Place,Dance Studio,Athletics & Sports,Gym,Convenience Store,Farmers Market,Farm,Falafel Restaurant
2,Bayview Village,Construction & Landscaping,Trail,Park,Dog Run,Falafel Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Farmers Market
3,"Bedford Park, Lawrence Manor East",Coffee Shop,Italian Restaurant,Sandwich Place,Liquor Store,Sushi Restaurant,Restaurant,Pub,Thai Restaurant,Sports Club,Café
4,Berczy Park,Coffee Shop,Cocktail Bar,Restaurant,Breakfast Spot,Hotel,Seafood Restaurant,Bakery,Beer Bar,Café,Cheese Shop


We will now start to perform k-Means clustering. First we need to decide the optimal number of clusters, for this we will use the '**silhouette score**' from the `sklearn` metrics library. For this test, a higher score is better and it reaches its maximum at the global optimum for k.

In [59]:
from sklearn.metrics import silhouette_score

sil=[]
kMax = 50

toronto_clusters = toronto_dummy.drop('Neighborhood', axis=1)
for k in range(2, kMax+1):
    kmeans = KMeans(n_clusters=k, random_state=0).fit(toronto_clusters)
    labels = kmeans.labels_
    sil.append(silhouette_score(toronto_clusters, labels, metric='euclidean'))

In [60]:
sil

[0.3313215406582571,
 0.19396886455050932,
 0.2013765317883567,
 0.20926371109802855,
 0.21763396690795603,
 0.2313269217527499,
 0.24727957672117437,
 0.19932165984646302,
 0.2282325349473163,
 0.21036200484047193,
 0.21536431351078647,
 0.21744457365717548,
 0.18178598975534446,
 0.16739349489397237,
 0.13016531293474032,
 0.1382865095534425,
 0.13875333306151336,
 0.14185373428508777,
 0.09164398052834603,
 0.16306583303295352,
 0.19678282547389658,
 0.13433744976053033,
 0.19973928736681681,
 0.2052617859136973,
 0.20414921498755614,
 0.2063848178425959,
 0.1443013361425697,
 0.1592634460261416,
 0.16138563986099339,
 0.09552535160810266,
 -0.06969208980702447,
 0.06558548846275432,
 0.062310064596488995,
 0.06104832638643271,
 -0.06994038787300137,
 0.08955666956385794,
 0.03135015337512711,
 0.0810816757933901,
 0.05402852694872412,
 0.055729585802561175,
 0.037062291500263356,
 0.07485474948968589,
 0.07485474948968589,
 0.013453912641653527,
 0.014194913519285632,
 0.0171046064

Now let's get the largest score and its corresponding index, i.e. the optimum k.

In [57]:
import operator
index, value = max(enumerate(sil), key=operator.itemgetter(1))
print(index+2, value)

2 0.3313215406582571


The highest sil-score gives too low a k so let's choose the next largest score

In [67]:
for n,i in enumerate(sil):
    if i == 0.3313215406582571:
        sil[n] = 0
sil

[0,
 0.19396886455050932,
 0.2013765317883567,
 0.20926371109802855,
 0.21763396690795603,
 0.2313269217527499,
 0.24727957672117437,
 0.19932165984646302,
 0.2282325349473163,
 0.21036200484047193,
 0.21536431351078647,
 0.21744457365717548,
 0.18178598975534446,
 0.16739349489397237,
 0.13016531293474032,
 0.1382865095534425,
 0.13875333306151336,
 0.14185373428508777,
 0.09164398052834603,
 0.16306583303295352,
 0.19678282547389658,
 0.13433744976053033,
 0.19973928736681681,
 0.2052617859136973,
 0.20414921498755614,
 0.2063848178425959,
 0.1443013361425697,
 0.1592634460261416,
 0.16138563986099339,
 0.09552535160810266,
 -0.06969208980702447,
 0.06558548846275432,
 0.062310064596488995,
 0.06104832638643271,
 -0.06994038787300137,
 0.08955666956385794,
 0.03135015337512711,
 0.0810816757933901,
 0.05402852694872412,
 0.055729585802561175,
 0.037062291500263356,
 0.07485474948968589,
 0.07485474948968589,
 0.013453912641653527,
 0.014194913519285632,
 0.017104606448748855,
 0.0213

Now let's find the new maximum

In [68]:
index, value = max(enumerate(sil), key=operator.itemgetter(1))
print(index+2, value)

8 0.24727957672117437


So let's use k=8 for our clustering.

In [99]:
kclusters = 8

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_clusters)

Let's add the cluster labels to our sorted venue dataframe `neighborhoods_sorted` and merge it with our original dataframe `df`

In [100]:
neighborhoods_sorted.drop('Clusters', axis=1, inplace=True)
neighborhoods_sorted.insert(0, 'Clusters', kmeans.labels_)
toronto_merged = df
toronto_merged = toronto_merged.join(neighborhoods_sorted.set_index('Neighborhood'), on='Neighborhood')
toronto_merged.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Clusters,1st,2nd,3rd,4th,5th,6th,7th,8th,9th,10th
0,M3A,North York,Parkwoods,43.751881,-79.33036,6.0,Construction & Landscaping,Pet Store,Park,Food & Drink Shop,Cosmetics Shop,Donut Shop,Flea Market,Fish Market,Fish & Chips Shop,Field
1,M4A,North York,Victoria Village,43.730419,-79.31282,0.0,Nail Salon,Grocery Store,Intersection,Yoga Studio,Dog Run,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65514,-79.362648,0.0,Coffee Shop,Breakfast Spot,Yoga Studio,Theater,Pub,Distribution Center,Restaurant,Electronics Store,Event Space,Food Truck
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.723209,-79.451408,0.0,Clothing Store,Pharmacy,Toy / Game Store,Restaurant,Men's Store,Bookstore,Food Court,Furniture / Home Store,Cosmetics Shop,American Restaurant
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66449,-79.393021,0.0,Coffee Shop,Café,Park,Pub,College Theater,Museum,Sandwich Place,Salon / Barbershop,Salad Place,Restaurant


In [101]:
toronto_merged.dropna(axis=0,subset=['Clusters'], inplace=True)

Finally let's visualize the neighborhood clusters on a map.

In [102]:
import matplotlib.cm as cm
import matplotlib.colors as colors

In [103]:
# create map
map_clusters = folium.Map(location=[location[0],location[1]], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Clusters']):
    cluster = int(cluster)
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters