# Neighborhood Clustering and Segmentation in Toronto, Canada

This project is being developed for a weekly submission assignment in the Cousera Data Sciene Capstone course offered by IBM.

We will collect data of different neighborhoods in Toronto, Canada and then segment and cluster similar neighborhoods together by using geospatial data given by <b>Foursquare API.</b>

The data being used in this notebook is given on a <a href='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'>This Wikipedia Page</a>. We will scrape the data into a pandas dataframe and process the data to be usable in our k-means clustering model. The coordinates for different neighborhoods can be downloaded by <a href='http://cocl.us/Geospatial_data'>This Link</a> provided by Coursera, or by using geocoders python library.

### Importing Libraries

In [14]:
import pandas as pd
import numpy as np
import bs4 as bs
import requests

### Fetching dataset from wikipedia

In [15]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

dfs = pd.read_html(url, header=0)
toronto_df = dfs[0]

toronto_df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


Dropping the rows where <b>Borough</b> is Not assigned

In [16]:
toronto_df = toronto_df[toronto_df['Borough'] != 'Not assigned']

#### Functions for processing of data

In [18]:
def concat_neighborhoods(series):
    string = ''
    for item in series:
        string = string + item + ', '
    return string

def ret_unique(series):
    items = []
    for item in series:
        items.append(item.split(',')[0])
    return items

def set_neighborhood(borough, neighborhood):
    items = []
    for boro, item in zip(borough, neighborhood):
        if item == 'Not assigned':
            items.append(boro)
        elif item:
            items.append(item)
            
    return items

In [32]:
toronto_aggregated = toronto_df.groupby(toronto_df['Postcode']).aggregate(concat_neighborhoods)
toronto_aggregated = toronto_aggregated.reset_index()
toronto_aggregated['Neighbourhood'] = toronto_aggregated['Neighbourhood'].astype(str).str[:-2]
toronto_aggregated['Borough'] = ret_unique(toronto_aggregated['Borough'])
toronto_aggregated.columns = ['PostCode', 'Borough', 'Neighborhood']
toronto_aggregated['Neighborhood'] = set_neighborhood(toronto_aggregated['Borough'], toronto_aggregated['Neighborhood'])
toronto_aggregated.head()

Unnamed: 0,PostCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [33]:
print("The shape of Cleansed Toronto Dataframe is : ", toronto_aggregated.shape)

The shape of Cleansed Toronto Dataframe is :  (103, 3)


### Adding coordinates of each individual Neighborhood using geocoder package

In [21]:
import geocoder

In [22]:
"""
latitude = []
longitude = []
for code in toronto_aggregated['PostCode']:
    lat_long_coords = None
    
    while lat_long_coords is None:
        g = geocoder.google('{}, Toronto, Ontario'.format(code))
        lat_long_coords = g.latlng
        
    latitude.append(lat_long_coords[0])
    longitude.append(lat_long_coords[1])
    
toronto_aggregated['Latitude'] = latitude
toronto_aggregated['Longitude'] = longitude
toronto_aggregated
"""

"\nlatitude = []\nlongitude = []\nfor code in toronto_aggregated['PostCode']:\n    lat_long_coords = None\n    \n    while lat_long_coords is None:\n        g = geocoder.google('{}, Toronto, Ontario'.format(code))\n        lat_long_coords = g.latlng\n        \n    latitude.append(lat_long_coords[0])\n    longitude.append(lat_long_coords[1])\n    \ntoronto_aggregated['Latitude'] = latitude\ntoronto_aggregated['Longitude'] = longitude\ntoronto_aggregated\n"

### Adding coordinates of Neighborhoods from .csv file

In [23]:
geodata = pd.read_csv('Geospatial_Coordinates.csv')
geodata.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [24]:
toronto_merged = toronto_aggregated.join(geodata.set_index('Postal Code'), on='PostCode')
toronto_merged.head()

Unnamed: 0,PostCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


Get Latitude and Longitude of Toronto

In [25]:
from geopy.geocoders import Nominatim

geocoder = Nominatim()
g = geocoder.geocode('Toronto, Ontario')

t_latitude = g.latitude
t_longitude = g.longitude

print("Latitude and Logitude of Toronto, Ontario are: {}, {}".format(t_latitude, t_longitude))

  This is separate from the ipykernel package so we can avoid doing imports until


Latitude and Logitude of Toronto, Ontario are: 43.653963, -79.387207


### Visualize the Neighborhoods

The following code is used to visulaize different neighborhods in Toronto. You can click on a marker to see the name of neighborhood as popup.

In [26]:
import folium

In [30]:
toronto_map = folium.Map(location=[t_latitude, t_longitude], zoom_start=10)

for lat, lng, neighborhood in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood']):
    label = folium.Popup(neighborhood, parse_html=True)
    folium.CircleMarker(
    [lat, lng],
    radius = 5,
    popup = label,
    color= 'red', 
    fill = True,
    fill_color = '#3186cc', 
    fill_opacity = 0.5,
    parse_html = False).add_to(toronto_map)

toronto_map

#### Define Foursquare API Credentials

In [34]:
CLIENT_ID = 'RWINEW5YS0D3ONXTR4G1RPH2PAEQRTNFWMSFA001KFW1LGSB'
CLIENT_SECRET = 'QQYGBVSIM1YSVXAO3ENP4NN2VJKIDICEKJSLWICYW0RDGPQ3'
VERSION = '20180605'

print('Credentials: \nClient ID: {}\nClient Secret: {}\nVersion: {}'.format(CLIENT_ID, CLIENT_SECRET, VERSION))

Credentials: 
Client ID: RWINEW5YS0D3ONXTR4G1RPH2PAEQRTNFWMSFA001KFW1LGSB
Client Secret: QQYGBVSIM1YSVXAO3ENP4NN2VJKIDICEKJSLWICYW0RDGPQ3
Version: 20180605


### Let's start exploring the first neighborhood in toronto_merged DataFrame

In [35]:
neighborhood_name = toronto_merged.loc[0, 'Neighborhood']
neigh_lat = toronto_merged.loc[0, 'Latitude']
neigh_lon = toronto_merged.loc[0, 'Longitude']

print('Latitude and Longitude of {} are {}, {}'.format(neighborhood_name, neigh_lat, neigh_lon))

Latitude and Longitude of Rouge, Malvern are 43.806686299999996, -79.19435340000001


In [36]:
radius = 500
LIMIT = 100

uri = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        neigh_lat, neigh_lon,
        radius,
        LIMIT)

print(uri)

https://api.foursquare.com/v2/venues/explore?&client_id=RWINEW5YS0D3ONXTR4G1RPH2PAEQRTNFWMSFA001KFW1LGSB&client_secret=QQYGBVSIM1YSVXAO3ENP4NN2VJKIDICEKJSLWICYW0RDGPQ3&v=20180605&ll=43.806686299999996,-79.19435340000001&radius=500&limit=100


In [37]:
results = requests.get(uri).json()
results

{'meta': {'code': 200, 'requestId': '5d3034e5787dba0038c6b803'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4bb6b9446edc76b0d771311c-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/fastfood_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d16e941735',
         'name': 'Fast Food Restaurant',
         'pluralName': 'Fast Food Restaurants',
         'primary': True,
         'shortName': 'Fast Food'}],
       'id': '4bb6b9446edc76b0d771311c',
       'location': {'cc': 'CA',
        'city': 'Toronto',
        'country': 'Canada',
        'crossStreet': 'Morningside & Sheppard',
        'distance': 387,
        'formattedAddress': ['Toronto ON', 'Canada'],
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.80744841934756,
          'ln

Function to get category type

In [38]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [47]:
from pandas.io.json import json_normalize
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Wendy's,Fast Food Restaurant,43.807448,-79.199056
1,Interprovincial Group,Print Shop,43.80563,-79.200378


We have successfully gathered the nearby venues of one neighborhood. Now let's implement the same logic to fetch the data for all neighborhoods in Toronto

In [53]:
def get_nearby_venues(names, latitude, longitude, radius=500, LIMIT=100):
    
    venues_list = []
    for name,  lat, long in zip(names, latitude, longitude):
        print('Processing Neighborhood: ', name)
        url = "https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat, long,
        radius,
        LIMIT)

        results = requests.get(url).json()['response']['groups'][0]['items']

        venues_list.append([(
        name,
        lat,
        long,
        v['venue']['name'],
        v['venue']['location']['lat'],
        v['venue']['location']['lng'],
        v['venue']['categories'][0]['name']) for v in results])

        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns=['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']

        print('Done: ', name)
    
    return nearby_venues

In [181]:
toronto_venues = get_nearby_venues(names = toronto_merged['Neighborhood'], 
                                   latitude = toronto_merged['Latitude'], 
                                   longitude = toronto_merged['Longitude'],
                                  radius=700)

Processing Neighborhood:  Rouge, Malvern
Done:  Rouge, Malvern
Processing Neighborhood:  Highland Creek, Rouge Hill, Port Union
Done:  Highland Creek, Rouge Hill, Port Union
Processing Neighborhood:  Guildwood, Morningside, West Hill
Done:  Guildwood, Morningside, West Hill
Processing Neighborhood:  Woburn
Done:  Woburn
Processing Neighborhood:  Cedarbrae
Done:  Cedarbrae
Processing Neighborhood:  Scarborough Village
Done:  Scarborough Village
Processing Neighborhood:  East Birchmount Park, Ionview, Kennedy Park
Done:  East Birchmount Park, Ionview, Kennedy Park
Processing Neighborhood:  Clairlea, Golden Mile, Oakridge
Done:  Clairlea, Golden Mile, Oakridge
Processing Neighborhood:  Cliffcrest, Cliffside, Scarborough Village West
Done:  Cliffcrest, Cliffside, Scarborough Village West
Processing Neighborhood:  Birch Cliff, Cliffside West
Done:  Birch Cliff, Cliffside West
Processing Neighborhood:  Dorset Park, Scarborough Town Centre, Wexford Heights
Done:  Dorset Park, Scarborough Town

Done:  Kingsway Park South West, Mimico NW, The Queensway West, Royal York South West, South of Bloor
Processing Neighborhood:  Islington Avenue
Done:  Islington Avenue
Processing Neighborhood:  Cloverdale, Islington, Martin Grove, Princess Gardens, West Deane Park
Done:  Cloverdale, Islington, Martin Grove, Princess Gardens, West Deane Park
Processing Neighborhood:  Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe
Done:  Bloordale Gardens, Eringate, Markland Wood, Old Burnhamthorpe
Processing Neighborhood:  Humber Summit
Done:  Humber Summit
Processing Neighborhood:  Emery, Humberlea
Done:  Emery, Humberlea
Processing Neighborhood:  Weston
Done:  Weston
Processing Neighborhood:  Westmount
Done:  Westmount
Processing Neighborhood:  Kingsview Village, Martin Grove Gardens, Richview Gardens, St. Phillips
Done:  Kingsview Village, Martin Grove Gardens, Richview Gardens, St. Phillips
Processing Neighborhood:  Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive

Let's take a look at the DataFrame

In [182]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Rouge, Malvern",43.806686,-79.194353,Images Salon & Spa,43.802283,-79.198565,Spa
1,"Rouge, Malvern",43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
2,"Rouge, Malvern",43.806686,-79.194353,Wendy's,43.802008,-79.19808,Fast Food Restaurant
3,"Rouge, Malvern",43.806686,-79.194353,Tim Hortons,43.802,-79.198169,Coffee Shop
4,"Rouge, Malvern",43.806686,-79.194353,Lee Valley,43.803161,-79.199681,Hobby Shop


Let's see the number of venues per Neighborhood

In [183]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
Agincourt,8,8,8,8,8,8
"Agincourt North, L'Amoreaux East, Milliken, Steeles East",15,15,15,15,15,15
"Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown",10,10,10,10,10,10
"Alderwood, Long Branch",13,13,13,13,13,13
"Bathurst Manor, Downsview North, Wilson Heights",21,21,21,21,21,21
Bayview Village,8,8,8,8,8,8
"Bedford Park, Lawrence Manor East",30,30,30,30,30,30
Berczy Park,100,100,100,100,100,100
"Birch Cliff, Cliffside West",9,9,9,9,9,9


### Analyse Each Naighborhood

We will perform One Hot Encoding and then take the mean of the number of different types of venues in each neighborhood so we can calculate the most popular types of venues in each neighborhood.

In [184]:
toronto_one_hot = pd.get_dummies(toronto_venues['Venue Category'])
toronto_one_hot.drop('Neighborhood', axis=1, inplace=True)
toronto_one_hot['Neighborhood'] = toronto_venues['Neighborhood']
category_columns = [toronto_one_hot.columns[-1]] + list(toronto_one_hot.columns[:-1])
toronto_one_hot = toronto_one_hot[category_columns]

toronto_venues_grouped = toronto_one_hot.groupby('Neighborhood').mean().reset_index()
toronto_venues_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Agincourt North, L'Amoreaux East, Milliken, St...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Top 10 venues in each neighborhood

In [185]:
def get_most_common(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending = False)
    return row_categories_sorted.index.values[0:num_top_venues]

In [186]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
        
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_venues_grouped['Neighborhood']

for ind in np.arange(toronto_venues_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = get_most_common(toronto_venues_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted
        

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Bar,Steakhouse,Theater,Restaurant,Cosmetics Shop,Sushi Restaurant,Asian Restaurant,Hotel
1,Agincourt,Badminton Court,Clothing Store,Lounge,Pool Hall,Shanghai Restaurant,Breakfast Spot,Motorcycle Shop,Sandwich Place,Yoga Studio,Dog Run
2,"Agincourt North, L'Amoreaux East, Milliken, St...",Pizza Place,BBQ Joint,Fast Food Restaurant,Chinese Restaurant,Pharmacy,Gym,Malay Restaurant,Park,Shop & Service,Noodle House
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",Grocery Store,Hardware Store,Pizza Place,Fast Food Restaurant,Beer Store,Fried Chicken Joint,Sandwich Place,Coffee Shop,Pharmacy,Comfort Food Restaurant
4,"Alderwood, Long Branch",Pizza Place,Convenience Store,Pharmacy,Pool,Athletics & Sports,Gas Station,Skating Rink,Sandwich Place,Pub,Gym
5,"Bathurst Manor, Downsview North, Wilson Heights",Coffee Shop,Park,Community Center,Sandwich Place,Sushi Restaurant,Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Supermarket,Diner
6,Bayview Village,Bank,Skate Park,Café,Grocery Store,Skating Rink,Japanese Restaurant,Chinese Restaurant,Donut Shop,Diner,Discount Store
7,"Bedford Park, Lawrence Manor East",Coffee Shop,Italian Restaurant,Pizza Place,Liquor Store,Thai Restaurant,Bagel Shop,Bakery,Sushi Restaurant,Juice Bar,Fast Food Restaurant
8,Berczy Park,Coffee Shop,Café,Restaurant,Hotel,Beer Bar,Park,Bakery,Cocktail Bar,Seafood Restaurant,Japanese Restaurant
9,"Birch Cliff, Cliffside West",College Stadium,Café,Diner,Discount Store,Park,Bank,General Entertainment,Skating Rink,Thai Restaurant,Dog Run


### Clustering of Neighborhoods

We will cluster the neighborhoods in toronto by using k-means clustering into 7 clusters.

In [187]:
from sklearn.cluster import KMeans

In [188]:
kclusters = 7

toronto_clustering = toronto_venues_grouped.drop('Neighborhood', axis=1)

kmeans = KMeans(n_clusters = kclusters, random_state=0).fit(toronto_clustering)

kmeans.labels_[0:10]

array([0, 3, 6, 6, 0, 0, 0, 0, 0, 0])

Adding Cluster labels to neighborhoods_venues_sorted

In [189]:
neighborhoods_venues_sorted['Cluster Labels'] = kmeans.labels_

toronto_final = toronto_merged

#merge final data with most common venues
toronto_final = toronto_final.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
toronto_final['Cluster Labels'].fillna(7, inplace=True)
toronto_final['Cluster Labels'] = toronto_final['Cluster Labels'].astype(int)
toronto_final.head()

Unnamed: 0,PostCode,Borough,Neighborhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,Fast Food Restaurant,Coffee Shop,Hobby Shop,Spa,Construction & Landscaping,Business Service,Yoga Studio,Donut Shop,Dim Sum Restaurant,Diner,6
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Breakfast Spot,Bar,Burger Joint,Yoga Studio,Discount Store,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Drugstore,2
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Fast Food Restaurant,Electronics Store,Beer Store,Moving Target,Intersection,Bus Line,Fried Chicken Joint,Thrift / Vintage Store,Rental Car Location,Sports Bar,0
3,M1G,Scarborough,Woburn,43.770992,-79.216917,Coffee Shop,Park,Convenience Store,Business Service,Drugstore,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dog Run,1
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,Coffee Shop,Indian Restaurant,Bakery,Thai Restaurant,Flower Shop,Fried Chicken Joint,Caribbean Restaurant,Athletics & Sports,Asian Restaurant,Rental Car Location,3


### Visualisation of Clusters

In [190]:
import matplotlib.cm as cm
import matplotlib.colors as colors

In [191]:
clustered_map = folium.Map(location=[t_latitude, t_longitude], zoom_start=10)

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]

colors_array = cm.rainbow(np.linspace(0,1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

for lat, lng, neighborhood, cluster in zip(toronto_final['Latitude'], toronto_final['Longitude'], toronto_final['Neighborhood'], toronto_final['Cluster Labels']):
    label = folium.Popup(str(neighborhood) + " Cluster: " + str(cluster), parse_html=True)
    folium.CircleMarker(
    [lat, lng],
    radius=5,
    popup=label,
    color=rainbow[int(cluster)-1],
    fill=True,
    fill_color=rainbow[int(cluster)-1],
    fill_opacity=0.5).add_to(clustered_map)
    
clustered_map

Now we have succesfully clustered Toronto, Ontario into 7 different clusters having similar neighborhoods.