Segmenting and Clustering Neighbourhoods in Toronto

import necessary libraries

In [99]:
!conda install -c conda-forge beautifulsoup4 --yes

!conda install -c conda-forge geopy --yes

!conda install -c conda-forge folium=0.5.0 --yes
print("installed")

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - beautifulsoup4


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    beautifulsoup4-4.8.2       |           py36_0         157 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following packages will be UPDATED:

    beautifulsoup4:  4.7.1-py36_1      --> 4.8.2-py36_0          conda-forge
    ca-certificates: 2019.11.27-0      --> 2019.11.28-hecc5488_0 conda-forge
    certifi:         2019.11.28-py36_0 --> 2019.11.28-py36_0     conda-f

In [130]:
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup
import requests
from pandas.io.json import json_normalize
import json

import folium

from geopy.geocoders import Nominatim


import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

Prepare web scraping code by utilizing BeautifulSoup

In [114]:
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
source = requests.get(url).content
content = BeautifulSoup(requests.get(url).content, 'lxml')

Create required dataframe

In [115]:
table = content.find('table')
td = table.find_all('td')
postcode = []
borough = []
neighbourhood = []

Create a list with scraped data

In [116]:
for i in range(0, len(td), 3):
    postcode.append(td[i].text.strip())
    borough.append(td[i+1].text.strip())
    neighbourhood.append(td[i+2].text.strip())

Create an actual DataFrame with the lists previously scraped and give the columns names

In [117]:
df_codes = pd.DataFrame(data=[postcode, borough, neighbourhood]).transpose()
df_codes.columns = ['Postal Code', 'Borough', 'Neighborhood']

Cleaning - Ignore cells with a borough that is Not assigned; If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough

In [118]:
df_codes['Borough'].replace('Not assigned', np.nan, inplace=True)
df_codes.dropna(subset=['Borough'], inplace=True)
df_codes['Neighborhood'].replace('Not assigned', "Queen's Park", inplace=True)

Cleaning - More than one neighborhood can exist in one postal code area. Combine rows into one row with the neighborhoods separated with a comma

In [119]:
df_codes = df_codes.groupby(['Postal Code', 'Borough'])['Neighborhood'].apply(', '.join).reset_index()
df_codes.columns = ['Postal Code', 'Borough', 'Neighborhood']
df_codes.head(12)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


Use the .shape method to print the number of rows of your dataframe.

In [120]:
df_codes.shape

(103, 3)

Part 2 -
Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.

In an older version of this course, we were leveraging the Google Maps Geocoding API to get the latitude and the longitude coordinates of each neighborhood. However, recently Google started charging for their API: http://geoawesomeness.com/developers-up-in-arms-over-google-maps-api-insane-price-hike/, so we will use the Geocoder Python package instead: https://geocoder.readthedocs.io/index.html.

The problem with this Package is you have to be persistent sometimes in order to get the geographical coordinates of a given postal code. So you can make a call to get the latitude and longitude coordinates of a given postal code and the result would be None, and then make the call again and you would get the coordinates. So, in order to make sure that you get the coordinates for all of our neighborhoods, you can run a while loop for each postal code. Taking postal code M5G as an example, your code would look something like this:

Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.

In an older version of this course, we were leveraging the Google Maps Geocoding API to get the latitude and the longitude coordinates of each neighborhood. However, recently Google started charging for their API: http://geoawesomeness.com/developers-up-in-arms-over-google-maps-api-insane-price-hike/, so we will use the Geocoder Python package instead: https://geocoder.readthedocs.io/index.html.

The problem with this Package is you have to be persistent sometimes in order to get the geographical coordinates of a given postal code. So you can make a call to get the latitude and longitude coordinates of a given postal code and the result would be None, and then make the call again and you would get the coordinates. So, in order to make sure that you get the coordinates for all of our neighborhoods, you can run a while loop for each postal code. Taking postal code M5G as an example, your code would look something like this:

Given that this package can be very unreliable, in case you are not able to get the geographical coordinates of the neighborhoods using the Geocoder package, here is a link to a csv file that has the geographical coordinates of each postal code: http://cocl.us/Geospatial_data

Use the Geocoder package or the csv file to create the following dataframe

In [121]:
!wget -O Geospatial_Coordinates.csv http://cocl.us/Geospatial_data
print('Coordinates downloaded!')
CO = pd.read_csv('Geospatial_Coordinates.csv')

--2020-02-12 15:59:38--  http://cocl.us/Geospatial_data
Resolving cocl.us (cocl.us)... 169.48.113.194, 158.85.108.86, 158.85.108.83
Connecting to cocl.us (cocl.us)|169.48.113.194|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://cocl.us/Geospatial_data [following]
--2020-02-12 15:59:38--  https://cocl.us/Geospatial_data
Connecting to cocl.us (cocl.us)|169.48.113.194|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2020-02-12 15:59:40--  https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv
Resolving ibm.box.com (ibm.box.com)... 107.152.26.197, 107.152.27.197
Connecting to ibm.box.com (ibm.box.com)|107.152.26.197|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /public/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2020-02-12 15:59:41--  https://

In [122]:
print(CO.shape)
CO.head()

(103, 3)


Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Merge the 2 dataframes

In [123]:
toronto_df_temp = df_codes.set_index('Postal Code')
coors_temp = CO.set_index('Postal Code')
toronto_df_coors = pd.concat([toronto_df_temp, coors_temp], axis=1, join='inner')

toronto_df_coors.index.name = 'Postal Code'
toronto_df_coors.reset_index(inplace=True)

print(toronto_df_coors.shape)
toronto_df_coors.head()

(103, 5)


Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


Part 3 - Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you.

Just make sure:

to add enough Markdown cells to explain what you decided to do and to report any observations you make.
to generate maps to visualize your neighborhoods and how they cluster together.
Once you are happy with your analysis, submit a link to the new Notebook on your Github repository.

Using Folium to create a Map of Toronto with Boroughs markers on top

In [131]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="tl-toronto-neigh")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Toronto are 43.653963, -79.387207.


In [132]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, long, post, borough, neigh in zip(toronto_df_coors['Latitude'], toronto_df_coors['Longitude'], toronto_df_coors['Postal Code'], toronto_df_coors['Borough'], toronto_df_coors['Neighborhood']):
    label = "{} ({}): {}".format(borough, post, neigh)
    popup = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=popup,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)
    
map_toronto

Reduce the number of Boroughs to explore To reduce the numbers of calls to FourSquare API, we will only explore boroughs that have Toronto in their names.

In [133]:
toronto_boroughs = ['East Toronto', 'Central Toronto', 'Downtown Toronto', 'West Toronto']
toronto_central_df = toronto_df_coors[toronto_df_coors['Borough'].isin(toronto_boroughs)].reset_index(drop=True)
print(toronto_central_df.shape)
toronto_central_df.head()

(39, 5)


Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


In [135]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=12)

for lat, long, post, borough, neigh in zip(toronto_central_df['Latitude'], toronto_central_df['Longitude'], toronto_central_df['Postal Code'], toronto_central_df['Borough'], toronto_central_df['Neighborhood']):
    label = "{} ({}): {}".format(borough, post, neigh)
    popup = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=popup,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)
    
map_toronto

Using FourSquare API to explore the Boroughs

In [136]:
CLIENT_ID = '25NULFPYBI2ZOPL1SVEVVI3OTDJMPH0ASX3GS2AEDETVQL0A' # your Foursquare ID
CLIENT_SECRET = '1DKEEM12K1KJ1LIGC4OZSIF4H5VS33AD3HH3PHB2BSCCCF3W' # your Foursquare Secret
VERSION = '20190330' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)


Your credentails:
CLIENT_ID: 25NULFPYBI2ZOPL1SVEVVI3OTDJMPH0ASX3GS2AEDETVQL0A
CLIENT_SECRET:1DKEEM12K1KJ1LIGC4OZSIF4H5VS33AD3HH3PHB2BSCCCF3W


In [138]:
radius = 500
LIMIT = 100

venues = []

for lat, long, post, borough, neighborhood in zip(toronto_central_df['Latitude'], toronto_central_df['Longitude'], toronto_central_df['Postal Code'], toronto_central_df['Borough'], toronto_central_df['Neighborhood']):
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    for venue in results:
        venues.append((
            post, 
            borough,
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [139]:
venues_df = pd.DataFrame(venues)
venues_df.columns = ['Postal Code', 'Borough', 'Neighborhood', 'BoroughLatitude', 'BoroughLongitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']
print(venues_df.shape)
venues_df.head()

(1705, 9)


Unnamed: 0,Postal Code,Borough,Neighborhood,BoroughLatitude,BoroughLongitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,M4E,East Toronto,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,M4E,East Toronto,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,M4E,East Toronto,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant


In [141]:
venues_df.groupby(['Postal Code', 'Borough', 'Neighborhood'])['VenueName'].count()

Postal Code  Borough           Neighborhood                                                                                              
M4E          East Toronto      The Beaches                                                                                                     4
M4K          East Toronto      The Danforth West, Riverdale                                                                                   42
M4L          East Toronto      The Beaches West, India Bazaar                                                                                 18
M4M          East Toronto      Studio District                                                                                                41
M4N          Central Toronto   Lawrence Park                                                                                                   4
M4P          Central Toronto   Davisville North                                                                                          

In [142]:
len(venues_df['VenueCategory'].unique())

229

Analyse venues in areas

In [145]:
# one hot encoding
toronto_central_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add postal, borough and neighborhood column back to dataframe
toronto_central_onehot['Postal Code'] = venues_df['Postal Code'] 
toronto_central_onehot['Borough'] = venues_df['Borough'] 
toronto_central_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move postal, borough and neighborhood column to the first column
fixed_columns = list(toronto_central_onehot.columns[-3:]) + list(toronto_central_onehot.columns[:-3])
toronto_central_onehot = toronto_central_onehot[fixed_columns]

print(toronto_central_onehot.shape)
toronto_central_onehot.head()

(1705, 232)


Unnamed: 0,Postal Code,Borough,Neighborhoods,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,M4E,East Toronto,The Beaches,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0
1,M4E,East Toronto,The Beaches,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M4E,East Toronto,The Beaches,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M4E,East Toronto,The Beaches,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M4K,East Toronto,"The Danforth West, Riverdale",0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [150]:
toronto_central_venues_freq = toronto_central_onehot.groupby(['Postal Code', 'Borough', 'Neighborhoods']).mean().reset_index()
print(toronto_central_venues_freq.shape)
toronto_central_venues_freq.head()

(39, 232)


Unnamed: 0,Postal Code,Borough,Neighborhoods,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,M4E,East Toronto,The Beaches,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M4K,East Toronto,"The Danforth West, Riverdale",0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.02381
2,M4L,East Toronto,"The Beaches West, India Bazaar",0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M4M,East Toronto,Studio District,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.02439
4,M4N,Central Toronto,Lawrence Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


10 most frequently occuring venus

In [158]:
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
areaColumns = ['PostalCode', 'Borough', 'Neighborhoods']
freqColumns = []
for ind in np.arange(num_top_venues):
    try:
        freqColumns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        freqColumns.append('{}th Most Common Venue'.format(ind+1))
columns = areaColumns+freqColumns
# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Postal Code'] = toronto_central_venues_freq['Postal Code']
neighborhoods_venues_sorted['Borough'] = toronto_central_venues_freq['Borough']
neighborhoods_venues_sorted['Neighborhoods'] = toronto_central_venues_freq['Neighborhoods']

for ind in np.arange(toronto_central_venues_freq.shape[0]):
    row_categories = toronto_central_venues_freq.iloc[ind, :].iloc[3:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    neighborhoods_venues_sorted.iloc[ind, 3:] = row_categories_sorted.index.values[0:12]

neighborhoods_venues_sorted.sort_values(freqColumns, inplace=True)
neighborhoods_venues_sorted

Unnamed: 0,PostalCode,Borough,Neighborhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,Postal Code
27,,Downtown Toronto,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Service,Airport Terminal,Airport Lounge,Boat or Ferry,Sculpture Garden,Rental Car Location,Coffee Shop,Harbor / Marina,Airport Gate,Airport Food Court,Airport,Bar
26,,Downtown Toronto,"Chinatown, Grange Park, Kensington Market",Bar,Café,Coffee Shop,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Chinese Restaurant,Dumpling Restaurant,Mexican Restaurant,Cocktail Bar,Dessert Shop,Burger Joint,Donut Shop
32,,West Toronto,"Little Portugal, Trinity",Bar,Restaurant,Asian Restaurant,Vietnamese Restaurant,Coffee Shop,Men's Store,Pizza Place,Café,Yoga Studio,Portuguese Restaurant,Boutique,Brewery
33,,West Toronto,"Brockton, Exhibition Place, Parkdale Village",Breakfast Spot,Café,Nightclub,Coffee Shop,Yoga Studio,Gym,Pet Store,Performing Arts Venue,Office,Italian Restaurant,Intersection,Convenience Store
3,,East Toronto,Studio District,Café,Coffee Shop,Gastropub,Bakery,Brewery,Italian Restaurant,American Restaurant,Yoga Studio,Comfort Food Restaurant,Sandwich Place,Cheese Shop,Pet Store
36,,West Toronto,"Runnymede, Swansea",Café,Coffee Shop,Pizza Place,Sushi Restaurant,Italian Restaurant,Restaurant,Juice Bar,Pub,Sandwich Place,Bookstore,Fish Market,Post Office
25,,Downtown Toronto,"Harbord, University of Toronto",Café,Japanese Restaurant,Bookstore,Sandwich Place,Restaurant,Bakery,Bar,Chinese Restaurant,Dessert Shop,Coffee Shop,College Arts Building,Pub
6,,Central Toronto,North Toronto West,Clothing Store,Coffee Shop,Yoga Studio,Gym / Fitness Center,Salon / Barbershop,Restaurant,Rental Car Location,Park,Mexican Restaurant,Metro Station,Furniture / Home Store,Sporting Goods Shop
19,,Downtown Toronto,"Harbourfront East, Toronto Islands, Union Station",Coffee Shop,Aquarium,Hotel,Italian Restaurant,Café,Scenic Lookout,Brewery,Sporting Goods Shop,Restaurant,Fried Chicken Joint,Plaza,Pizza Place
18,,Downtown Toronto,"Adelaide, King, Richmond",Coffee Shop,Café,Bar,Steakhouse,Thai Restaurant,Restaurant,Burger Joint,Cosmetics Shop,Breakfast Spot,Bakery,Asian Restaurant,Seafood Restaurant


cluster into 4 clusters using kmeans

In [162]:
kclusters = 4

toronto_central_venues_freq_clustering = toronto_central_venues_freq.drop(['Postal Code', 'Borough', 'Neighborhoods'], 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_central_venues_freq_clustering)

toronto_central_clustered_df = toronto_central_df
toronto_central_clustered_df['Cluster'] = kmeans.labels_

toronto_central_clustered_df = toronto_central_clustered_df.join(neighborhoods_venues_sorted.drop(['Borough', 'Neighborhoods'], 1).set_index('Postal Code'), on='Postal Code')
toronto_central_clustered_df.sort_values(['Cluster'] + freqColumns, inplace=True)
toronto_central_clustered_df

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,0,,,,,,,,,,,,
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572,0,,,,,,,,,,,,
3,M4M,East Toronto,Studio District,43.659526,-79.340923,0,,,,,,,,,,,,
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,0,,,,,,,,,,,,
5,M4P,Central Toronto,Davisville North,43.712751,-79.390197,0,,,,,,,,,,,,
6,M4R,Central Toronto,North Toronto West,43.715383,-79.405678,0,,,,,,,,,,,,
7,M4S,Central Toronto,Davisville,43.704324,-79.38879,0,,,,,,,,,,,,
9,M4V,Central Toronto,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",43.686412,-79.400049,0,,,,,,,,,,,,
11,M4X,Downtown Toronto,"Cabbagetown, St. James Town",43.667967,-79.367675,0,,,,,,,,,,,,
12,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316,0,,,,,,,,,,,,


Create map

In [163]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, post, bor, poi, cluster in zip(toronto_central_clustered_df['Latitude'], toronto_central_clustered_df['Longitude'], toronto_central_clustered_df['PostalCode'], toronto_central_clustered_df['Borough'], toronto_central_clustered_df['Neighborhood'], toronto_central_clustered_df['Cluster']):
    label = folium.Popup('{} ({}): {} - Cluster {}'.format(bor, post, poi, cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters