### This notebook will work through the steps necessary to gather Toronto neighborhood data, identify geo coordinates, and visualize clustered areas within the Toronto region as part of the Applied Data Science Capstone project assignment on Coursera

#### Isaac Injeti - Jan 9th, 2019

Step 1: Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe

Step 2: To create the dataframe:

The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.

If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.

Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.

Step 3: Submit a link to your Notebook on your Github repository. (10 marks)

In [1]:
#Import necessary libraries and packages
import requests
import pandas as pd

In [2]:
#Set url to wikipedia site where the html table is located
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

In [3]:
#Use Pandas' built-in 'read_html' function to load the table data from html into a dataframe
df = pd.read_html(url)

#set our df = to the first table loaded from the html
df = df[0].dropna(axis=0, thresh=0)

#set the names of the columns based on the first row in the dataframe and delete the original index
df.columns = df.iloc[0]
df = df.reindex(df.index.drop(0))
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Harbourfront


In [4]:
#Now drop the rows where a Borough is not assigned
df = df.drop(df[df.Borough == 'Not assigned'].index)

In [5]:
#Filter dataframe on any rows with Neighbourhood as 'Not assigned'
df[df.Neighbourhood=='Not assigned']

Unnamed: 0,Postcode,Borough,Neighbourhood
9,M7A,Queen's Park,Not assigned


In [6]:
#set value of row to Borough instead of "Not assigned"
df.Neighbourhood[9]=df.Borough[9]

#check row to validate Neighbourhood was updated as expected.
df.Neighbourhood[9]

"Queen's Park"

In [7]:
#Filter dataframe again on any rows with Neighbourhood having 'Not assigned' to confirm all are clear
df[df.Neighbourhood=='Not assigned']

Unnamed: 0,Postcode,Borough,Neighbourhood


In [8]:
#Next before we loop through duplicate Boroughs and consolidate the Neighbourhoods, we will first sort and reset index
df.sort_values('Postcode',inplace=True)
df.reset_index(drop=True, inplace=True)

In [9]:
#Now we will loop through each row in the df and append the Neighbourhood to the contents of the duplicate row
for index, row in df.iterrows():
    if index > 0:
        if df.at[index-1,'Postcode'] == row['Postcode']:
            row['Neighbourhood'] = row['Neighbourhood']+', '+df.at[index-1,'Neighbourhood']
    else:
        pass

In [10]:
#The loop above will return appended string of each duplicate in the last row for each unique Postcode
#To get the df to our desired final state we need to clean up the duplicates keeping only the last row for each code
df.drop_duplicates(subset='Postcode',keep='last',inplace=True)
df.reset_index(drop=True, inplace=True)
df.shape

(103, 3)

### Part 1 - Dataframe

In [11]:
#Here is our final prepared dataframe ready for the next step.
df.head(15)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"West Hill, Morningside, Guildwood"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park"
7,M1L,Scarborough,"Clairlea, Oakridge, Golden Mile"
8,M1M,Scarborough,"Cliffside, Scarborough Village West, Cliffcrest"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


### Part 2 - Create dataframe with geo coordinates for each Postcode

In [12]:
#Note: ran the code below and ran over query limit. Hence using csv provided.

#Prior installation of geocoder may be required
import geocoder

for pc in df.Postcode:
    #initialize your variable to None
    lat_lng_coords = None
    #loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.google('{}, Toronto, Ontario'.format(pc))
        lat_lng_coords = g.latlng
    latitude = lat_lng_coords[0]
    longitude = lat_lng_coords[1]
    
g
#<[OVER_QUERY_LIMIT] Google - Geocode [empty]>

In [13]:
#Downloaded csv and created dataframe using pandas
geocsv = pd.read_csv('C:\\Users\\IsaacInjeti\\Desktop\\New Desktop Files\\Projects\\Data Science\\DS-Capstone\\Geospatial_Coordinates.csv')

In [14]:
geocsv.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [15]:
#Check number of rows and columns for validation
geocsv.shape

(103, 3)

In [16]:
#rename column to match for merge
geocsv.rename(columns = {'Postal Code':'Postcode'}, inplace = True)
geocsv.head()

Unnamed: 0,Postcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [17]:
#Create a merged dataframe named areas_df based on Postcode
areas_df = df.merge(geocsv, on='Postcode', how='left')
areas_df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"West Hill, Morningside, Guildwood",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [18]:
areas_df.Borough.unique()

array(['Scarborough', 'North York', 'East York', 'East Toronto',
       'Central Toronto', 'Downtown Toronto', 'York', 'West Toronto',
       "Queen's Park", 'Mississauga', 'Etobicoke'], dtype=object)

### Import necessary packages for map exploration of boroughs

In [19]:
import json
import folium
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors
from geopy.geocoders import Nominatim
from sklearn.cluster import KMeans
from pandas.io.json import json_normalize

In [20]:
#assign string of Toronto to query coordinates from geocode
address = 'Toronto, CA'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

  after removing the cwd from sys.path.


In [21]:
mapToronto = folium.Map(location=[latitude, longitude], zoom_start=11)
mapToronto

In [22]:
#First pass of identifying coordinates for unique boroughs and plotting on a map
boroughs = areas_df.drop_duplicates(subset='Borough',keep='first')
boroughs.reset_index(drop=True,inplace=True)

boroughs.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M2H,North York,Hillcrest Village,43.803762,-79.363452
2,M4B,East York,"Woodbine Gardens, Parkview Hill",43.706397,-79.309937
3,M4E,East Toronto,The Beaches,43.676357,-79.293031
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


In [23]:
mapToronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough in zip(boroughs['Latitude'], boroughs['Longitude'], boroughs['Borough']):
    label = folium.Popup(borough, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        popup=label,
        radius=5,
        color='blue',
        fill=True,
        fill_opacity=0.5).add_to(mapToronto)
    
mapToronto

### Part 3 - Foursquare venue exploration and data gathering

In [24]:
CLIENT_ID = '3FCZRWQWNF1PLVXA5F52SVSIARXY1L5WFRJEFXBRBJDP4UQC' 
CLIENT_SECRET = 'WEPSXIJMULZIEUYLUENDBWEVO2YODLQ5WMKW4FHVS5AYYKYJ' 
VERSION = '20190110'

In [25]:
bor_lat = boroughs.loc[0, 'Latitude']
bor_lng = boroughs.loc[0, 'Longitude']

bor_name = boroughs.loc[0, 'Borough']

print('Latitude and longitude values of {} are {}, {}.'.format(bor_name, 
                                                               bor_lat, 
                                                               bor_lng))

Latitude and longitude values of Scarborough are 43.806686299999996, -79.19435340000001.


In [26]:
LIMIT = 100
radius = 500

fs_url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, bor_lat, bor_lng, VERSION, radius, LIMIT)


In [27]:
results = requests.get(fs_url).json()
results.keys()

dict_keys(['meta', 'response'])

In [28]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [29]:
venues = results['response']['venues']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['name', 'categories', 'location.lat', 'location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng,categories.1
0,Kaycan - SCARBOROUGH,"[{'id': '4bf58dd8d48988d130941735', 'name': 'B...",43.806768,-79.197875,Building
1,Frito Lay,"[{'id': '4eb1bea83b7b6f98df247e06', 'name': 'F...",43.803844,-79.194841,Factory
2,Alvin Curling Public School,"[{'id': '4f4533804b9074f6e4fb0105', 'name': 'E...",43.808683,-79.190103,Elementary School
3,Shell,"[{'id': '4bf58dd8d48988d113951735', 'name': 'G...",43.803227,-79.192414,Gas Station
4,Cascades (Metro Waste),"[{'id': '4bf58dd8d48988d130941735', 'name': 'B...",43.807494,-79.195073,Building


In [30]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough', 
                  'Borough Latitude', 
                  'Borough Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [31]:
TorBor_vens = getNearbyVenues(names=boroughs['Borough'],
                                   latitudes=boroughs['Latitude'],
                                   longitudes=boroughs['Longitude'])

Scarborough
North York
East York
East Toronto
Central Toronto
Downtown Toronto
York
West Toronto
Queen's Park
Mississauga
Etobicoke


In [32]:
TorBor_vens.head()

Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Scarborough,43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,North York,43.803762,-79.363452,Eagle's Nest Golf Club,43.805455,-79.364186,Golf Course
2,North York,43.803762,-79.363452,AY Jackson Pool,43.804515,-79.366138,Pool
3,North York,43.803762,-79.363452,Villa Madina,43.801685,-79.363938,Mediterranean Restaurant
4,North York,43.803762,-79.363452,Duncan Creek Park,43.805539,-79.360695,Dog Run


In [33]:
# one hot encoding
OneHot_bors = pd.get_dummies(TorBor_vens[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
OneHot_bors['Borough'] = TorBor_vens['Borough']

# move neighborhood column to the first column
fixed_columns = [OneHot_bors.columns[-1]] + list(OneHot_bors.columns[:-1])
OneHot_bors = OneHot_bors[fixed_columns]

OneHot_bors.head()

Unnamed: 0,Borough,American Restaurant,Arts & Crafts Store,Athletics & Sports,Bakery,Bank,Bar,Brewery,Bubble Tea Shop,Burger Joint,...,Smoothie Shop,Spa,Supermarket,Sushi Restaurant,Swim School,Theater,Trail,Vegetarian / Vegan Restaurant,Wings Joint,Yoga Studio
0,Scarborough,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,North York,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,North York,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,North York,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,North York,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [34]:
torbors_grouped = OneHot_bors.groupby('Borough').mean().reset_index()
torbors_grouped

Unnamed: 0,Borough,American Restaurant,Arts & Crafts Store,Athletics & Sports,Bakery,Bank,Bar,Brewery,Bubble Tea Shop,Burger Joint,...,Smoothie Shop,Spa,Supermarket,Sushi Restaurant,Swim School,Theater,Trail,Vegetarian / Vegan Restaurant,Wings Joint,Yoga Studio
0,Central Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0
1,Downtown Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0
2,East Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,East York,0.0,0.0,0.076923,0.0,0.076923,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Etobicoke,0.066667,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Mississauga,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,North York,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Queen's Park,0.0,0.022727,0.0,0.0,0.0,0.022727,0.0,0.022727,0.022727,...,0.022727,0.022727,0.0,0.045455,0.0,0.022727,0.0,0.022727,0.022727,0.022727
8,Scarborough,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,West Toronto,0.0,0.0,0.0,0.105263,0.052632,0.052632,0.052632,0.0,0.0,...,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [35]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [36]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Borough']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
bors_venues_sorted = pd.DataFrame(columns=columns)
bors_venues_sorted['Borough'] = torbors_grouped['Borough']

for ind in np.arange(torbors_grouped.shape[0]):
    bors_venues_sorted.iloc[ind, 1:] = return_most_common_venues(torbors_grouped.iloc[ind, :], num_top_venues)

bors_venues_sorted

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Toronto,Park,Swim School,Bus Line,Dim Sum Restaurant,Yoga Studio,Field,Discount Store,Dog Run,Fast Food Restaurant,Fried Chicken Joint
1,Downtown Toronto,Park,Playground,Trail,Flower Shop,Diner,Discount Store,Dog Run,Fast Food Restaurant,Field,Fried Chicken Joint
2,East Toronto,Coffee Shop,Neighborhood,Gym / Fitness Center,Pub,Yoga Studio,Discount Store,Dog Run,Fast Food Restaurant,Field,Flower Shop
3,East York,Fast Food Restaurant,Pizza Place,Intersection,Rock Climbing Spot,Gastropub,Pet Store,Pharmacy,Café,Gym / Fitness Center,Athletics & Sports
4,Etobicoke,Café,American Restaurant,Sandwich Place,Fried Chicken Joint,Flower Shop,Liquor Store,Mexican Restaurant,Fast Food Restaurant,Pharmacy,Restaurant
5,Mississauga,Hotel,Coffee Shop,Burrito Place,Gym / Fitness Center,Mediterranean Restaurant,Middle Eastern Restaurant,Fried Chicken Joint,Sandwich Place,American Restaurant,Burger Joint
6,North York,Pool,Golf Course,Mediterranean Restaurant,Dog Run,Creperie,Gym / Fitness Center,Gym,General Entertainment,Gastropub,Fried Chicken Joint
7,Queen's Park,Coffee Shop,Gym,Sushi Restaurant,Diner,Japanese Restaurant,Yoga Studio,Creperie,Nightclub,Chinese Restaurant,College Auditorium
8,Scarborough,Fast Food Restaurant,Yoga Studio,Hotel,Hobby Shop,Gym / Fitness Center,Gym,Golf Course,General Entertainment,Gastropub,Fried Chicken Joint
9,West Toronto,Bakery,Supermarket,Pharmacy,Discount Store,Park,Pool,Music Venue,Bank,Bar,Brewery


In [37]:
# set number of clusters
kclusters = 5

torbors_clustering = torbors_grouped.drop('Borough', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(torbors_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 3, 0, 0, 0, 0, 4, 0, 1, 0])

In [38]:
boroughs_combined = boroughs

# add clustering labels
boroughs_combined['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
boroughs_combined = boroughs_combined.join(bors_venues_sorted.set_index('Borough'), on='Borough')

boroughs_combined.head() # check the last columns!

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.


Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,2,Fast Food Restaurant,Yoga Studio,Hotel,Hobby Shop,Gym / Fitness Center,Gym,Golf Course,General Entertainment,Gastropub,Fried Chicken Joint
1,M2H,North York,Hillcrest Village,43.803762,-79.363452,3,Pool,Golf Course,Mediterranean Restaurant,Dog Run,Creperie,Gym / Fitness Center,Gym,General Entertainment,Gastropub,Fried Chicken Joint
2,M4B,East York,"Woodbine Gardens, Parkview Hill",43.706397,-79.309937,0,Fast Food Restaurant,Pizza Place,Intersection,Rock Climbing Spot,Gastropub,Pet Store,Pharmacy,Café,Gym / Fitness Center,Athletics & Sports
3,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Coffee Shop,Neighborhood,Gym / Fitness Center,Pub,Yoga Studio,Discount Store,Dog Run,Fast Food Restaurant,Field,Flower Shop
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,0,Park,Swim School,Bus Line,Dim Sum Restaurant,Yoga Studio,Field,Discount Store,Dog Run,Fast Food Restaurant,Fried Chicken Joint


In [39]:
# create map
toronto_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(boroughs_combined['Latitude'], boroughs_combined['Longitude'], boroughs_combined['Borough'], boroughs_combined['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(toronto_clusters)
       
toronto_clusters

In [40]:
#label Cluster 0 by reviewing top venues
boroughs_combined.loc[boroughs_combined['Cluster Labels'] == 0, boroughs_combined.columns[[1] + list(range(5, boroughs_combined.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,East York,0,Fast Food Restaurant,Pizza Place,Intersection,Rock Climbing Spot,Gastropub,Pet Store,Pharmacy,Café,Gym / Fitness Center,Athletics & Sports
3,East Toronto,0,Coffee Shop,Neighborhood,Gym / Fitness Center,Pub,Yoga Studio,Discount Store,Dog Run,Fast Food Restaurant,Field,Flower Shop
4,Central Toronto,0,Park,Swim School,Bus Line,Dim Sum Restaurant,Yoga Studio,Field,Discount Store,Dog Run,Fast Food Restaurant,Fried Chicken Joint
5,Downtown Toronto,0,Park,Playground,Trail,Flower Shop,Diner,Discount Store,Dog Run,Fast Food Restaurant,Field,Fried Chicken Joint
7,West Toronto,0,Bakery,Supermarket,Pharmacy,Discount Store,Park,Pool,Music Venue,Bank,Bar,Brewery
9,Mississauga,0,Hotel,Coffee Shop,Burrito Place,Gym / Fitness Center,Mediterranean Restaurant,Middle Eastern Restaurant,Fried Chicken Joint,Sandwich Place,American Restaurant,Burger Joint


In [41]:
boroughs_combined['Cluster Labels'].replace({0: 'City Lifestyle'}, inplace=True)

In [42]:
#label Cluster 1 by reviewing top venues
boroughs_combined.loc[boroughs_combined['Cluster Labels'] == 1, boroughs_combined.columns[[1] + list(range(5, boroughs_combined.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Queen's Park,1,Coffee Shop,Gym,Sushi Restaurant,Diner,Japanese Restaurant,Yoga Studio,Creperie,Nightclub,Chinese Restaurant,College Auditorium


In [43]:
boroughs_combined['Cluster Labels'].replace({1: 'Collegiate'}, inplace=True)

In [44]:
#label Cluster 2 by reviewing top venues
boroughs_combined.loc[boroughs_combined['Cluster Labels'] == 2, boroughs_combined.columns[[1] + list(range(5, boroughs_combined.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Scarborough,2,Fast Food Restaurant,Yoga Studio,Hotel,Hobby Shop,Gym / Fitness Center,Gym,Golf Course,General Entertainment,Gastropub,Fried Chicken Joint


In [45]:
boroughs_combined['Cluster Labels'].replace({2: 'Suburban'}, inplace=True)

In [46]:
#label Cluster 3 by reviewing top venues
boroughs_combined.loc[boroughs_combined['Cluster Labels'] == 3, boroughs_combined.columns[[1] + list(range(5, boroughs_combined.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,3,Pool,Golf Course,Mediterranean Restaurant,Dog Run,Creperie,Gym / Fitness Center,Gym,General Entertainment,Gastropub,Fried Chicken Joint
10,Etobicoke,3,Café,American Restaurant,Sandwich Place,Fried Chicken Joint,Flower Shop,Liquor Store,Mexican Restaurant,Fast Food Restaurant,Pharmacy,Restaurant


In [47]:
boroughs_combined['Cluster Labels'].replace({3: 'Suburban'}, inplace=True)

In [48]:
#label Cluster 4 by reviewing top venues
boroughs_combined.loc[boroughs_combined['Cluster Labels'] == 4, boroughs_combined.columns[[1] + list(range(5, boroughs_combined.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,York,4,Hockey Arena,Trail,Park,Field,Flower Shop,Diner,Discount Store,Dog Run,Fast Food Restaurant,Yoga Studio


In [49]:
boroughs_combined['Cluster Labels'].replace({4: 'Suburban'}, inplace=True)

In [50]:
#show updated labeled data
boroughs_combined

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,Suburban,Fast Food Restaurant,Yoga Studio,Hotel,Hobby Shop,Gym / Fitness Center,Gym,Golf Course,General Entertainment,Gastropub,Fried Chicken Joint
1,M2H,North York,Hillcrest Village,43.803762,-79.363452,Suburban,Pool,Golf Course,Mediterranean Restaurant,Dog Run,Creperie,Gym / Fitness Center,Gym,General Entertainment,Gastropub,Fried Chicken Joint
2,M4B,East York,"Woodbine Gardens, Parkview Hill",43.706397,-79.309937,City Lifestyle,Fast Food Restaurant,Pizza Place,Intersection,Rock Climbing Spot,Gastropub,Pet Store,Pharmacy,Café,Gym / Fitness Center,Athletics & Sports
3,M4E,East Toronto,The Beaches,43.676357,-79.293031,City Lifestyle,Coffee Shop,Neighborhood,Gym / Fitness Center,Pub,Yoga Studio,Discount Store,Dog Run,Fast Food Restaurant,Field,Flower Shop
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,City Lifestyle,Park,Swim School,Bus Line,Dim Sum Restaurant,Yoga Studio,Field,Discount Store,Dog Run,Fast Food Restaurant,Fried Chicken Joint
5,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529,City Lifestyle,Park,Playground,Trail,Flower Shop,Diner,Discount Store,Dog Run,Fast Food Restaurant,Field,Fried Chicken Joint
6,M6C,York,Humewood-Cedarvale,43.693781,-79.428191,Suburban,Hockey Arena,Trail,Park,Field,Flower Shop,Diner,Discount Store,Dog Run,Fast Food Restaurant,Yoga Studio
7,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259,City Lifestyle,Bakery,Supermarket,Pharmacy,Discount Store,Park,Pool,Music Venue,Bank,Bar,Brewery
8,M7A,Queen's Park,Queen's Park,43.662301,-79.389494,Collegiate,Coffee Shop,Gym,Sushi Restaurant,Diner,Japanese Restaurant,Yoga Studio,Creperie,Nightclub,Chinese Restaurant,College Auditorium
9,M7R,Mississauga,Canada Post Gateway Processing Centre,43.636966,-79.615819,City Lifestyle,Hotel,Coffee Shop,Burrito Place,Gym / Fitness Center,Mediterranean Restaurant,Middle Eastern Restaurant,Fried Chicken Joint,Sandwich Place,American Restaurant,Burger Joint


In [51]:
#create int index for color grade
boroughs_combined['c_index'] = ''
boroughs_combined.loc[boroughs_combined['Cluster Labels'] == 'City Lifestyle', 'c_index'] = 0
boroughs_combined.loc[boroughs_combined['Cluster Labels'] == 'Suburban', 'c_index'] = 1
boroughs_combined.loc[boroughs_combined['Cluster Labels'] == 'Collegiate', 'c_index'] = 2

In [52]:
#Visualize labeled clusters on map with legend

toronto_boroughs = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster, index in zip(boroughs_combined['Latitude'], boroughs_combined['Longitude'], boroughs_combined['Borough'], boroughs_combined['Cluster Labels'], boroughs_combined['c_index']):
    label = folium.Popup(str(poi) + 's demographic is: ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[index-1],
        fill=True,
        fill_color=rainbow[index-1],
        fill_opacity=0.7).add_to(toronto_boroughs)
    
toronto_boroughs