# Week 3 Assignment: Segmentation and Clustering of Neighborhoods in Toronto

In this week's assignment, the neighborhoods of Toronto need to be identified, segmented and clustered.

First, I will import the required libraries for this task.

In [1]:
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

# Library needed to read websites
from urllib.request import urlopen

print('Libraries imported.')

Libraries imported.


## Task 1: Identification of Neighborhoods and Scraping Data

Reading the URL using pandas.

In [49]:
raw_data = pd.read_html('http://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
type(raw_data)

list

Returns a list. The first item is a data frame containing the desired data.

In [62]:
df_all=raw_data[0]
type(df)

pandas.core.frame.DataFrame

In [63]:
df_all.head(11) # Just to check the format and general data within the data frame.

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,
8,M9A,Etobicoke,Islington Avenue
9,M1B,Scarborough,"Malvern, Rouge"


Now, the Boroughs with entry "Not Assigned" should be removed.

In [181]:
df=df_all[df_all.Borough!='Not assigned'] # Creates a data frame without the unassigned boroughs
df.reset_index(drop=True,inplace=True) # Re-sets the indices from the original data frame to 0 through n
df.head(11)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [84]:
df.shape

(103, 3)

## Task 2: Identify the geographic locations of the neighborhoods using Bing Maps' Geocoding API

First, lets import the required library. The google API did not work for me, so I created a bing developer account and used geocoder with my bing API. (I have removed it for sharing the code.)

In [2]:
import geocoder
bing_API = '' #add your bing-API here

After creating an empty dataframe from a dictionary, the coordinates are requested from the bing maps database. They are subsequently written into the dataframe.

In [279]:
df_coords = pd.DataFrame({'postalcode': [], 'latitude' : [], 'longitude' : []})
for pc in df['Postal Code']:
    post = str(pc)
    g = geocoder.bing(post+', Toronto, Ontario',key=bing_API)
    coords = {'postalcode': [post], 'latitude' : [g.lat], 'longitude' : [g.lng]}
    c = pd.DataFrame(coords)
    df_coords = df_coords.append(c)

All entries have index 0, so the index is reset to proper indices.

In [286]:
df_coords.reset_index(drop=True,inplace=True)
df_coords.head()

Unnamed: 0,postalcode,latitude,longitude
0,M3A,43.751881,-79.33036
1,M4A,43.730419,-79.31282
2,M5A,43.65514,-79.362648
3,M6A,43.723209,-79.451408
4,M7A,43.66449,-79.393021


Now, the initial dataframe and the coordinate dataframe are combined.

In [305]:
combined_df = pd.concat([df,df_coords['latitude'],df_coords['longitude']],axis=1)
combined_df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,latitude,longitude
0,M3A,North York,Parkwoods,43.751881,-79.33036
1,M4A,North York,Victoria Village,43.730419,-79.31282
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65514,-79.362648
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.723209,-79.451408
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66449,-79.393021


# Task 3: Segmenting and Clustering Neighborhoods of Downtown Toronto

The dataframe containing all geographical data on the boroughs is reduced to Downtown Toronto.

In [307]:
df_dttor = combined_df[combined_df.Borough=='Downtown Toronto'] # Creates a data frame with data only from Downtown
df_dttor.reset_index(drop=True,inplace=True) # Re-sets the indices from the original data frame to 0 through n
df_dttor.head(5)

Unnamed: 0,Postal Code,Borough,Neighborhood,latitude,longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65514,-79.362648
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66449,-79.393021
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.65736,-79.378181
3,M5C,Downtown Toronto,St. James Town,43.651428,-79.375572
4,M5E,Downtown Toronto,Berczy Park,43.645309,-79.37368


These neighborhoods are now visualised on a map. To do this the overall coordinates of Toronto are required.

In [311]:
g = geocoder.bing('Toronto, Ontario',key=bing_API)
print('The geograpical coordinates of Toronto are {}, {}.'.format(g.lat,g.lng))

The geograpical coordinates of Toronto are 43.64868927001953, -79.38543701171875.


In [317]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[g.lat, g.lng], zoom_start=13)
# add markers to map
for lat, lng, label in zip(df_dttor['latitude'], df_dttor['longitude'], df_dttor['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
map_toronto

## Scraping data for each neighborhood using Foursquare

This is the routine from the New York Neighborhood lab used to cluster the neighborhoods of Manhatten.

In [1]:
#Input your Foursquare credentials!
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT=100

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 
CLIENT_SECRET:


In [319]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [321]:
toronto_venues = getNearbyVenues(names=df_dttor['Neighborhood'],
                                   latitudes=df_dttor['latitude'],
                                   longitudes=df_dttor['longitude']
                                  )

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
Stn A PO Boxes
St. James Town, Cabbagetown
First Canadian Place, Underground city
Church and Wellesley


In [328]:
print(toronto_venues.shape)
toronto_venues.head()

(1204, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65514,-79.362648,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65514,-79.362648,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65514,-79.362648,Figs Breakfast & Lunch,43.655675,-79.364503,Breakfast Spot
3,"Regent Park, Harbourfront",43.65514,-79.362648,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot
4,"Regent Park, Harbourfront",43.65514,-79.362648,The Yoga Lounge,43.655515,-79.364955,Yoga Studio


In [334]:
print('There are {} unique categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 185 unique categories.


Next, we are going to analyse the neighborhoods by counting the occurences of each type of venue in a neighborhood and calculating their frequency.

In [335]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

In [336]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighborhood,Yoga Studio,Afghan Restaurant,American Restaurant,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Basketball Stadium,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Boat or Ferry,Bookstore,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Butcher,Café,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Gym,College Rec Center,College Theater,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Costume Shop,Creperie,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Furniture / Home Store,Gaming Cafe,Garden,Gastropub,Gay Bar,General Entertainment,General Travel,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Historic Site,Hobby Shop,Hookah Bar,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Lake,Latin American Restaurant,Lingerie Store,Liquor Store,Lounge,Market,Martial Arts Dojo,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music School,Music Store,Music Venue,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Organic Grocery,Other Great Outdoors,Park,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Poke Place,Poutine Place,Pub,Ramen Restaurant,Record Shop,Restaurant,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Souvlaki Shop,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store
0,Berczy Park,0.014925,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.014925,0.0,0.014925,0.029851,0.0,0.0,0.014925,0.029851,0.0,0.0,0.014925,0.0,0.0,0.0,0.029851,0.0,0.0,0.0,0.0,0.0,0.014925,0.029851,0.0,0.0,0.029851,0.0,0.0,0.014925,0.044776,0.074627,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.014925,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.014925,0.014925,0.0,0.014925,0.0,0.0,0.014925,0.014925,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014925,0.014925,0.014925,0.0,0.0,0.0,0.0,0.0,0.029851,0.0,0.0,0.0,0.0,0.0,0.014925,0.014925,0.014925,0.014925,0.0,0.014925,0.0,0.0,0.0,0.014925,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.014925,0.014925,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.044776,0.0,0.0,0.0,0.0,0.0,0.0,0.044776,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.014925,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0
1,"CN Tower, King and Spadina, Railway Lands, Har...",0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.028571,0.014286,0.042857,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.042857,0.0,0.014286,0.0,0.0,0.0,0.0,0.014286,0.071429,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.014286,0.0,0.0,0.014286,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.042857,0.014286,0.0,0.0,0.014286,0.014286,0.0,0.0,0.0,0.014286,0.014286,0.057143,0.014286,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.014286,0.014286,0.0,0.014286,0.0,0.014286,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.042857,0.0,0.014286,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.014286,0.014286,0.0,0.042857,0.0,0.0,0.014286,0.0,0.028571,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.014286,0.028571,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0
2,Central Bay Street,0.0,0.0,0.0,0.0,0.017544,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.070175,0.0,0.122807,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.035088,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.035088,0.017544,0.0,0.017544,0.017544,0.0,0.035088,0.0,0.0,0.0,0.0,0.035088,0.0,0.0,0.017544,0.017544,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.017544,0.017544,0.0,0.017544,0.017544,0.0,0.0,0.0,0.0,0.017544,0.017544,0.0,0.0,0.0
3,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.3,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Church and Wellesley,0.013158,0.013158,0.013158,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.013158,0.0,0.013158,0.0,0.013158,0.0,0.013158,0.013158,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.118421,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.026316,0.0,0.0,0.0,0.013158,0.0,0.013158,0.013158,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.013158,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.013158,0.013158,0.0,0.013158,0.0,0.026316,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.065789,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.013158,0.026316,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.026316,0.013158,0.0,0.052632,0.0,0.013158,0.013158,0.013158,0.0,0.013158,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.013158,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.013158,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Following the lab, I will now create a dataframe with the top 10 venues of each neighborhood.

In [337]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [383]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Seafood Restaurant,Cocktail Bar,Restaurant,Breakfast Spot,Hotel,Beer Bar,Cheese Shop,Café,Bakery
1,"CN Tower, King and Spadina, Railway Lands, Har...",Coffee Shop,Italian Restaurant,Restaurant,Café,Bar,Park,Gym / Fitness Center,Sandwich Place,Speakeasy,Bakery
2,Central Bay Street,Coffee Shop,Clothing Store,Middle Eastern Restaurant,Plaza,Sandwich Place,Cosmetics Shop,Hotel,Restaurant,Bubble Tea Shop,Modern European Restaurant
3,Christie,Café,Grocery Store,Playground,Candy Store,Italian Restaurant,Baby Store,Coffee Shop,Women's Store,Dog Run,Falafel Restaurant
4,Church and Wellesley,Coffee Shop,Japanese Restaurant,Restaurant,Sushi Restaurant,Gastropub,Gay Bar,Hotel,Café,Men's Store,Pub


## Clustering of the neighborhoods

In [384]:
# set number of clusters
kclusters = 7

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [391]:
toronto_merged = df_dttor

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
toronto_merged.head(2)

Unnamed: 0,Postal Code,Borough,Neighborhood,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65514,-79.362648,5,Coffee Shop,Breakfast Spot,Yoga Studio,Bakery,Italian Restaurant,Food Truck,Event Space,Electronics Store,Distribution Center,Pub
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66449,-79.393021,6,Coffee Shop,Park,Café,Persian Restaurant,Pizza Place,Chinese Restaurant,Restaurant,Clothing Store,Pub,Museum


In [392]:
# create map
map_clusters = folium.Map(location=[g.lat, g.lng], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['latitude'], toronto_merged['longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Identifying cluster characteristics

In [393]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,"University of Toronto, Harbord",0,Café,Bookstore,Bakery,Restaurant,Coffee Shop,Japanese Restaurant,Gym,Bar,Sushi Restaurant,Italian Restaurant
12,"Kensington Market, Chinatown, Grange Park",0,Café,Vietnamese Restaurant,Gaming Cafe,Vegetarian / Vegan Restaurant,Coffee Shop,Mexican Restaurant,Bakery,Comfort Food Restaurant,Caribbean Restaurant,Restaurant
13,"CN Tower, King and Spadina, Railway Lands, Har...",0,Coffee Shop,Italian Restaurant,Restaurant,Café,Bar,Park,Gym / Fitness Center,Sandwich Place,Speakeasy,Bakery
16,"St. James Town, Cabbagetown",0,Coffee Shop,Park,Bakery,Pub,Chinese Restaurant,Restaurant,Café,Italian Restaurant,Pizza Place,Farm


In [394]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Christie,1,Café,Grocery Store,Playground,Candy Store,Italian Restaurant,Baby Store,Coffee Shop,Women's Store,Dog Run,Falafel Restaurant


In [395]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,St. James Town,2,Café,Seafood Restaurant,Coffee Shop,Cocktail Bar,Gastropub,American Restaurant,Restaurant,Bakery,Gym,Clothing Store
4,Berczy Park,2,Coffee Shop,Seafood Restaurant,Cocktail Bar,Restaurant,Breakfast Spot,Hotel,Beer Bar,Cheese Shop,Café,Bakery
7,"Richmond, Adelaide, King",2,Café,Coffee Shop,Restaurant,Gym,Hotel,Salad Place,American Restaurant,Breakfast Spot,Japanese Restaurant,Steakhouse
9,"Toronto Dominion Centre, Design Exchange",2,Coffee Shop,Café,Hotel,Restaurant,Salad Place,Seafood Restaurant,American Restaurant,Japanese Restaurant,Italian Restaurant,Gastropub
10,"Commerce Court, Victoria Hotel",2,Coffee Shop,Restaurant,Café,Hotel,Italian Restaurant,American Restaurant,Gym,Japanese Restaurant,Seafood Restaurant,Deli / Bodega
15,Stn A PO Boxes,2,Coffee Shop,Hotel,Restaurant,Café,Japanese Restaurant,Asian Restaurant,Sushi Restaurant,Salon / Barbershop,Sandwich Place,Seafood Restaurant
17,"First Canadian Place, Underground city",2,Coffee Shop,Café,Hotel,Gym,American Restaurant,Japanese Restaurant,Restaurant,Deli / Bodega,Seafood Restaurant,Asian Restaurant
18,Church and Wellesley,2,Coffee Shop,Japanese Restaurant,Restaurant,Sushi Restaurant,Gastropub,Gay Bar,Hotel,Café,Men's Store,Pub


In [396]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Rosedale,3,Park,Building,Playground,Tennis Court,Women's Store,Discount Store,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant


In [397]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,"Garden District, Ryerson",4,Coffee Shop,Clothing Store,Middle Eastern Restaurant,Cosmetics Shop,Italian Restaurant,Café,Japanese Restaurant,Restaurant,Bookstore,Ramen Restaurant
5,Central Bay Street,4,Coffee Shop,Clothing Store,Middle Eastern Restaurant,Plaza,Sandwich Place,Cosmetics Shop,Hotel,Restaurant,Bubble Tea Shop,Modern European Restaurant
8,"Harbourfront East, Union Station, Toronto Islands",4,Coffee Shop,Japanese Restaurant,Hotel,Plaza,Deli / Bodega,Park,Boat or Ferry,Aquarium,Salad Place,Roof Deck


In [398]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 5, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Regent Park, Harbourfront",5,Coffee Shop,Breakfast Spot,Yoga Studio,Bakery,Italian Restaurant,Food Truck,Event Space,Electronics Store,Distribution Center,Pub


In [399]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 6, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,"Queen's Park, Ontario Provincial Government",6,Coffee Shop,Park,Café,Persian Restaurant,Pizza Place,Chinese Restaurant,Restaurant,Clothing Store,Pub,Museum


As a concluding remark, I have to note that Downtown Toronto apparently consists primarily of Coffee Shop and Cafes, which make up a large portion of the venues.