## Segmenting and Clustering Neighborhoods in Toronto
Week 3 assignment for Coursera Applied Data Science Capstone

### Part 1
Scrape data from Wikipedia to create a dataframe of neighborhoods

In [1]:
import pandas as pd
import requests
import numpy as np

In [2]:
#!conda install beautifulsoup4
from bs4 import BeautifulSoup

In [3]:
URL = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
page = requests.get(URL)

Use BeautifulSoup to parse the html:

In [4]:
soup = BeautifulSoup(page.content, 'html.parser')
#print(soup.prettify)

In [5]:
tb = soup.find("table", attrs={"class": "wikitable"})
#tb

In [6]:
t_headers = []

for th in tb.find_all("th"):
    # remove any newlines and extra spaces from left and right
    t_headers.append(th.text.replace('\n', ' ').strip())
    # Get all the rows of table
    tb_data = []
    for tr in tb.tbody.find_all("tr"): # find all tr's from table's tbody
        t_row = {}
        # Each table row is stored in the form of
        # t_row = {'Postal code': '', 'Borough': '', 'Neighborhood)': ''}
        # find all td's in tr and zip it with t_header
        for td, th in zip(tr.find_all("td"), t_headers): 
            t_row[th] = td.text.replace('\n', '').strip().replace(' / ', ', ') #strip line breaks and reformat "/" to ","
        tb_data.append(t_row)

print(t_headers)
tb_data


['Postal code', 'Borough', 'Neighborhood']


[{},
 {'Postal code': 'M1A', 'Borough': 'Not assigned', 'Neighborhood': ''},
 {'Postal code': 'M2A', 'Borough': 'Not assigned', 'Neighborhood': ''},
 {'Postal code': 'M3A', 'Borough': 'North York', 'Neighborhood': 'Parkwoods'},
 {'Postal code': 'M4A',
  'Borough': 'North York',
  'Neighborhood': 'Victoria Village'},
 {'Postal code': 'M5A',
  'Borough': 'Downtown Toronto',
  'Neighborhood': 'Regent Park, Harbourfront'},
 {'Postal code': 'M6A',
  'Borough': 'North York',
  'Neighborhood': 'Lawrence Manor, Lawrence Heights'},
 {'Postal code': 'M7A',
  'Borough': 'Downtown Toronto',
  'Neighborhood': "Queen's Park, Ontario Provincial Government"},
 {'Postal code': 'M8A', 'Borough': 'Not assigned', 'Neighborhood': ''},
 {'Postal code': 'M9A',
  'Borough': 'Etobicoke',
  'Neighborhood': 'Islington Avenue'},
 {'Postal code': 'M1B',
  'Borough': 'Scarborough',
  'Neighborhood': 'Malvern, Rouge'},
 {'Postal code': 'M2B', 'Borough': 'Not assigned', 'Neighborhood': ''},
 {'Postal code': 'M3B', 'B

Clean up the table by dropping all "Not assigned" boroughs.
Convert table into a pandas dataframe and display the formatted table.

In [7]:
tb_df = pd.DataFrame(tb_data, columns=t_headers)
tb_df.dropna(inplace=True)
tb_df = tb_df[tb_df.Borough!='Not assigned']
tb_df.reset_index(inplace=True, drop=True)

from IPython.display import display, HTML
display(HTML(tb_df.to_html(index=True)))

Unnamed: 0,Postal code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [8]:
tb_df.shape

(103, 3)

### Part 2
Add latitude and longitude information to the neighborhood table using the provided csv table.

In [9]:
postal_code_lookup = pd.read_csv('http://cocl.us/Geospatial_data')
postal_code_lookup.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [10]:
geo_data = tb_df.merge(postal_code_lookup, left_on='Postal code', right_on='Postal Code')
geo_data.drop(columns=['Postal Code'], axis=1, inplace=True)
print(geo_data.shape)
geo_data.head()

(103, 5)


Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


### Part 3
Explore the data and cluster neighborhoods using venue data from Foursquare.

Look at the distribution of neighborhoods by borough. Downtown Toronto has 19 neighborhoods.

In [11]:
print('The dataframe has {} boroughs and {} postal codes.'.format(
        len(geo_data['Borough'].unique()),
        geo_data.shape[0]
    )
)

geo_data.groupby('Borough').count()

The dataframe has 10 boroughs and 103 postal codes.


Unnamed: 0_level_0,Postal code,Neighborhood,Latitude,Longitude
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Central Toronto,9,9,9,9
Downtown Toronto,19,19,19,19
East Toronto,5,5,5,5
East York,5,5,5,5
Etobicoke,12,12,12,12
Mississauga,1,1,1,1
North York,24,24,24,24
Scarborough,17,17,17,17
West Toronto,6,6,6,6
York,5,5,5,5


In [12]:
#!pip install geopy
from geopy.geocoders import Nominatim
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="ca_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Toronto are 43.6534817, -79.3839347.


Create a map of all Toronto neighborhoods.

In [13]:
import folium
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(geo_data['Latitude'], geo_data['Longitude'], geo_data['Borough'], geo_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Now focus only on downtown Toronto neighborhoods for the cluster analysis.

In [14]:
dt = geo_data[geo_data.Borough=='Downtown Toronto']

map_downtown = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, borough, neighborhood in zip(dt['Latitude'], dt['Longitude'], dt['Borough'], dt['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_downtown)  
    
map_downtown

The remaining sections will request information on food venues from the Foursquare API for the Downtown Toronto neighborhoods and then cluster them by the types of food venues in each neighborhood (e.g. coffee shops, Italian restaurants, fast food, etc.)

Foursquare credentials (stripped before posting to Github)

In [15]:
CLIENT_ID = 'CLIENT_ID' # your Foursquare ID
CLIENT_SECRET = 'CLIENT_SECRET' # your Foursquare Secret
VERSION = '20180605'
LIMIT = 30

Define a function to get information on nearby venues in the "Food" category for all neighborhoods in Downtown Toronto. Food venues have categoryId=4d4b7105d754a06374d81259.

In [16]:
def getNearbyFood(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?&categoryId=4d4b7105d754a06374d81259&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()['response']['venues']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['name'], 
            v['location']['lat'], 
            v['location']['lng'],  
            v['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [17]:
dt_food = getNearbyFood(names=dt['Neighborhood'],
                                  latitudes=dt['Latitude'],
                                  longitudes=dt['Longitude']
                                 )

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
Stn A PO Boxes
St. James Town, Cabbagetown
First Canadian Place, Underground city
Church and Wellesley


In [18]:
print(dt_food.shape)
dt_food.head()

(512, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Terroni Sud Forno Produzione e Spaccio,43.653903,-79.360018,Gourmet Shop
1,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
2,"Regent Park, Harbourfront",43.65426,-79.360636,Rocco's No Frills,43.651419,-79.365947,Grocery Store
3,"Regent Park, Harbourfront",43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot
4,"Regent Park, Harbourfront",43.65426,-79.360636,Tim Hortons,43.656631,-79.35624,Coffee Shop


Summarize the number of food venues in each neighborhood (note the Foursquare API call was limited to 30 records for each.)

In [19]:
dt_food.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,30,30,30,30,30,30
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",4,4,4,4,4,4
Central Bay Street,30,30,30,30,30,30
Christie,27,27,27,27,27,27
Church and Wellesley,30,30,30,30,30,30
"Commerce Court, Victoria Hotel",30,30,30,30,30,30
"First Canadian Place, Underground city",30,30,30,30,30,30
"Garden District, Ryerson",30,30,30,30,30,30
"Harbourfront East, Union Station, Toronto Islands",29,29,29,29,29,29
"Kensington Market, Chinatown, Grange Park",30,30,30,30,30,30


In [20]:
print('There are {} uniques categories in food venues for Downtown Toronto.'.format(len(dt_food['Venue Category'].unique())))

There are 73 uniques categories in food venues for Downtown Toronto.


In [21]:
# one hot encoding
dt_onehot = pd.get_dummies(dt_food[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dt_onehot['Neighborhood'] = dt_food['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [dt_onehot.columns[-1]] + list(dt_onehot.columns[:-1])
dt_onehot = dt_onehot[fixed_columns]

print(dt_onehot.shape)
dt_onehot.head()

(512, 74)


Unnamed: 0,Neighborhood,American Restaurant,Bagel Shop,Bakery,Bar,Beer Bar,Bistro,Breakfast Spot,Bubble Tea Shop,Burger Joint,...,Sports Bar,Sri Lankan Restaurant,Steakhouse,Sushi Restaurant,Tapas Restaurant,Tea Room,Thai Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint
0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Regent Park, Harbourfront",0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront",0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [22]:
dt_grouped = dt_onehot.groupby('Neighborhood').mean().reset_index()
print(dt_grouped.shape)
dt_grouped

(19, 74)


Unnamed: 0,Neighborhood,American Restaurant,Bagel Shop,Bakery,Bar,Beer Bar,Bistro,Breakfast Spot,Bubble Tea Shop,Burger Joint,...,Sports Bar,Sri Lankan Restaurant,Steakhouse,Sushi Restaurant,Tapas Restaurant,Tea Room,Thai Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint
0,Berczy Park,0.033333,0.0,0.0,0.0,0.0,0.033333,0.033333,0.033333,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0
1,"CN Tower, King and Spadina, Railway Lands, Har...",0.25,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0
2,Central Bay Street,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.066667,0.033333,...,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.0
3,Christie,0.037037,0.0,0.148148,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0
4,Church and Wellesley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.133333,...,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0
5,"Commerce Court, Victoria Hotel",0.033333,0.033333,0.0,0.033333,0.0,0.0,0.066667,0.0,0.0,...,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0
6,"First Canadian Place, Underground city",0.0,0.033333,0.0,0.033333,0.0,0.0,0.066667,0.0,0.0,...,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0
7,"Garden District, Ryerson",0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.033333,0.0,...,0.0,0.0,0.033333,0.0,0.0,0.0,0.066667,0.0,0.033333,0.0
8,"Harbourfront East, Union Station, Toronto Islands",0.0,0.034483,0.034483,0.0,0.0,0.0,0.0,0.068966,0.0,...,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483
9,"Kensington Market, Chinatown, Grange Park",0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.033333,0.0,...,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0


Examine the data by finding the top 5 most common venue types in each neighborhood.

In [23]:
num_top_venues = 5

for hood in dt_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = dt_grouped[dt_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
                 venue  freq
0          Coffee Shop  0.23
1           Restaurant  0.13
2   Italian Restaurant  0.10
3  Japanese Restaurant  0.07
4        Grocery Store  0.07


----CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport----
                 venue  freq
0  American Restaurant  0.25
1                  Bar  0.25
2          Coffee Shop  0.25
3     Tapas Restaurant  0.25
4   Mexican Restaurant  0.00


----Central Bay Street----
                 venue  freq
0          Coffee Shop  0.40
1       Sandwich Place  0.07
2  Fried Chicken Joint  0.07
3      Bubble Tea Shop  0.07
4     Ramen Restaurant  0.03


----Christie----
                venue  freq
0                Café  0.26
1              Bakery  0.15
2          Restaurant  0.07
3         Coffee Shop  0.07
4  Italian Restaurant  0.07


----Church and Wellesley----
                  venue  freq
0           Coffee Shop  0.13
1          Burger Joint  0.13
2  F

Now find the top 10 most common food venue types for each neighborhood to use as inputs to the clustering algorithm.

In [24]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [25]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = dt_grouped['Neighborhood']

for ind in np.arange(dt_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dt_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Restaurant,Italian Restaurant,Grocery Store,Japanese Restaurant,Greek Restaurant,Salad Place,Bistro,Breakfast Spot,Bubble Tea Shop
1,"CN Tower, King and Spadina, Railway Lands, Har...",American Restaurant,Coffee Shop,Bar,Tapas Restaurant,Fast Food Restaurant,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant,Ethiopian Restaurant
2,Central Bay Street,Coffee Shop,Bubble Tea Shop,Fried Chicken Joint,Sandwich Place,Diner,Ramen Restaurant,Italian Restaurant,Shopping Mall,Burger Joint,Fast Food Restaurant
3,Christie,Café,Bakery,Coffee Shop,Italian Restaurant,Restaurant,Sandwich Place,Korean Restaurant,Juice Bar,Japanese Restaurant,Candy Store
4,Church and Wellesley,Coffee Shop,Burger Joint,Fast Food Restaurant,Sandwich Place,Gastropub,Sushi Restaurant,Café,Mexican Restaurant,Italian Restaurant,Poke Place


In [26]:
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 5

dt_grouped_clustering = dt_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dt_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 3, 4, 0, 0, 1, 1, 4, 0, 0])

In [27]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

dt_merged = dt

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
dt_merged = dt_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

print(dt_merged['Cluster Labels'].value_counts())

dt_merged.head() # check the last columns!

1    7
0    7
4    3
3    1
2    1
Name: Cluster Labels, dtype: int64


Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,0,Coffee Shop,Bakery,Ethiopian Restaurant,Pizza Place,Café,Fish & Chips Shop,Gourmet Shop,Grocery Store,Donut Shop,Italian Restaurant
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,4,Coffee Shop,Burger Joint,Fast Food Restaurant,Fried Chicken Joint,Poke Place,Sandwich Place,Indian Restaurant,Italian Restaurant,Diner,Mexican Restaurant
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,4,Coffee Shop,Fried Chicken Joint,Sandwich Place,Thai Restaurant,Japanese Restaurant,Gastropub,Vietnamese Restaurant,Grocery Store,Falafel Restaurant,Indian Restaurant
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,1,Coffee Shop,Café,Diner,Japanese Restaurant,Restaurant,Seafood Restaurant,Middle Eastern Restaurant,Convenience Store,Pizza Place,Italian Restaurant
20,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,1,Coffee Shop,Restaurant,Italian Restaurant,Grocery Store,Japanese Restaurant,Greek Restaurant,Salad Place,Bistro,Breakfast Spot,Bubble Tea Shop


In [28]:
import matplotlib.cm as cm
import matplotlib.colors as colors
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dt_merged['Latitude'], dt_merged['Longitude'], dt_merged['Neighborhood'], dt_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

The results in the map above and tables below are intuitive. The neighborhoods in the dense central business district (purple) have similar food venues including coffee shops and food courts for downtown workers to visit during the work day. Neighborhoods that are a similar distance from the central business district and have similar densities also have similar food venues, shown in orange and red. Finally, the two neighborhoods that are geographically least like the urban core (Rosedale and the airport) are each in their own cluster.

Look at tables of which neighborhoods are included in each cluster:

In [29]:
dt_merged.loc[dt_merged['Cluster Labels'] == 0, dt_merged.columns[[2] + list(range(5, dt_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,"Regent Park, Harbourfront",0,Coffee Shop,Bakery,Ethiopian Restaurant,Pizza Place,Café,Fish & Chips Shop,Gourmet Shop,Grocery Store,Donut Shop,Italian Restaurant
25,Christie,0,Café,Bakery,Coffee Shop,Italian Restaurant,Restaurant,Sandwich Place,Korean Restaurant,Juice Bar,Japanese Restaurant,Candy Store
36,"Harbourfront East, Union Station, Toronto Islands",0,Coffee Shop,Restaurant,Bubble Tea Shop,Chinese Restaurant,Japanese Restaurant,Wings Joint,Italian Restaurant,Mexican Restaurant,Mediterranean Restaurant,Juice Bar
80,"University of Toronto, Harbord",0,Vegetarian / Vegan Restaurant,Coffee Shop,Café,Restaurant,Bubble Tea Shop,Hot Dog Joint,Italian Restaurant,Korean Restaurant,Frozen Yogurt Shop,College Quad
84,"Kensington Market, Chinatown, Grange Park",0,Chinese Restaurant,Coffee Shop,Bakery,Dim Sum Restaurant,Noodle House,Fast Food Restaurant,Sushi Restaurant,Vietnamese Restaurant,Sri Lankan Restaurant,Grocery Store
96,"St. James Town, Cabbagetown",0,Restaurant,Coffee Shop,Pizza Place,Gastropub,Thai Restaurant,Italian Restaurant,Café,Ethiopian Restaurant,Grocery Store,Hot Dog Joint
99,Church and Wellesley,0,Coffee Shop,Burger Joint,Fast Food Restaurant,Sandwich Place,Gastropub,Sushi Restaurant,Café,Mexican Restaurant,Italian Restaurant,Poke Place


In [30]:
dt_merged.loc[dt_merged['Cluster Labels'] == 1, dt_merged.columns[[2] + list(range(5, dt_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,St. James Town,1,Coffee Shop,Café,Diner,Japanese Restaurant,Restaurant,Seafood Restaurant,Middle Eastern Restaurant,Convenience Store,Pizza Place,Italian Restaurant
20,Berczy Park,1,Coffee Shop,Restaurant,Italian Restaurant,Grocery Store,Japanese Restaurant,Greek Restaurant,Salad Place,Bistro,Breakfast Spot,Bubble Tea Shop
30,"Richmond, Adelaide, King",1,Coffee Shop,Food Court,Ramen Restaurant,Café,Breakfast Spot,Japanese Restaurant,Indian Restaurant,Restaurant,Burger Joint,Fast Food Restaurant
42,"Toronto Dominion Centre, Design Exchange",1,Coffee Shop,Food Court,Café,Japanese Restaurant,Restaurant,Grocery Store,Deli / Bodega,Convenience Store,Pizza Place,Juice Bar
48,"Commerce Court, Victoria Hotel",1,Coffee Shop,Restaurant,Café,Food Court,Japanese Restaurant,Breakfast Spot,American Restaurant,Fast Food Restaurant,Italian Restaurant,Juice Bar
92,Stn A PO Boxes,1,Coffee Shop,Restaurant,Bubble Tea Shop,Greek Restaurant,Pub,Bistro,Café,Deli / Bodega,Diner,Fast Food Restaurant
97,"First Canadian Place, Underground city",1,Coffee Shop,Food Court,Japanese Restaurant,Breakfast Spot,Café,Bar,Sushi Restaurant,Deli / Bodega,Convenience Store,Pizza Place


In [31]:
dt_merged.loc[dt_merged['Cluster Labels'] == 2, dt_merged.columns[[2] + list(range(5, dt_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
91,Rosedale,2,Coffee Shop,Italian Restaurant,Food Truck,Wings Joint,Falafel Restaurant,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant,Ethiopian Restaurant


In [32]:
dt_merged.loc[dt_merged['Cluster Labels'] == 3, dt_merged.columns[[2] + list(range(5, dt_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
87,"CN Tower, King and Spadina, Railway Lands, Har...",3,American Restaurant,Coffee Shop,Bar,Tapas Restaurant,Fast Food Restaurant,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant,Ethiopian Restaurant


In [33]:
dt_merged.loc[dt_merged['Cluster Labels'] == 4, dt_merged.columns[[2] + list(range(5, dt_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,"Queen's Park, Ontario Provincial Government",4,Coffee Shop,Burger Joint,Fast Food Restaurant,Fried Chicken Joint,Poke Place,Sandwich Place,Indian Restaurant,Italian Restaurant,Diner,Mexican Restaurant
9,"Garden District, Ryerson",4,Coffee Shop,Fried Chicken Joint,Sandwich Place,Thai Restaurant,Japanese Restaurant,Gastropub,Vietnamese Restaurant,Grocery Store,Falafel Restaurant,Indian Restaurant
24,Central Bay Street,4,Coffee Shop,Bubble Tea Shop,Fried Chicken Joint,Sandwich Place,Diner,Ramen Restaurant,Italian Restaurant,Shopping Mall,Burger Joint,Fast Food Restaurant
