# Segmenting and Clustering Neighborhoods in Toronto 

## Part I

### For this part of the assignment, we will need to scrape Toronto neighbourhood data from a wikipedia page and present the data in a dataframe

In [52]:
#Load necessary libraries
import requests
from bs4 import BeautifulSoup
from urllib.request import urlopen
import pandas as pd
import numpy as np

In [53]:
#Scrape data from wikipedia page using beautiful soup package
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
canada_data = requests.get(url).text
soup = BeautifulSoup(canada_data, 'lxml')

In [54]:
#Append data in a dataframe
table=soup.find('table')

toronto_data = pd.DataFrame(columns=['Postal Code','Borough','Neighborhood'])

for row in table.findAll('td'):
    #ignore cells that are not assigned
    if row.span.text == "Not assigned":
        pass
    #append the rest of the cell to dataframe accordingly
    else:
        postal_code = row.text.strip()[:3]
        borough = row.span.text.split('(')[0]
        neighborhood = row.span.text.split('(')[1].strip(')').replace('/',',')
    
        toronto_data = toronto_data.append({"Postal Code":postal_code, "Borough":borough, "Neighborhood":neighborhood}, ignore_index=True)


#Replace unclear text with proper text
toronto_data['Borough']=toronto_data['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

toronto_data

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park , Harbourfront"
3,M6A,North York,"Lawrence Manor , Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government
...,...,...,...
98,M8X,Etobicoke,"The Kingsway , Montgomery Road , Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto Business,Enclave of M4L
101,M8Y,Etobicoke,"Old Mill South , King's Mill Park , Sunnylea ,..."


In [55]:
toronto_data.shape

(103, 3)

## Part II

### Retrieve latitude and the longitude coordinates of each neighborhood

In [56]:
#!pip install geocoder
import geocoder # import geocoder

toronto_data['Latitude']=""
toronto_data['Longitude']=""

# loop until you get the coordinates
for index,row in toronto_data.iterrows():
    lat_lng_coords = None
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Toronto, Ontario'.format(row['Postal Code']))
        lat_lng_coords = g.latlng

    row['Latitude']=lat_lng_coords[0]
    row['Longitude']= lat_lng_coords[1]

toronto_data

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.75245,-79.32991
1,M4A,North York,Victoria Village,43.73057,-79.31306
2,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.65512,-79.36264
3,M6A,North York,"Lawrence Manor , Lawrence Heights",43.72327,-79.45042
4,M7A,Queen's Park,Ontario Provincial Government,43.66253,-79.39188
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway , Montgomery Road , Old Mill North",43.65319,-79.51113
99,M4Y,Downtown Toronto,Church and Wellesley,43.66659,-79.38133
100,M7Y,East Toronto Business,Enclave of M4L,43.64869,-79.38544
101,M8Y,Etobicoke,"Old Mill South , King's Mill Park , Sunnylea ,...",43.63278,-79.48945


## Part III

### We will now explore and cluster the neighborhoods in Toronto, analysis and draw conclusions on the different clusters of the neighborhoods


Assumptions for this assignment:

*Focus will be on boroughs that contain the word Toronto \
*K-means clustering will be used to cluster the neighbourhoods

#### 1. Obtain the coordinates of toronto using geopy library, and visualize location of each neighborhood on the map using Folium

In [57]:
# Use geopy library to get the longitude and latitude values of Toronto
from geopy.geocoders import Nominatim

address = 'Toronto Ontario, TO'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto Ontario are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto Ontario are 43.65238435, -79.38356765.


In [58]:
import folium

# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### 2. Filter out boroughs containing the word 'Toronto', which is the area of focus in this assignment

In [59]:
#Filter data to borough containing the word 'Toronto'
toronto_data_short = toronto_data.loc[toronto_data['Borough'].str.contains("Toronto", case=False)].reset_index(drop=True)
toronto_data_short

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.65512,-79.36264
1,M5B,Downtown Toronto,"Garden District, Ryerson",43.65739,-79.37804
2,M5C,Downtown Toronto,St. James Town,43.65215,-79.37587
3,M4E,East Toronto,The Beaches,43.67709,-79.29547
4,M5E,Downtown Toronto,Berczy Park,43.64536,-79.37306
5,M5G,Downtown Toronto,Central Bay Street,43.65609,-79.38493
6,M6G,Downtown Toronto,Christie,43.66869,-79.42071
7,M5H,Downtown Toronto,"Richmond , Adelaide , King",43.6497,-79.38258
8,M6H,West Toronto,"Dufferin , Dovercourt Village",43.66505,-79.43891
9,M4J,East York/East Toronto,The Danforth East,43.68811,-79.33418


In [60]:
#standardizing borough text
toronto_data_short['Borough']=toronto_data_short['Borough'].replace({'Downtown Toronto Stn A':'Downtown Toronto','East Toronto Business':'East Toronto'})
toronto_data_short

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.65512,-79.36264
1,M5B,Downtown Toronto,"Garden District, Ryerson",43.65739,-79.37804
2,M5C,Downtown Toronto,St. James Town,43.65215,-79.37587
3,M4E,East Toronto,The Beaches,43.67709,-79.29547
4,M5E,Downtown Toronto,Berczy Park,43.64536,-79.37306
5,M5G,Downtown Toronto,Central Bay Street,43.65609,-79.38493
6,M6G,Downtown Toronto,Christie,43.66869,-79.42071
7,M5H,Downtown Toronto,"Richmond , Adelaide , King",43.6497,-79.38258
8,M6H,West Toronto,"Dufferin , Dovercourt Village",43.66505,-79.43891
9,M4J,East York/East Toronto,The Danforth East,43.68811,-79.33418


#### 3. Visualize the location of each neighborhood in the East/West/Central/Downtown Toronto

In [61]:
# create map of area of focus in Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(toronto_data_short['Latitude'], toronto_data_short['Longitude'], toronto_data_short['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### 4. Explore the neighborhoods using information from Foursquare

In [None]:
#Define Foursquare credentials
CLIENT_ID = 'XXX' # your Foursquare ID
CLIENT_SECRET = 'XXX' # your Foursquare Secret
VERSION = 'XXX' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

In [63]:
#Reuse function in this course to extract the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

##### We will like to get the top 100 venues that are within 100m from each neighborhood

In [135]:
#Reuse function from this course to loop through the list of neighbourhoods
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [136]:
#Apply function to obtain the top 100 venues within 500m from each neighborhood
toronto_venues = getNearbyVenues(names=toronto_data_short['Neighborhood'],
                                   latitudes=toronto_data_short['Latitude'],
                                   longitudes=toronto_data_short['Longitude']
                                  )

Regent Park , Harbourfront
Garden District, Ryerson
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Richmond , Adelaide , King
Dufferin , Dovercourt Village
The Danforth  East
Harbourfront East , Union Station , Toronto Islands
Little Portugal , Trinity
The Danforth West , Riverdale
Toronto Dominion Centre , Design Exchange
Brockton , Parkdale Village , Exhibition Place
India Bazaar , The Beaches West
Commerce Court , Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North & West
High Park , The Junction South
North Toronto West
The Annex , North Midtown , Yorkville
Parkdale , Roncesvalles
Davisville
University of Toronto , Harbord
Runnymede , Swansea
Moore Park , Summerhill East
Kensington Market , Chinatown , Grange Park
Summerhill West , Rathnelly , South Hill , Forest Hill SE , Deer Park
CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport
Rosedale
Enclave of M5E
St. Jame

In [140]:
print(toronto_venues.shape)
toronto_venues

(1698, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park , Harbourfront",43.65512,-79.36264,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park , Harbourfront",43.65512,-79.36264,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park , Harbourfront",43.65512,-79.36264,Figs Breakfast & Lunch,43.655675,-79.364503,Breakfast Spot
3,"Regent Park , Harbourfront",43.65512,-79.36264,The Yoga Lounge,43.655515,-79.364955,Yoga Studio
4,"Regent Park , Harbourfront",43.65512,-79.36264,Body Blitz Spa East,43.654735,-79.359874,Spa
...,...,...,...,...,...,...,...
1693,Enclave of M4L,43.64869,-79.38544,Red Eye Espresso,43.651150,-79.390146,Café
1694,Enclave of M4L,43.64869,-79.38544,Condom Shack,43.650542,-79.388138,Hobby Shop
1695,Enclave of M4L,43.64869,-79.38544,Kanga,43.649955,-79.389352,Pie Shop
1696,Enclave of M4L,43.64869,-79.38544,Druxy's,43.648015,-79.379907,Deli / Bodega


In [141]:
#Group the row according to the neighborhood
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,62,62,62,62,62,62
"Brockton , Parkdale Village , Exhibition Place",85,85,85,85,85,85
"CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport",81,81,81,81,81,81
Central Bay Street,60,60,60,60,60,60
Christie,11,11,11,11,11,11
Church and Wellesley,80,80,80,80,80,80
"Commerce Court , Victoria Hotel",100,100,100,100,100,100
Davisville,25,25,25,25,25,25
Davisville North,8,8,8,8,8,8
"Dufferin , Dovercourt Village",17,17,17,17,17,17


In [142]:
#Check the number of unique categories
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 231 uniques categories.


##### We will group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [143]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Adult Boutique,Afghan Restaurant,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,...,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [144]:
#Group the rows by neighborhood
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Adult Boutique,Afghan Restaurant,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,...,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint
0,Berczy Park,0.016129,0.0,0.0,0.0,0.0,0.016129,0.0,0.016129,0.0,...,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0
1,"Brockton , Parkdale Village , Exhibition Place",0.023529,0.011765,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.011765,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.0
2,"CN Tower , King and Spadina , Railway Lands , ...",0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.012346,0.0,0.012346,0.0,0.0,0.0,0.0
3,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.016667,...,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.016667,0.016667,0.0
4,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Church and Wellesley,0.0125,0.0,0.0125,0.0125,0.0125,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,"Commerce Court , Victoria Hotel",0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,...,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0
7,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Dufferin , Dovercourt Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


##### We will analyse which are the top 10 venues in each neighborhood

In [145]:
#Reuse function from this course to obtain a defined number of top venues
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [146]:
#Apply the function to obtain top 10 venues of each neighborhood
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Cocktail Bar,Bakery,Seafood Restaurant,Farmers Market,Breakfast Spot,Restaurant,Pharmacy,Beer Bar,Cheese Shop
1,"Brockton , Parkdale Village , Exhibition Place",Bar,Café,Restaurant,Coffee Shop,Gift Shop,Sandwich Place,Bakery,Yoga Studio,Italian Restaurant,Lounge
2,"CN Tower , King and Spadina , Railway Lands , ...",Italian Restaurant,Coffee Shop,Café,French Restaurant,Gym / Fitness Center,Bar,Park,Bakery,Restaurant,Intersection
3,Central Bay Street,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Plaza,Bubble Tea Shop,Middle Eastern Restaurant,Pizza Place,Restaurant,Sandwich Place
4,Christie,Café,Grocery Store,Playground,Baby Store,Italian Restaurant,Athletics & Sports,Coffee Shop,Candy Store,Moroccan Restaurant,Nightclub
5,Church and Wellesley,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Restaurant,Gay Bar,Fast Food Restaurant,Dance Studio,Pub,Hotel,Café
6,"Commerce Court , Victoria Hotel",Coffee Shop,Restaurant,Hotel,Café,Italian Restaurant,Japanese Restaurant,Gym,American Restaurant,Seafood Restaurant,Gastropub
7,Davisville,Dessert Shop,Italian Restaurant,Coffee Shop,Café,Sandwich Place,Pizza Place,Gas Station,Restaurant,Farmers Market,Thai Restaurant
8,Davisville North,Breakfast Spot,Gym / Fitness Center,Park,Playground,Gym,Department Store,Food & Drink Shop,Hotel,Middle Eastern Restaurant,Music Venue
9,"Dufferin , Dovercourt Village",Park,Pharmacy,Bakery,Café,Bus Line,Smoke Shop,Brazilian Restaurant,Liquor Store,Bar,Bank


#### 5. Cluster the neighborhoods using K-means clustering

In [147]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 4, 1, 1,
       2, 1, 1, 1, 2, 3, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1])

In [149]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_data_short

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.65512,-79.36264,1,Coffee Shop,Breakfast Spot,Yoga Studio,Theater,Food Truck,Spa,Event Space,Restaurant,Sushi Restaurant,Electronics Store
1,M5B,Downtown Toronto,"Garden District, Ryerson",43.65739,-79.37804,1,Coffee Shop,Clothing Store,Café,Middle Eastern Restaurant,Cosmetics Shop,Japanese Restaurant,Hotel,Italian Restaurant,Bar,Sandwich Place
2,M5C,Downtown Toronto,St. James Town,43.65215,-79.37587,1,Coffee Shop,Cocktail Bar,Clothing Store,Italian Restaurant,Hotel,Cosmetics Shop,Restaurant,Café,Gastropub,Moroccan Restaurant
3,M4E,East Toronto,The Beaches,43.67709,-79.29547,1,Health Food Store,Trail,Pub,Music Venue,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant
4,M5E,Downtown Toronto,Berczy Park,43.64536,-79.37306,1,Coffee Shop,Cocktail Bar,Bakery,Seafood Restaurant,Farmers Market,Breakfast Spot,Restaurant,Pharmacy,Beer Bar,Cheese Shop
5,M5G,Downtown Toronto,Central Bay Street,43.65609,-79.38493,1,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Plaza,Bubble Tea Shop,Middle Eastern Restaurant,Pizza Place,Restaurant,Sandwich Place
6,M6G,Downtown Toronto,Christie,43.66869,-79.42071,1,Café,Grocery Store,Playground,Baby Store,Italian Restaurant,Athletics & Sports,Coffee Shop,Candy Store,Moroccan Restaurant,Nightclub
7,M5H,Downtown Toronto,"Richmond , Adelaide , King",43.6497,-79.38258,1,Hotel,Café,Coffee Shop,Restaurant,Gym,Asian Restaurant,Steakhouse,Japanese Restaurant,American Restaurant,Salad Place
8,M6H,West Toronto,"Dufferin , Dovercourt Village",43.66505,-79.43891,1,Park,Pharmacy,Bakery,Café,Bus Line,Smoke Shop,Brazilian Restaurant,Liquor Store,Bar,Bank
9,M4J,East York/East Toronto,The Danforth East,43.68811,-79.33418,0,Park,Home Service,Intersection,Yoga Studio,New American Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant


#### 6. Visualize the clusters on the map

In [150]:
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### 7. Examine cluster

##### Cluster 1

In [153]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,East York/East Toronto,0,Park,Home Service,Intersection,Yoga Studio,New American Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant
21,Central Toronto,0,Locksmith,Park,Yoga Studio,Music Venue,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant


##### Cluster 2

In [154]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,1,Coffee Shop,Breakfast Spot,Yoga Studio,Theater,Food Truck,Spa,Event Space,Restaurant,Sushi Restaurant,Electronics Store
1,Downtown Toronto,1,Coffee Shop,Clothing Store,Café,Middle Eastern Restaurant,Cosmetics Shop,Japanese Restaurant,Hotel,Italian Restaurant,Bar,Sandwich Place
2,Downtown Toronto,1,Coffee Shop,Cocktail Bar,Clothing Store,Italian Restaurant,Hotel,Cosmetics Shop,Restaurant,Café,Gastropub,Moroccan Restaurant
3,East Toronto,1,Health Food Store,Trail,Pub,Music Venue,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant
4,Downtown Toronto,1,Coffee Shop,Cocktail Bar,Bakery,Seafood Restaurant,Farmers Market,Breakfast Spot,Restaurant,Pharmacy,Beer Bar,Cheese Shop
5,Downtown Toronto,1,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Plaza,Bubble Tea Shop,Middle Eastern Restaurant,Pizza Place,Restaurant,Sandwich Place
6,Downtown Toronto,1,Café,Grocery Store,Playground,Baby Store,Italian Restaurant,Athletics & Sports,Coffee Shop,Candy Store,Moroccan Restaurant,Nightclub
7,Downtown Toronto,1,Hotel,Café,Coffee Shop,Restaurant,Gym,Asian Restaurant,Steakhouse,Japanese Restaurant,American Restaurant,Salad Place
8,West Toronto,1,Park,Pharmacy,Bakery,Café,Bus Line,Smoke Shop,Brazilian Restaurant,Liquor Store,Bar,Bank
10,Downtown Toronto,1,Coffee Shop,Hotel,Japanese Restaurant,Aquarium,Park,Boat or Ferry,Plaza,Vegetarian / Vegan Restaurant,Train Station,Sporting Goods Shop


##### Cluster 3

In [155]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,Central Toronto,2,Gym Pool,Playground,Park,Yoga Studio,New American Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant
33,Downtown Toronto,2,Gym / Fitness Center,Tennis Court,Playground,Park,Yoga Studio,Molecular Gastronomy Restaurant,Moving Target,Movie Theater,Moroccan Restaurant,Monument / Landmark


##### Cluster 4

In [156]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,Central Toronto,3,Home Service,Yoga Studio,New American Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark


##### Cluster 5

In [158]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,Central Toronto,4,Bus Line,Business Service,Swim School,Yoga Studio,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant


##### Analysis of the clusters

Based on the results above, it is seen that the clusters are segmented based on certain type of venue that are most common in that each neighborhood. Some observations can be made from each cluster's characteristics.

In **Cluster 1**, the most common few venues are Parks, essential services such as Locksmith, House service, and recreational area such as Yoga studio and music venue, with some restaurants nearby. This cluster seems to exhibit triats of a residential area. From the map, it can be observed that the location of the two neighborhoods in this cluster are located slightly further from the city center, which supports the traits of a residential area as well.

Majority of the neighborhood are classified under **Cluster 2**. In this cluster, the most common venues are coffee shops and a wide variaties of restaurants. There are also hotel, train station and entertainment facilities such as bar, gym, clothing store. These amenities are typical of a city center. From the map, it can also be oberved that the location of the neighborhoods in this cluster are indeed around the city center of Toronto. 

In **Cluster 3**, the neighborhoods in this area have the charateristics of being nearby to sports and activity amenities such as gym, tennis court, yoga studio, playground and parks.

In **Cluster 4**, there is a mixture of wide range of amenities in the neighborhood, ranging from essential service such as home service, to different restaurants, shops, and sport amenities such as yoga studio.

In **Cluster 5**, the main characteristics of the neighborhood in this cluster is the presence of bus line and business service, which are not present in other clusters. This cluster also has a mixture of amenities of covering sports, shops and restaurants.