# Week 3 Assignment: Segmenting and Clustering in the City of Toronto, Canada
## Huanglei Pan

## Part 1 :  Scraping and parsing the data

In [14]:
# Import the packages
import pandas as pd

### 1-1 Scrape the Dataframe from Wikipedia Page and create pandas dataframe

In [15]:
# Read the table from Wikipedia
table = pd.read_html("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")[0]

# Convert the table to pandas dataframe
df= pd.DataFrame(table)
df.head(12)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor
7,M7A,Downtown Toronto,Queen's Park
8,M8A,Not assigned,Not assigned
9,M9A,Queen's Park,Not assigned


### 1-2 Deal with missing data and aggregate data

In [16]:
#Drop the rows with Borughs that are not assigned
df=df[df.Borough != 'Not assigned']
df.head(12)

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor
7,M7A,Downtown Toronto,Queen's Park
9,M9A,Queen's Park,Not assigned
10,M1B,Scarborough,Rouge
11,M1B,Scarborough,Malvern
13,M3B,North York,Don Mills North


In [17]:
# Replace the not assigned neighbourhoods by their borough names
df['Neighbourhood']=df['Neighbourhood'].replace('Not assigned', df['Borough'])
df.head(12)

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor
7,M7A,Downtown Toronto,Queen's Park
9,M9A,Queen's Park,Queen's Park
10,M1B,Scarborough,Rouge
11,M1B,Scarborough,Malvern
13,M3B,North York,Don Mills North


In [18]:
#Aggregate the neighbourhoods with some borough
df_toronto=df.groupby(['Postcode','Borough'], sort = False).agg(lambda x: ','.join(x))
df_toronto.reset_index(level=['Postcode','Borough'], inplace=True)
df_toronto.head(12)
        

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Heights,Lawrence Manor"
4,M7A,Downtown Toronto,Queen's Park
5,M9A,Queen's Park,Queen's Park
6,M1B,Scarborough,"Rouge,Malvern"
7,M3B,North York,Don Mills North
8,M4B,East York,"Woodbine Gardens,Parkview Hill"
9,M5B,Downtown Toronto,"Ryerson,Garden District"


### 1-3 Check the shape of dataframe

In [19]:
# See the shape of the final dataframe
print("There are " + str(df_toronto.shape[0]) + " rows in the dataframe")

There are 103 rows in the dataframe


## Part 2 :  Get the Coordinates for the Postcodes

In [20]:
# Install and import geocoder package
#!conda install -c conda-forge geocoder --yes
import geocoder

In [21]:
# Get the coordinates of the postcodes
latitude=[]
longitude=[]
for postcode in df_toronto.Postcode:
    lat_lng_coords = None

    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Toronto, Ontario'.format(postcode)) #geocoder.google is unavailble, so I use geocoder.arcgis
        lat_lng_coords = g.latlng

    latitude.append(lat_lng_coords[0])
    longitude.append(lat_lng_coords[1])

In [22]:
# Add the coordinates to the dataframe
df_toronto['Latitude'] = latitude
df_toronto['Longitude'] = longitude
df_toronto.head(12)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.75242,-79.329242
1,M4A,North York,Victoria Village,43.7306,-79.313265
2,M5A,Downtown Toronto,Harbourfront,43.650295,-79.359166
3,M6A,North York,"Lawrence Heights,Lawrence Manor",43.72327,-79.451286
4,M7A,Downtown Toronto,Queen's Park,43.66115,-79.391715
5,M9A,Queen's Park,Queen's Park,43.662299,-79.528195
6,M1B,Scarborough,"Rouge,Malvern",43.811525,-79.195517
7,M3B,North York,Don Mills North,43.749055,-79.362227
8,M4B,East York,"Woodbine Gardens,Parkview Hill",43.707535,-79.311773
9,M5B,Downtown Toronto,"Ryerson,Garden District",43.657363,-79.37818


## Part 3 :  Segmenting and Clustering the Neibourhoods

In [23]:
# Import pakages
from geopy.geocoders import Nominatim
import requests 
from pandas.io.json import json_normalize
import numpy as np

In [24]:
# Install and import folium libary
!conda install -c conda-forge folium=0.5.0 --yes 
import folium 

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    altair-4.0.1               |             py_0         575 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         673 KB

The following NEW packages will be INSTALLED:

    altair:  4.0.1-py_0 conda-forge
    branca:  0.3.1-py_0 conda-forge
    folium:  0.5.0-py_0 conda-forge
    vincent: 0.4.4-py_1 conda-forge


Downloading and Extracting Packages
vincent-0.4.4        | 28 KB     | #####

### 3-1 Mapping Toronto Neighbourhoods

In [27]:
#Use geopy library to get the latitude and longitude values of Toronto
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [28]:
# Create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# Add markers to map
for lat, lng, label in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

I would use the data with boroughs containing the word "Toronto"

In [29]:
# Filter boroughs containing the word "Toronto"
data_toronto=df_toronto[df_toronto['Borough'].str.contains('Toronto')]
data_toronto.reset_index(drop = True)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,Harbourfront,43.650295,-79.359166
1,M7A,Downtown Toronto,Queen's Park,43.66115,-79.391715
2,M5B,Downtown Toronto,"Ryerson,Garden District",43.657363,-79.37818
3,M5C,Downtown Toronto,St. James Town,43.65121,-79.375481
4,M4E,East Toronto,The Beaches,43.676531,-79.295425
5,M5E,Downtown Toronto,Berczy Park,43.64516,-79.373675
6,M5G,Downtown Toronto,Central Bay Street,43.656091,-79.38493
7,M6G,Downtown Toronto,Christie,43.668781,-79.42071
8,M5H,Downtown Toronto,"Adelaide,King,Richmond",43.6497,-79.382582
9,M6H,West Toronto,"Dovercourt Village,Dufferin",43.665087,-79.438705


In [30]:
# Create map of Boroughs containing word "Toronto" using latitude and longitude values

# Add markers to old map
for lat, lng, label in zip(data_toronto['Latitude'], data_toronto['Longitude'], data_toronto['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#ff8080',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### 3-2 Explore Neighbourhoods in Toronto by Foursquare

In [31]:
# Define Foursquare Credentials and Version

CLIENT_ID = '3BQYKQ5LBBRYVOCSMWM40YHSWB21RBI05S0JQ2LJ4CFMCFPU' # your Foursquare ID
CLIENT_SECRET = 'AHI5WFW2P1TWQ5Q5ASROJJCZJ2501OQPUWQ5RLLFOLSU1KT4' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 50

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 3BQYKQ5LBBRYVOCSMWM40YHSWB21RBI05S0JQ2LJ4CFMCFPU
CLIENT_SECRET:AHI5WFW2P1TWQ5Q5ASROJJCZJ2501OQPUWQ5RLLFOLSU1KT4


In [32]:
# Create a function to repeat the same process to all the neighborhoods in Toronto

def getNearbyVenues(names, latitudes, longitudes, radius=300):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]["groups"][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [33]:
# Fit the function on data_toronto
toronto_venues = getNearbyVenues(names=data_toronto['Neighbourhood'],
                                   latitudes=data_toronto['Latitude'],
                                   longitudes=data_toronto['Longitude']
                                  )

Harbourfront
Queen's Park
Ryerson,Garden District
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Adelaide,King,Richmond
Dovercourt Village,Dufferin
Harbourfront East,Toronto Islands,Union Station
Little Portugal,Trinity
The Danforth West,Riverdale
Design Exchange,Toronto Dominion Centre
Brockton,Exhibition Place,Parkdale Village
The Beaches West,India Bazaar
Commerce Court,Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North,Forest Hill West
High Park,The Junction South
North Toronto West
The Annex,North Midtown,Yorkville
Parkdale,Roncesvalles
Davisville
Harbord,University of Toronto
Runnymede,Swansea
Moore Park,Summerhill East
Chinatown,Grange Park,Kensington Market
Deer Park,Forest Hill SE,Rathnelly,South Hill,Summerhill West
CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara
Rosedale
Stn A PO Boxes 25 The Esplanade
Cabbagetown,St. James Town
First Canadian Place,Underground city

In [34]:
# Check the size of results
print(toronto_venues.shape)
toronto_venues.head()

(839, 7)


Unnamed: 0,Neighborhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Harbourfront,43.650295,-79.359166,The Distillery Historic District,43.650244,-79.359323,Historic Site
1,Harbourfront,43.650295,-79.359166,Arvo,43.649963,-79.361442,Coffee Shop
2,Harbourfront,43.650295,-79.359166,SOMA chocolatemaker,43.650622,-79.358127,Chocolate Shop
3,Harbourfront,43.650295,-79.359166,Distillery Sunday Market,43.650075,-79.361832,Farmers Market
4,Harbourfront,43.650295,-79.359166,Cacao 70,43.650067,-79.360723,Dessert Shop


In [35]:
# Check how many venues were returned for each neighbourhood
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide,King,Richmond",50,50,50,50,50,50
Berczy Park,7,7,7,7,7,7
"Brockton,Exhibition Place,Parkdale Village",40,40,40,40,40,40
Business Reply Mail Processing Centre 969 Eastern,50,50,50,50,50,50
"CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara",16,16,16,16,16,16
"Cabbagetown,St. James Town",18,18,18,18,18,18
Central Bay Street,28,28,28,28,28,28
"Chinatown,Grange Park,Kensington Market",31,31,31,31,31,31
Christie,3,3,3,3,3,3
Church and Wellesley,39,39,39,39,39,39


In [36]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 185 uniques categories.


### 3-3 Analyze Each Neighborhood

In [37]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Adult Boutique,American Restaurant,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bakery,Bank,...,Thrift / Vintage Store,Tourist Information Center,Toy / Game Store,Trail,Tram Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [38]:
toronto_onehot.shape

(839, 185)

In [39]:
# Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Adult Boutique,American Restaurant,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bakery,...,Thrift / Vintage Store,Tourist Information Center,Toy / Game Store,Trail,Tram Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop
0,"Adelaide,King,Richmond",0.0,0.0,0.04,0.0,0.0,0.0,0.06,0.0,0.02,...,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Brockton,Exhibition Place,Parkdale Village",0.05,0.0,0.0,0.0,0.0,0.0,0.025,0.025,0.025,...,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.025,0.0,0.0
3,Business Reply Mail Processing Centre 969 Eastern,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0
4,"CN Tower,Bathurst Quay,Island airport,Harbourf...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Cabbagetown,St. James Town",0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.055556,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Central Bay Street,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.071429,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Chinatown,Grange Park,Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.129032,0.0,0.0
8,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Church and Wellesley,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641


In [40]:
toronto_grouped.shape

(36, 185)

In [41]:
# Print each neighborhood along with the top 5 most common venues

num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide,King,Richmond----
              venue  freq
0       Coffee Shop  0.10
1              Café  0.06
2        Steakhouse  0.06
3  Asian Restaurant  0.06
4               Gym  0.04


----Berczy Park----
                venue  freq
0          Steakhouse  0.14
1  Italian Restaurant  0.14
2        Concert Hall  0.14
3      Breakfast Spot  0.14
4        Liquor Store  0.14


----Brockton,Exhibition Place,Parkdale Village----
                    venue  freq
0             Coffee Shop  0.10
1          Sandwich Place  0.08
2             Yoga Studio  0.05
3          Breakfast Spot  0.05
4  Furniture / Home Store  0.05


----Business Reply Mail Processing Centre 969 Eastern----
                 venue  freq
0           Steakhouse  0.08
1     Sushi Restaurant  0.06
2                 Café  0.06
3  Japanese Restaurant  0.04
4                  Bar  0.04


----CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara----
         venue  freq
0         Ca

In [42]:
# Create function to sort the venues in descending order

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [43]:
# Create the new dataframe and display the top 10 venues for each neighborhood.

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",Coffee Shop,Asian Restaurant,Café,Steakhouse,Gym,Bar,Seafood Restaurant,Gastropub,American Restaurant,Salon / Barbershop
1,Berczy Park,Department Store,Concert Hall,Steakhouse,Liquor Store,Italian Restaurant,Breakfast Spot,Beer Bar,Wine Shop,Ethiopian Restaurant,Fish Market
2,"Brockton,Exhibition Place,Parkdale Village",Coffee Shop,Sandwich Place,Yoga Studio,Breakfast Spot,Nightclub,Burrito Place,Italian Restaurant,Furniture / Home Store,Gym,Café
3,Business Reply Mail Processing Centre 969 Eastern,Steakhouse,Sushi Restaurant,Café,Lounge,Bar,Hotel,Gastropub,Concert Hall,Pizza Place,Coffee Shop
4,"CN Tower,Bathurst Quay,Island airport,Harbourf...",Café,Coffee Shop,Hotel,Restaurant,Ramen Restaurant,Pub,Donut Shop,Sports Club,Diner,Market


In [44]:
neighborhoods_venues_sorted.shape

(36, 11)

### 3-4 Clustering Neighborhoods

In [45]:
# Import Kmeans libary
from sklearn.cluster import KMeans

Using KMeans to cluster the neigborhoods

In [72]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_ 

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 2, 0, 0, 0, 0,
       3, 0, 0, 3, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

Since I ignored the difference of English and American spelling of "Neighbo(u)rhood" in the very beginning, now I uniform the spelling to "Neighborhood" for data frames

In [47]:
# Revise the column name  'Neighbourhood" to "Neighborhood" in data_toronto
new_columns =list(data_toronto.columns)
new_columns[2]='Neighborhood'
data_toronto.columns=new_columns

In [48]:
# Add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = data_toronto


# Aerge toronto_grouped with data_toronto to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() 

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,M5A,Downtown Toronto,Harbourfront,43.650295,-79.359166,0.0,Bakery,Café,Art Gallery,Coffee Shop,Historic Site,Seafood Restaurant,Restaurant,Pub,Chocolate Shop,Clothing Store
4,M7A,Downtown Toronto,Queen's Park,43.66115,-79.391715,0.0,Coffee Shop,College Auditorium,Fried Chicken Joint,Sandwich Place,Salad Place,Bar,Metro Station,Café,Donut Shop,Fish & Chips Shop
9,M5B,Downtown Toronto,"Ryerson,Garden District",43.657363,-79.37818,0.0,Coffee Shop,Café,Sandwich Place,Movie Theater,Middle Eastern Restaurant,Hookah Bar,Burger Joint,Burrito Place,Ice Cream Shop,Electronics Store
15,M5C,Downtown Toronto,St. James Town,43.65121,-79.375481,0.0,Restaurant,Coffee Shop,Gastropub,BBQ Joint,Hotel,Indian Restaurant,Church,Middle Eastern Restaurant,Café,Food Truck
19,M4E,East Toronto,The Beaches,43.676531,-79.295425,0.0,Trail,Wine Shop,Dumpling Restaurant,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant


In [49]:
toronto_merged.shape

(39, 16)

In [50]:
# Drop rows with NaN
toronto_merged.dropna(subset=['Cluster Labels'],inplace=True)

In [51]:
toronto_merged.reset_index(drop=True)
toronto_merged['Cluster Labels'] = toronto_merged['Cluster Labels'].astype('int32')

### 3-5 Visualize the resulting clusters

In [52]:
# Import Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

In [53]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### 3-6 Observations for the clustering result

In [59]:
# Seperate each cluster results to see their common venues

createVar = locals()
for i in range(5):    
    createVar['Class_'+str(i)] =  toronto_merged[toronto_merged['Cluster Labels']==i]

In [60]:
Class_0

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,M5A,Downtown Toronto,Harbourfront,43.650295,-79.359166,0,Bakery,Café,Art Gallery,Coffee Shop,Historic Site,Seafood Restaurant,Restaurant,Pub,Chocolate Shop,Clothing Store
4,M7A,Downtown Toronto,Queen's Park,43.66115,-79.391715,0,Coffee Shop,College Auditorium,Fried Chicken Joint,Sandwich Place,Salad Place,Bar,Metro Station,Café,Donut Shop,Fish & Chips Shop
9,M5B,Downtown Toronto,"Ryerson,Garden District",43.657363,-79.37818,0,Coffee Shop,Café,Sandwich Place,Movie Theater,Middle Eastern Restaurant,Hookah Bar,Burger Joint,Burrito Place,Ice Cream Shop,Electronics Store
15,M5C,Downtown Toronto,St. James Town,43.65121,-79.375481,0,Restaurant,Coffee Shop,Gastropub,BBQ Joint,Hotel,Indian Restaurant,Church,Middle Eastern Restaurant,Café,Food Truck
19,M4E,East Toronto,The Beaches,43.676531,-79.295425,0,Trail,Wine Shop,Dumpling Restaurant,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant
20,M5E,Downtown Toronto,Berczy Park,43.64516,-79.373675,0,Department Store,Concert Hall,Steakhouse,Liquor Store,Italian Restaurant,Breakfast Spot,Beer Bar,Wine Shop,Ethiopian Restaurant,Fish Market
24,M5G,Downtown Toronto,Central Bay Street,43.656091,-79.38493,0,Coffee Shop,Ice Cream Shop,Bakery,Japanese Restaurant,Chinese Restaurant,Poke Place,Bookstore,Spa,Bubble Tea Shop,American Restaurant
25,M6G,Downtown Toronto,Christie,43.668781,-79.42071,0,Café,Grocery Store,Wine Shop,Eastern European Restaurant,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space
30,M5H,Downtown Toronto,"Adelaide,King,Richmond",43.6497,-79.382582,0,Coffee Shop,Asian Restaurant,Café,Steakhouse,Gym,Bar,Seafood Restaurant,Gastropub,American Restaurant,Salon / Barbershop
31,M6H,West Toronto,"Dovercourt Village,Dufferin",43.665087,-79.438705,0,Bus Line,Convenience Store,Skating Rink,Park,Brazilian Restaurant,Music Venue,Electronics Store,Fish & Chips Shop,Fast Food Restaurant,Farmers Market


In [61]:
Class_1

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
62,M5N,Central Toronto,Roselawn,43.711941,-79.41912,1,Home Service,Eastern European Restaurant,Flower Shop,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant


In [62]:
Class_2

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
68,M5P,Central Toronto,"Forest Hill North,Forest Hill West",43.694785,-79.414405,2,Business Service,Wine Shop,Electronics Store,Flower Shop,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space


In [63]:
Class_3

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
83,M4T,Central Toronto,"Moore Park,Summerhill East",43.690685,-79.382946,3,Tennis Court,Wine Shop,Eastern European Restaurant,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant
91,M4W,Downtown Toronto,Rosedale,43.682205,-79.377945,3,Playground,Tennis Court,Food,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant


In [64]:
Class_4

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
67,M4P,Central Toronto,Davisville North,43.712755,-79.388514,4,Breakfast Spot,Wine Shop,Electronics Store,Flower Shop,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space


### Report

From the clustering result we can find:

1. Class 0 contains 31/36 (86.1%) of the total neigborhoods, these neigborhood have common venues that are related to food and drinks, and almost contain
     Café, coffee shop and restaurants in top 5 common venues.
     
2. There are only one neigborhood in Class 1, 2 and 4: Class 1 with home sevice in the fist top common venue; Class 2 with business sevice in the first top common venue;
    Class 3 with venues related to food/drinks in the first two places, but in the 3rd-5th places are electronics strore, flower shop and fish market.
    
3. There are two neigborhhoods in Class 3, they have venues related to sports in the first common venue.