# Exploring and clustering the neighborhoods in Toronto

## Introduction

The porpuse of this lab is to examine and clusterize de beighborhoods in Toronto. 


## Table of Contents

* 1. Scrape the Wikipedia page [https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M]
* 2. Pre-prossessing the data
* 3. Get the latitude and the longitude coordinates of each neighborhood
* 4. Explore and cluster the neighborhood in Toronto

### 1. Scraping the Wikipedia page

Importing necessary Libraries

In [1]:
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from bs4 import BeautifulSoup

import pandas as pd
import numpy as np

In [2]:
# Set the path to chromedriver
url = r"https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

option = Options()
option.headless = True
driver = webdriver.Chrome(r"C:\Users\ErikaS\Documents\Projetos Érika\Coursera__Capstone/chromedriver")

# Get the url
driver.get(url)

# Select the div correct and extrating the table in html format
driver.find_element_by_xpath("/html/body/div[3]/div[3]/div[4]/div/table[1]").click()
element = driver.find_element_by_xpath("/html/body/div[3]/div[3]/div[4]/div/table[1]")
html_content = element.get_attribute('outerHTML')

# Close the browser
driver.quit()


In [3]:
# Tranforming
soup = BeautifulSoup(html_content, 'html.parser')
table = soup.find(name = 'table')

# Using pandas to structure the table in a DataFrame
df_full = pd.read_html(str(table))[0]

# First 5 elements
df_full.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


### 2. Pre-prossessing the data

Importing necessary Libraries

To create the dataframe

* The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
* Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
* More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
* If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.
* Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
* In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.

In [4]:
# Getting the cells with a borough that is NOt Assigned
isnot_NotAssigned = df_full['Borough'] != "Not assigned"

# Filtering the DataFrame
df = df_full[isnot_NotAssigned]

# First 5 rows
df.head()


Unnamed: 0,Postal Code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


It was filtered to get only data with Vorough assigned

In [5]:
# Verifying if there are two neighborhoods with the same postal code
# In this case, first I will compare the rows number with the number of unique Postal Code. 
# If they werw the same, there aren't duplicated Postal Code, so there's no necessity to do something.
print("Row number: ", df.shape[0])
print("Number of unique Postal Code: ", pd.unique(df['Postal Code']).shape[0])
if df.shape[0] == pd.unique(df['Postal Code']).shape[0]:
    print('There is no Postal Code Area with more than one neighborhood associated')
else:
    print('There is at least one postal Code Area with more than one neighborhood associated. Please combine the neighborhoods in one line.')

Row number:  103
Number of unique Postal Code:  103
There is no Postal Code Area with more than one neighborhood associated


In this case, there's no need to do any modification in the df, because all the **Postal Code** are unique. The data already is in the correct format.

In [6]:
# Identifyng if there are any Not assigned neighborhood

if sum(df['Neighborhood'].isnull()) == 0 :
    print("There is no NaN in Neigborhood")
else:
    print("There is at least one Neigborhood Not assigned. Please, assigned the borough in the neighborhood place.")


There is no NaN in Neigborhood


We can see that there is no NaN in Neigborhood, so we don't need to assigned the borough in the neighborhood place

In [7]:
# Printing the number of rows
df.shape

(103, 3)

### 3. Getting the coordinates of each neighborhood

In [8]:
# Importing the coordinates dataframe
coord_PC = pd.read_csv('Geospatial_Coordinates.csv')
#coord_PC.head()
coord_PC.dtypes

Postal Code     object
Latitude       float64
Longitude      float64
dtype: object

In [9]:
# Joing the dataframes
df = pd.merge(df, 
                 coord_PC,
                 left_on = 'Postal Code', 
                 right_on = 'Postal Code',
                 how = 'left')

# First 5 rows
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [10]:
df.shape

(103, 5)

### 4. Exploring and clustering the neighborhood in Toronto

Just make sure:

* to add enough Markdown cells to explain what you decided to do and to report any observations you make.
* to generate maps to visualize your neighborhoods and how they cluster together.

Counting how many borough of each are in it

In [11]:
df['Borough'].value_counts()

North York          24
Downtown Toronto    19
Scarborough         17
Etobicoke           12
Central Toronto      9
West Toronto         6
East York            5
East Toronto         5
York                 5
Mississauga          1
Name: Borough, dtype: int64

Filtering the boroughs that contain the word Toronto

In [12]:
in_To = df['Borough'].str.contains(pat = "Toronto")
df_Toronto = df[in_To]
df_Toronto['Borough'].value_counts()

Downtown Toronto    19
Central Toronto      9
West Toronto         6
East Toronto         5
Name: Borough, dtype: int64

Importing Libraries

In [13]:
# for plot
import matplotlib.cm as cm
import matplotlib.colors as colors

# for clustering
from sklearn.cluster import KMeans

# for create a map
import folium

# Convert address into coordinates
from geopy.geocoders import Nominatim

# Library to handle JSON files
import json

# Library to handle requests
import requests
from pandas.io.json import json_normalize

**Using geopy library to get the latitude and longitude values of Toronto**

In [14]:

geolocator = Nominatim(user_agent = 'my_explorer', timeout = 3)
location = geolocator.geocode('toronto')
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [15]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_Toronto['Latitude'], 
                                           df_Toronto['Longitude'], df_Toronto['Borough'], df_Toronto['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

Since Folium was used to generate the map, it was verified that the map does not appear on Github. So the map will be willing as follow.

![Selected Neighborhoods](Neighborhoods_selected.png)

Defining Foursquare Credentials and Version

In [16]:
CLIENT_ID = 'DJRZXO3LNK3CFTC45R4Y100OKQLV010RFO455LMWYUQBWDKN' # your Foursquare ID
CLIENT_SECRET = 'U2D1AD452QTRPENFVKJCYQW15BVSJGLKAXYZ4VNTSTRPF4ZW' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version



Defining a function to explore de venues in all the neighborhoods selected in Toronto

In [17]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Runing the above function on each neighborhood, with limit = 50 and radius = 500

In [18]:
LIMIT = 50
radius = 500

toronto_venues = getNearbyVenues(names=df_Toronto['Neighborhood'],
                                   latitudes=df_Toronto['Latitude'],
                                   longitudes=df_Toronto['Longitude']
                                  )

In [19]:
print("Sixe of the resulting dataframe: ", toronto_venues.shape)
toronto_venues.head()

Sixe of the resulting dataframe:  (1190, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot
3,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
4,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa


Let's check how many venues returned for each neighborhood

In [84]:
toronto_venues.groupby('Neighborhood').count().head()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,50,50,50,50,50,50
"Brockton, Parkdale Village, Exhibition Place",23,23,23,23,23,23
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",17,17,17,17,17,17
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",15,15,15,15,15,15
Central Bay Street,50,50,50,50,50,50


Finding out how many unique categories can be curated from all the returned venues

In [21]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 224 uniques categories.


Analyzing each Neighborhood

In [22]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Toy / Game Store,Trail,Train Station,Transportation Service,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Examining the new dataframe size

In [23]:
toronto_onehot.shape

(1190, 224)

Grouping rows by neighborhood and taking the mean of the frequency of occurence of each categgory

In [85]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighborhood,Yoga Studio,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Toy / Game Store,Trail,Train Station,Transportation Service,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Business reply mail Processing Centre, South C...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.066667,0.066667,0.066667,0.133333,0.2,0.066667,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0


Confirming the new size

In [25]:
toronto_grouped.shape

(39, 224)

Creating a new dataframe and displaying the top 10 venues for each neighborshood

In [26]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

#Function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]


# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
toronto_venues_sorted = pd.DataFrame(columns=columns)
toronto_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    toronto_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

toronto_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Cocktail Bar,Restaurant,Cheese Shop,Seafood Restaurant,Beer Bar,Bakery,Café,Basketball Stadium,Park
1,"Brockton, Parkdale Village, Exhibition Place",Café,Coffee Shop,Nightclub,Breakfast Spot,Stadium,Bakery,Restaurant,Intersection,Italian Restaurant,Climbing Gym
2,"Business reply mail Processing Centre, South C...",Light Rail Station,Smoke Shop,Auto Workshop,Brewery,Spa,Burrito Place,Farmers Market,Fast Food Restaurant,Restaurant,Butcher
3,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Lounge,Boat or Ferry,Airport,Airport Food Court,Airport Gate,Airport Terminal,Sculpture Garden,Bar,Coffee Shop
4,Central Bay Street,Coffee Shop,Sandwich Place,Italian Restaurant,Burger Joint,Bubble Tea Shop,Café,Ice Cream Shop,Indian Restaurant,Bar,Spa


### Cluster Neighborhoods

Running k-means to cluster the neighborhood into 5 clusters.

In [27]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Creating a new dataframe that inlcudes the cluster as well as the top 10 venues for each neighborhood

In [28]:
# add clustering labels
toronto_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df_Toronto

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(toronto_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,0,Coffee Shop,Park,Bakery,Pub,Breakfast Spot,Restaurant,Café,Theater,Mexican Restaurant,Shoe Store
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0,Coffee Shop,Wings Joint,Creperie,Park,Mexican Restaurant,Italian Restaurant,Hobby Shop,Gym,Fried Chicken Joint,Distribution Center
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,0,Café,Middle Eastern Restaurant,Ramen Restaurant,Cosmetics Shop,Restaurant,Bookstore,Coffee Shop,Theater,Clothing Store,Tea Room
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Café,Coffee Shop,Gastropub,Cosmetics Shop,Creperie,American Restaurant,Cocktail Bar,Restaurant,Beer Bar,Hotel
19,M4E,East Toronto,The Beaches,43.676357,-79.293031,2,Trail,Pub,Health Food Store,Wings Joint,Cupcake Shop,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center


Visualizing the resulting clusters

In [29]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], 
                                  toronto_merged['Longitude'], 
                                  toronto_merged['Neighborhood'], 
                                  toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color="#070000",
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Since Folium was used to generate the map, it was verified that the map does not appear on Github. So the map will be willing as follow.

![Clustered Neighborhoods](Neighborhoods_clustered.png)

### Examining Clusters

#### Cluster 0

In [30]:
clus0 = toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
clus0.head()

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Downtown Toronto,0,Coffee Shop,Park,Bakery,Pub,Breakfast Spot,Restaurant,Café,Theater,Mexican Restaurant,Shoe Store
4,Downtown Toronto,0,Coffee Shop,Wings Joint,Creperie,Park,Mexican Restaurant,Italian Restaurant,Hobby Shop,Gym,Fried Chicken Joint,Distribution Center
9,Downtown Toronto,0,Café,Middle Eastern Restaurant,Ramen Restaurant,Cosmetics Shop,Restaurant,Bookstore,Coffee Shop,Theater,Clothing Store,Tea Room
15,Downtown Toronto,0,Café,Coffee Shop,Gastropub,Cosmetics Shop,Creperie,American Restaurant,Cocktail Bar,Restaurant,Beer Bar,Hotel
20,Downtown Toronto,0,Coffee Shop,Cocktail Bar,Restaurant,Cheese Shop,Seafood Restaurant,Beer Bar,Bakery,Café,Basketball Stadium,Park


In [31]:
# How many neighborhood there are in this cluster
print("Number of neighborhoods in Cluster 0: ",clus0.shape[0])

Number of neighborhoods in Cluster 0:  33


In [76]:
a = []
for i in range(2,12):
    a = a + list(clus0[clus0.columns[i]])

counter = np.unique(a, return_counts = True)
pd.DataFrame({'Venues': counter[0], 'Freq' : counter[1]} ).sort_values(by = ['Freq'], ascending = False).head(10)


Unnamed: 0,Venues,Freq
38,Coffee Shop,26
29,Café,24
100,Restaurant,20
76,Italian Restaurant,11
15,Bakery,11
90,Park,8
17,Bar,7
71,Hotel,7
103,Seafood Restaurant,6
66,Gym,6


The cluster0 is compose for 33 neighborhoods and there are, predominantly, Coffee Shop and restaurants.

#### Cluster 1

In [71]:
clus1 = toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
clus1.head()

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
62,Central Toronto,1,Pool,Garden,Wings Joint,Cupcake Shop,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store


In [72]:
# How many neighborhood there are in this cluster
print("Number of neighborhoods in Cluster 1: ",clus1.shape[0])

Number of neighborhoods in Cluster 1:  1


The cluster1 is compose for 1 neighborhood and there is places to relax and spend a good time.

#### Cluster 2

In [73]:
clus2 = toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
clus2.head()

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,East Toronto,2,Trail,Pub,Health Food Store,Wings Joint,Cupcake Shop,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center
83,Central Toronto,2,Park,Playground,Summer Camp,Restaurant,College Rec Center,Cupcake Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store
91,Downtown Toronto,2,Park,Playground,Trail,Cupcake Shop,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store


In [74]:
# How many neighborhood there are in this cluster
print("Number of neighborhoods in Cluster 2: ",clus2.shape[0])

Number of neighborhoods in Cluster 2:  3


In [75]:
a = []
for i in range(2,12):
    a = a + list(clus2[clus2.columns[i]])

counter = np.unique(a, return_counts = True)
pd.DataFrame({'Venues': counter[0], 'Freq' : counter[1]} ).sort_values(by = ['Freq'], ascending = False).head(10)


Unnamed: 0,Venues,Freq
1,Cupcake Shop,3
3,Distribution Center,3
4,Dog Run,3
5,Doner Restaurant,3
2,Discount Store,2
6,Donut Shop,2
7,Dumpling Restaurant,2
9,Park,2
10,Playground,2
14,Trail,2


The cluster2 is compose for 3 neighborhoods and there are Distribution Centers, restaurants and places to spend a good time

#### Cluster 3

In [80]:
clus3 = toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
clus3.head()

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
68,Central Toronto,3,Jewelry Store,Trail,Mexican Restaurant,Sushi Restaurant,Wings Joint,Deli / Bodega,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run


In [81]:
# How many neighborhood there are in this cluster
print("Number of neighborhoods in Cluster 3: ",clus3.shape[0])

Number of neighborhoods in Cluster 3:  1


The cluster3 is compose for 1 neighborhood and there is some typical restaurants and a Jewelry Store

### Cluster 4

In [82]:
clus4 = toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
clus4.head()

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
61,Central Toronto,4,Park,Swim School,Bus Line,Wings Joint,Dance Studio,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center


In [83]:
# How many neighborhood there are in this cluster
print("Number of neighborhoods in Cluster 4: ",clus4.shape[0])

Number of neighborhoods in Cluster 4:  1


The cluster4 is compose for 1 neighborhood and there is a Bus Line, some restaurants, a Swim School and others.

## Thank you