# Week 3 assignment.

This is a notebook for Data science Capstone Project assignment for week 3. This notebook includes the following

1. Markdown cell explaining the contents of the notebook
2. Loads data from the wiki page, cleans the data and converts it into a pandas datafram
3. Prints the shape of the data frame


#### Install Required Libraries

In [58]:
#!pip3 install requests
print("Requests installed...")
#!pip3 install BeautifulSoup4
print("BeautifulSoup installed...")
#!pip3 install geopy
print("Geopy installed...")


Requests installed...
BeautifulSoup installed...
Geopy installed...


#### import required Libraries

In [59]:
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows',None)
from geopy.geocoders import Nominatim
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium
import requests
from bs4 import BeautifulSoup
import numpy as np
from pandas.io.json import json_normalize

#### The code below uses requests library to get the content from the wikipedia page and assigns it to variable response. The status code of 200 indicates that the data was successfully received from the wikipedia page.

In [60]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
response = requests.get(url)
response.status_code

200

#### Create a helper method to parse the table and generate a data frame

In [61]:
def parse_table(table):
    n_cols = 0
    n_rows = 0
    col_names = []
    #Determine number of rows and columns in table
    for row in table.find_all('tr'):
        # Determine the number of rows in table
        td_tags = row.find_all('td')
        if len(td_tags) > 0:
            n_rows += 1
            if n_cols == 0:
                # set the number of columns for table
                n_cols = len(td_tags)
        # Determine column names
        th_tags = row.find_all('th')
        if len(th_tags) > 0 and len(col_names) == 0:
            for th in th_tags:
                col_names.append(th.get_text().strip())
    # Create data frame from the table.
    columns = col_names if len(col_names) > 0 else range(0,n_cols)
    df = pd.DataFrame(columns = columns, index = range(0,n_rows))
    row_marker = 0
    for row in table.find_all('tr'):
        col_marker = 0
        columns = row.find_all('td')
        for column in columns:
            df.iat[row_marker,col_marker] = column.get_text().strip().replace("/",",")
            col_marker +=1
        if len(columns) > 0:
            row_marker +=1
    return df

#### The code below extracts the table from the entire web page data received in response above and calls the helper method to generate a data frame
#### In addition, it filters out the postal code for which the Borough is Not assigned.

In [62]:
soup = BeautifulSoup(response.text,'html')
tables = soup.find('table', {'class':'wikitable'})
table_df = parse_table(tables)
table_df = table_df[table_df.Borough != 'Not assigned']
table_df.head(103)


Unnamed: 0,Postal code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park , Harbourfront"
5,M6A,North York,"Lawrence Manor , Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park , Ontario Provincial Government"
8,M9A,Etobicoke,Islington Avenue
9,M1B,Scarborough,"Malvern , Rouge"
11,M3B,North York,Don Mills
12,M4B,East York,"Parkview Hill , Woodbine Gardens"
13,M5B,Downtown Toronto,"Garden District, Ryerson"


#### Print the shape of the data fram after clean up. This answers the first question in the assignment

In [63]:
table_df.shape

(103, 3)

#### Using a csv file to download the coordinates and create a merged data frame. This answers the second questions in the assignment

In [64]:
geo_df = pd.read_csv('Geospatial_Coordinates.csv')
geo_df = geo_df.rename(columns = {"Postal Code": "Postal code"})
geo_df.head()
final_table_df = pd.merge(table_df,geo_df, on = "Postal code")
final_table_df.head()

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor , Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park , Ontario Provincial Government",43.662301,-79.389494


In [65]:
print ('The dataframe has {} boroughs and {} neighborhoods'.format(len(final_table_df['Borough'].unique()), final_table_df.shape[0]))

The dataframe has 10 boroughs and 103 neighborhoods


#### Get the coordinates of Toronto area.


In [103]:
address = 'Toronto, ON'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinate of Toronto are {},{}'.format(latitude,longitude))

The geographical coordinate of Toronto are 43.6534817,-79.3839347


#### Create a map of Toronto with super imposed neighborhoods marked in circle

In [104]:
map_toronto = folium.Map(location=[latitude,longitude],zoom_start=10)
for lat,lng,borough, neighborhood in zip(final_table_df['Latitude'], final_table_df['Longitude'],final_table_df['Borough'], final_table_df['Neighborhood']):
    label = '{},{}'.format(neighborhood,borough)
    label = folium.Popup(label,parse_html=True)
    folium.CircleMarker([lat,lng],radius=5,popup=label,color='blue',fill=True,fill_color='#3286bb',fill_opacity=0.7,parse_html=False).add_to(map_toronto)

map_toronto

#### Since some of the postal codes had multiple Neighborhoods, I decided to explore each neighborhood in each of the postal code. In order to do that I had to create a new dataframe that would have unique Neighborhoods for each of the postal code.

The code below generates a new data frame that has duplicate postal codes but unique neighborhood for each of the postal code.

In [219]:
downtown_data = final_table_df[final_table_df['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
downtown_data.head()
downtown_data_temp = pd.DataFrame(downtown_data.Neighborhood.str.split(',').tolist(),index=downtown_data['Postal code']).stack()
downtown_data_temp = downtown_data_temp.reset_index([0,'Postal code'])
downtown_data_temp.columns=['Postal code', 'Neighborhood']
final_table_df_temp = pd.merge(downtown_data_temp,downtown_data, on = "Postal code")
downtown_data_temp
del final_table_df_temp['Neighborhood_y']
final_table_df_temp = final_table_df_temp.rename(columns={'Neighborhood_x':'Neighborhood'})
downtown_data = final_table_df_temp
downtown_data

Unnamed: 0,Postal code,Neighborhood,Borough,Latitude,Longitude
0,M5A,Regent Park,Downtown Toronto,43.65426,-79.360636
1,M5A,Harbourfront,Downtown Toronto,43.65426,-79.360636
2,M7A,Queen's Park,Downtown Toronto,43.662301,-79.389494
3,M7A,Ontario Provincial Government,Downtown Toronto,43.662301,-79.389494
4,M5B,Garden District,Downtown Toronto,43.657162,-79.378937
5,M5B,Ryerson,Downtown Toronto,43.657162,-79.378937
6,M5C,St. James Town,Downtown Toronto,43.651494,-79.375418
7,M5E,Berczy Park,Downtown Toronto,43.644771,-79.373306
8,M5G,Central Bay Street,Downtown Toronto,43.657952,-79.387383
9,M6G,Christie,Downtown Toronto,43.669542,-79.422564


#### Create a map of downtown toronto

In [220]:
address = 'Downtown Toronto, ON'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinate of Toronto are {},{}'.format(latitude,longitude))
map_downtown_toronto = folium.Map([latitude,longitude],zoom_start=11)
for lat,lng,borough, neighborhood in zip(downtown_data['Latitude'], downtown_data['Longitude'],downtown_data['Borough'], downtown_data['Neighborhood']):
    label = '{},{}'.format(neighborhood,borough)
    label = folium.Popup(label,parse_html=True)
    folium.CircleMarker([lat,lng],radius=5,popup=label,color='blue',fill=True,fill_color='#3286bb',fill_opacity=0.7,parse_html=False).add_to(map_downtown_toronto)

map_downtown_toronto


The geographical coordinate of Toronto are 43.6563221,-79.3809161


#### Declaring Foursquare credentials. The credentials are hidden for security

In [221]:
{
    "tags": [
        "remove_input",
    ]
}
CLIENT_ID = ''
CLIENT_SECRET = ''
VERSION = '20180605'


#### The code below is for helper method to get 100 venues through FourSquare API for each of the neighborhood

In [222]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID,CLIENT_SECRET,VERSION,lat,lng,radius,LIMIT)
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        # return only relevant information for each nearby venue
        venues_list.append([(name,lat,lng,v['venue']['name'],v['venue']['location']['lat'],v['venue']['location']['lng'],v['venue']['categories'][0]['name']) for v in results])
        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns = ['Neighborhood','Neighborhood Latitude','Neighborhood Longitude',      'Venue','Venue Latitude','Venue Longitude','Venue Category']
    return(nearby_venues)

#### Calls the helper method above to get the venues

In [223]:
north_york_venues = getNearbyVenues(names=downtown_data['Neighborhood'],latitudes=downtown_data['Latitude'],longitudes=downtown_data['Longitude'])
print(north_york_venues.shape)
north_york_venues.head()

Regent Park 
 Harbourfront
Queen's Park 
 Ontario Provincial Government
Garden District
 Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond 
 Adelaide 
 King
Harbourfront East 
 Union Station 
 Toronto Islands
Toronto Dominion Centre 
 Design Exchange
Commerce Court 
 Victoria Hotel
University of Toronto 
 Harbord
Kensington Market 
 Chinatown 
 Grange Park
CN Tower 
 King and Spadina 
 Railway Lands 
 Harbourfront West 
 Bathurst
 Quay 
 South Niagara 
 Island airport
Rosedale
Stn A PO Boxes
St. James Town 
 Cabbagetown
First Canadian Place 
 Underground city
Church and Wellesley
(2389, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Regent Park,43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,Regent Park,43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Regent Park,43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,Regent Park,43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,Regent Park,43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot


In [224]:
north_york_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adelaide,97,97,97,97,97,97
Bathurst\n Quay,14,14,14,14,14,14
Cabbagetown,43,43,43,43,43,43
Chinatown,62,62,62,62,62,62
Design Exchange,100,100,100,100,100,100
Grange Park,62,62,62,62,62,62
Harbord,35,35,35,35,35,35
Harbourfront,49,49,49,49,49,49
Harbourfront West,14,14,14,14,14,14
Island airport,14,14,14,14,14,14


In [225]:
print('There are {} unique Categories'.format(len(north_york_venues['Venue Category'].unique())))

There are 202 unique Categories


In [226]:
north_york_onehot = pd.get_dummies(north_york_venues[['Venue Category']], prefix="",prefix_sep="")
north_york_onehot['Neighborhood'] = north_york_venues['Neighborhood']
fixed_columns = [north_york_onehot.columns[-1]] + list(north_york_onehot.columns[:-1])
north_york_onehot = north_york_onehot[fixed_columns]
north_york_grouped = north_york_onehot.groupby('Neighborhood').mean().reset_index()
north_york_grouped.shape


(39, 202)

#### The code below is the helper method to get the most common venues

In [227]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

#### Get the 10 most common venues for each of the neighborhood

In [228]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = north_york_grouped['Neighborhood']
for ind in np.arange(north_york_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(north_york_grouped.iloc[ind, :],num_top_venues)
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adelaide,Coffee Shop,Café,Gym,Restaurant,Hotel,Deli / Bodega,Thai Restaurant,Salad Place,Cosmetics Shop,Concert Hall
1,Bathurst\n Quay,Airport Terminal,Airport Lounge,Airport Service,Coffee Shop,Boat or Ferry,Boutique,Harbor / Marina,Sculpture Garden,Airport,Airport Food Court
2,Cabbagetown,Coffee Shop,Restaurant,Pub,Pizza Place,Italian Restaurant,Bakery,Café,Chinese Restaurant,Breakfast Spot,Playground
3,Chinatown,Café,Coffee Shop,Mexican Restaurant,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Bar,Chinese Restaurant,Grocery Store,Dessert Shop,Burger Joint
4,Design Exchange,Coffee Shop,Hotel,Café,Restaurant,Seafood Restaurant,Italian Restaurant,American Restaurant,Japanese Restaurant,Gastropub,Salad Place


#### Create 5 cluster using k Means to cluster the neighborhoods into 5 cluster

In [229]:
kclusters = 5
north_york_grouped_clustering = north_york_grouped.drop('Neighborhood', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(north_york_grouped_clustering)
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]
 # check the last columns!

array([1, 2, 1, 3, 1, 3, 3, 1, 2, 2])

In [230]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
north_york_merged = downtown_data
# merge
north_york_merged = north_york_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
north_york_merged = north_york_merged.dropna()
north_york_merged['Cluster Labels'] = north_york_merged['Cluster Labels'].apply(np.int64)
north_york_merged

Unnamed: 0,Postal code,Neighborhood,Borough,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Regent Park,Downtown Toronto,43.65426,-79.360636,1,Coffee Shop,Bakery,Park,Pub,Mexican Restaurant,Theater,Café,Restaurant,Breakfast Spot,Yoga Studio
1,M5A,Harbourfront,Downtown Toronto,43.65426,-79.360636,1,Coffee Shop,Bakery,Park,Pub,Mexican Restaurant,Theater,Café,Restaurant,Breakfast Spot,Yoga Studio
2,M7A,Queen's Park,Downtown Toronto,43.662301,-79.389494,0,Coffee Shop,Diner,Gym,Music Venue,Mexican Restaurant,Juice Bar,Italian Restaurant,Hobby Shop,Fried Chicken Joint,Distribution Center
3,M7A,Ontario Provincial Government,Downtown Toronto,43.662301,-79.389494,0,Coffee Shop,Diner,Gym,Music Venue,Mexican Restaurant,Juice Bar,Italian Restaurant,Hobby Shop,Fried Chicken Joint,Distribution Center
4,M5B,Garden District,Downtown Toronto,43.657162,-79.378937,1,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Japanese Restaurant,Italian Restaurant,Bubble Tea Shop,Middle Eastern Restaurant,Tea Room,Lingerie Store
5,M5B,Ryerson,Downtown Toronto,43.657162,-79.378937,1,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Japanese Restaurant,Italian Restaurant,Bubble Tea Shop,Middle Eastern Restaurant,Tea Room,Lingerie Store
6,M5C,St. James Town,Downtown Toronto,43.651494,-79.375418,1,Coffee Shop,Café,Cocktail Bar,Beer Bar,Restaurant,Hotel,American Restaurant,Gym,Park,Breakfast Spot
7,M5E,Berczy Park,Downtown Toronto,43.644771,-79.373306,1,Coffee Shop,Café,Cheese Shop,Farmers Market,Bakery,Restaurant,Italian Restaurant,Seafood Restaurant,Beer Bar,Cocktail Bar
8,M5G,Central Bay Street,Downtown Toronto,43.657952,-79.387383,1,Coffee Shop,Café,Italian Restaurant,Sandwich Place,Salad Place,Ice Cream Shop,Sushi Restaurant,Japanese Restaurant,Spa,Burger Joint
9,M6G,Christie,Downtown Toronto,43.669542,-79.422564,3,Grocery Store,Café,Park,Coffee Shop,Candy Store,Italian Restaurant,Diner,Restaurant,Baby Store,Athletics & Sports


#### Create a cluster map for downtown toronto for each of the neighborhood

In [231]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(north_york_merged['Latitude'], north_york_merged['Longitude'],north_york_merged['Neighborhood'], north_york_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
   
    folium.CircleMarker([lat, lon],radius=5,popup=label,color=rainbow[int(cluster-1)],fill=True,fill_color=rainbow[cluster-1],fill_opacity=0.7).add_to(map_clusters)
map_clusters

#### From the clusters data we can see that the cluster 1, which consists of Queen's park and Ontario provincial, the top common venue is coffee shop, followed by Diner and gym respectively

In [233]:
#Cluster 1
north_york_merged.loc[north_york_merged['Cluster Labels'] == 0, north_york_merged.columns[[1]+list(range(5,north_york_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Queen's Park,0,Coffee Shop,Diner,Gym,Music Venue,Mexican Restaurant,Juice Bar,Italian Restaurant,Hobby Shop,Fried Chicken Joint,Distribution Center
3,Ontario Provincial Government,0,Coffee Shop,Diner,Gym,Music Venue,Mexican Restaurant,Juice Bar,Italian Restaurant,Hobby Shop,Fried Chicken Joint,Distribution Center


#### Again for cluster 2 we see that the top most venue is coffee shop. but it is interesting to see that this cluster includes neighborhood that has other venues such as Bakery, clothing store, Aquarium and so on.

In [234]:
#Cluster 2
north_york_merged.loc[north_york_merged['Cluster Labels'] == 1, north_york_merged.columns[[1]+list(range(5,north_york_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Regent Park,1,Coffee Shop,Bakery,Park,Pub,Mexican Restaurant,Theater,Café,Restaurant,Breakfast Spot,Yoga Studio
1,Harbourfront,1,Coffee Shop,Bakery,Park,Pub,Mexican Restaurant,Theater,Café,Restaurant,Breakfast Spot,Yoga Studio
4,Garden District,1,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Japanese Restaurant,Italian Restaurant,Bubble Tea Shop,Middle Eastern Restaurant,Tea Room,Lingerie Store
5,Ryerson,1,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Japanese Restaurant,Italian Restaurant,Bubble Tea Shop,Middle Eastern Restaurant,Tea Room,Lingerie Store
6,St. James Town,1,Coffee Shop,Café,Cocktail Bar,Beer Bar,Restaurant,Hotel,American Restaurant,Gym,Park,Breakfast Spot
7,Berczy Park,1,Coffee Shop,Café,Cheese Shop,Farmers Market,Bakery,Restaurant,Italian Restaurant,Seafood Restaurant,Beer Bar,Cocktail Bar
8,Central Bay Street,1,Coffee Shop,Café,Italian Restaurant,Sandwich Place,Salad Place,Ice Cream Shop,Sushi Restaurant,Japanese Restaurant,Spa,Burger Joint
10,Richmond,1,Coffee Shop,Café,Gym,Restaurant,Hotel,Deli / Bodega,Thai Restaurant,Salad Place,Cosmetics Shop,Concert Hall
11,Adelaide,1,Coffee Shop,Café,Gym,Restaurant,Hotel,Deli / Bodega,Thai Restaurant,Salad Place,Cosmetics Shop,Concert Hall
12,King,1,Coffee Shop,Café,Gym,Restaurant,Hotel,Deli / Bodega,Thai Restaurant,Salad Place,Cosmetics Shop,Concert Hall


#### This cluster contains the airport so all the top venues are related to airport activity such as terminals, lounge coffee shops etc.

In [235]:
#Cluster 3
north_york_merged.loc[north_york_merged['Cluster Labels'] == 2, north_york_merged.columns[[1]+list(range(5,north_york_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
25,CN Tower,2,Airport Terminal,Airport Lounge,Airport Service,Coffee Shop,Boat or Ferry,Boutique,Harbor / Marina,Sculpture Garden,Airport,Airport Food Court
26,King and Spadina,2,Airport Terminal,Airport Lounge,Airport Service,Coffee Shop,Boat or Ferry,Boutique,Harbor / Marina,Sculpture Garden,Airport,Airport Food Court
27,Railway Lands,2,Airport Terminal,Airport Lounge,Airport Service,Coffee Shop,Boat or Ferry,Boutique,Harbor / Marina,Sculpture Garden,Airport,Airport Food Court
28,Harbourfront West,2,Airport Terminal,Airport Lounge,Airport Service,Coffee Shop,Boat or Ferry,Boutique,Harbor / Marina,Sculpture Garden,Airport,Airport Food Court
29,Bathurst\n Quay,2,Airport Terminal,Airport Lounge,Airport Service,Coffee Shop,Boat or Ferry,Boutique,Harbor / Marina,Sculpture Garden,Airport,Airport Food Court
30,South Niagara,2,Airport Terminal,Airport Lounge,Airport Service,Coffee Shop,Boat or Ferry,Boutique,Harbor / Marina,Sculpture Garden,Airport,Airport Food Court
31,Island airport,2,Airport Terminal,Airport Lounge,Airport Service,Coffee Shop,Boat or Ferry,Boutique,Harbor / Marina,Sculpture Garden,Airport,Airport Food Court


#### This cluster has grocery shop, cafe, park, restaurant which could mean that this cluster may contain residential area

In [236]:
#Cluster 4
north_york_merged.loc[north_york_merged['Cluster Labels'] == 3, north_york_merged.columns[[1]+list(range(5,north_york_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Christie,3,Grocery Store,Café,Park,Coffee Shop,Candy Store,Italian Restaurant,Diner,Restaurant,Baby Store,Athletics & Sports
20,University of Toronto,3,Café,Bookstore,Bar,Italian Restaurant,Japanese Restaurant,Bakery,Restaurant,Yoga Studio,Beer Bar,Beer Store
21,Harbord,3,Café,Bookstore,Bar,Italian Restaurant,Japanese Restaurant,Bakery,Restaurant,Yoga Studio,Beer Bar,Beer Store
22,Kensington Market,3,Café,Coffee Shop,Mexican Restaurant,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Bar,Chinese Restaurant,Grocery Store,Dessert Shop,Burger Joint
23,Chinatown,3,Café,Coffee Shop,Mexican Restaurant,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Bar,Chinese Restaurant,Grocery Store,Dessert Shop,Burger Joint
24,Grange Park,3,Café,Coffee Shop,Mexican Restaurant,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Bar,Chinese Restaurant,Grocery Store,Dessert Shop,Burger Joint


#### Again the top venues in this cluster are park, trails, playground etc. which may suggest that this cluster contains the residential area

In [237]:
#Cluster 5
north_york_merged.loc[north_york_merged['Cluster Labels'] == 4, north_york_merged.columns[[1]+list(range(5,north_york_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,Rosedale,4,Park,Trail,Playground,Women's Store,Deli / Bodega,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center
