Problem I: 
Use the Notebook to build the code to scrape the following Wikipedia page, 
https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, 
in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe

In [2]:
import pandas as pd
import numpy as np
                        # import the library we use to open URLs
import urllib.request
import requests
from bs4 import BeautifulSoup
import os
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

                            # import the BeautifulSoup library so we can parse HTML and XML documents
from bs4 import BeautifulSoup
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize
from geopy.geocoders import Nominatim
import folium # map rendering library


In [3]:
# specify which URL/web page we are going to be scraping
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

In [4]:
# open the url using urllib.request and put the HTML into the page variable
page = urllib.request.urlopen(url)

In [5]:
# parse the HTML from our URL into the BeautifulSoup parse tree format and #print(soup.prettify())  to look at html to see underlying our chosen webpage
soup = BeautifulSoup(page, "lxml")
#print(soup.prettify())

In [6]:
table=soup.find('table')
soup.prettify()

'<!DOCTYPE html>\n<html class="client-nojs" dir="ltr" lang="en">\n <head>\n  <meta charset="utf-8"/>\n  <title>\n   List of postal codes of Canada: M - Wikipedia\n  </title>\n  <script>\n   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"Xsl70QpAEJkAANCrR4QAAABI","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":958430791,"wgRevisionId":958430791,"wgArticleId":539066,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Articles with short description","Communications in Ontario","Postal codes in Canada","Toronto"

In [7]:
#set up 3 empty lists to store our data in. Just because we have only 3
column_names=['postalcode','borough','neighbourhood']
df = pd.DataFrame(columns=column_names)

In [8]:
# Load the data from BeautifulSoup parse tree format into table with columns defined above
for tr_cell in table.find_all('tr'):
    row_data=[]
    for td_cell in tr_cell.find_all('td'):
        row_data.append(td_cell.text.strip())
    if len(row_data)==3:
        df.loc[len(df)] = row_data

In [9]:
df.head() 

Unnamed: 0,postalcode,borough,neighbourhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [10]:
#Disqualify Borough with Not assigned data
df=df[df['borough']!='Not assigned'] 

In [11]:
df.head()

Unnamed: 0,postalcode,borough,neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [12]:
df.shape

(103, 3)

Problem II: 
Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood. using the Geocoder package, here is a link to a csv file that has the geographical coordinates of each postal code: http://cocl.us/Geospatial_data, Use the Geocoder package or the csv file to create the following dataframe

In [14]:
#Load geodata from VCS file 
geo_df=pd.read_csv('http://cocl.us/Geospatial_data')

In [15]:

geo_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [16]:
#harmonise the columns name in order to merge 2 dataframe
geo_df.rename(columns={'Postal Code':'postalcode', 'Latitude':'latitude', 'Longitude':'longitude'}, inplace=True)
geo_merged = pd.merge(geo_df, df, on='postalcode')

In [17]:
#Have columns in odrer as given in question 
geo_merged = geo_merged[['postalcode','borough', 'neighbourhood', 'latitude', 'longitude']] 

In [18]:
geo_merged.head()

Unnamed: 0,postalcode,borough,neighbourhood,latitude,longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


Problem III: Explore and cluster the neighborhoods in Toronto. With Foursquare, Work with only boroughs that contain the word Toronto to explore more the area


In [20]:
#let select those neighourhoods witth boroughs containing Toronto in name
toronto_neighbourhoods=geo_merged[geo_merged['borough'].str.contains("Toronto")] 
toronto_neighbourhoods.head()

Unnamed: 0,postalcode,borough,neighbourhood,latitude,longitude
37,M4E,East Toronto,The Beaches,43.676357,-79.293031
41,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
42,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572
43,M4M,East Toronto,Studio District,43.659526,-79.340923
44,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


In [21]:
#let look at how many boroughs are there. 
toronto_neighbourhoods.shape

(39, 5)

In [22]:
address = 'Toronto, ON, Canada'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of toronto are 43.6534817, -79.3839347.


In [23]:
#Still 39 Boroughs are too many. Let create map see how it look like
map_toronto= folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, borough, neighbourhood in zip(toronto_neighbourhoods['latitude'], toronto_neighbourhoods['longitude'], toronto_neighbourhoods['borough'], toronto_neighbourhoods['neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [26]:
CLIENT_ID = 'F2BNOO1203DBO5P3SVJ4NF3O3AHTVUEQYSWEA1GJO1KUU2GO' # Foursquare ID
CLIENT_SECRET = 'DSAQKGNWWJD2SE3TBMAYVN4TR5UDWTWHRMQTLCLYDUW3XHA0' # Foursquare Secret
VERSION = '20180604'

In [27]:
# assume I don't much Canada especially anything about Toronto at all. I decide to explore entire metropolitan, cluster and see what is there


# type your answer here

LIMIT = 200 
                                # limit of number of venues returned by Foursquare API
radius = 1000 
                                # define radius

                                # create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude,
    longitude, 
    radius, 
    LIMIT)
url 
                                # display URL




'https://api.foursquare.com/v2/venues/explore?&client_id=F2BNOO1203DBO5P3SVJ4NF3O3AHTVUEQYSWEA1GJO1KUU2GO&client_secret=DSAQKGNWWJD2SE3TBMAYVN4TR5UDWTWHRMQTLCLYDUW3XHA0&v=20180604&ll=43.6534817,-79.3839347&radius=1000&limit=200'

In [28]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [29]:
# type your answer here

toronto_neighbourhoods = getNearbyVenues(names=toronto_neighbourhoods['neighbourhood'],
                                   latitudes=toronto_neighbourhoods['latitude'],
                                   longitudes=toronto_neighbourhoods['longitude']
                                  )

The Beaches
The Danforth West, Riverdale
India Bazaar, The Beaches West
Studio District
Lawrence Park
Davisville North
North Toronto West,  Lawrence Park
Davisville
Moore Park, Summerhill East
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
Rosedale
St. James Town, Cabbagetown
Church and Wellesley
Regent Park, Harbourfront
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
Roselawn
Forest Hill North & West, Forest Hill Road Park
The Annex, North Midtown, Yorkville
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Stn A PO Boxes
First Canadian Place, Underground city
Christie
Dufferin, Dovercourt Village
Little Portugal, Trinity
Brockton, Parkdale Village, Exhibition Place
High

In [30]:
print(toronto_neighbourhoods.shape)
toronto_neighbourhoods.head()

(3189, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,Tori's Bakeshop,43.672114,-79.290331,Vegetarian / Vegan Restaurant
2,The Beaches,43.676357,-79.293031,Beaches Bake Shop,43.680363,-79.289692,Bakery
3,The Beaches,43.676357,-79.293031,The Beech Tree,43.680493,-79.288846,Gastropub
4,The Beaches,43.676357,-79.293031,The Fox Theatre,43.672801,-79.287272,Indie Movie Theater


In [31]:
toronto_neighbourhoods.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,100,100,100,100,100,100
"Brockton, Parkdale Village, Exhibition Place",100,100,100,100,100,100
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",47,47,47,47,47,47
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",16,16,16,16,16,16
Central Bay Street,100,100,100,100,100,100
Christie,100,100,100,100,100,100
Church and Wellesley,100,100,100,100,100,100
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
Davisville,100,100,100,100,100,100
Davisville North,100,100,100,100,100,100


In [32]:
#Let's find out how many unique categories can be curated from all the returned venues

print('There are {} uniques categories.'.format(len(toronto_neighbourhoods['Venue Category'].unique())))

There are 274 uniques categories.


In [33]:
#Analyse each neighbourhoud
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_neighbourhoods[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_neighbourhoods['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Zoo,Accessories Store,Airport,Airport Lounge,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,Art Gallery,...,Tree,Turkish Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wings Joint,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [34]:
toronto_onehot.shape

(3189, 274)

In [35]:
#Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()


In [36]:
#let check the size for confirmation 
toronto_grouped.shape

(39, 274)

In [37]:
#Let's print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
                 venue  freq
0          Coffee Shop  0.12
1                 Café  0.07
2                Hotel  0.04
3  Japanese Restaurant  0.04
4           Restaurant  0.04


----Brockton, Parkdale Village, Exhibition Place----
                    venue  freq
0                    Café  0.07
1              Restaurant  0.06
2             Coffee Shop  0.06
3                     Bar  0.05
4  Furniture / Home Store  0.04


----Business reply mail Processing Centre, South Central Letter Processing Plant Toronto----
                venue  freq
0                Park  0.09
1         Pizza Place  0.06
2         Coffee Shop  0.06
3             Brewery  0.06
4  Italian Restaurant  0.04


----CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport----
              venue  freq
0              Café  0.12
1       Coffee Shop  0.12
2   Harbor / Marina  0.12
3  Sushi Restaurant  0.06
4    Scenic Lookout  0.06


----Central Bay Stree

In [38]:
#Let's put that into a pandas dataframe
#First, let's write a function to sort the venues in descending order.

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [39]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Café,Hotel,Japanese Restaurant,Restaurant,Park,Beer Bar,Bakery,Seafood Restaurant,Cocktail Bar
1,"Brockton, Parkdale Village, Exhibition Place",Café,Restaurant,Coffee Shop,Bar,Bakery,Furniture / Home Store,Tibetan Restaurant,Gift Shop,Performing Arts Venue,Park
2,"Business reply mail Processing Centre, South C...",Park,Brewery,Coffee Shop,Pizza Place,Sushi Restaurant,Italian Restaurant,Fast Food Restaurant,Skate Park,Liquor Store,Fish & Chips Shop
3,"CN Tower, King and Spadina, Railway Lands, Har...",Harbor / Marina,Coffee Shop,Café,Park,Dance Studio,Garden,Airport,Airport Lounge,Scenic Lookout,Sushi Restaurant
4,Central Bay Street,Coffee Shop,Café,Park,Art Gallery,Sushi Restaurant,Japanese Restaurant,Yoga Studio,Furniture / Home Store,Hotel,Italian Restaurant


In [40]:
# Run k-means to cluster the neighborhood into 5 clusters.
#set number of clusters
kclusters = 4

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 0, 0, 3, 1, 0, 1, 1, 0, 0])

In [41]:
#Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_neighbourhoods

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail,0,Coffee Shop,Pub,Pizza Place,Beach,Breakfast Spot,Japanese Restaurant,Nail Salon,Bar,Café,Bakery
1,The Beaches,43.676357,-79.293031,Tori's Bakeshop,43.672114,-79.290331,Vegetarian / Vegan Restaurant,0,Coffee Shop,Pub,Pizza Place,Beach,Breakfast Spot,Japanese Restaurant,Nail Salon,Bar,Café,Bakery
2,The Beaches,43.676357,-79.293031,Beaches Bake Shop,43.680363,-79.289692,Bakery,0,Coffee Shop,Pub,Pizza Place,Beach,Breakfast Spot,Japanese Restaurant,Nail Salon,Bar,Café,Bakery
3,The Beaches,43.676357,-79.293031,The Beech Tree,43.680493,-79.288846,Gastropub,0,Coffee Shop,Pub,Pizza Place,Beach,Breakfast Spot,Japanese Restaurant,Nail Salon,Bar,Café,Bakery
4,The Beaches,43.676357,-79.293031,The Fox Theatre,43.672801,-79.287272,Indie Movie Theater,0,Coffee Shop,Pub,Pizza Place,Beach,Breakfast Spot,Japanese Restaurant,Nail Salon,Bar,Café,Bakery


In [42]:
#cluster 1
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,43.676357,-79.293942,Trail,0,Coffee Shop,Pub,Pizza Place,Beach,Breakfast Spot,Japanese Restaurant,Nail Salon,Bar,Café,Bakery
1,43.676357,-79.290331,Vegetarian / Vegan Restaurant,0,Coffee Shop,Pub,Pizza Place,Beach,Breakfast Spot,Japanese Restaurant,Nail Salon,Bar,Café,Bakery
2,43.676357,-79.289692,Bakery,0,Coffee Shop,Pub,Pizza Place,Beach,Breakfast Spot,Japanese Restaurant,Nail Salon,Bar,Café,Bakery
3,43.676357,-79.288846,Gastropub,0,Coffee Shop,Pub,Pizza Place,Beach,Breakfast Spot,Japanese Restaurant,Nail Salon,Bar,Café,Bakery
4,43.676357,-79.287272,Indie Movie Theater,0,Coffee Shop,Pub,Pizza Place,Beach,Breakfast Spot,Japanese Restaurant,Nail Salon,Bar,Café,Bakery
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3184,43.662744,-79.309945,Coffee Shop,0,Park,Brewery,Coffee Shop,Pizza Place,Sushi Restaurant,Italian Restaurant,Fast Food Restaurant,Skate Park,Liquor Store,Fish & Chips Shop
3185,43.662744,-79.330372,Grocery Store,0,Park,Brewery,Coffee Shop,Pizza Place,Sushi Restaurant,Italian Restaurant,Fast Food Restaurant,Skate Park,Liquor Store,Fish & Chips Shop
3186,43.662744,-79.310174,Breakfast Spot,0,Park,Brewery,Coffee Shop,Pizza Place,Sushi Restaurant,Italian Restaurant,Fast Food Restaurant,Skate Park,Liquor Store,Fish & Chips Shop
3187,43.662744,-79.309898,Bistro,0,Park,Brewery,Coffee Shop,Pizza Place,Sushi Restaurant,Italian Restaurant,Fast Food Restaurant,Skate Park,Liquor Store,Fish & Chips Shop


In [43]:
#cluster II
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
761,43.679563,-79.375458,Grocery Store,1,Coffee Shop,Grocery Store,Park,Convenience Store,Candy Store,Bistro,Bank,Japanese Restaurant,BBQ Joint,Athletics & Sports
762,43.679563,-79.389367,BBQ Joint,1,Coffee Shop,Grocery Store,Park,Convenience Store,Candy Store,Bistro,Bank,Japanese Restaurant,BBQ Joint,Athletics & Sports
763,43.679563,-79.388559,Athletics & Sports,1,Coffee Shop,Grocery Store,Park,Convenience Store,Candy Store,Bistro,Bank,Japanese Restaurant,BBQ Joint,Athletics & Sports
764,43.679563,-79.377856,Pie Shop,1,Coffee Shop,Grocery Store,Park,Convenience Store,Candy Store,Bistro,Bank,Japanese Restaurant,BBQ Joint,Athletics & Sports
765,43.679563,-79.374920,Filipino Restaurant,1,Coffee Shop,Grocery Store,Park,Convenience Store,Candy Store,Bistro,Bank,Japanese Restaurant,BBQ Joint,Athletics & Sports
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2402,43.648429,-79.374547,Thai Restaurant,1,Coffee Shop,Café,Hotel,Theater,Concert Hall,Restaurant,Japanese Restaurant,Seafood Restaurant,Park,Italian Restaurant
2403,43.648429,-79.375602,Italian Restaurant,1,Coffee Shop,Café,Hotel,Theater,Concert Hall,Restaurant,Japanese Restaurant,Seafood Restaurant,Park,Italian Restaurant
2404,43.648429,-79.380677,Shopping Mall,1,Coffee Shop,Café,Hotel,Theater,Concert Hall,Restaurant,Japanese Restaurant,Seafood Restaurant,Park,Italian Restaurant
2405,43.648429,-79.373630,Cosmetics Shop,1,Coffee Shop,Café,Hotel,Theater,Concert Hall,Restaurant,Japanese Restaurant,Seafood Restaurant,Park,Italian Restaurant


In [44]:
#cluster III
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
363,43.72802,-79.394382,Park,2,Park,Café,Bookstore,College Quad,College Gym,Gym / Fitness Center,Coffee Shop,Trail,Yoga Studio,Donut Shop
364,43.72802,-79.381986,Gym / Fitness Center,2,Park,Café,Bookstore,College Quad,College Gym,Gym / Fitness Center,Coffee Shop,Trail,Yoga Studio,Donut Shop
365,43.72802,-79.379563,Coffee Shop,2,Park,Café,Bookstore,College Quad,College Gym,Gym / Fitness Center,Coffee Shop,Trail,Yoga Studio,Donut Shop
366,43.72802,-79.378976,Bookstore,2,Park,Café,Bookstore,College Quad,College Gym,Gym / Fitness Center,Coffee Shop,Trail,Yoga Studio,Donut Shop
367,43.72802,-79.378413,Trail,2,Park,Café,Bookstore,College Quad,College Gym,Gym / Fitness Center,Coffee Shop,Trail,Yoga Studio,Donut Shop
368,43.72802,-79.378222,College Quad,2,Park,Café,Bookstore,College Quad,College Gym,Gym / Fitness Center,Coffee Shop,Trail,Yoga Studio,Donut Shop
369,43.72802,-79.377835,Café,2,Park,Café,Bookstore,College Quad,College Gym,Gym / Fitness Center,Coffee Shop,Trail,Yoga Studio,Donut Shop
370,43.72802,-79.376819,College Gym,2,Park,Café,Bookstore,College Quad,College Gym,Gym / Fitness Center,Coffee Shop,Trail,Yoga Studio,Donut Shop


In [45]:
#cluster IV
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2191,43.628947,-79.396033,Airport,3,Harbor / Marina,Coffee Shop,Café,Park,Dance Studio,Garden,Airport,Airport Lounge,Scenic Lookout,Sushi Restaurant
2192,43.628947,-79.396484,Harbor / Marina,3,Harbor / Marina,Coffee Shop,Café,Park,Dance Studio,Garden,Airport,Airport Lounge,Scenic Lookout,Sushi Restaurant
2193,43.628947,-79.402185,Harbor / Marina,3,Harbor / Marina,Coffee Shop,Café,Park,Dance Studio,Garden,Airport,Airport Lounge,Scenic Lookout,Sushi Restaurant
2194,43.628947,-79.393933,Garden,3,Harbor / Marina,Coffee Shop,Café,Park,Dance Studio,Garden,Airport,Airport Lounge,Scenic Lookout,Sushi Restaurant
2195,43.628947,-79.395756,Airport Lounge,3,Harbor / Marina,Coffee Shop,Café,Park,Dance Studio,Garden,Airport,Airport Lounge,Scenic Lookout,Sushi Restaurant
2196,43.628947,-79.395601,Sculpture Garden,3,Harbor / Marina,Coffee Shop,Café,Park,Dance Studio,Garden,Airport,Airport Lounge,Scenic Lookout,Sushi Restaurant
2197,43.628947,-79.392203,Coffee Shop,3,Harbor / Marina,Coffee Shop,Café,Park,Dance Studio,Garden,Airport,Airport Lounge,Scenic Lookout,Sushi Restaurant
2198,43.628947,-79.398474,Park,3,Harbor / Marina,Coffee Shop,Café,Park,Dance Studio,Garden,Airport,Airport Lounge,Scenic Lookout,Sushi Restaurant
2199,43.628947,-79.403961,Dog Run,3,Harbor / Marina,Coffee Shop,Café,Park,Dance Studio,Garden,Airport,Airport Lounge,Scenic Lookout,Sushi Restaurant
2200,43.628947,-79.393198,Café,3,Harbor / Marina,Coffee Shop,Café,Park,Dance Studio,Garden,Airport,Airport Lounge,Scenic Lookout,Sushi Restaurant
