# Capstone Project - 1st Assigment

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>
    
1. <a href="#item1">Web Scrapping</a>

2. <a href="#item2">Geospatial data</a>

3. <a href="#item3">Explore and cluster the neighborhoods in Toronto</a>

</font>
</div>

<a id='item1'></a>

## Section 1: Web scrapping

First install all the packages:

In [56]:
import sys
!{sys.executable} -m pip install beautifulsoup4
!{sys.executable} -m pip install  lxml
!{sys.executable} -m pip install request
print('All packages successfully installed')

All packages successfully installed


Import BS4, requests and pandas packages:

In [57]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np

Start scrapping Wikipedia table in following steps:
1. request the URL,
2. make the soup, 
3. find the right table by specifing the class,
4. loop through all rows in the table to extract the relevant data

In [58]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
req = requests.get(url)
soup = BeautifulSoup(req.content, 'lxml')
table = []

match = soup.find('table', class_='wikitable').tbody
match = match.find_all('tr')[1:]
for tr in match:
    cells = tr.find_all('td')
    if cells[1].text != 'Not assigned\n':
        data = []
        for cell in cells:
            cell = cell.text.split('\n')[0]
            data.append(cell)
        if data[2] == 'Not assigned':
            data[2] = data[1]
        table.append(data)

column = ['Postal Code', 'Borough', 'Neighborhood']
table = pd.DataFrame(table, columns = column)

Return the data frame

In [59]:
table.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


Check the shape of the data frame

In [60]:
table.shape

(103, 3)

<a id='item2'></a>

## Section 2: Geospatial data

As geocoder does not work, I will use the csv file:

In [61]:
geo = pd.read_csv('https://cocl.us/Geospatial_data')

In [62]:
geo.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Check the shape of the data frame:

In [63]:
geo.shape

(103, 3)

Merge _geo_ with _table_ :

In [64]:
geo_table = pd.merge(table, geo, on='Postal Code')

In [65]:
geo_table.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


Check the entry for Downtown Toronto:

In [66]:
geo_table[geo_table['Postal Code']=='M5G']

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
24,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383


OK, it looks exactly the same as in the assignment instructions.

<a id='item3'></a>

## Section 3. Explore and cluster the neighborhoods in Toronto

Let's start with a visualization of the neighborhoods:

In [67]:
import folium

# create map of Manhattan using latitude and longitude values
map_toronto = folium.Map(location=[43.70, -79.35], zoom_start=11)

# add markers to map
for lat, lng, label in zip(geo_table['Latitude'], geo_table['Longitude'], geo_table['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Define Foursquare Credentials and Version:

In [68]:
CLIENT_ID = 'JKXB1NZHLEIDGDDJCWLAFW34C2MO2U4INCTU5DP2XYWDGPMQ' # my Foursquare ID
CLIENT_SECRET = 'S4JNYT15YBGXGKLBGOBDQEXNS5DAD1U41OQ3K3N23JZTH02Q' # my Foursquare Secret
VERSION = '20200512' # Foursquare API version

print('My credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

My credentails:
CLIENT_ID: JKXB1NZHLEIDGDDJCWLAFW34C2MO2U4INCTU5DP2XYWDGPMQ
CLIENT_SECRET:S4JNYT15YBGXGKLBGOBDQEXNS5DAD1U41OQ3K3N23JZTH02Q


Define a function to get nearby venues for all Toronto neighborhoods:

In [69]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT = 100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
                    
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Run the function for Toronto neighborhoods:

In [70]:
toronto_venues = getNearbyVenues(names=geo_table['Neighborhood'],
                                   latitudes=geo_table['Latitude'],
                                   longitudes=geo_table['Longitude']
                                  )

toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
4,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant


In [90]:
toronto_venues.groupby('Neighborhood').sum()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue Latitude,Venue Longitude
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Agincourt,218.971001,-396.310147,218.963063,-396.305699
"Alderwood, Long Branch",392.421723,-715.891357,392.416351,-715.900632
"Bathurst Manor, Wilson Heights, Downsview North",831.332238,-1509.402927,831.354610,-1509.369162
Bayview Village,175.147789,-317.543900,175.151610,-317.523441
"Bedford Park, Lawrence Manor East",1049.598780,-1906.073993,1049.611251,-1906.062689
...,...,...,...,...
"Wexford, Maryvale",306.250500,-555.070944,306.228166,-555.064102
Willowdale,1707.110375,-3097.133818,1707.067458,-3097.300014
Woburn,131.312976,-237.650752,131.311676,-237.658736
Woodbine Heights,393.258095,-713.865498,393.264152,-713.843604


**It seems that there are multiple postal codes assigned to the same neighborhood** (less rows than in *geo_table* data frame). I'll leave it unchanged, as the venues found for different postal areas will be ultimately added together when using the 'groupby' method.

Let's find out how many unique categories can be curated from all the returned venues:

In [72]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 264 uniques categories.


Strangely, there exists a venue category 'Neighborhood' and there are four items labeled as 'Neigborhood'. It seems like a mistake, so I'll drop these venues from the further analysis.

In [73]:
toronto_venues[toronto_venues['Venue Category']=='Neighborhood']

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
351,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
588,"Richmond, Adelaide, King",43.650571,-79.384568,Downtown Toronto,43.653232,-79.385296,Neighborhood
766,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,Harbourfront,43.639526,-79.380688,Neighborhood
1240,Studio District,43.659526,-79.340923,Leslieville,43.66207,-79.337856,Neighborhood


In [74]:
toronto_venues = toronto_venues[toronto_venues['Venue Category'] != 'Neighborhood']

In [75]:
toronto_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.332140,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
4,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant
...,...,...,...,...,...,...,...
2113,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,RONA,43.629393,-79.518320,Hardware Store
2114,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,7-Eleven,43.629107,-79.517431,Convenience Store
2115,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Jim & Maria's No Frills,43.631152,-79.518617,Grocery Store
2116,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Koala Tan Tanning Salon & Sunless Spa,43.631370,-79.519006,Tanning Salon


Define dummy variables:

In [76]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]
toronto_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [77]:
toronto_onehot.shape

(2114, 264)

Let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category:

In [105]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()

First, let's write a function to sort the venues in descending order.

In [106]:
def return_most_common_venues(row, num_top_venues):

    row_categories = row.iloc[1:]

    row_categories_sorted = row_categories.sort_values(ascending=False)

    

    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [115]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Latin American Restaurant,Lounge,Clothing Store,Breakfast Spot,Skating Rink,Electronics Store,Eastern European Restaurant,Drugstore,Department Store,Donut Shop
1,"Alderwood, Long Branch",Pizza Place,Gym,Coffee Shop,Pub,Skating Rink,Athletics & Sports,Pharmacy,Sandwich Place,Diner,Department Store
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Restaurant,Ice Cream Shop,Sushi Restaurant,Deli / Bodega,Middle Eastern Restaurant,Fried Chicken Joint,Pizza Place,Pharmacy
3,Bayview Village,Chinese Restaurant,Café,Bank,Japanese Restaurant,Yoga Studio,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run
4,"Bedford Park, Lawrence Manor East",Italian Restaurant,Juice Bar,Sandwich Place,Coffee Shop,Thai Restaurant,Liquor Store,Restaurant,Indian Restaurant,Pub,Butcher


Run *k*-means to cluster the neighborhood into 5 clusters. I'll leave *k = 5* unchanged as in the NY example. 

In [81]:
# set number of clusters
from sklearn.cluster import KMeans
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 2, 1, 1, 1, 1, 1, 1, 2, 1], dtype=int32)

In [122]:
# add clustering labels

neighborhoods_venues_sorted['Cluster labels'] = kmeans.labels_

# move labels to the first column
fixed_columns = [neighborhoods_venues_sorted.columns[-1]] + list(neighborhoods_venues_sorted.columns[:-1])
neighborhoods_venues_sorted = neighborhoods_venues_sorted[fixed_columns]
neighborhoods_venues_sorted.head()

Unnamed: 0,Cluster labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,Agincourt,Latin American Restaurant,Lounge,Clothing Store,Breakfast Spot,Skating Rink,Electronics Store,Eastern European Restaurant,Drugstore,Department Store,Donut Shop
1,2,"Alderwood, Long Branch",Pizza Place,Gym,Coffee Shop,Pub,Skating Rink,Athletics & Sports,Pharmacy,Sandwich Place,Diner,Department Store
2,1,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Restaurant,Ice Cream Shop,Sushi Restaurant,Deli / Bodega,Middle Eastern Restaurant,Fried Chicken Joint,Pizza Place,Pharmacy
3,1,Bayview Village,Chinese Restaurant,Café,Bank,Japanese Restaurant,Yoga Studio,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run
4,1,"Bedford Park, Lawrence Manor East",Italian Restaurant,Juice Bar,Sandwich Place,Coffee Shop,Thai Restaurant,Liquor Store,Restaurant,Indian Restaurant,Pub,Butcher


In [153]:
# merge toronto_grouped with geo_table to add latitude/longitude for each neighborhood
toronto = pd.merge(geo_table, neighborhoods_venues_sorted, on = 'Neighborhood', how='inner')

In [156]:
toronto

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0,Park,Food & Drink Shop,Dance Studio,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center
1,M4A,North York,Victoria Village,43.725882,-79.315572,2,Coffee Shop,Portuguese Restaurant,Pizza Place,French Restaurant,Hockey Arena,Yoga Studio,Diner,Deli / Bodega,Department Store,Dessert Shop
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636,1,Coffee Shop,Bakery,Park,Breakfast Spot,Restaurant,Pub,Theater,Café,Health Food Store,Historic Site
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,1,Furniture / Home Store,Clothing Store,Accessories Store,Coffee Shop,Miscellaneous Shop,Boutique,Shoe Store,Event Space,Vietnamese Restaurant,Women's Store
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,1,Coffee Shop,Sushi Restaurant,Distribution Center,Bank,Bar,Beer Bar,Japanese Restaurant,Juice Bar,Sandwich Place,Yoga Studio
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
93,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944,0,Smoke Shop,River,Park,Distribution Center,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run
94,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160,1,Coffee Shop,Sushi Restaurant,Japanese Restaurant,Restaurant,Yoga Studio,Café,Pizza Place,Pub,Hotel,Men's Store
95,M7Y,East Toronto,Business reply mail Processing Centre,43.662744,-79.321558,2,Yoga Studio,Garden Center,Restaurant,Light Rail Station,Auto Workshop,Fast Food Restaurant,Farmers Market,Spa,Pizza Place,Garden
96,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509,0,Baseball Field,Park,Deli / Bodega,Yoga Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Doner Restaurant


Finally, let's visualize the resulting clusters

In [161]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location=[43.70, -79.35], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto['Latitude'], toronto['Longitude'], toronto['Neighborhood'], toronto['Cluster labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster.

In [172]:
toronto.loc[toronto['Cluster labels'] == 0, toronto.columns[[1] + list(range(5, toronto.shape[1]))]]

Unnamed: 0,Borough,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,0,Park,Food & Drink Shop,Dance Studio,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center
19,York,0,Women's Store,Park,Pool,Convenience Store,Yoga Studio,Distribution Center,Department Store,Dessert Shop,Dim Sum Restaurant,Diner
33,East York,0,Park,Convenience Store,Coffee Shop,Yoga Studio,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center
58,Central Toronto,0,Bus Line,Park,Swim School,Distribution Center,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Yoga Studio
61,York,0,Park,Dance Studio,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center
63,North York,0,Convenience Store,Park,Bank,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Yoga Studio
81,Scarborough,0,Playground,Park,Sculpture Garden,Distribution Center,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
87,Downtown Toronto,0,Park,Playground,Trail,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Yoga Studio
93,Etobicoke,0,Smoke Shop,River,Park,Distribution Center,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run
96,Etobicoke,0,Baseball Field,Park,Deli / Bodega,Yoga Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Doner Restaurant


In [173]:
toronto.loc[toronto['Cluster labels'] == 1, toronto.columns[[1] + list(range(5, toronto.shape[1]))]]

Unnamed: 0,Borough,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Downtown Toronto,1,Coffee Shop,Bakery,Park,Breakfast Spot,Restaurant,Pub,Theater,Café,Health Food Store,Historic Site
3,North York,1,Furniture / Home Store,Clothing Store,Accessories Store,Coffee Shop,Miscellaneous Shop,Boutique,Shoe Store,Event Space,Vietnamese Restaurant,Women's Store
4,Downtown Toronto,1,Coffee Shop,Sushi Restaurant,Distribution Center,Bank,Bar,Beer Bar,Japanese Restaurant,Juice Bar,Sandwich Place,Yoga Studio
6,North York,1,Coffee Shop,Gym,Beer Store,Japanese Restaurant,Asian Restaurant,Restaurant,Caribbean Restaurant,Bike Shop,Sandwich Place,Café
7,North York,1,Coffee Shop,Gym,Beer Store,Japanese Restaurant,Asian Restaurant,Restaurant,Caribbean Restaurant,Bike Shop,Sandwich Place,Café
...,...,...,...,...,...,...,...,...,...,...,...,...
88,Downtown Toronto,1,Coffee Shop,Café,Seafood Restaurant,Beer Bar,Restaurant,Italian Restaurant,Cocktail Bar,Japanese Restaurant,Bakery,Park
90,Etobicoke,1,Rental Car Location,Drugstore,Bar,Distribution Center,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run
91,Downtown Toronto,1,Coffee Shop,Restaurant,Café,Italian Restaurant,Bakery,Market,Pizza Place,Pub,Liquor Store,Indian Restaurant
92,Downtown Toronto,1,Coffee Shop,Café,Restaurant,Hotel,Gym,Japanese Restaurant,Salad Place,Steakhouse,Asian Restaurant,Deli / Bodega


In [174]:
toronto.loc[toronto['Cluster labels'] == 2, toronto.columns[[1] + list(range(5, toronto.shape[1]))]]

Unnamed: 0,Borough,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,2,Coffee Shop,Portuguese Restaurant,Pizza Place,French Restaurant,Hockey Arena,Yoga Studio,Diner,Deli / Bodega,Department Store,Dessert Shop
5,Scarborough,2,Fast Food Restaurant,Yoga Studio,Deli / Bodega,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run
8,East York,2,Pizza Place,Gym / Fitness Center,Bank,Intersection,Fast Food Restaurant,Pet Store,Pharmacy,Gastropub,Athletics & Sports,Diner
12,East York,2,Skating Rink,Video Store,Curling Ice,Pharmacy,Spa,Dance Studio,Park,Beer Store,Drugstore,Eastern European Restaurant
14,York,2,Field,Hockey Arena,Playground,Trail,Yoga Studio,Distribution Center,Department Store,Dessert Shop,Dim Sum Restaurant,Diner
25,North York,2,Pool,Golf Course,Fast Food Restaurant,Dog Run,Mediterranean Restaurant,Yoga Studio,Department Store,Dessert Shop,Dim Sum Restaurant,Diner
27,East York,2,Indian Restaurant,Yoga Studio,Pharmacy,Sandwich Place,Liquor Store,Discount Store,Burger Joint,Restaurant,Fast Food Restaurant,Bank
46,East Toronto,2,Fast Food Restaurant,Italian Restaurant,Pub,Sushi Restaurant,Steakhouse,Ice Cream Shop,Burrito Place,Pizza Place,Restaurant,Pet Store
50,Scarborough,2,Motel,American Restaurant,Yoga Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Deli / Bodega
53,York,2,Sandwich Place,Skating Rink,Restaurant,Discount Store,Yoga Studio,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Distribution Center


In [175]:
toronto.loc[toronto['Cluster labels'] == 3, toronto.columns[[1] + list(range(5, toronto.shape[1]))]]

Unnamed: 0,Borough,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
30,Scarborough,3,Playground,Yoga Studio,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Doner Restaurant
79,Central Toronto,3,Gym,Playground,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Curling Ice,Distribution Center


In [176]:
toronto.loc[toronto['Cluster labels'] == 4, toronto.columns[[1] + list(range(5, toronto.shape[1]))]]

Unnamed: 0,Borough,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Scarborough,4,Construction & Landscaping,Bar,Yoga Studio,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Doner Restaurant
49,North York,4,Construction & Landscaping,Yoga Studio,Deli / Bodega,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run
