# Segmenting and Clustering Neighborhoods in Toronto

## 1) Web scraping with BeautifulSoup

First, i´ll import the essencial libraries to do the web scraping.

In [155]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

Them we will create the dataframe and the requests object with the page that i´ll do the web scraping. In this case the wikipedia with the postal codes Toronto, Canada.

In [156]:
# Creating an empty Dataframe
df = pd.DataFrame()

#the page we want to load
page = requests.get("http://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")

#if 200 the download was a success
print("the status code is ", page.status_code)

the status code is  200


If the status code is 200, then the request was concluded with success.
Then, with the request object done, i can create the "BeautifulSoup" object to do the web scraping in the page.

In [157]:
# using the BS4 library
soup = BeautifulSoup(page.content, 'html.parser')

With the BS4 object created, i´ll collect the postal codes in the table.
The first loop is to find the table with class is "wikitable sortable".
The second loop is to go through all the fields in the table that it is between the tags <TD> and </TD> and take the text.
The if and elif are to get the each of the three columns (Postal code, Borough and Neighborhood) and put in the correct place.

In [158]:
PostalCode = []
Borough = []
Neighborhood = []
i = 0
for article in soup.find_all('table', class_="wikitable sortable"):
    for index in article.find_all('td'):
        tdline = index.text
        if (i%3==0):
            PostalCode.append(tdline)
        elif (i%3==1):
            Borough.append(tdline)
        elif (i%3==2):
            Neighborhood.append(tdline.replace('\n', ''))
        i += 1

# Append columns to the Empty DataFrame
df['PostalCode'] = PostalCode
df['Borough'] = Borough
df['Neighborhood'] = Neighborhood


Then i need to clean the base, removing the rows with "Not assigned" value in the column "Borough". And if the column Neighborhood has a "Not assigned" value then copy the value of the "Borough" to it. And finally, group the lines with the same postal code in one line.

In [159]:
# Get names of indexes for which column Borough has value 'Not assigned'
indexNames = df[df['Borough'] == 'Not assigned'].index

# Delete these row indexes from dataFrame
df.drop(indexNames, inplace=True)

#If the column Neighborhood has a value 'Not assigned' then it copy the value of the 'Borough' to it
for index in df[df['Neighborhood'] == 'Not assigned'].index:
    df['Neighborhood'][index] = df['Borough'][index]

#group by the rows with the same PostalCode
index_cols = df.columns.tolist()
index_cols.remove("Neighborhood")
df = df.groupby(index_cols)["Neighborhood"].apply(lambda tags: ', '.join(tags))
df = df.reset_index()

df.shape

(103, 3)

The result can be seen below.

In [160]:
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


## 2) Coordinates for each postal code

In this step, i´ll cread the csv with the coordinates for each of the postal codes and merge the dataframe df with the geolocation_df and drop the duplicate "Postal Code" column.

In [161]:
#load the file with the coordinates
geolocation_df = pd.read_csv('https://cocl.us/Geospatial_data')

#merge the two dataframes
df = df.merge(geolocation_df, left_index=True, right_index=True)

#drop the duplicate column 'Postal Code'
df = df.drop('Postal Code', axis = 1)

The dataframe can be seen below.

In [162]:
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


## 3) Clustering Toronto Neighborhoods with K-means

First, i´ll import all the essencial libraries to do the clustering and create the folium maps.

In [163]:
import folium
from geopy.geocoders import Nominatim
import json
import requests # library to handle requests
from pandas.io.json import json_normalize
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

To each neighborhood, i want to know what kind of venues and quantity of kind of venues. To do it, i´ll use the Foursquare API and the keys to connect to it.

In [209]:
# @hidden_cell

CLIENT_ID = 'LDRO1FGI3TJF2WQD5G1ZNZU3T10KCEW5VQNZY422MVUYBBWI' # your Foursquare ID
CLIENT_SECRET = 'D3GMSNWLCDUZDOLUEX0GOMCKTGKTAK5ODQNEZGFPORHE0R0T' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

The two functions below are to extract the category of the venue and to connect in the API and collect the venues for each set of latitudes and longitudes in a given radius.

In [165]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']

    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']


def getNearbyVenues(names, latitudes, longitudes, radius=500 # define radius
                    , LIMIT=100): # limit of number of venues returned by Foursquare API
    venues_list = []
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)

        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            VERSION,
            lat,
            lng,
            radius,
            LIMIT)

        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']

        # return only relevant information for each nearby venue
        venues_list.append([(
            name,
            lat,
            lng,
            v['venue']['name'],
            v['venue']['location']['lat'],
            v['venue']['location']['lng'],
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood',
                             'Neighborhood Latitude',
                             'Neighborhood Longitude',
                             'Venue',
                             'Venue Latitude',
                             'Venue Longitude',
                             'Venue Category']

    return (nearby_venues)


Before we use the functions, we divided the base in three parts because of the constant timeouts of connection with the Foursquare API. The separation of the dataframe is shown below.

In [166]:
temp = df[df['Borough'].str.contains("Toronto")]
temp2 = df[df['Borough'].str.contains("York")]
temp3 = df[~df['Borough'].str.contains("York")]
temp3 = temp3[~temp3['Borough'].str.contains("Toronto")]

Using the three dataframes with the functions, we have.

Note: To facilitate a future merge in the base, I used the PostalCode column

In [168]:
toronto_venues = getNearbyVenues(names=temp['PostalCode'],
                                   latitudes=temp['Latitude'],
                                   longitudes=temp['Longitude']
                                  )

append2 = getNearbyVenues(names=temp2['PostalCode'],
                                   latitudes=temp2['Latitude'],
                                   longitudes=temp2['Longitude']
                                  )

append3 = getNearbyVenues(names=temp3['PostalCode'],
                                   latitudes=temp3['Latitude'],
                                   longitudes=temp3['Longitude']
                                  )
print("Load Complete")

Load Complete


In [171]:
print('Load Complete with rows and columns: A)', toronto_venues.shape, ' B) ', append2.shape, ' C) ', append3.shape)

Load Complete with rows and columns: A) (1716, 7)  B)  (347, 7)  C)  (176, 7)


Joining all the dataframes in one dataframe and see the quantity of rows and colums of the final dataframe.

In [172]:
toronto_venues = toronto_venues.append(append2)
toronto_venues = toronto_venues.append(append3)
toronto_venues.shape

(2239, 7)

And a example of the dataframe.

In [173]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M4E,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,M4E,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,M4E,43.676357,-79.293031,Glen Stewart Ravine,43.6763,-79.294784,Other Great Outdoors
3,M4E,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
4,M4E,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood


We can see the quantity of uniques categories below.

In [57]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 270 uniques categories.


Analyzing the categories in the base, we see three categories that are basically the same thing, but with different names. To correct it, I did the replace as shown below.

In [174]:
#Cleaning the base grouping values that has the same meaning
toronto_venues["Venue Category"] = toronto_venues["Venue Category"].replace("Café", "Coffee Shop")
toronto_venues["Venue Category"] = toronto_venues["Venue Category"].replace("Cafeteria", "Coffee Shop")

Below i work the dataframe to summarize the venues categories for each neighborhood.

In [175]:
toronto_data = df.reset_index(drop=True)

toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood']

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

#group by results by neighborhood and calculate the mean
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,M1B,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M1C,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,M1E,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M1G,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,M1H,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


With the dataframe ready, i used the K means to cluster the neighborhoods. After testing the algorithm with different k values, I have settled the k to a total of 7 clusters.

In [176]:
# set number of clusters
kclusters = 7

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 6, 1, 5, 3, 3, 3, 3, 3, 3], dtype=int32)

The function return_most_common_venues is to show the most common top venues in the neighborhood.

In [193]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)

    return row_categories_sorted.index.values[0:num_top_venues]

To analyze the result, i get the top 10 most common venues for each neighborhood. And add the cluster labels that the k means algorithm generated into it.

In [194]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
neighborhoods_venues_sorted.head()

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,M1B,Fast Food Restaurant,Dessert Shop,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
1,6,M1C,History Museum,Bar,Women's Store,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Dim Sum Restaurant
2,1,M1E,Intersection,Moving Target,Electronics Store,Rental Car Location,Pizza Place,Breakfast Spot,Mexican Restaurant,Medical Center,Diner,Discount Store
3,5,M1G,Coffee Shop,Korean Restaurant,Women's Store,Dumpling Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Electronics Store
4,3,M1H,Gas Station,Thai Restaurant,Fried Chicken Joint,Bank,Athletics & Sports,Caribbean Restaurant,Bakery,Hakka Restaurant,Eastern European Restaurant,Dumpling Restaurant


And merge the result with the original dataset and drop any row that has a "NA" value in the Cluster Label.

In [196]:
toronto_merged = toronto_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='PostalCode')

toronto_merged = toronto_merged.dropna()
toronto_merged['Cluster Labels'] = toronto_merged['Cluster Labels'].astype(int)

toronto_merged.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,0,Fast Food Restaurant,Dessert Shop,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,6,History Museum,Bar,Women's Store,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Dim Sum Restaurant
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,1,Intersection,Moving Target,Electronics Store,Rental Car Location,Pizza Place,Breakfast Spot,Mexican Restaurant,Medical Center,Diner,Discount Store
3,M1G,Scarborough,Woburn,43.770992,-79.216917,5,Coffee Shop,Korean Restaurant,Women's Store,Dumpling Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Electronics Store
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,3,Gas Station,Thai Restaurant,Fried Chicken Joint,Bank,Athletics & Sports,Caribbean Restaurant,Bakery,Hakka Restaurant,Eastern European Restaurant,Dumpling Restaurant


To finaly, plot in the map the clusters segmented by the k means algorithm. The plot was done using the folium library, generating a circle marker for each Postal Code, with each collor representing a different cluster.

In [197]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

rainbow = ['#cc0000', '#ffff00', '#00cc00', '#000099', '#990066', '#ff66ff', '#666666']

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'],
                                  toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster - 1],
        fill=True,
        fill_color=rainbow[cluster - 1],
        fill_opacity=0.9).add_to(map_clusters)

map_clusters


And each cluster can be seen below with their respective neighborhoods.

In [198]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Scarborough,0,Fast Food Restaurant,Dessert Shop,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore


In [199]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Scarborough,1,Intersection,Moving Target,Electronics Store,Rental Car Location,Pizza Place,Breakfast Spot,Mexican Restaurant,Medical Center,Diner,Discount Store
13,Scarborough,1,Pizza Place,Bank,Fried Chicken Joint,Fast Food Restaurant,Pharmacy,Noodle House,Gas Station,Convenience Store,Intersection,Chinese Restaurant
15,Scarborough,1,Fast Food Restaurant,Grocery Store,Chinese Restaurant,Pizza Place,Supermarket,Discount Store,Sandwich Place,Breakfast Spot,Electronics Store,Pharmacy
24,North York,1,Grocery Store,Pharmacy,Pizza Place,Coffee Shop,Discount Store,Home Service,Donut Shop,Dim Sum Restaurant,Diner,Dog Run
34,North York,1,Hockey Arena,Coffee Shop,Pizza Place,Intersection,Financial or Legal Service,Portuguese Restaurant,Women's Store,Dim Sum Restaurant,Diner,Discount Store
35,East York,1,Fast Food Restaurant,Pizza Place,Intersection,Gym / Fitness Center,Pet Store,Pharmacy,Gastropub,Coffee Shop,Bank,Bus Line
81,York,1,Brewery,Bus Line,Pizza Place,Convenience Store,Dumpling Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore
89,Etobicoke,1,Pizza Place,Gym,Pharmacy,Sandwich Place,Pub,Athletics & Sports,Pool,Skating Rink,Coffee Shop,Comic Shop
96,North York,1,Pizza Place,Empanada Restaurant,Women's Store,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Dumpling Restaurant
99,Etobicoke,1,Pizza Place,Chinese Restaurant,Middle Eastern Restaurant,Coffee Shop,Sandwich Place,Discount Store,Intersection,Women's Store,Diner,Dog Run


In [200]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
91,Etobicoke,2,Baseball Field,Women's Store,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Fast Food Restaurant
97,North York,2,Paper / Office Supplies Store,Baseball Field,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Fast Food Restaurant


In [201]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Scarborough,3,Gas Station,Thai Restaurant,Fried Chicken Joint,Bank,Athletics & Sports,Caribbean Restaurant,Bakery,Hakka Restaurant,Eastern European Restaurant,Dumpling Restaurant
5,Scarborough,3,Spa,Playground,Drugstore,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant
6,Scarborough,3,Department Store,Hobby Shop,Bus Station,Discount Store,Coffee Shop,Diner,Dog Run,Doner Restaurant,Donut Shop,Drugstore
7,Scarborough,3,Bakery,Bus Line,Park,Bus Station,Metro Station,Intersection,Ice Cream Shop,Soccer Field,Gastropub,Dessert Shop
8,Scarborough,3,Motel,American Restaurant,Women's Store,Dessert Shop,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore
9,Scarborough,3,College Stadium,Skating Rink,Coffee Shop,General Entertainment,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant
10,Scarborough,3,Indian Restaurant,Pet Store,Vietnamese Restaurant,Chinese Restaurant,Women's Store,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Dog Run
11,Scarborough,3,Smoke Shop,Vietnamese Restaurant,Auto Garage,Bakery,Breakfast Spot,Sandwich Place,Shopping Mall,Accessories Store,Middle Eastern Restaurant,Empanada Restaurant
12,Scarborough,3,Latin American Restaurant,Lounge,Breakfast Spot,Clothing Store,Dumpling Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore
17,North York,3,Pool,Golf Course,Dog Run,Mediterranean Restaurant,Drugstore,Dim Sum Restaurant,Diner,Discount Store,Doner Restaurant,Donut Shop


In [206]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Scarborough,4,Park,Coffee Shop,Playground,Donut Shop,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant
23,North York,4,Park,Bank,Convenience Store,Dumpling Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant
25,North York,4,Park,Food & Drink Shop,Bus Stop,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Falafel Restaurant,Donut Shop
30,North York,4,Park,Airport,Snack Place,Other Repair Shop,College Stadium,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant
31,North York,4,Grocery Store,Hotel,Shopping Mall,Bank,Park,German Restaurant,General Travel,Empanada Restaurant,Golf Course,Electronics Store
40,East York,4,Park,Coffee Shop,Convenience Store,Dumpling Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore
44,Central Toronto,4,Park,Bus Line,Swim School,Donut Shop,Diner,Discount Store,Dog Run,Doner Restaurant,Drugstore,Dessert Shop
50,Downtown Toronto,4,Park,Trail,Playground,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop
64,Central Toronto,4,Park,Trail,Jewelry Store,Sushi Restaurant,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop
72,North York,4,Park,Pub,Japanese Restaurant,Sushi Restaurant,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Doner Restaurant


In [205]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 5, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Scarborough,5,Coffee Shop,Korean Restaurant,Women's Store,Dumpling Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Electronics Store
20,North York,5,Coffee Shop,Women's Store,Drugstore,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant


In [204]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 6, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Scarborough,6,History Museum,Bar,Women's Store,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Dim Sum Restaurant
