# Applied Data Science Capstone
# Segmenting and Clustering Neighborhoods in Toronto
#### Author: Miguel Burg Demay
#### June 2021

This challenge aims to explore, segment, and cluster the neighborhoods in the city of Toronto based on the postalcode and borough information, available in a Wikipedia page. 

It is composed by three parts:

1. **Scrape the Wikipedia page:** wrangle the data, clean it, and then read it into a pandas dataframe 
2. **Get latitudes and longitudes:** get the latitude and the longitude coordinates of each neighborhood
3. **Explore and cluster:** the neighborhoods in Toronto 

## Part 1: Scrape the Wikipedia page

For this first step, BeautifulSoup is going to be used to read the Wikipedia page. Then, the html file will be scraped, searching for relevant information.

In [1]:
import pandas as pd
from bs4 import BeautifulSoup as bs
import requests
import folium
import json
from pandas.io.json import json_normalize
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

In [2]:
#using BeatifulSoup to get data from url page
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
data = requests.get(url).text
soup = bs(data,"html5lib")

In [3]:
toronto_data=[]
table=soup.find('table') #selecting data table
for row in table.findAll('td'): #selecting each data row
    columns = {}
    try:
        columns['Postal Code'] = row.p.text[:3] #selecting postal code from 'p' tag - just 3 characters
        columns['Borough'] = row.span.text.split('(')[0] #selecting tag span, considering borough as the whole text before '('
        columns['Neighborhood'] = row.span.text.split('(')[1].strip(')').replace(' /',',').replace(')',' ').strip(' ')
                                #selecting tag span, considering neighborhood as the text after '(', replacing '/' by ','
                                #ignoring ')' and trailing ' '
        toronto_data.append(columns) 
    except: pass
    
df_toronto=pd.DataFrame(toronto_data) #creating dataframe from a list of dictionaries
df_toronto['Borough']=df_toronto['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})
                                            # renaming non standard data in wiki table

In [4]:
df_toronto.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government


In [5]:
df_toronto.shape

(103, 3)

## Part 2: Get latitudes and longitudes


In the second part of this project, geographic coordinates will be associated to each Toronto Borough. 2 graphs will be shown: one identifying each Borough in Toronto, and other restricted to Boroughs with the word 'Toronto' in the name.  

In [6]:
file = 'Geospatial_coordinates.csv' 
geo_data=pd.read_csv(file) #using csv file due package unreliability.

In [7]:
toronto_geo_data = pd.merge(df_toronto,geo_data, on='Postal Code') # adding geo coordinates
toronto_geo_data.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494


### Plotting geolocalization graphics

#### Toronto Boroughs

In [8]:
latitude = toronto_geo_data[toronto_geo_data['Postal Code']=='M5A']['Latitude'].values[0]
longitude = toronto_geo_data[toronto_geo_data['Postal Code']=='M5A']['Longitude'].values[0]
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, borough, neighborhood in zip(toronto_geo_data['Latitude'], toronto_geo_data['Longitude'], toronto_geo_data['Borough'], toronto_geo_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
print ('Toronto Boroughs')
map_toronto

Toronto Boroughs


Selecting Boroughs with the word 'Toronto' in the name

In [9]:
toronto=toronto_geo_data[toronto_geo_data['Borough'].str.contains('Toronto')].reset_index(drop=True)

In [10]:
toronto.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
3,M4E,East Toronto,The Beaches,43.676357,-79.293031
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306


### Boroughs with the word 'Toronto' in the name

In [11]:
latitude = toronto_geo_data[toronto_geo_data['Postal Code']=='M5A']['Latitude'].values[0]
longitude = toronto_geo_data[toronto_geo_data['Postal Code']=='M5A']['Longitude'].values[0]
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, borough, neighborhood in zip(toronto['Latitude'], toronto['Longitude'], toronto['Borough'], toronto['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='darkorange',
        fill=True,
        fill_color='#ff9900',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
print('Boroughs with the word ''Toronto'' in the name')
map_toronto

Boroughs with the word Toronto in the name


## Part 3: Explore and cluster

In this section, venues in Toront will be explored with Foursquare.

In [13]:
file = 'fs.csv'
df_fs = pd.read_csv(file)
df_fs
CLIENT_ID = df_fs['CLIENT_ID'][0]
CLIENT_SECRET =  df_fs['CLIENT_SECRET'][0]
VERSION = '20180605' 
LIMIT = 100 
print(CLIENT_ID, CLIENT_SECRET)

0BNEGSE0BKLCUIP5JOUUVNIYAKTHE32HK5RH1DH43G3G5XP3 CCG2RG0BXHZ13EESTWU2BLXPWCJBV5WU5GVJ3IOIHYLVLB1M


The following function gets information about venues around each Borough.

In [14]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):      
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        lat, 
        lng, 
        radius, 
        LIMIT)
            
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

The function is then called to create a dataframe with information of the venues in neighborhoods. 

In [15]:
toronto_venues=getNearbyVenues(toronto['Neighborhood'],toronto['Latitude'],toronto['Longitude'])


In [16]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
1,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
2,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant


The dataframe has 1494 venue categories, but just 218 are unique. 

In [17]:
toronto_venues.shape

(1494, 7)

In [18]:
len(toronto_venues['Venue Category'].unique())

218

Each category is configured as a column. 

In [19]:
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
toronto_onehot['Neighborhood Name'] = toronto_venues['Neighborhood'] 
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighborhood Name,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [20]:
toronto_onehot.shape

(1494, 219)

Clustering the dataframe on the name of the neighborhood, it results in 39 items.

In [21]:
toronto_grouped = toronto_onehot.groupby('Neighborhood Name').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighborhood Name,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.042553,0.0,0.0,0.0,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0
2,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0,0.066667,0.066667,0.133333,0.2,0.066667,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.016129,0.0,0.016129
4,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [22]:
toronto_grouped.shape

(39, 219)

For each neighborhood, it is possible to estimate the 10 most known venues, as shown in the following

In [23]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [24]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood Name']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Sandwich Place,Bakery,Cocktail Bar,Coffee Shop,Vegetarian / Vegan Restaurant,Seafood Restaurant,Beer Bar,Farmers Market,Comfort Food Restaurant,Pharmacy
1,"Brockton, Parkdale Village, Exhibition Place",Coffee Shop,Café,Breakfast Spot,Sandwich Place,Italian Restaurant,Stadium,Restaurant,Bar,Bakery,Nightclub
2,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Lounge,Harbor / Marina,Bar,Rental Car Location,Sculpture Garden,Boutique,Boat or Ferry,Plane,Airport Terminal
3,Central Bay Street,Coffee Shop,Sandwich Place,Sushi Restaurant,Italian Restaurant,Japanese Restaurant,Café,Burger Joint,Salad Place,Bank,Pizza Place
4,Christie,Grocery Store,Café,Park,Baby Store,Coffee Shop,Restaurant,Nightclub,Italian Restaurant,Bank,Music Venue


Now that the dataframe is prepared for clustering, the k-means algorithm can run. 
The number of clusters was defined based on an empirical analysis of the outcomes. 

In [25]:
kclusters = 8
toronto_grouped_clustering = toronto_grouped.drop('Neighborhood Name', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)


In [26]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)



In [27]:
toronto_merged = toronto
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
toronto_merged.head() 

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,4,Coffee Shop,Park,Bakery,Pub,Café,Spa,Sandwich Place,Breakfast Spot,Performing Arts Venue,Mexican Restaurant
1,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,4,Coffee Shop,Clothing Store,Sandwich Place,Café,Cosmetics Shop,Japanese Restaurant,Hotel,Pizza Place,Bank,Thai Restaurant
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Coffee Shop,Italian Restaurant,Café,Cocktail Bar,Restaurant,Clothing Store,Japanese Restaurant,Beer Bar,Gastropub,Cosmetics Shop
3,M4E,East Toronto,The Beaches,43.676357,-79.293031,6,Health Food Store,Grocery Store,Pub,Neighborhood,Movie Theater,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,0,Sandwich Place,Bakery,Cocktail Bar,Coffee Shop,Vegetarian / Vegan Restaurant,Seafood Restaurant,Beer Bar,Farmers Market,Comfort Food Restaurant,Pharmacy


The following figure presents the localization of each neighborhood. The colors are associated with cluster numbers. 

In [28]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Cluster 0 and Cluster 4 are the most populated clusters. 

In [29]:
for i in range(kclusters): print('Cluster {}: {} elements'.format(i,toronto_merged.loc[toronto_merged['Cluster Labels']==i].shape[0]))

Cluster 0: 14 elements
Cluster 1: 2 elements
Cluster 2: 1 elements
Cluster 3: 1 elements
Cluster 4: 17 elements
Cluster 5: 1 elements
Cluster 6: 2 elements
Cluster 7: 1 elements


In [30]:
toronto_merged.loc[toronto_merged['Cluster Labels']==0,toronto_merged.columns[[1]+list(range(5,toronto_merged.shape[1]))]].head()

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Downtown Toronto,0,Coffee Shop,Italian Restaurant,Café,Cocktail Bar,Restaurant,Clothing Store,Japanese Restaurant,Beer Bar,Gastropub,Cosmetics Shop
4,Downtown Toronto,0,Sandwich Place,Bakery,Cocktail Bar,Coffee Shop,Vegetarian / Vegan Restaurant,Seafood Restaurant,Beer Bar,Farmers Market,Comfort Food Restaurant,Pharmacy
8,West Toronto,0,Pharmacy,Music Venue,Brazilian Restaurant,Middle Eastern Restaurant,Liquor Store,Café,Supermarket,Bar,Bank,Bakery
11,West Toronto,0,Bar,Café,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Coffee Shop,Asian Restaurant,New American Restaurant,Yoga Studio,Diner,Salon / Barbershop
12,East Toronto,0,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Yoga Studio,Bubble Tea Shop,Furniture / Home Store,Frozen Yogurt Shop,Spa,Brewery


In [31]:
toronto_merged.loc[toronto_merged['Cluster Labels']==1,toronto_merged.columns[[1]+list(range(5,toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,Central Toronto,1,Park,Trail,Jewelry Store,Sushi Restaurant,Adult Boutique,Monument / Landmark,Martial Arts School,Mediterranean Restaurant,Men's Store,Metro Station
33,Downtown Toronto,1,Park,Playground,Trail,Adult Boutique,Moroccan Restaurant,Martial Arts School,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant


In [32]:
toronto_merged.loc[toronto_merged['Cluster Labels']==2,toronto_merged.columns[[1]+list(range(5,toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
29,Central Toronto,2,Tennis Court,Park,Movie Theater,Martial Arts School,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop


In [33]:
toronto_merged.loc[toronto_merged['Cluster Labels']==3,toronto_merged.columns[[1]+list(range(5,toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,Central Toronto,3,Garden,Pool,Museum,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant


In [34]:
toronto_merged.loc[toronto_merged['Cluster Labels']==4,toronto_merged.columns[[1]+list(range(5,toronto_merged.shape[1]))]].head()

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,4,Coffee Shop,Park,Bakery,Pub,Café,Spa,Sandwich Place,Breakfast Spot,Performing Arts Venue,Mexican Restaurant
1,Downtown Toronto,4,Coffee Shop,Clothing Store,Sandwich Place,Café,Cosmetics Shop,Japanese Restaurant,Hotel,Pizza Place,Bank,Thai Restaurant
5,Downtown Toronto,4,Coffee Shop,Sandwich Place,Sushi Restaurant,Italian Restaurant,Japanese Restaurant,Café,Burger Joint,Salad Place,Bank,Pizza Place
7,Downtown Toronto,4,Coffee Shop,Café,Sandwich Place,Gym,Clothing Store,Restaurant,Sushi Restaurant,Cosmetics Shop,Burrito Place,Steakhouse
10,Downtown Toronto,4,Coffee Shop,Café,Hotel,Scenic Lookout,Pizza Place,Aquarium,Deli / Bodega,Sporting Goods Shop,Brewery,Sports Bar


In [35]:
toronto_merged.loc[toronto_merged['Cluster Labels']==5,toronto_merged.columns[[1]+list(range(5,toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,East York/East Toronto,5,Film Studio,Convenience Store,Park,Adult Boutique,Movie Theater,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant


In [36]:
toronto_merged.loc[toronto_merged['Cluster Labels']==6,toronto_merged.columns[[1]+list(range(5,toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,East Toronto,6,Health Food Store,Grocery Store,Pub,Neighborhood,Movie Theater,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant
6,Downtown Toronto,6,Grocery Store,Café,Park,Baby Store,Coffee Shop,Restaurant,Nightclub,Italian Restaurant,Bank,Music Venue


In [37]:
toronto_merged.loc[toronto_merged['Cluster Labels']==7,toronto_merged.columns[[1]+list(range(5,toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,Central Toronto,7,Park,Bus Line,Dim Sum Restaurant,Swim School,Adult Boutique,Movie Theater,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant


In [38]:
toronto_merged.loc[toronto_merged['Cluster Labels']==4]['1st Most Common Venue'].value_counts()


Coffee Shop             12
Pizza Place              1
Sandwich Place           1
Fast Food Restaurant     1
Sushi Restaurant         1
Café                     1
Name: 1st Most Common Venue, dtype: int64

In [39]:
toronto_merged.loc[toronto_merged['Cluster Labels']==0]['1st Most Common Venue'].value_counts()

Coffee Shop           3
Sandwich Place        2
Mexican Restaurant    1
Bar                   1
Airport Service       1
Skate Park            1
Café                  1
Pizza Place           1
Pharmacy              1
Greek Restaurant      1
Restaurant            1
Name: 1st Most Common Venue, dtype: int64

It seems clear that: 
* Group 0: is diverse, with a lot of coffee shops and sandwich places.
* Group 1: has a lot of parks
* Group 2: has a lot of places to play Tennis
* Group 3: has gardens
* Group 4: has Coffee Shops as the predominat venue. 
* Group 5: if you are searching for Film Studios, it is where you have to go.
* Group 6: it is a controversal group, where you can go if you are searching for Health food or Groceries.
* Group 7: as group 1, it has a lot of parks. However, it has a lot of bus lines too. 