
# IBM Data Science Certification - Project Capstone

This notebook is developed as the Project Capstone of Luiz Claudio Boechat in order to complete the IBM Data Science Certification Track, held on Coursera platform.

## Summary

1. <a href="#item1">Data Acquisition</a>
2. <a href="#item2">Data Visualization
3. <a href="#item3">Neighborhood Exploration</a>
4. <a href="#item4">Clustering Neighborhoods</a>


## 1. Data Acquisition

This is the first part of the Project. This assignment consists of acquiring the <a href="https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M">list of neighborhoods of Toronto</a> and loading into a Data Frame.

As the first step, all necessary libraries and classes are imported to the current environment.

In [1]:
#importing libraries
from bs4 import BeautifulSoup
from IPython.core.display import HTML
import pandas as pd
import requests

print("Imported dependencies!")

Imported dependencies!


Considering the link to the data, the html content is requested and parsed using the `BeatifulSoup` class.

In [2]:
#HTTP request to the URL
data_url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
data_page = requests.get(data_url)

#HTML parsing with BeautifulSoup
data_soup = BeautifulSoup(data_page.text, 'html.parser')

#extracting html table with data
data_html_table = data_soup.find_all('table', class_='wikitable')
data_html_table_str = str(data_html_table[0])
HTML(data_html_table_str)

Postal Code,Borough,Neighborhood
M1A,Not assigned,
M2A,Not assigned,
M3A,North York,Parkwoods
M4A,North York,Victoria Village
M5A,Downtown Toronto,"Regent Park, Harbourfront"
M6A,North York,"Lawrence Manor, Lawrence Heights"
M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
M8A,Not assigned,
M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
M1B,Scarborough,"Malvern, Rouge"


The html content is converted into a Data Frame.

In [3]:
#converting html table into data frame
df = pd.read_html(data_html_table_str)[0]

print('Read data frame with shape: ', df.shape)
print('Data Frame Head:')
df.head()

Read data frame with shape:  (180, 3)
Data Frame Head:


Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


Processing Data Frame in order to remove unassigned Postal Codes (which is the same of removing empty Neighborhoods).

In [4]:
#processing data frame
#renaming column PostalCode
df.rename(columns={'Postal Code': 'PostalCode'}, inplace=True)
#dropping rows with empty neighborhood
df.dropna(subset=['Neighborhood'], inplace=True)

print('Data Frame Head:')
df.head()

Data Frame Head:


Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [5]:
print('Data Frame description:')
df.describe(include='all')

Data Frame description:


Unnamed: 0,PostalCode,Borough,Neighborhood
count,103,103,103
unique,103,10,99
top,M6S,North York,Downsview
freq,1,24,4


In [6]:
print('Is there any unassigned Neighborhood?', any(df['Neighborhood'].isna()))
print('Is there any unassigned Borough?', any(df['Borough'] == 'Not assigned'))
print('Is there any duplicated Postal Code?', any(df['PostalCode'].duplicated()))

Is there any unassigned Neighborhood? False
Is there any unassigned Borough? False
Is there any duplicated Postal Code? False


In [7]:
print('Shape of data frame: ', df.shape)

Shape of data frame:  (103, 3)


_Note:_ It is not necessary to combine neighborhood names according postal code, neither assigning borough names to neighborhoods because:
* data was extracted directly from Wikipedia URL where all empty Neighborhoods have 'Not Assigned' boroughs;
* no empty neighborhoods remained in the data frame after dropping rows;
* no duplicate Postal Codes remained.

Acquiring coordinates for each postal code using <a href='https://cocl.us/Geospatial_data'>Geospatial CSV file</a>.

In [8]:
#reading CSV file
df_coords = pd.read_csv('https://cocl.us/Geospatial_data')
#renaming columns
df_coords.rename(columns={'Postal Code': 'PostalCode'}, inplace=True)

print('Shape of coordinates\' data frame: ', df.shape)
print('Coordinates data frame head:')
df_coords.head()

Shape of coordinates' data frame:  (103, 3)
Coordinates data frame head:


Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Joining two databases.

In [9]:
#testing if all postal codes from one data frame is in another
print('Are all postal codes from df in df_coords?', all(df['PostalCode'].isin(df_coords['PostalCode'])))
print('Are all postal codes from df_coords in df?', all(df_coords['PostalCode'].isin(df['PostalCode'])))

Are all postal codes from df in df_coords? True
Are all postal codes from df_coords in df? True


In [10]:
#joining data frames on Postal Code value
df = df.merge(df_coords, on='PostalCode')

print('Read data frame with shape: ', df.shape)
print('Data Frame Head:')
df.head()

Read data frame with shape:  (103, 5)
Data Frame Head:


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


## 2. Data Visualization

With the dataframe containing all neighborhoods and coordinates, it is possible to plot a map using `folium` library.

In [11]:
#installing libraries
!pip install folium
!pip install geopy

#importing libraries
import folium
from folium import plugins
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

print("Imported dependencies!")

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/a4/f0/44e69d50519880287cc41e7c8a6acc58daa9a9acf5f6afc52bcc70f69a6d/folium-0.11.0-py2.py3-none-any.whl (93kB)
[K     |████████████████████████████████| 102kB 8.7MB/s ta 0:00:011
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/13/fb/9eacc24ba3216510c6b59a4ea1cd53d87f25ba76237d7f4393abeaf4c94e/branca-0.4.1-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0
Imported dependencies!


The map is centered according to Toronto coordinates.

In [12]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of {} are {}, {}.'.format(address, latitude, longitude))

The geograpical coordinate of Toronto, Ontario are 43.6534817, -79.3839347.


In [13]:
# create map of Manhattan using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

## 3. Neighborhood Exploration

In [14]:
# @hidden_cell
CLIENT_ID = '3HBP2YBLOSMGO5OQU0C5ISMEV2XD2STJYTFDKRCS0QBZNHLS' # your Foursquare ID
CLIENT_SECRET = 'D5WMNK0E2ZQVT2YCWI3ITUMI0WZUZZPKY1HYOKQD32UDIR5N' # your Foursquare Secret
VERSION = '20180604'

Using Foursquare API, all neighborhoods are explored in order to find out their venues. For each neighborhood, the API is used to query venues' names, location and categories. 

In [22]:
#function used for getting neighborhood venues
def getNeighborhoodVenues(names, latitudes, longitudes, radius=500, limit=15):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, limit)            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        # return only relevant information for each nearby venue
        venues_list.append([(name, lat, lng, v['venue']['name'], v['venue']['location']['lat'], v['venue']['location']['lng'], v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']
    
    return(nearby_venues)

For the exploration, only neighborhoods whose boroughs are located in Toronto are considered.

In [23]:
#filtering Toronto neighborhoods
toronto_df = df[df['Borough'].str.contains('Toronto')]
#querying venues
toronto_venues = getNeighborhoodVenues(names=toronto_df['Neighborhood'], latitudes=toronto_df['Latitude'], longitudes=toronto_df['Longitude'])
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot
3,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
4,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa


In [25]:
#venues per neighborhood
toronto_venues.groupby('Neighborhood').count()['Venue']

Neighborhood
Berczy Park                                                                                                   15
Brockton, Parkdale Village, Exhibition Place                                                                  15
Business reply mail Processing Centre, South Central Letter Processing Plant Toronto                          15
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport    15
Central Bay Street                                                                                            15
Christie                                                                                                      15
Church and Wellesley                                                                                          15
Commerce Court, Victoria Hotel                                                                                15
Davisville                                                                         

Evaluating categories of venues for each neighborhood:

In [26]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Art Gallery,Arts & Crafts Store,...,Swim School,Tailor Shop,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Grouping the row per neighborhood and evaluating the frequency of each type of venue.

In [27]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Art Gallery,...,Swim School,Tailor Shop,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Business reply mail Processing Centre, South C...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.066667,0.066667,0.066667,0.066667,0.2,0.133333,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Church and Wellesley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0
7,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0
9,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Top 10 common venue categories of each neighborhoord:

In [44]:
#importing dependencies
import numpy as np

#auxiliary function for sorting most common categories in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#number of top venues
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted 

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Seafood Restaurant,Park,Coffee Shop,Restaurant,Breakfast Spot,Farmers Market,Concert Hall,Bistro,Liquor Store,Cocktail Bar
1,"Brockton, Parkdale Village, Exhibition Place",Coffee Shop,Café,Breakfast Spot,Furniture / Home Store,Bakery,Performing Arts Venue,Pet Store,Restaurant,Italian Restaurant,Bar
2,"Business reply mail Processing Centre, South C...",Garden,Butcher,Skate Park,Brewery,Smoke Shop,Burrito Place,Farmers Market,Auto Workshop,Pizza Place,Fast Food Restaurant
3,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Terminal,Boutique,Rental Car Location,Sculpture Garden,Boat or Ferry,Plane,Harbor / Marina,Airport Lounge,Airport Gate
4,Central Bay Street,Coffee Shop,Bar,Gastropub,Italian Restaurant,Modern European Restaurant,Middle Eastern Restaurant,Japanese Restaurant,Sushi Restaurant,Spa,Ramen Restaurant
5,Christie,Grocery Store,Café,Park,Diner,Nightclub,Restaurant,Baby Store,Athletics & Sports,Italian Restaurant,Candy Store
6,Church and Wellesley,Creperie,Salon / Barbershop,Restaurant,Beer Bar,Ramen Restaurant,Breakfast Spot,Dance Studio,Bubble Tea Shop,Burger Joint,Juice Bar
7,"Commerce Court, Victoria Hotel",Café,Coffee Shop,Gym,Museum,Gastropub,Restaurant,Tea Room,Pub,Bookstore,Gym / Fitness Center
8,Davisville,Dessert Shop,Italian Restaurant,Café,Indian Restaurant,Gym,Sandwich Place,Sushi Restaurant,Seafood Restaurant,Thai Restaurant,Park
9,Davisville North,Park,Sandwich Place,Hotel,Department Store,Breakfast Spot,Food & Drink Shop,Dance Studio,Cuban Restaurant,Dessert Shop,Deli / Bodega


## 4. Clustering Neighborhoods

Running k-Means algorithm for grouping neighborhoods into 5 clusters.

In [46]:
#importing dependencies
from sklearn.cluster import KMeans

#number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

#run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

#check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 2, 0, 0, 2, 2, 0, 2, 2, 0], dtype=int32)

Creating a new data frame including the labels generated by KMeans.

In [47]:
#add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

#merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_df
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() 

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,2,Coffee Shop,Breakfast Spot,Park,Historic Site,Distribution Center,Restaurant,Chocolate Shop,Bakery,Spa,Gym / Fitness Center
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0,Sushi Restaurant,Yoga Studio,Arts & Crafts Store,Hobby Shop,Distribution Center,Italian Restaurant,Diner,Creperie,Mexican Restaurant,Coffee Shop
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,2,Café,Steakhouse,Plaza,Comic Shop,Music Venue,Burrito Place,Coffee Shop,Mexican Restaurant,Tea Room,Thai Restaurant
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,2,Coffee Shop,Gastropub,Cosmetics Shop,Middle Eastern Restaurant,Restaurant,BBQ Joint,Creperie,Gym,Italian Restaurant,Japanese Restaurant
19,M4E,East Toronto,The Beaches,43.676357,-79.293031,2,Pub,Trail,Health Food Store,Wine Bar,Diner,Dessert Shop,Department Store,Deli / Bodega,Dance Studio,Cuban Restaurant


Cluster Visualization:

In [50]:
#importing dependencies
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Clusters

In [52]:
#cluster 1
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Downtown Toronto,0,Sushi Restaurant,Yoga Studio,Arts & Crafts Store,Hobby Shop,Distribution Center,Italian Restaurant,Diner,Creperie,Mexican Restaurant,Coffee Shop
20,Downtown Toronto,0,Seafood Restaurant,Park,Coffee Shop,Restaurant,Breakfast Spot,Farmers Market,Concert Hall,Bistro,Liquor Store,Cocktail Bar
31,West Toronto,0,Bakery,Pharmacy,Music Venue,Grocery Store,Brewery,Bar,Bank,Portuguese Restaurant,Supermarket,Café
36,Downtown Toronto,0,Park,Plaza,Hotel,Dessert Shop,Supermarket,Performing Arts Venue,Lake,Salad Place,Skating Rink,Sporting Goods Shop
37,West Toronto,0,Wine Bar,Art Gallery,Cuban Restaurant,Pizza Place,Korean Restaurant,Brewery,Ice Cream Shop,Cocktail Bar,Bar,Asian Restaurant
41,East Toronto,0,Greek Restaurant,Ice Cream Shop,Italian Restaurant,Yoga Studio,Restaurant,Dessert Shop,Cosmetics Shop,Pizza Place,Pub,Brewery
47,East Toronto,0,Movie Theater,Sushi Restaurant,Ice Cream Shop,Pub,Brewery,Burrito Place,Gym,Italian Restaurant,Pet Store,Steakhouse
67,Central Toronto,0,Park,Sandwich Place,Hotel,Department Store,Breakfast Spot,Food & Drink Shop,Dance Studio,Cuban Restaurant,Dessert Shop,Deli / Bodega
69,West Toronto,0,Thai Restaurant,Furniture / Home Store,Flea Market,Italian Restaurant,Music Venue,Bar,Speakeasy,Café,Grocery Store,Arts & Crafts Store
80,Downtown Toronto,0,Bakery,Restaurant,Yoga Studio,College Gym,Café,Dessert Shop,Japanese Restaurant,Bookstore,Italian Restaurant,Bar


In [53]:
#cluster 2
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
62,Central Toronto,1,Garden,Home Service,Wine Bar,Cuban Restaurant,Distribution Center,Diner,Dessert Shop,Department Store,Deli / Bodega,Dance Studio


In [54]:
#cluster 3
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Downtown Toronto,2,Coffee Shop,Breakfast Spot,Park,Historic Site,Distribution Center,Restaurant,Chocolate Shop,Bakery,Spa,Gym / Fitness Center
9,Downtown Toronto,2,Café,Steakhouse,Plaza,Comic Shop,Music Venue,Burrito Place,Coffee Shop,Mexican Restaurant,Tea Room,Thai Restaurant
15,Downtown Toronto,2,Coffee Shop,Gastropub,Cosmetics Shop,Middle Eastern Restaurant,Restaurant,BBQ Joint,Creperie,Gym,Italian Restaurant,Japanese Restaurant
19,East Toronto,2,Pub,Trail,Health Food Store,Wine Bar,Diner,Dessert Shop,Department Store,Deli / Bodega,Dance Studio,Cuban Restaurant
24,Downtown Toronto,2,Coffee Shop,Bar,Gastropub,Italian Restaurant,Modern European Restaurant,Middle Eastern Restaurant,Japanese Restaurant,Sushi Restaurant,Spa,Ramen Restaurant
25,Downtown Toronto,2,Grocery Store,Café,Park,Diner,Nightclub,Restaurant,Baby Store,Athletics & Sports,Italian Restaurant,Candy Store
30,Downtown Toronto,2,Coffee Shop,Steakhouse,Concert Hall,Seafood Restaurant,Plaza,Gym / Fitness Center,Speakeasy,Asian Restaurant,Restaurant,Hotel
42,Downtown Toronto,2,Café,Coffee Shop,Hotel,Restaurant,Beer Bar,Pub,Bakery,Gym,Gym / Fitness Center,Tea Room
43,West Toronto,2,Coffee Shop,Café,Breakfast Spot,Furniture / Home Store,Bakery,Performing Arts Venue,Pet Store,Restaurant,Italian Restaurant,Bar
48,Downtown Toronto,2,Café,Coffee Shop,Gym,Museum,Gastropub,Restaurant,Tea Room,Pub,Bookstore,Gym / Fitness Center


In [55]:
#cluster 4
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
68,Central Toronto,3,Park,Trail,Jewelry Store,Sushi Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,Comic Shop,Concert Hall
83,Central Toronto,3,Park,Trail,Restaurant,Creperie,Diner,Dessert Shop,Department Store,Deli / Bodega,Dance Studio,Cuban Restaurant
91,Downtown Toronto,3,Park,Trail,Playground,Cosmetics Shop,Diner,Dessert Shop,Department Store,Deli / Bodega,Dance Studio,Cuban Restaurant


In [56]:
#cluster 5
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
61,Central Toronto,4,Park,Bus Line,Swim School,Cosmetics Shop,Diner,Dessert Shop,Department Store,Deli / Bodega,Dance Studio,Cuban Restaurant
