# Applied Data Science Capstone Project

This notebook will be mainly used to program the assignments in the IBM Applied Data Science Capstone Project from Coursera.

I am excited to share my notebook with you, and hope to learn a lot from this experience!

## Part 1: Creating Notebook and importing libraries

The first assignment is to import some libraries and saying hello to the Notebook's readers.

In [1]:
import pandas as pd
import numpy as np
print('Hello Capstone Project Course!')

Hello Capstone Project Course!


## Part 2-A: Creating a dataframe of neighborhoods in Toronto

The assignment of Week 3 is to create a dataframe consisting of neighborhoods in Toronto from the following Wikipedia Page: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

To do so, we will use the BeautifulSoup Library:

In [2]:
import requests
from bs4 import BeautifulSoup

First, we check if scrapping the table from this website is legal. If the response status codo is 200, it is legal.

In [3]:
wikiurl="https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
table_class="wikitable sortable jquery-tablesorter"
response=requests.get(wikiurl)
print(response.status_code)

200


Since it is legal, we can proceed. The next step is to import the table using the html attributes of the table object in the website.

In [4]:
soup = BeautifulSoup(response.text, 'html.parser')
table=soup.find('table',{'class':"wikitable"})

Now, we modify the table with pandas until we have the desired dataframe, sorted by postal code.

In [5]:
toronto_neighborhoods=pd.read_html(str(table))
toronto_neighborhoods=pd.DataFrame(toronto_neighborhoods[0])
toronto_neighborhoods.rename(columns={'Neighbourhood':'Neighborhood'},inplace=True)
toronto_neighborhoods=toronto_neighborhoods[~toronto_neighborhoods.Borough.str.contains('Not assigned')]
toronto_neighborhoods.sort_values('Postal Code',axis=0,inplace=True)
toronto_neighborhoods.reset_index(inplace=True)
toronto_neighborhoods.drop('index',axis=1,inplace=True)
toronto_neighborhoods.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park"
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge"
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


## Part 2-B: Appending coordinates to the neighborhoods dataframe

The second part of Week 3's assignment is to annex the coordinates of each neighborhood to the dataframe created in Part 2-A.

To do so, we import the coordinates from the Geospacial Coordinates CSV file.

In [6]:
coordinates=pd.read_csv('Geospatial_Coordinates.csv')
coordinates.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Since the coordinates dataframe is also sorted by postal code, we can annex the 'Latitude' and 'Longitude' columns to the neighborhoods dataframe.

In [7]:
toronto_neighborhoods['Latitude']=coordinates['Latitude']
toronto_neighborhoods['Longitude']=coordinates['Longitude']
toronto_neighborhoods

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
...,...,...,...,...,...
98,M9N,York,Weston,43.706876,-79.518188
99,M9P,Etobicoke,Westmount,43.696319,-79.532242
100,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ...",43.688905,-79.554724
101,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest...",43.739416,-79.588437


## Part 2-C: Clustering neighborhoods in Toronto

The third part of Week 3's assignment is to cluster the neighborhoods in Toronto. To do so, we will replicate the analysis that we performed with Manhattan's neighborhoods.

### Initial visualization of the geographical data

First, we will import the necessary libraries to perform the analysis.

In [8]:
import json
from geopy.geocoders import Nominatim
import requests
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium

Second, we will get the geographical coordinates of Toronto.

In [9]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Toronto are 43.6534817, -79.3839347.


Now, we will create a map of all neighborhoods in Toronto.

In [10]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, borough, neighborhood in zip(toronto_neighborhoods['Latitude'], toronto_neighborhoods['Longitude'], toronto_neighborhoods['Borough'], toronto_neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Obtaining venue data from Foursquare API

Next, we will obtain a list of popular venues in a radius of 500 meters around each neighborhood.

To do so, we need to declare our Foursquare API credentials:

In [11]:
CLIENT_ID = '54S2IMLF3AIOSITW4P44RO5NODFXT2ZQ2GQU2BATPGDSYCNA' 
CLIENT_SECRET = 'FGJ0ML41C5LMEC1TF4YYUAR34AZVX2S1TBP0QM2M2HJ3JG5Z' 
VERSION = '20180605' 
LIMIT = 100

print('My credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

My credentials:
CLIENT_ID: 54S2IMLF3AIOSITW4P44RO5NODFXT2ZQ2GQU2BATPGDSYCNA
CLIENT_SECRET:FGJ0ML41C5LMEC1TF4YYUAR34AZVX2S1TBP0QM2M2HJ3JG5Z


Then, we define a function to get the top 100 venues in a radius of 500 meters around each neighborhood:

In [12]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Using the previous function, we obtain the venues for each neighborhood in Toronto and store them in a new dataframe.

In [13]:
toronto_venues = getNearbyVenues(names=toronto_neighborhoods['Neighborhood'],
                                   latitudes=toronto_neighborhoods['Latitude'],
                                   longitudes=toronto_neighborhoods['Longitude']
                                  )
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Malvern, Rouge",43.806686,-79.194353,Wendy’s,43.807448,-79.199056,Fast Food Restaurant
1,"Malvern, Rouge",43.806686,-79.194353,Interprovincial Group,43.80563,-79.200378,Print Shop
2,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
3,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,SEBS Engineering Inc. (Sustainable Energy and ...,43.782371,-79.15682,Construction & Landscaping
4,"Guildwood, Morningside, West Hill",43.763573,-79.188711,RBC Royal Bank,43.76679,-79.191151,Bank


### Data preprocessing

With the previous dataframe, we can perform a onehot encoding and use the groupby.mean() method to get dummy variables for each venue category, and analyze the frequency of each category per neighborhood:

In [14]:
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Adult Boutique,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000,0.000000,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000,0.000000,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000,0.000000,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000,0.000000,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000,0.000000,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
90,"Willowdale, Willowdale East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000,0.029412,0.0,0.0,0.0,0.0
91,"Willowdale, Willowdale West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000,0.000000,0.0,0.0,0.0,0.0
92,Woburn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.000,0.000000,0.0,0.0,0.0,0.0
93,Woodbine Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.125,0.000000,0.0,0.0,0.0,0.0


The next step is sorting the top 10 most frequent venues per neighborhood in order to classify them after the clustering algorithm.

We will do this by defining a function that returns said venues per neighborhood, and then we will create a dataframe to visualize the information.

The function:

In [15]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

The dataframe:

In [16]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Lounge,Latin American Restaurant,Skating Rink,Clothing Store,Breakfast Spot,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant
1,"Alderwood, Long Branch",Pizza Place,Gym,Pharmacy,Pub,Sandwich Place,Dance Studio,Coffee Shop,Drugstore,Donut Shop,Doner Restaurant
2,"Bathurst Manor, Wilson Heights, Downsview North",Bank,Coffee Shop,Shopping Mall,Deli / Bodega,Supermarket,Ice Cream Shop,Sushi Restaurant,Restaurant,Chinese Restaurant,Fried Chicken Joint
3,Bayview Village,Café,Bank,Chinese Restaurant,Japanese Restaurant,Women's Store,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant
4,"Bedford Park, Lawrence Manor East",Sandwich Place,Coffee Shop,Italian Restaurant,Sushi Restaurant,Japanese Restaurant,Butcher,Café,Indian Restaurant,Spa,Restaurant


### Clustering

Now, we are ready to use the k-means clustering algorithm. We will use the onehot-encoded dataframe to train our model using the scikit-learn machine learning library. Since the amount of points we are analyzing is greater than the Manhattan analysis, we will classify the neighborhoods in 7 clusters.

In [17]:
kclusters = 7 # number of clusters
toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)
# running the k-means clustering algorithm
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

To have the complete information of all the analysis, we will create a new dataframe comprised of all boroughs, neighborhoods, postal codes, latitudes, longitudes, cluster labels, and top 10 venue types:

In [18]:
# adding clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_neighborhoods

# merging toronto_grouped with toronto_neighborhoods to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

#dropping NaN values and transforming cluster labels into integers
toronto_merged.dropna(inplace=True)
toronto_merged['Cluster Labels']=toronto_merged['Cluster Labels'].astype('int32')

# checking the dataframe
toronto_merged.head() 

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,1,Fast Food Restaurant,Print Shop,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop,Department Store
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,1,Construction & Landscaping,Bar,Women's Store,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Eastern European Restaurant
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,1,Restaurant,Intersection,Mexican Restaurant,Bank,Medical Center,Breakfast Spot,Electronics Store,Rental Car Location,Eastern European Restaurant,Drugstore
3,M1G,Scarborough,Woburn,43.770992,-79.216917,0,Coffee Shop,Korean BBQ Restaurant,Women's Store,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Eastern European Restaurant
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,1,Caribbean Restaurant,Gas Station,Bakery,Athletics & Sports,Bank,Thai Restaurant,Fried Chicken Joint,Hakka Restaurant,Drugstore,Donut Shop


To visualize the clustering algorithm, we create a folium map:

In [19]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examining clusters

For our last step, we will analyze each cluster to find similarities between neighborhoods within the cluster, and differences between clusters.

##### Cluster 1 - Most common venues: Pizza places, discount stores, distribution centers, coffee shops, and diner.

In [20]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Scarborough,0,Coffee Shop,Korean BBQ Restaurant,Women's Store,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Eastern European Restaurant
24,North York,0,Grocery Store,Pharmacy,Pizza Place,Coffee Shop,Butcher,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
81,York,0,Convenience Store,Grocery Store,Pizza Place,Brewery,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant
89,Etobicoke,0,Pizza Place,Gym,Pharmacy,Pub,Sandwich Place,Dance Studio,Coffee Shop,Drugstore,Donut Shop,Doner Restaurant
96,North York,0,Furniture / Home Store,Pizza Place,Intersection,Home Service,Dog Run,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Doner Restaurant
99,Etobicoke,0,Pizza Place,Intersection,Coffee Shop,Discount Store,Sandwich Place,Chinese Restaurant,Women's Store,Distribution Center,Dim Sum Restaurant,Diner


##### Cluster 2 - Most common venues: Distribution centers, restaurants, discount stores, drugstores, and banks.

In [21]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Scarborough,1,Fast Food Restaurant,Print Shop,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop,Department Store
1,Scarborough,1,Construction & Landscaping,Bar,Women's Store,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Eastern European Restaurant
2,Scarborough,1,Restaurant,Intersection,Mexican Restaurant,Bank,Medical Center,Breakfast Spot,Electronics Store,Rental Car Location,Eastern European Restaurant,Drugstore
4,Scarborough,1,Caribbean Restaurant,Gas Station,Bakery,Athletics & Sports,Bank,Thai Restaurant,Fried Chicken Joint,Hakka Restaurant,Drugstore,Donut Shop
6,Scarborough,1,Hobby Shop,Coffee Shop,Discount Store,Chinese Restaurant,Department Store,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore,Dessert Shop
...,...,...,...,...,...,...,...,...,...,...,...,...
92,Etobicoke,1,Grocery Store,Convenience Store,Discount Store,Tanning Salon,Burrito Place,Burger Joint,Flower Shop,Supplement Shop,Sandwich Place,Hardware Store
95,Etobicoke,1,Café,Park,Shopping Plaza,Pet Store,Pharmacy,Pizza Place,Coffee Shop,Beer Store,Liquor Store,Distribution Center
97,North York,1,Furniture / Home Store,Food Service,Baseball Field,Doner Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop,Dessert Shop
101,Etobicoke,1,Grocery Store,Fast Food Restaurant,Pharmacy,Pizza Place,Beer Store,Fried Chicken Joint,Discount Store,Sandwich Place,Distribution Center,Dessert Shop


##### Cluster 3 - Most common venues: Parks, trails, pools, restaurants, donut shops, and diner.

In [22]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Scarborough,2,Park,Intersection,Playground,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop
25,North York,2,Park,Food & Drink Shop,Pool,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Doner Restaurant,Deli / Bodega,Dog Run
44,Central Toronto,2,Park,Bus Line,Swim School,Business Service,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Dessert Shop,Dog Run
50,Downtown Toronto,2,Park,Playground,Trail,Escape Room,Electronics Store,Eastern European Restaurant,Ethiopian Restaurant,Drugstore,Donut Shop,Deli / Bodega
64,Central Toronto,2,Trail,Park,Sushi Restaurant,Jewelry Store,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run
73,York,2,Trail,Park,Field,Hockey Arena,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run
74,York,2,Park,Pool,Women's Store,Gourmet Shop,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Doner Restaurant
79,North York,2,Park,Construction & Landscaping,Bakery,Basketball Court,Ethiopian Restaurant,Escape Room,Event Space,Electronics Store,Eastern European Restaurant,Dim Sum Restaurant
100,Etobicoke,2,Park,Bus Line,Pizza Place,Sandwich Place,Distribution Center,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run


##### Cluster 4 - Most common venues: Ethiopian and Eastern European restaurants, electronics stores, department stores, parks, and event spaces.

In [23]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,North York,3,Park,Department Store,Event Space,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop,Doner Restaurant
23,North York,3,Park,Convenience Store,Event Space,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore,Department Store,Donut Shop
40,East York,3,Park,Convenience Store,Event Space,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Drugstore,Department Store,Donut Shop


##### Cluster 5 - What makes this neighborhood unique: River access, college gym, restaurants and dog run spaces.

In [24]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
90,Etobicoke,4,River,Doner Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Donut Shop,College Gym


##### Cluster 6 - What makes this neighborhood unique: Business services, playgrounds, and stores.

In [25]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 5, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Scarborough,5,Business Service,Playground,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Women's Store


##### Cluster 7 - What makes this neighborhood unique: Jewelry stores, restaurants, and various stores.

In [26]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 6, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
98,York,6,Jewelry Store,Drugstore,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Women's Store,Fast Food Restaurant
