# Assignment 1
## Dataframe with Toronto Postal codes

### 1.1 Scraping Wikipedia (https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M) with Pandas

* *attrs* specifies that only the wikitable is being scraped from the website (class can be found out by viewing the website's source)
* *na_values* converts the 'Not assigned' string to NaN

In [1]:
import pandas as pd

url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

dfs = pd.read_html(url, attrs={'class': 'wikitable sortable'}, na_values=['Not assigned'])
toronto = dfs[0]
toronto

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,,
1,M2A,,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,,
176,M6Z,,
177,M7Z,,
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


### 1.2 Cleaning dataframe
* dropping rows with NaN values in borough column
* sorting by postal code

Merging neighborhoods from duplicate postal code areas is not required (anymore), because the wikipedia list does not contain duplicate entries of postal codes.

In [2]:
toronto_clean = toronto.dropna(axis=0)

toronto_clean.sort_values(by=['Postal Code'], inplace=True)

toronto_clean.reset_index(drop=True, inplace=True)
toronto_clean.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


But neighborhoods can share different postal codes! This will pose a problem for the clustering later on. Therefore duplicates are being removed (only the first instance of the neighborhood is kept).

In [3]:
toronto_clean.loc[toronto_clean['Neighborhood'].duplicated()]

Unnamed: 0,Postal Code,Borough,Neighborhood
24,M2R,North York,Willowdale
27,M3C,North York,Don Mills
31,M3L,North York,Downsview
32,M3M,North York,Downsview
33,M3N,North York,Downsview


In [4]:
toronto_clean['Neighborhood'].drop_duplicates(inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)


### 1.3 Shape of the dataframe
Requirement no. 6

In [5]:
toronto_clean.shape

(103, 3)

**The clean dataframe has 103 rows**

# Assignment 2
## Retrieving geo coordinates for postal codes
### Test if retrieving via geocoder or geopy works

##### Using Geocoder Coursera example: does not work at all (infinite loop)

In [6]:
#!conda install -c conda-forge geocoder --yes
import geocoder

# initialize your variable to None
lat_lng_coords = None

# set address / postal code
postal_code = 'M5G'
address = '{}, Toronto, Ontario'.format(postal_code)
print('Address:', address)

# loop until you get the coordinates
i = 0
while(lat_lng_coords is None):
    i += 1
    g = geocoder.google(address)
    lat_lng_coords = g.latlng
    print(i)
    if i > 10:
        break

latitude = lat_lng_coords[0]
longitude = lat_lng_coords[1]

Address: M5G, Toronto, Ontario
1
2
3
4
5
6
7
8
9
10
11


TypeError: 'NoneType' object is not subscriptable

##### Using Geopy package: is not reliable (postal code M5G doesn't work, M3A does)

In [7]:
from geopy.geocoders import Nominatim

# set postal code as in Coursera example --> does not work
postal_code = 'M5G'

address = '{}, Toronto, Ontario'.format(postal_code)
print('Address:', address)

# geopy geolocator
geolocator = Nominatim(user_agent="coursera_capstone")
location = geolocator.geocode(address)

print('Geopy result:', (location.latitude, location.longitude))

Address: M5G, Toronto, Ontario


AttributeError: 'NoneType' object has no attribute 'latitude'

In [8]:
from geopy.geocoders import Nominatim

# set other postal code, first from wikipedia list
postal_code = 'M3A'

address = '{}, Toronto, Ontario'.format(postal_code)
print('Address:', address)

# geopy geolocator
geolocator = Nominatim(user_agent="coursera_capstone")
location = geolocator.geocode(address)

print('Geopy result:', (location.latitude, location.longitude))

Address: M3A, Toronto, Ontario
Geopy result: (43.6534817, -79.3839347)


### Using csv file provided by Coursera instead
#### Reading csv file

In [9]:
import pandas as pd

url='https://cocl.us/Geospatial_data'

toronto_geo = pd.read_csv(url)
print(toronto_geo.shape)
toronto_geo.head()

(103, 3)


Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


#### Merging geospatial dataframe with borough / neighborhood dataframe on postal code column

In [10]:
toronto_locs = pd.merge(toronto_clean,
                        toronto_geo,
                        how='inner',
                        on='Postal Code'
                       )
toronto_locs.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


# Assignment 3
## Exploring the neighborhoods

### Creating a map with folium

In [11]:
# fetching the geocoordinats of Toronto's center with geopy geolocator
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="coursera_capstone")
location = geolocator.geocode('Toronto, Ontario')
print('Geopy result:', (location.latitude, location.longitude))

Geopy result: (43.6534817, -79.3839347)


In [12]:
#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium

# creating the map
map_toronto = folium.Map(location=[location.latitude, location.longitude], zoom_start=10)

# add markers of postal codes to map
for lat, lng, borough, nbhood in zip(toronto_locs['Latitude'], toronto_locs['Longitude'], toronto_locs['Borough'], toronto_locs['Neighborhood']):
    label = '{} ({})'.format(nbhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)

map_toronto

### Clustering

#### Clustering by location into 5 clusters to check if implementation of k-means works

In [13]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# dropping all but numerical lat and long columns
toronto_locs_cluster = toronto_locs.drop(['Postal Code', 'Borough', 'Neighborhood'], axis=1)

# set number of clusters
kclusters = 5

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_locs_cluster)

# generate new df with cluster labels
toronto_locs_clustered = toronto_locs
toronto_locs_clustered.insert(0, 'Cluster Labels', kmeans.labels_)
toronto_locs_clustered.head()

Unnamed: 0,Cluster Labels,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,2,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,2,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,2,M1G,Scarborough,Woburn,43.770992,-79.216917
4,2,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


#### Map clusters

In [14]:
import numpy as np

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_locs_clustered['Latitude'], toronto_locs_clustered['Longitude'], toronto_locs_clustered['Neighborhood'], toronto_locs_clustered['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_toronto)
       
map_toronto

**Clustering by geographic location seems to work fine.**

### Using Foursquare to explore the neighborhoods

In [15]:
import json # library to handle JSON files

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

In [21]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 20

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 
CLIENT_SECRET:


**Removed my credentials for the uploaded notebook** 

In [18]:
def getNearbyVenues(postcodes, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for postcode, lat, lng in zip(postcodes, latitudes, longitudes):
        print(postcode)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            postcode, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postal Code', 
                  'Postal Code Latitude', 
                  'Postal Code Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [19]:
# create dataframe with venues fetched from foursquare
toronto_venues = getNearbyVenues(postcodes=toronto_locs['Postal Code'],
                                 latitudes=toronto_locs['Latitude'],
                                 longitudes=toronto_locs['Longitude']
                                )

M1B
M1C
M1E
M1G
M1H
M1J
M1K
M1L
M1M
M1N
M1P
M1R
M1S
M1T
M1V
M1W
M1X
M2H
M2J
M2K
M2L
M2M
M2N
M2P
M2R
M3A
M3B
M3C
M3H
M3J
M3K
M3L
M3M
M3N
M4A
M4B
M4C
M4E
M4G
M4H
M4J
M4K
M4L
M4M
M4N
M4P
M4R
M4S
M4T
M4V
M4W
M4X
M4Y
M5A
M5B
M5C
M5E
M5G
M5H
M5J
M5K
M5L
M5M
M5N
M5P
M5R
M5S
M5T
M5V
M5W
M5X
M6A
M6B
M6C
M6E
M6G
M6H
M6J
M6K
M6L
M6M
M6N
M6P
M6R
M6S
M7A
M7R
M7Y
M8V
M8W
M8X
M8Y
M8Z
M9A
M9B
M9C
M9L
M9M
M9N
M9P
M9R
M9V
M9W


In [20]:
print(toronto_venues.shape)
toronto_venues.head()

(1068, 7)


Unnamed: 0,Postal Code,Postal Code Latitude,Postal Code Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M1B,43.806686,-79.194353,Wendy’s,43.807448,-79.199056,Fast Food Restaurant
1,M1B,43.806686,-79.194353,Interprovincial Group,43.80563,-79.200378,Print Shop
2,M1C,43.784535,-79.160497,Great Shine Window Cleaning,43.783145,-79.157431,Home Service
3,M1C,43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
4,M1E,43.763573,-79.188711,RBC Royal Bank,43.76679,-79.191151,Bank


In [22]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add postal codes column back to dataframe
toronto_onehot['Postal Code'] = toronto_venues['Postal Code'] 

# move postal codes column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]
print(toronto_onehot.shape)
toronto_onehot.head()

(1068, 218)


Unnamed: 0,Postal Code,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Art Gallery,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,M1B,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,M1B,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M1C,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M1C,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M1E,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### Create df for kmeans clustering

In [23]:
# group toronto onehot venues by neighborhood
toronto_grouped = toronto_onehot.groupby('Postal Code').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Postal Code,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Art Gallery,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,M1B,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M1C,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,M1E,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M1G,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,M1H,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [24]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Postal Code']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Postal Code'] = toronto_grouped['Postal Code']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

print(neighborhoods_venues_sorted.shape)
neighborhoods_venues_sorted.head()

(98, 6)


Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M1B,Fast Food Restaurant,Print Shop,Yoga Studio,Department Store,Electronics Store
1,M1C,Bar,Home Service,Yoga Studio,Dessert Shop,Event Space
2,M1E,Electronics Store,Breakfast Spot,Medical Center,Intersection,Rental Car Location
3,M1G,Coffee Shop,Korean Restaurant,Insurance Office,Farmers Market,Event Space
4,M1H,Bank,Lounge,Hakka Restaurant,Fried Chicken Joint,Thai Restaurant


#### Clustering neighborhoods according to top 5 venues

In [25]:
# set number of clusters
kclusters = 5

toronto_grouped_clust = toronto_grouped.drop('Postal Code', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clust)

print(kmeans.labels_)
print(len(kmeans.labels_))

[3 3 3 3 3 4 3 3 3 3 3 3 3 3 4 3 3 3 3 3 1 3 0 3 3 3 3 0 0 3 3 3 3 3 3 3 3
 1 3 3 3 0 3 3 3 0 3 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 3 3
 3 3 0 3 1 3 3 3 3 3 3 3 3 0 2 3 3 3 2 1 3 3 3 3]
98


In [26]:
# recreating toronto_locs oncemore to have a clean merge
toronto_locs = pd.merge(toronto_clean,
                        toronto_geo,
                        how='inner',
                        on='Postal Code'
                       )
toronto_locs.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [27]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

# merging clustered venues with toronto location data to add latitude/longitude and neighborhood names
toronto_merged = pd.merge(neighborhoods_venues_sorted,
                          toronto_locs,
                          on='Postal Code',
                          how='left')

toronto_merged.head() # check the last columns!

Unnamed: 0,Cluster Labels,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,Borough,Neighborhood,Latitude,Longitude
0,3,M1B,Fast Food Restaurant,Print Shop,Yoga Studio,Department Store,Electronics Store,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,3,M1C,Bar,Home Service,Yoga Studio,Dessert Shop,Event Space,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,3,M1E,Electronics Store,Breakfast Spot,Medical Center,Intersection,Rental Car Location,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,3,M1G,Coffee Shop,Korean Restaurant,Insurance Office,Farmers Market,Event Space,Scarborough,Woburn,43.770992,-79.216917
4,3,M1H,Bank,Lounge,Hakka Restaurant,Fried Chicken Joint,Thai Restaurant,Scarborough,Cedarbrae,43.773136,-79.239476


#### Creating the clustered venues map

In [28]:
# creating the map
geolocator = Nominatim(user_agent="coursera_capstone")
location = geolocator.geocode('Toronto, Ontario')
map_toronto_venclust = folium.Map(location=[location.latitude, location.longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_toronto_venclust)
       
map_toronto_venclust

### Examining the clusters
#### Cluster 1

In [29]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(2, toronto_merged.shape[1]))]]

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,Borough,Neighborhood,Latitude,Longitude
22,M3A,Bus Stop,Park,Construction & Landscaping,Food & Drink Shop,Yoga Studio,North York,Parkwoods,43.753259,-79.329656
27,M3K,Airport,Park,Snack Place,Yoga Studio,Department Store,North York,Downsview,43.737473,-79.464763
28,M3L,Grocery Store,Park,Bank,Shopping Mall,Yoga Studio,North York,Downsview,43.739015,-79.506944
41,M4N,Park,Swim School,Bus Line,Yoga Studio,Dessert Shop,Central Toronto,Lawrence Park,43.72802,-79.38879
45,M4T,Park,Summer Camp,Yoga Studio,Department Store,Electronics Store,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316
47,M4W,Park,Playground,Trail,Yoga Studio,Deli / Bodega,Downtown Toronto,Rosedale,43.679563,-79.377529
71,M6E,Park,Women's Store,Bar,Dessert Shop,Event Space,York,Caledonia-Fairbanks,43.689026,-79.453512
76,M6L,Park,Construction & Landscaping,Basketball Court,Trail,Bakery,North York,"North Park, Maple Leaf Park, Upwood Park",43.713756,-79.490074
87,M8X,Park,River,Yoga Studio,Deli / Bodega,Eastern European Restaurant,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944


**Parks seem to fall into / define the first cluster**

#### Cluster 2

In [30]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(2, toronto_merged.shape[1]))]]

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,Borough,Neighborhood,Latitude,Longitude
20,M2P,Park,Convenience Store,Bank,Yoga Studio,Dessert Shop,North York,York Mills West,43.752758,-79.400049
37,M4J,Park,Convenience Store,Yoga Studio,Dessert Shop,Event Space,East York,East Toronto,43.685347,-79.338106
78,M6N,Convenience Store,Yoga Studio,Department Store,Event Space,Electronics Store,York,"Runnymede, The Junction North",43.673185,-79.487262
93,M9N,Park,Convenience Store,Yoga Studio,Dessert Shop,Event Space,York,Weston,43.706876,-79.518188


**Parks and Convenience Stores seem to fall into the second cluster**

#### Cluster 3

In [31]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(2, toronto_merged.shape[1]))]]

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,Borough,Neighborhood,Latitude,Longitude
88,M8Y,Baseball Field,Deli / Bodega,Yoga Studio,Dessert Shop,Event Space,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509
92,M9M,Baseball Field,Yoga Studio,Dessert Shop,Event Space,Electronics Store,North York,"Humberlea, Emery",43.724766,-79.532242


**Baseball fields seem to fall into the third cluster**

#### Cluster 4

In [32]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(2, toronto_merged.shape[1]))]]

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,Borough,Neighborhood,Latitude,Longitude
0,M1B,Fast Food Restaurant,Print Shop,Yoga Studio,Department Store,Electronics Store,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Bar,Home Service,Yoga Studio,Dessert Shop,Event Space,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Electronics Store,Breakfast Spot,Medical Center,Intersection,Rental Car Location,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Coffee Shop,Korean Restaurant,Insurance Office,Farmers Market,Event Space,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Bank,Lounge,Hakka Restaurant,Fried Chicken Joint,Thai Restaurant,Scarborough,Cedarbrae,43.773136,-79.239476
...,...,...,...,...,...,...,...,...,...,...
91,M9L,Gym,Pizza Place,College Gym,College Stadium,Event Space,North York,Humber Summit,43.756303,-79.565963
94,M9P,Pizza Place,Chinese Restaurant,Discount Store,Intersection,Sandwich Place,Etobicoke,Westmount,43.696319,-79.532242
95,M9R,Pizza Place,Bus Line,Mobile Phone Shop,Sandwich Place,Coffee Shop,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ...",43.688905,-79.554724
96,M9V,Grocery Store,Beer Store,Fried Chicken Joint,Japanese Restaurant,Fast Food Restaurant,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest...",43.739416,-79.588437


**Most of all other venues seem to fall into the fourth cluster**

#### Cluster 5

In [33]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(2, toronto_merged.shape[1]))]]

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,Borough,Neighborhood,Latitude,Longitude
5,M1J,Playground,Yoga Studio,Deli / Bodega,Electronics Store,Eastern European Restaurant,Scarborough,Scarborough Village,43.744734,-79.239476
14,M1V,Playground,Park,Yoga Studio,Deli / Bodega,Eastern European Restaurant,Scarborough,"Milliken, Agincourt North, Steeles East, L'Amo...",43.815252,-79.284577


**Playgrounds near Yoga Studios and Eastern European Restaurants seem to define the fifth cluster**