## 1. Data Wrangling

In [3]:
import pandas as pd

In [4]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

scraped the data using pandas into a list of dataframes

In [5]:
df = pd.read_html(url)

sliced the first element of the list to create our data frame

In [6]:
df_tor = df[0]

In [7]:
df_tor.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


dropped rows where the Borough is not assigned

In [8]:
df_tor.drop(df_tor.loc[df_tor['Borough']=='Not assigned'].index, inplace=True)
df_tor.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


looked for rows where the Neighbourhood is Not assigned but there weren't any once the Borough-less rows were removed

In [9]:
df_tor.loc[df_tor['Neighbourhood']=='Not assigned']

Unnamed: 0,Postal Code,Borough,Neighbourhood


reset the index now that everything is cleaned up

In [10]:
df_tor.reset_index(drop=True,inplace=True)
df_tor.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


determined the shape of the data frame

In [11]:
df_tor.shape

(103, 3)

imported the latitude and longitude data to a new dataframe

In [12]:
url2 = 'https://cocl.us/Geospatial_data'
latlng = pd.read_csv(url2)
latlng.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


set the indeces to postal code in both data frames to make joining easier

In [13]:
df_tor.set_index('Postal Code',inplace=True)
latlng.set_index('Postal Code',inplace=True)

In [14]:
df2 = df_tor.join(latlng)
df2.head()

Unnamed: 0_level_0,Borough,Neighbourhood,Latitude,Longitude
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
M3A,North York,Parkwoods,43.753259,-79.329656
M4A,North York,Victoria Village,43.725882,-79.315572
M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


reset the index of the combined dataframe

In [15]:
df2.reset_index(inplace=True)
df2.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


## 2. Geocoding

installed more relevant libraries

In [17]:
import numpy as np
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes
import folium

print("libraries imported")

libraries imported


In [18]:
from geopy.geocoders import Nominatim

determined the coordinates of Toronto to map our neighbourhoods

In [19]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent='blue_jay')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('the geographical coordinates of Toronto are {}, {}'.format(latitude,longitude))

the geographical coordinates of Toronto are 43.6534817, -79.3839347


filtered the data to show only neighbourhoods of Downtown Toronto; dropped the neighbourhood of "Stn A PO Boxes" as it seems more like a postal system location than a real neighbourhood

In [20]:
df3 = df2[df2['Borough']=='Downtown Toronto']
df3.drop(df3.loc[df3['Neighbourhood']=='Stn A PO Boxes'].index, inplace=True)
df3.reset_index(drop=True,inplace=True)
df3

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
5,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
6,M6G,Downtown Toronto,Christie,43.669542,-79.422564
7,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
8,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752
9,M5K,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576


mapped the neighbourhoods

In [21]:
map_toronto = folium.Map(location=[latitude,longitude], zoom_start=14)

for lat, lng, borough, neighbourhood in zip(df3['Latitude'], df3['Longitude'],df3['Borough'],df3['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)

map_toronto

## 3. Four Square API Calling

In [26]:
import requests
from pandas.io.json import json_normalize

CLIENT_ID = 'UJAJYCLQK5HKKGEE11AYPJV1VBJOIWDZNRA25ZWFMJSBPWNN' 
CLIENT_SECRET = 'XJM5C4Q31XY3LURSOVHVD3M33UZFKCVGFIBMOOYEXN30PNCN' 
VERSION = '20200818' 

print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentials:
CLIENT_ID: UJAJYCLQK5HKKGEE11AYPJV1VBJOIWDZNRA25ZWFMJSBPWNN
CLIENT_SECRET:XJM5C4Q31XY3LURSOVHVD3M33UZFKCVGFIBMOOYEXN30PNCN


created two functions to clean up the category type from the JSON results and to perform the call to the API itself and return the nearby venues for each neighbourhood

In [27]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [28]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        
        url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            VERSION,
            lat,
            lng,
            radius,
            limit)
        
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        venues_list.append([(
            name,
            lat,
            lng,
            v['venue']['name'],
            v['venue']['location']['lat'],
            v['venue']['location']['lng'],
            v['venue']['categories'][0]['name']) for v in results])
        
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood',
                            'Neighbourhood Latitude',
                            'Neighbourhood Longitude',
                            'Venue',
                            'Venue Latitude',
                            'Venue Longitude',
                            'Venue Category']
    
    return(nearby_venues)

called the API

In [29]:
limit = 500

downtown_venues = getNearbyVenues(names=df3['Neighbourhood'],
                                  latitudes = df3['Latitude'],
                                  longitudes = df3['Longitude']
                                 )

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
St. James Town, Cabbagetown
First Canadian Place, Underground city
Church and Wellesley


did a quick groupby and count to get a feel for the results

In [30]:
downtown_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,57,57,57,57,57,57
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",15,15,15,15,15,15
Central Bay Street,65,65,65,65,65,65
Christie,17,17,17,17,17,17
Church and Wellesley,75,75,75,75,75,75
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
"First Canadian Place, Underground city",100,100,100,100,100,100
"Garden District, Ryerson",100,100,100,100,100,100
"Harbourfront East, Union Station, Toronto Islands",100,100,100,100,100,100
"Kensington Market, Chinatown, Grange Park",67,67,67,67,67,67


## 4. Neighbourhood Venue Grouping and Ranking

In [31]:
downtown_onehot = pd.get_dummies(downtown_venues[['Venue Category']], prefix="", prefix_sep="")

downtown_onehot['Neighbourhood'] = downtown_venues['Neighbourhood']

onehot encoded the API results and grouped them together by neighbourhood

In [32]:
fixed_columns = [downtown_onehot.columns[-1]] + list(downtown_onehot.columns[:-1])
downtown_onehot = downtown_onehot[fixed_columns]
                                                  

In [33]:
downtown_grouped = downtown_onehot.groupby('Neighbourhood').mean().reset_index()
downtown_grouped

Unnamed: 0,Neighbourhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0
1,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.066667,0.066667,0.066667,0.133333,0.133333,0.133333,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.015385,0.0,0.0,0.015385,0.0,0.0,0.015385
3,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Church and Wellesley,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026667
5,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,...,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0
6,"First Canadian Place, Underground city",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,...,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0
7,"Garden District, Ryerson",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0
8,"Harbourfront East, Union Station, Toronto Islands",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0
9,"Kensington Market, Chinatown, Grange Park",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.044776,0.0,0.044776,0.014925,0.0,0.0,0.0


created and ran a function to return a neighbourhood's most common venues

In [34]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [35]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
        
downtown_venues_sorted = pd.DataFrame(columns=columns)
downtown_venues_sorted['Neighbourhood'] = downtown_grouped['Neighbourhood']

for ind in np.arange(downtown_grouped.shape[0]):
    downtown_venues_sorted.iloc[ind, 1:] = return_most_common_venues(downtown_grouped.iloc[ind, :], num_top_venues)
    
downtown_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Café,Cheese Shop,Cocktail Bar,Farmers Market,Seafood Restaurant,Beer Bar,Restaurant,Bakery,Breakfast Spot
1,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Lounge,Airport Service,Airport Terminal,Coffee Shop,Harbor / Marina,Boat or Ferry,Rental Car Location,Sculpture Garden,Boutique,Airport
2,Central Bay Street,Coffee Shop,Sandwich Place,Italian Restaurant,Café,Japanese Restaurant,Department Store,Salad Place,Burger Joint,Bubble Tea Shop,Portuguese Restaurant
3,Christie,Grocery Store,Café,Park,Baby Store,Coffee Shop,Diner,Nightclub,Restaurant,Athletics & Sports,Candy Store
4,Church and Wellesley,Coffee Shop,Japanese Restaurant,Gay Bar,Sushi Restaurant,Restaurant,Café,Hotel,Mediterranean Restaurant,Yoga Studio,Men's Store


## 5. K-Means Clustering and Analysis

used the elbow method to determine the best value of K

In [36]:
import matplotlib.pyplot as plt

dgc = downtown_grouped.drop('Neighbourhood', 1)

cost=[]
for i in range(1,10):
    KM = KMeans(n_clusters = i, random_state=0, max_iter=500)
    KM.fit(dgc)
    cost.append(KM.inertia_)
    
plt.plot(range(1,10), cost, color='g', linewidth='3')
plt.xlabel('Value of K')
plt.ylabel('Squared Error Cost')
plt.show()

<Figure size 640x480 with 1 Axes>

merged the cluster labels with most common venues of each neighbourhood

In [37]:
kclusters = 4

kmeans = KMeans(n_clusters = kclusters, random_state=0, max_iter=500).fit(dgc)

downtown_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

downtown_merged = df3

downtown_merged = downtown_merged.join(downtown_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

downtown_merged

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,0,Coffee Shop,Pub,Bakery,Park,Café,Restaurant,Breakfast Spot,Theater,Event Space,Performing Arts Venue
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0,Coffee Shop,College Cafeteria,Bank,Smoothie Shop,Beer Bar,Sandwich Place,Restaurant,Portuguese Restaurant,Café,Persian Restaurant
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,0,Clothing Store,Coffee Shop,Café,Bubble Tea Shop,Cosmetics Shop,Japanese Restaurant,Ramen Restaurant,Diner,Lingerie Store,Middle Eastern Restaurant
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Coffee Shop,Café,Cosmetics Shop,American Restaurant,Restaurant,Cocktail Bar,Seafood Restaurant,Beer Bar,Diner,Italian Restaurant
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,0,Coffee Shop,Café,Cheese Shop,Cocktail Bar,Farmers Market,Seafood Restaurant,Beer Bar,Restaurant,Bakery,Breakfast Spot
5,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,0,Coffee Shop,Sandwich Place,Italian Restaurant,Café,Japanese Restaurant,Department Store,Salad Place,Burger Joint,Bubble Tea Shop,Portuguese Restaurant
6,M6G,Downtown Toronto,Christie,43.669542,-79.422564,2,Grocery Store,Café,Park,Baby Store,Coffee Shop,Diner,Nightclub,Restaurant,Athletics & Sports,Candy Store
7,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,0,Coffee Shop,Café,Clothing Store,Hotel,Restaurant,Gym,Bar,Steakhouse,Thai Restaurant,Lounge
8,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,0,Coffee Shop,Aquarium,Hotel,Café,Italian Restaurant,Fried Chicken Joint,Restaurant,Scenic Lookout,Brewery,Park
9,M5K,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576,0,Coffee Shop,Hotel,Café,Restaurant,American Restaurant,Seafood Restaurant,Italian Restaurant,Japanese Restaurant,Salad Place,Concert Hall


mapped the neighbourhoods again as clusters

In [38]:
import matplotlib.cm as cm
import matplotlib.colors as colors

map_clusters = folium.Map(location=[latitude,longitude], zoom_start=14)

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0,1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lng, neigh, cluster in zip(downtown_merged['Latitude'], downtown_merged['Longitude'], downtown_merged['Neighbourhood'], downtown_merged['Cluster Labels']):
    label = folium.Popup(str(neigh) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat,lng],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
    
map_clusters

#### Cluster 0

In [39]:
downtown_merged.loc[downtown_merged['Cluster Labels']==0, downtown_merged.columns[[2] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Regent Park, Harbourfront",0,Coffee Shop,Pub,Bakery,Park,Café,Restaurant,Breakfast Spot,Theater,Event Space,Performing Arts Venue
1,"Queen's Park, Ontario Provincial Government",0,Coffee Shop,College Cafeteria,Bank,Smoothie Shop,Beer Bar,Sandwich Place,Restaurant,Portuguese Restaurant,Café,Persian Restaurant
2,"Garden District, Ryerson",0,Clothing Store,Coffee Shop,Café,Bubble Tea Shop,Cosmetics Shop,Japanese Restaurant,Ramen Restaurant,Diner,Lingerie Store,Middle Eastern Restaurant
3,St. James Town,0,Coffee Shop,Café,Cosmetics Shop,American Restaurant,Restaurant,Cocktail Bar,Seafood Restaurant,Beer Bar,Diner,Italian Restaurant
4,Berczy Park,0,Coffee Shop,Café,Cheese Shop,Cocktail Bar,Farmers Market,Seafood Restaurant,Beer Bar,Restaurant,Bakery,Breakfast Spot
5,Central Bay Street,0,Coffee Shop,Sandwich Place,Italian Restaurant,Café,Japanese Restaurant,Department Store,Salad Place,Burger Joint,Bubble Tea Shop,Portuguese Restaurant
7,"Richmond, Adelaide, King",0,Coffee Shop,Café,Clothing Store,Hotel,Restaurant,Gym,Bar,Steakhouse,Thai Restaurant,Lounge
8,"Harbourfront East, Union Station, Toronto Islands",0,Coffee Shop,Aquarium,Hotel,Café,Italian Restaurant,Fried Chicken Joint,Restaurant,Scenic Lookout,Brewery,Park
9,"Toronto Dominion Centre, Design Exchange",0,Coffee Shop,Hotel,Café,Restaurant,American Restaurant,Seafood Restaurant,Italian Restaurant,Japanese Restaurant,Salad Place,Concert Hall
10,"Commerce Court, Victoria Hotel",0,Coffee Shop,Restaurant,Café,Hotel,Gym,American Restaurant,Seafood Restaurant,Italian Restaurant,Bakery,Cocktail Bar


#### Cluster 1

In [40]:
downtown_merged.loc[downtown_merged['Cluster Labels']==1, downtown_merged.columns[[2] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Rosedale,1,Park,Playground,Trail,Dance Studio,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store,Diner


#### Cluster 2

In [41]:
downtown_merged.loc[downtown_merged['Cluster Labels']==2, downtown_merged.columns[[2] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Christie,2,Grocery Store,Café,Park,Baby Store,Coffee Shop,Diner,Nightclub,Restaurant,Athletics & Sports,Candy Store


#### Cluster 3

In [42]:
downtown_merged.loc[downtown_merged['Cluster Labels']==3, downtown_merged.columns[[2] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,"CN Tower, King and Spadina, Railway Lands, Har...",3,Airport Lounge,Airport Service,Airport Terminal,Coffee Shop,Harbor / Marina,Boat or Ferry,Rental Car Location,Sculpture Garden,Boutique,Airport


### 6. Conclusion

Postal codes was not the best geographical definition for a neighborhood since it is based on population and not geography. The result was three of the four clusters only contained one neighborhood as they were the three least dense neighborhoods of the Downtown Toronto area. A better approach would have been to likely incorporate the nearest neighborhoods of the next-door boroughs to downtown.

The venue categories of Four Square could also be simplified and combined to get a more accurate analysis. For example, 'beer bar', 'cocktail bar' and 'pub' to name a few could all be combined into 'bar' to get a better idea how they compare versus the predominant 'coffee shop' or 'cafe'.