# Segmenting and Clustering Neighborhoods in the city of Toronto, Canada

This **Jupyter Notebook** is my submission of the **Week 3 final assignment** for the **Applied Data Science Capstone** course as part of the **IBM Data Science Professional Certificate** program on **Coursera**. It is broken down into three parts for easier reference and grading.

<p><img src="https://www.orbitz.com/features/world-on-screen/img/sliders/1_0.jpg?ver=2">
<p><a href=https://www.orbitz.com/features/world-on-screen/map/movies-in-toronto">Image Source: Movie Sightseeing and Tours in Toronto</a>

## 1) Scraping Toronto postal codes from Wikipedia and entering this information into a pandas dataframe

I began by copying and pasting the table from the following Wikipedia page into an Excel spreadsheet:
https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

I then installed the necessary modules and libraries to read this into a dataframe.

In [1]:
# install and import some stuff we might need immediately
!pip install xlrd
import pandas as pd
import numpy as np

# read the Excel file
df = pd.read_excel('toronto_neighborhoods.xlsx')
df.head()



Unnamed: 0,Postal code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront


<br />
The data in the dataframe needs to be cleaned up a bit.  Specifically I am A) dropping rows that have unassigned boroughs, B) assign neighborhoods that are not assigned to a particular borough, C) double-check the unique postal codes, and D) return the number of rows for postal codes in the dataframe. 

In [2]:
# drop rows that have unassigned buroughs
not_assigned_boroughs = df[(df['Borough']) == 'Not assigned'].index
df.drop(not_assigned_boroughs, inplace=True)

In [3]:
# find 'not assigned' neighborhoods
# if found, assign the borough to be the neighborhood
not_assigned_neighborhoods = df[(df['Neighborhood']) == 'Not assigned']
not_assigned_neighborhoods.count()

# none found

Postal code     0
Borough         0
Neighborhood    0
dtype: int64

In [4]:
# determine/append neighborhoods that share postal codes
unique_postal_codes = df['Postal code'].unique()
unique_postal_codes

array(['M3A', 'M4A', 'M5A', 'M6A', 'M7A', 'M9A', 'M1B', 'M3B', 'M4B',
       'M5B', 'M6B', 'M9B', 'M1C', 'M3C', 'M4C', 'M5C', 'M6C', 'M9C',
       'M1E', 'M4E', 'M5E', 'M6E', 'M1G', 'M4G', 'M5G', 'M6G', 'M1H',
       'M2H', 'M3H', 'M4H', 'M5H', 'M6H', 'M1J', 'M2J', 'M3J', 'M4J',
       'M5J', 'M6J', 'M1K', 'M2K', 'M3K', 'M4K', 'M5K', 'M6K', 'M1L',
       'M2L', 'M3L', 'M4L', 'M5L', 'M6L', 'M9L', 'M1M', 'M2M', 'M3M',
       'M4M', 'M5M', 'M6M', 'M9M', 'M1N', 'M2N', 'M3N', 'M4N', 'M5N',
       'M6N', 'M9N', 'M1P', 'M2P', 'M4P', 'M5P', 'M6P', 'M9P', 'M1R',
       'M2R', 'M4R', 'M5R', 'M6R', 'M7R', 'M9R', 'M1S', 'M4S', 'M5S',
       'M6S', 'M1T', 'M4T', 'M5T', 'M1V', 'M4V', 'M5V', 'M8V', 'M9V',
       'M1W', 'M4W', 'M5W', 'M8W', 'M9W', 'M1X', 'M4X', 'M5X', 'M8X',
       'M4Y', 'M7Y', 'M8Y', 'M8Z'], dtype=object)

In [5]:
# reset indexes
df.reset_index(drop=True, inplace=True)
df.head()

Unnamed: 0,Postal code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Regent Park / Harbourfront
3,M6A,North York,Lawrence Manor / Lawrence Heights
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government


In [6]:
# get shape of dataframe
df.shape

(103, 3)

The **shape** of our dataframe indicates that there are **103 unique postal codes for Toronto**.

## 2) Add latitude and longitude of each 'neighborhood' to the existing dataframe so this information can be used in conjunction with the Foursquare API

I attempted to use the **geocoder** before using a CSV file with latitude and longitude information.  This required installing **geocoder** in the environment and importing the module.

In [7]:
# install and import geocoder
!pip install geocoder
import geocoder
print('Libraries imported.')

Libraries imported.


<br />
I attempted to run a <b>while</b> loop test for a postal code (in this case 'M3A') as suggested by the assignment instructions.

In [8]:
# initialize your variable to None
lat_lng_coords = None

# loop until you get the coordinates
while(lat_lng_coords is None):
  g = geocoder.google('{}, Toronto, Ontario'.format('M3A'))
  lat_lng_coords = g.latlng

latitude = lat_lng_coords[0]
longitude = lat_lng_coords[1]

KeyboardInterrupt: 


This never actually resolved and continuously timed out.  I collapsed the output of the previous cell for brevity.  

Since I am very excited to see latitude and longitude values for the city of Toronto, I moved on to getting the latitude and longitude values from a CSV file containing that information.

In [9]:
# read CSV file into new dataframe
location_df = pd.read_csv('http://cocl.us/Geospatial_data')
location_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476



I changed the column headers to be "Postal code" for each dataframe, which will help with merging the two and then re-confirmed the shape of the resultant dataframe.

In [10]:
# rename "Postal Code" column to matche original dataframe's "Postal code"
location_df.rename(columns={"Postal Code":"Postal code"}, inplace=True)
location_df.head()

Unnamed: 0,Postal code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [11]:
# merge original dataframe and location dataframe
toronto_df = df.merge(location_df, on='Postal code')
toronto_df.head()

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494


In [12]:
# re-confirm size of the dataframe with the added columns
toronto_df.shape

(103, 5)


The latitude and longitude have been merged with the original dataframe, and the new dataframe is now ready to be used with the Foursquare API to analyze neighborhoods in Toronto.


## 3) Explore and cluster the neighborhoods in Toronto

I decided to look at all the neighborhoods, starting with visualizing the neighborhoods then cluster-related analyses.  Although I suspect that there are a lot of smaller neighborhoods/postal codes with few venues, I am hopeful that the final cluster analysis will address that.

## Download and explore the data set

First though, the environment needs to be set up, necessary libraries need to be imported and initial operations set up.

In [13]:
# install geopy via conda if necessary
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim

import requests
from pandas.io.json import json_normalize   # deprecated
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans

# install folium via conda if necessary
#!conda install -c conda-forge folium=0.5.0 --yes
import folium

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


<br />
As part of the high-level examination of the dataset, I began with a visualization.  To enable this, I retrieved the latitude and longitude of Toronto so it can be used in generating the <b>folium</b> map of Toronto.

In [14]:
tor_address = 'Toronto, ON'

geolocator = Nominatim(user_agent="toronto_explorer")
tor_location = geolocator.geocode(tor_address)
tor_latitude = tor_location.latitude
tor_longitude = tor_location.longitude
print('The latitude and longitude for Toronto is {}, {}.'.format(tor_latitude, tor_longitude))

The latitude and longitude for Toronto is 43.6534817, -79.3839347.


In [15]:
tor_map = folium.Map(location=[tor_latitude, tor_longitude], zoom_start=11)

for lat, lng, borough, neighborhood, postal_code in zip(toronto_df['Latitude'], toronto_df['Longitude'], toronto_df['Borough'], toronto_df['Neighborhood'], toronto_df['Postal code']):
    label = 'Postal Code {}. Borough: {}. Neighborhoods: {}'.format(postal_code, borough, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=True).add_to(tor_map)  
    
tor_map

<br />
Now that I have the map, I decided to get some additional information about the boroughs the numbers and names of the boroughs in case I want to further examine those later.  It is clear from the map the location of Toronto's downtown area, along with a number of outlying, possibly smaller neighborhoods, based on the arrangements of the postal codes.  Further analysis will reveal more about this. 

In [16]:
print('There are {} uniques boroughs.'.format(len(toronto_df['Borough'].unique())))
toronto_df['Borough'].unique()

There are 10 uniques boroughs.


array(['North York', 'Downtown Toronto', 'Etobicoke', 'Scarborough',
       'East York', 'York', 'East Toronto', 'West Toronto',
       'Central Toronto', 'Mississauga'], dtype=object)

<br />
With that, the Foursquare API information is assigned to variables that will be used to make calls for all the postal code/neighborhood/borough requests.

In [17]:
# @hidden_cell
# this cell is hidden as it contains API keys
# Setup Foursquare API connection information
CLIENT_ID = 'HRLKE3GPSUVHVI3DLKALDZWLI4ETYFDP53C2ANXALGXOVD4C';
print('Client ID added to CLIENT_ID.')
CLIENT_SECRET = 'BTMW2I0EBOSAM4I0GUMDT303F5J1I324KRK5N02X24DJLOBZ';
print('Client Secret added to CLIENT_SECRET.')
VERSION = '20180605';
print('Foursquare API version added to VERSION.')

Client ID added to CLIENT_ID.
Client Secret added to CLIENT_SECRET.
Foursquare API version added to VERSION.


<br />
I explored the first neighborhood as I suspect that there was not a lot going on there in order to get a baseline view of the returned information from the Foursquare API call.

In [18]:
# exploring the first neighborhood in the dataframe
neighborhood_name = toronto_df.loc[0, 'Neighborhood']
neighborhood_postal = toronto_df.loc[0, 'Postal code']
neighborhood_latitude = toronto_df.loc[0, 'Latitude']
neighborhood_longitude = toronto_df.loc[0, 'Longitude']

print('The latitude and longitude for {} in postal code {} are {}, {}.'.format(neighborhood_name, neighborhood_postal, neighborhood_latitude, neighborhood_longitude))

The latitude and longitude for Parkwoods in postal code M3A are 43.7532586, -79.3296565.


In [19]:
# top 100 venues within 500 meter radius of the location
LIMIT = 100 
radius = 500 
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5eaeffc31d67cb001b1e72b9'},
  'headerLocation': 'Parkwoods - Donalda',
  'headerFullLocation': 'Parkwoods - Donalda, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 2,
  'suggestedBounds': {'ne': {'lat': 43.757758604500005,
    'lng': -79.32343823984928},
   'sw': {'lat': 43.7487585955, 'lng': -79.33587476015072}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4e8d9dcdd5fbbbb6b3003c7b',
       'name': 'Brookbanks Park',
       'location': {'address': 'Toronto',
        'lat': 43.751976046055574,
        'lng': -79.33214044722958,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.751976046055574,
          'lng': -79.33214044722958}],
        'distance': 245,
        'cc': 'CA',
        'c

<br />
I extracted the information necessary to create a dataframe and confirm that the total number of venues shown in the dataframe matches the total number of venues the API returned.

In [20]:
# borrowed get_category_type so we can pull the items key
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues)
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
nearby_venues.head()

  


Unnamed: 0,name,categories,lat,lng
0,Brookbanks Park,Park,43.751976,-79.33214
1,Variety Store,Food & Drink Shop,43.751974,-79.333114


In [21]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

2 venues were returned by Foursquare.


## Exploring neighborhoods in Toronto

I set up the general function to pull all the venues, set up a dataframe, and determine the number of unique categories of venues in each particular neighborhood.

In [22]:
# general function to pull all venues
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [23]:
# get venues by borough
tor_venues = getNearbyVenues(names=toronto_df['Neighborhood'],
                                   latitudes=toronto_df['Latitude'],
                                   longitudes=toronto_df['Longitude']
                                  )
print('Venues loaded.')

Parkwoods
Victoria Village
Regent Park / Harbourfront
Lawrence Manor / Lawrence Heights
Queen's Park / Ontario Provincial Government
Islington Avenue
Malvern / Rouge
Don Mills
Parkview Hill / Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park / Princess Gardens / Martin Grove / Islington / Cloverdale
Rouge Hill / Port Union / Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate / Bloordale Gardens / Old Burnhamthorpe / Markland Wood
Guildwood / Morningside / West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor / Wilson Heights / Downsview North
Thorncliffe Park
Richmond / Adelaide / King
Dufferin / Dovercourt Village
Scarborough Village
Fairview / Henry Farm / Oriole
Northwood Park / York University
East Toronto
Harbourfront East / Union Station / Toronto Islands
Little Portugal / Trinity
Kennedy Park / Ionview / East Birchmount Park
Bayview Village
Do

In [24]:
#see a sample of the dataframe that was generated
tor_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
4,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant


In [25]:
#check shape of dataframe
print(tor_venues.shape)

(2151, 7)


<br />
The shape indicates the total number of venues in Toronto spread out across all neighborhoods.

<br />
I display the number of venues by neighborhood.  Some neighborhoods appear busier than others.

In [26]:
# how many venues of each type are in each neighborhood
tor_venue_count = tor_venues.groupby('Neighborhood').count()

In [27]:
tor_venue_count['Venue'][0:32]

Neighborhood
Agincourt                                                                                                             5
Alderwood / Long Branch                                                                                              10
Bathurst Manor / Wilson Heights / Downsview North                                                                    19
Bayview Village                                                                                                       4
Bedford Park / Lawrence Manor East                                                                                   24
Berczy Park                                                                                                          57
Birch Cliff / Cliffside West                                                                                          4
Brockton / Parkdale Village / Exhibition Place                                                                       23
Business reply mail Process

In [28]:
tor_venue_count['Venue'][33:65]

Neighborhood
Golden Mile / Clairlea / Oakridge                                                                                                                 10
Guildwood / Morningside / West Hill                                                                                                                7
Harbourfront East / Union Station / Toronto Islands                                                                                              100
High Park / The Junction South                                                                                                                    23
Hillcrest Village                                                                                                                                  5
Humber Summit                                                                                                                                      1
Humberlea / Emery                                                                            

In [29]:
tor_venue_count['Venue'][66:93]

Neighborhood
Roselawn                                                                                                                  1
Rouge Hill / Port Union / Highland Creek                                                                                  3
Runnymede / Swansea                                                                                                      39
Runnymede / The Junction North                                                                                            3
Scarborough Village                                                                                                       1
South Steeles / Silverstone / Humbergate / Jamestown / Mount Olive / Beaumond Heights / Thistletown / Albion Gardens      9
St. James Town                                                                                                           77
St. James Town / Cabbagetown                                                                                           

In [30]:
# number of unique categories
print('There are {} uniques categories of venues.'.format(len(tor_venues['Venue Category'].unique())))

There are 267 uniques categories of venues.


<br />
As an aside, from previous similar exercises there are 323 unique categories of venues just in the single New York City borough of Manhattan alone.
<br /><br />
In terms of total number of venues by neighborhood, it is clear that some may be more vibrant and bustling than others.

## Analyze each neighborhood

I performed one-hot encoding to create dummies for each venue category, determine mean frequency, and cap the number of venues at an arbitrary value of ten prior to setting up the cluster analysis.

In [31]:
# onehot encoding
tor_encoded = pd.get_dummies(tor_venues[['Venue Category']], prefix="", prefix_sep="")
# add neighborhood column
tor_encoded['Neighborhood'] = tor_venues['Neighborhood'] 
n_col = tor_encoded.pop('Neighborhood')
tor_encoded.insert(0, 'Neighborhood', n_col)
tor_encoded.head()

Unnamed: 0,Neighborhood,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [32]:
# mean venue per category by neighborhood
toronto_grouped = tor_encoded.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
1,Alderwood / Long Branch,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
2,Bathurst Manor / Wilson Heights / Downsview North,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.052632,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
4,Bedford Park / Lawrence Manor East,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,...,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
88,Wexford / Maryvale,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
89,Willowdale,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.000000,0.025641,0.0,0.0,0.0,0.0,0.0,0.0
90,Woburn,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
91,Woodbine Heights,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.100000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0


In [33]:
# confirm new shize
toronto_grouped.shape

(93, 267)

In [34]:
# function to sort venues into different categories
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [35]:
# new df and top ten venues per negighborhood
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Lounge,Skating Rink,Latin American Restaurant,Breakfast Spot,Clothing Store,Yoga Studio,Drugstore,Distribution Center,Dog Run,Doner Restaurant
1,Alderwood / Long Branch,Pizza Place,Sandwich Place,Skating Rink,Pharmacy,Gym,Dance Studio,Coffee Shop,Athletics & Sports,Pub,Dim Sum Restaurant
2,Bathurst Manor / Wilson Heights / Downsview North,Coffee Shop,Bank,Fried Chicken Joint,Bridal Shop,Sandwich Place,Diner,Restaurant,Middle Eastern Restaurant,Supermarket,Sushi Restaurant
3,Bayview Village,Café,Bank,Chinese Restaurant,Japanese Restaurant,Yoga Studio,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
4,Bedford Park / Lawrence Manor East,Sandwich Place,Italian Restaurant,Restaurant,Coffee Shop,Sushi Restaurant,Greek Restaurant,Thai Restaurant,Liquor Store,Comfort Food Restaurant,Juice Bar


<br />
This sets up creating clusters from the most common categories by neighborhood.

## Cluster neighborhoods
Initially I attempted to perform the analysis with k=5 clusters, however this led to all but a dozen neighborhoods falling into one particular cluster.  So instead I re-ran the cluster analysis with k=10 clusters and began to see a bit more spread and diversity in the clusters from which some useful analysis could be gained.

In [36]:
knums = 10
toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)
kclusts = KMeans(n_clusters=knums, random_state=0).fit(toronto_grouped_clustering)
kclusts.labels_[0:10].astype(int)

array([2, 1, 1, 1, 1, 1, 1, 1, 2, 2])

<br />
It was necessary to merge the dataframe used to generate the clusters with the dataframe containing the original postal code, borough, and latitude/longitude data.  The latitude/longitude is especially necessary since it is needed to generate a map showing each neighborhood.

In [37]:
# create dataframe with all necessary columns and information
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kclusts.labels_)
tor_merged = toronto_df
tor_merged = tor_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
tor_merged.head(10)

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,9.0,Food & Drink Shop,Park,Drugstore,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Yoga Studio
1,M4A,North York,Victoria Village,43.725882,-79.315572,1.0,Pizza Place,Coffee Shop,Intersection,French Restaurant,Portuguese Restaurant,Hockey Arena,Eastern European Restaurant,Electronics Store,Drugstore,Dessert Shop
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636,1.0,Coffee Shop,Bakery,Pub,Park,Breakfast Spot,Café,Theater,Yoga Studio,Cosmetics Shop,Shoe Store
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763,2.0,Clothing Store,Furniture / Home Store,Coffee Shop,Event Space,Shoe Store,Sporting Goods Shop,Miscellaneous Shop,Arts & Crafts Store,Accessories Store,Boutique
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494,1.0,Coffee Shop,Sushi Restaurant,Diner,Burger Joint,Burrito Place,Juice Bar,Café,Japanese Restaurant,Italian Restaurant,Beer Bar
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242,,,,,,,,,,,
6,M1B,Scarborough,Malvern / Rouge,43.806686,-79.194353,6.0,Fast Food Restaurant,Print Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Yoga Studio,Dessert Shop
7,M3B,North York,Don Mills,43.745906,-79.352188,1.0,Gym,Beer Store,Restaurant,Japanese Restaurant,Coffee Shop,Asian Restaurant,Dim Sum Restaurant,Sandwich Place,Bike Shop,Sporting Goods Shop
8,M4B,East York,Parkview Hill / Woodbine Gardens,43.706397,-79.309937,2.0,Pizza Place,Gastropub,Pharmacy,Gym / Fitness Center,Breakfast Spot,Fast Food Restaurant,Intersection,Bank,Athletics & Sports,Pet Store
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,1.0,Clothing Store,Coffee Shop,Café,Italian Restaurant,Cosmetics Shop,Restaurant,Japanese Restaurant,Bubble Tea Shop,Middle Eastern Restaurant,Diner


<br />
By specifying <b>tor_merged.head(10)</b> it is clear that the Foursquare API did not return a single venue as shown by row 5 for M9A/Etobicoke/Islington Avenue.  While this will not show up in the cluster analysis, I feel that this information might be nice to hold on to for the final analysis, so I created a dataframe of neighborhoods that did not have venues, and thus a cluster/cluster label could not be generated. In a sense, these neighborhoods might be considered their own cluster.

Once these were moved to their own dataframe, I removed them from the dataframe to be mapped.  In order to iterate using cluster labels, the cluster labels needed to be type(int) and not type(float).  The column could not be re-typed if there was NaN fields for Cluster Labels.

In [38]:
# write neighborhoods with no venues to a separate dataframe 
tor_no_venues = tor_merged[tor_merged["Cluster Labels"].isnull()]
tor_no_venues.head(10)

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242,,,,,,,,,,,
11,M9B,Etobicoke,West Deane Park / Princess Gardens / Martin Gr...,43.650943,-79.554724,,,,,,,,,,,
45,M2L,North York,York Mills / Silver Hills,43.75749,-79.374714,,,,,,,,,,,
52,M2M,North York,Willowdale / Newtonbrook,43.789053,-79.408493,,,,,,,,,,,
95,M1X,Scarborough,Upper Rouge,43.836125,-79.205636,,,,,,,,,,,


In [39]:
#remove rows from dataframe with no venues
tor_merged = tor_merged.dropna(axis=0, subset=['Cluster Labels'])
# re-index tor_merged
tor_merged.reset_index(drop=True)
# convert Cluster Labels to int to allow it to be used for iteration in the mapping function
tor_merged[['Cluster Labels']] = tor_merged[['Cluster Labels']].astype(int)
tor_merged.head(10)

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,9,Food & Drink Shop,Park,Drugstore,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Yoga Studio
1,M4A,North York,Victoria Village,43.725882,-79.315572,1,Pizza Place,Coffee Shop,Intersection,French Restaurant,Portuguese Restaurant,Hockey Arena,Eastern European Restaurant,Electronics Store,Drugstore,Dessert Shop
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636,1,Coffee Shop,Bakery,Pub,Park,Breakfast Spot,Café,Theater,Yoga Studio,Cosmetics Shop,Shoe Store
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.718518,-79.464763,2,Clothing Store,Furniture / Home Store,Coffee Shop,Event Space,Shoe Store,Sporting Goods Shop,Miscellaneous Shop,Arts & Crafts Store,Accessories Store,Boutique
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494,1,Coffee Shop,Sushi Restaurant,Diner,Burger Joint,Burrito Place,Juice Bar,Café,Japanese Restaurant,Italian Restaurant,Beer Bar
6,M1B,Scarborough,Malvern / Rouge,43.806686,-79.194353,6,Fast Food Restaurant,Print Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Yoga Studio,Dessert Shop
7,M3B,North York,Don Mills,43.745906,-79.352188,1,Gym,Beer Store,Restaurant,Japanese Restaurant,Coffee Shop,Asian Restaurant,Dim Sum Restaurant,Sandwich Place,Bike Shop,Sporting Goods Shop
8,M4B,East York,Parkview Hill / Woodbine Gardens,43.706397,-79.309937,2,Pizza Place,Gastropub,Pharmacy,Gym / Fitness Center,Breakfast Spot,Fast Food Restaurant,Intersection,Bank,Athletics & Sports,Pet Store
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,1,Clothing Store,Coffee Shop,Café,Italian Restaurant,Cosmetics Shop,Restaurant,Japanese Restaurant,Bubble Tea Shop,Middle Eastern Restaurant,Diner
10,M6B,North York,Glencairn,43.709577,-79.445073,2,Pizza Place,Park,Japanese Restaurant,Metro Station,Pub,Dog Run,Dim Sum Restaurant,Diner,Discount Store,Distribution Center


<br />
I then mapped the clusters, with each cluster having a different color associated with it as well as the label indicating cluster number, postal code, borough, and neighborhoods.

In [40]:
# map the clusters
map_clusters = folium.Map(location=[tor_latitude, tor_longitude], zoom_start=11)
colors_array = cm.rainbow(np.linspace(0, 1, knums))
rainboom = [colors.rgb2hex(i) for i in colors_array]

for lat, lon, postcode, torborough, torhood, cluster in zip(tor_merged['Latitude'], tor_merged['Longitude'], tor_merged['Postal code'], tor_merged['Borough'], tor_merged['Neighborhood'], tor_merged['Cluster Labels']):
    label = 'Cluster {}. Postal Code {}. Borough: {}. Neighborhoods: {}'.format(cluster, postcode, torborough, torhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainboom[cluster-1],
        fill=True,
        fill_color=rainboom[cluster-1],
        fill_opacity=0.7,
        parse_html=True).add_to(map_clusters)
       
map_clusters


## Examine clusters
Having visualized the clusters for reference, I queried and returned the number of neighborhood groups in each cluster.  I then separated the dataframes by Cluster Labels, displayed the results, and provided cluster-by-cluster commentary before interjecting some closing thoughts. 

In [41]:
tor_merged_count = tor_merged.groupby('Cluster Labels').count()
tor_merged_count['Neighborhood']

Cluster Labels
0     1
1    51
2    30
3     2
4     1
5     1
6     1
7     2
8     1
9     8
Name: Neighborhood, dtype: int64

<br />
It is apparent that even though more clusters were created, there are still a number of neighborhoods with only one or two members, two clusters with many members, and one cluster with a small-to-midsize number of members.  While I will display neighborhoods associated with all clusters, most of my commentary will revolve around the three in question.


### Cluster 0: Grab a Slice as You Leave
The cluster featured one member with a bus line as the second most common venue.  Pizza and sandwiches rounded out the top three most common venues.  This seems like a perfect neighborhood for commuters or those needed to leave town on a full stomach.

In [42]:
tor_merged.loc[tor_merged['Cluster Labels'] == 0, tor_merged.columns[[2,0,1] + list(range(6, tor_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Postal code,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
77,Kingsview Village / St. Phillips / Martin Grov...,M9R,Etobicoke,Pizza Place,Bus Line,Sandwich Place,Mobile Phone Shop,Dog Run,Diner,Discount Store,Distribution Center,Doner Restaurant,College Stadium



### Cluster 1:  Seriously Caffeinated
The outstanding feature of this cluster is the high number of coffee shops and cafes as the most common venue.  If not the top venue, coffee shops and cafes are very popular in all of its member neighborhoods.  The next most common type of venue is those that may involve the serving or sale of alcohol.  Restaurants of various ethnic flavors are feature predominately.  That makes sense that many of these neighborhoods happen to be in the Downtown Toronto borough, but not all.  Many reside in other boroughs, thus suggesting that these neighborhoods in non-Downtown Toronto boroughs may be the destination location equivalent for their respective boroughs.

In [43]:
tor_merged.loc[tor_merged['Cluster Labels'] == 1, tor_merged.columns[[2,0,1] + list(range(6, tor_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Postal code,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Victoria Village,M4A,North York,Pizza Place,Coffee Shop,Intersection,French Restaurant,Portuguese Restaurant,Hockey Arena,Eastern European Restaurant,Electronics Store,Drugstore,Dessert Shop
2,Regent Park / Harbourfront,M5A,Downtown Toronto,Coffee Shop,Bakery,Pub,Park,Breakfast Spot,Café,Theater,Yoga Studio,Cosmetics Shop,Shoe Store
4,Queen's Park / Ontario Provincial Government,M7A,Downtown Toronto,Coffee Shop,Sushi Restaurant,Diner,Burger Joint,Burrito Place,Juice Bar,Café,Japanese Restaurant,Italian Restaurant,Beer Bar
7,Don Mills,M3B,North York,Gym,Beer Store,Restaurant,Japanese Restaurant,Coffee Shop,Asian Restaurant,Dim Sum Restaurant,Sandwich Place,Bike Shop,Sporting Goods Shop
9,"Garden District, Ryerson",M5B,Downtown Toronto,Clothing Store,Coffee Shop,Café,Italian Restaurant,Cosmetics Shop,Restaurant,Japanese Restaurant,Bubble Tea Shop,Middle Eastern Restaurant,Diner
13,Don Mills,M3C,North York,Gym,Beer Store,Restaurant,Japanese Restaurant,Coffee Shop,Asian Restaurant,Dim Sum Restaurant,Sandwich Place,Bike Shop,Sporting Goods Shop
15,St. James Town,M5C,Downtown Toronto,Café,Coffee Shop,Gastropub,Cocktail Bar,American Restaurant,Seafood Restaurant,Gym,Lingerie Store,Cosmetics Shop,Hotel
17,Eringate / Bloordale Gardens / Old Burnhamthor...,M9C,Etobicoke,Pizza Place,Beer Store,Convenience Store,Coffee Shop,Cosmetics Shop,Café,Shopping Plaza,Liquor Store,Pet Store,General Travel
20,Berczy Park,M5E,Downtown Toronto,Coffee Shop,Cocktail Bar,Seafood Restaurant,Cheese Shop,Bakery,Café,Beer Bar,Restaurant,Shopping Mall,Bistro
22,Woburn,M1G,Scarborough,Coffee Shop,Korean Restaurant,Convenience Store,Yoga Studio,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore



### Cluster 2: The Night Life Lives Somewhere During the Day
In contrast to Cluster 1, the top venue seems to contain a higher number of grocery stores as the top venue.  While various types of restaurants are also present, other types of pick-up/takeout-oriented establishments along with basic services suggest that this might be neighborhoods where people live rather than travel to for entertainment.  This is a large cluster, with a nice mix of food, fun, and possible family activity locations. Judging by the map, this does appear to be the more suburban sections of Toronto, as opposed to Downtown Toronto.

In [44]:
tor_merged.loc[tor_merged['Cluster Labels'] == 2, tor_merged.columns[[2,0,1] + list(range(6, tor_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Postal code,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Lawrence Manor / Lawrence Heights,M6A,North York,Clothing Store,Furniture / Home Store,Coffee Shop,Event Space,Shoe Store,Sporting Goods Shop,Miscellaneous Shop,Arts & Crafts Store,Accessories Store,Boutique
8,Parkview Hill / Woodbine Gardens,M4B,East York,Pizza Place,Gastropub,Pharmacy,Gym / Fitness Center,Breakfast Spot,Fast Food Restaurant,Intersection,Bank,Athletics & Sports,Pet Store
10,Glencairn,M6B,North York,Pizza Place,Park,Japanese Restaurant,Metro Station,Pub,Dog Run,Dim Sum Restaurant,Diner,Discount Store,Distribution Center
12,Rouge Hill / Port Union / Highland Creek,M1C,Scarborough,Construction & Landscaping,History Museum,Bar,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
14,Woodbine Heights,M4C,East York,Park,Cosmetics Shop,Beer Store,Diner,Dance Studio,Athletics & Sports,Curling Ice,Skating Rink,Video Store,Pharmacy
16,Humewood-Cedarvale,M6C,York,Field,Dog Run,Trail,Hockey Arena,Donut Shop,Diner,Discount Store,Distribution Center,Doner Restaurant,Yoga Studio
18,Guildwood / Morningside / West Hill,M1E,Scarborough,Mexican Restaurant,Rental Car Location,Breakfast Spot,Medical Center,Electronics Store,Intersection,Bank,Doner Restaurant,Distribution Center,Dog Run
19,The Beaches,M4E,East Toronto,Health Food Store,Pub,Trail,Yoga Studio,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run
26,Cedarbrae,M1H,Scarborough,Caribbean Restaurant,Fried Chicken Joint,Bank,Thai Restaurant,Athletics & Sports,Gas Station,Bakery,Hakka Restaurant,Eastern European Restaurant,Drugstore
27,Hillcrest Village,M2H,North York,Athletics & Sports,Mediterranean Restaurant,Golf Course,Pool,Dog Run,Yoga Studio,Donut Shop,Diner,Discount Store,Distribution Center



### Cluster 3: Body and Mind
This cluster featured parks and yoga studios as the first and second most common venues for each of its members. Its members' order of common venues were also the same.

In [45]:
tor_merged.loc[tor_merged['Cluster Labels'] == 3, tor_merged.columns[[2,0,1] + list(range(6, tor_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Postal code,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
64,Weston,M9N,York,Park,Yoga Studio,Drugstore,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Eastern European Restaurant
83,Moore Park / Summerhill East,M4T,Central Toronto,Park,Yoga Studio,Drugstore,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Eastern European Restaurant



### Cluster 4:  Lonely Playground
This cluster featured one member in the borough of Scarborough.  Its top venue was a playground.

In [46]:
tor_merged.loc[tor_merged['Cluster Labels'] == 4, tor_merged.columns[[2,0,1] + list(range(6, tor_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Postal code,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,Scarborough Village,M1J,Scarborough,Playground,Yoga Studio,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore



### Cluster 5: Unhealthy Eating
This cluster had a member with the top four common venues as eating establishments. The two two were pizza and donuts respectively.  Also notable that no outdoor venues such as parks or playgrounds were common, except for dog runs as the seventh most common venue.  Of course, dog runs are good exercise for dogs, but not necessarily for people.

In [47]:
tor_merged.loc[tor_merged['Cluster Labels'] == 5, tor_merged.columns[[2,0,1] + list(range(6, tor_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Postal code,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
50,Humber Summit,M9L,North York,Pizza Place,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Yoga Studio,Dessert Shop



### Cluster 6: A Quick Bite Before Copies
This cluster contained a member where print shops were the second most common venue.  Restaurants and dining establishments filled out the top four.  A good place for a meal after placing a big print order or for a fast lunch if you work at a local printer.

In [48]:
tor_merged.loc[tor_merged['Cluster Labels'] == 6, tor_merged.columns[[2,0,1] + list(range(6, tor_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Postal code,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Malvern / Rouge,M1B,Scarborough,Fast Food Restaurant,Print Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Yoga Studio,Dessert Shop



### Cluster 7: Play Ball!
This cluster features neighborhoods in which all members featured baseball fields as one of the top two most common venue.  All members were very similar in venues and order of venue popularity.

In [49]:
tor_merged.loc[tor_merged['Cluster Labels'] == 7, tor_merged.columns[[2,0,1] + list(range(6, tor_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Postal code,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
57,Humberlea / Emery,M9M,North York,Paper / Office Supplies Store,Baseball Field,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Eastern European Restaurant
101,Old Mill South / King's Mill Park / Sunnylea /...,M8Y,Etobicoke,Baseball Field,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Eastern European Restaurant,Filipino Restaurant



### Cluster 8: Just Like Paradise
This cluster has a sole member with top venues of gardens, yoga studios, and farmers markets, suggesting an outdoorsy, health-oriented, or body and mind type of orientation seen in "similar" clusters.  Event space and three different types of ethnic dining experiences suggest a diverse community atmosphere in this Central Toronto borough as a possible coming together point for Toronto and its surrounding communities.

In [50]:
tor_merged.loc[tor_merged['Cluster Labels'] == 8, tor_merged.columns[[2,0,1] + list(range(6, tor_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Postal code,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
62,Roselawn,M5N,Central Toronto,Garden,Yoga Studio,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Drugstore,Donut Shop



### Cluster 9: Fresh Air!
This cluster featured every member having an outdoor recreation venue (either a park, playground, or trail) as the top three most common venues.  Two of the members had multiple outdoor recreation venues as the top three most common venues.

In [51]:
tor_merged.loc[tor_merged['Cluster Labels'] == 9, tor_merged.columns[[2,0,1] + list(range(6, tor_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Postal code,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Parkwoods,M3A,North York,Food & Drink Shop,Park,Drugstore,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Yoga Studio
21,Caledonia-Fairbanks,M6E,York,Park,Women's Store,Spa,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Electronics Store
35,East Toronto,M4J,East York,Park,Coffee Shop,Convenience Store,Yoga Studio,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant
61,Lawrence Park,M4N,Central Toronto,Park,Swim School,Construction & Landscaping,Bus Line,Yoga Studio,Donut Shop,Discount Store,Distribution Center,Dog Run,Doner Restaurant
66,York Mills West,M2P,North York,Park,Convenience Store,Bank,Bar,Yoga Studio,Drugstore,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
68,Forest Hill North & West,M5P,Central Toronto,Jewelry Store,Park,Sushi Restaurant,Bus Line,Trail,Donut Shop,Discount Store,Distribution Center,Dog Run,Doner Restaurant
85,Milliken / Agincourt North / Steeles East / L'...,M1V,Scarborough,Playground,Park,Coffee Shop,Yoga Studio,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant
91,Rosedale,M4W,Downtown Toronto,Park,Playground,Trail,Yoga Studio,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run



### Cluster NaN: Clusterless
As mentioned previously, there were five neighborhoods that did not have a single venue returned by the Foursquare API.  That does not mean that nothing is going on there, just not anything we were able to measure with our methodology.  All are in the outlying communities and boroughs.

In [52]:
tor_no_venues

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242,,,,,,,,,,,
11,M9B,Etobicoke,West Deane Park / Princess Gardens / Martin Gr...,43.650943,-79.554724,,,,,,,,,,,
45,M2L,North York,York Mills / Silver Hills,43.75749,-79.374714,,,,,,,,,,,
52,M2M,North York,Willowdale / Newtonbrook,43.789053,-79.408493,,,,,,,,,,,
95,M1X,Scarborough,Upper Rouge,43.836125,-79.205636,,,,,,,,,,,


<br />
As with most analysis, a framework has been created for further research and for additional questions to be asked and answered.  In terms of the clustering and analysis of Toronto neighborhoods, a number of different and new approaches can be made to know more about these areas of the city and its surrounding areas.  The methodology of this was to take Toronto and all areas and analyze them together as one.  Perhaps additional analyses can be conducted on individual boroughs in much the same way as this analysis of all boroughs was done.  Would focusing on Downtown Toronto neighborhoods, or Central Toronto neighborhoods, or North York neighborhoods only yield different insights specific to those boroughs?  Different insights could be gained by looking at different numbers of clusters or using a different method or delving deeper into the relative strengths of each venue's commonality in a particular area.  The possibilities are endless, as they should be.
<br />

<p>&nbsp;</p>
<p><img src="https://canadianculturething.com/wp-content/uploads/2011/06/scottpilgrim6-624x345.jpg">
<p>Image source: 
<a href="https://canadianculturething.com/scott-pilgrim-vs-keeping-toronto-anonymous/">Canadian Culture Thing: Scott Pilgrim vs. Keeping Toronto Anonymous!</a>