# Segmenting and Clustering Neighborhoods in Toronto

#
#
## Section 1 - Get PostCodes data into a structured format

In [1]:
import pandas as pd
import numpy as np

I cut and pasted the data from the web table into a text file.  That file is tab-separated in order to preserve the comma-separated values in the neighbourhood column.

In [2]:
# Read the file in from my local drive
df = pd.read_csv('~/dev/GitHub/Coursera_Capstone/toronto-post-codes.tsv', delimiter='\t')
df.shape

(180, 3)

In [3]:
# Rename the column currently named "Postal Code" to "PostalCode".
df.rename( columns={'Postal Code':'PostalCode'}, inplace=True )
# Rename the column currently named "Neighbourhood" to "Neighborhood".
df.rename( columns={'Neighbourhood':'Neighborhood'}, inplace=True )

df.columns

Index(['PostalCode', 'Borough', 'Neighborhood'], dtype='object')

We can ignore cells with a borough that is Not assigned.

In [4]:
# So I'll replace all "Not assigned" values with NaN
df.replace("Not assigned", np.nan, inplace = True)

# then drop any rows with NaN in "Borough" column
df.dropna(subset=["Borough"], axis=0, inplace=True)

# and reset the index since I droped some rows
df.reset_index(drop=True, inplace=True)

df.shape

(103, 3)

We're told that more than one neighborhood can exist in one postal code area.  That's ok because this is also like the picture of the dataframe we're trying to mimic.

We're also told that if a cell has a borough but a not assigned neighborhood, then the neighborhood will be the same as the borough.  That's fine because I can't see any rows where Neighbourhood = NaN anyway.

In [5]:
for i in df['Neighborhood']:
    if i == 'NaN':
        print(i)

So this what my prepared dataframe looks like:

In [6]:
# Might as well sort it by the PostalCode column.
df.sort_values(by=['PostalCode'])

df.head(12)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


The data in my dataframe isn't sorted in the same way as in the picture of the one we're trying to mimic, but hopefully the shape is correct anyway.

In [7]:
df.shape

(103, 3)

#
#
## Section 2 - Add Latitude and Longitude values

### First try with Geocoder

In [12]:
# Create a dataframe of postcodes
postal_code = df[['PostalCode']]
postal_code.head(2)

Unnamed: 0,PostalCode
0,M3A
1,M4A


In [13]:
# Install it.
!pip install geocoder



In [14]:
import geocoder

In [15]:
# Try a basic retrieve from geocoder.
g = geocoder.google('Mountain View, CA')
g.latlng

It seems that even a basic retrieval of data from geocoder isn't working.  Nothing happens.  So I'll go for plan B, to use the provided data.

### Plan B - Use the provided file of LatLong values

In [16]:
# Read the file in from my local drive
coords = pd.read_csv('~/dev/GitHub/Coursera_Capstone/Geospatial_Coordinates.csv')
coords.head(2)

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497


In [17]:
# Rename the column currently named "Postal Code" to "PostalCode" so I can use that column to do the merge.
coords.rename( columns={'Postal Code':'PostalCode'}, inplace=True )
coords.columns

Index(['PostalCode', 'Latitude', 'Longitude'], dtype='object')

In [18]:
# Merge the two dataframes (df and coords) on the PostalCode column values.
neighborhoods = df.merge(coords, how='inner', on='PostalCode')
neighborhoods.head(1)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656


In [19]:
# Do a random sample to check that the merge is ok (the latlong values are the same for one sample).
print( neighborhoods[neighborhoods["PostalCode"] == 'M3A'] )
print( coords[coords["PostalCode"] == 'M3A'] )

  PostalCode     Borough Neighborhood   Latitude  Longitude
0        M3A  North York    Parkwoods  43.753259 -79.329656
   PostalCode   Latitude  Longitude
25        M3A  43.753259 -79.329656


OK, so here is the dataframe required for the second submission...

In [20]:
neighborhoods.head(12)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


##
##
## Section 3a - EXPLORE the neighborhoods in Toronto

#
#
### First let's see a map of all Toronto neighbourhoods

In [21]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

In [22]:
# Use geopy library to get the latitude and longitude values of Toronto.
address = 'Toronto, CA'
geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geographical coordinate of Toronto are 43.6534817, -79.3839347.


In [23]:
import folium # map rendering library

In [24]:
# Create a map of Toronto with neighborhoods superimposed on top.
# using latitude and longitude values
map_to = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip( \
                        neighborhoods['Latitude'], neighborhoods['Longitude'], \
                        neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_to)  
    
map_to

#
#


The instructions say "You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you. ", so ...

# 
### Create a dataset and map of only boroughs that contain the word "Downtown Toronto"

In [25]:
# Reduce the dataset to only Downtown Toronto boroughs
toronto_data = neighborhoods[neighborhoods['Borough'] == 'Downtown Toronto'].reset_index(drop=True)

print('There are {} Neighbourhoods in Downtown Toronto.'.format(toronto_data.shape[0]))
toronto_data.head(2)

There are 19 Neighbourhoods in Downtown Toronto.


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [26]:
# Create a map of Toronto with neighborhoods superimposed on top.
# using latitude and longitude values
map_to = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, borough, neighborhood in zip( \
                        toronto_data['Latitude'], toronto_data['Longitude'], \
                        toronto_data['Borough'], toronto_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_to)  
    
map_to

#
#
### Next, use Foursquare to explore the FIRST neighborhood in Downtown Toronto, which is...

In [1]:
# Define Foursquare Credentials and Version

CLIENT_ID = 'N0ICIYH1JETTJIRN43TOFSXSRFLBOA40BD4W0ROCHJOMJFW5' 	# your Foursquare ID
CLIENT_SECRET = 'REMOVED' 	# your Foursquare Secret
VERSION = '20180605' 						# Foursquare API version
LIMIT = 100 							# A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
# print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: N0ICIYH1JETTJIRN43TOFSXSRFLBOA40BD4W0ROCHJOMJFW5


In [28]:
toronto_data.loc[0, 'Neighborhood']

'Regent Park, Harbourfront'

In [29]:
# Get the neighborhood's latitude and longitude values.
neighborhood_latitude = toronto_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = toronto_data.loc[0, 'Longitude'] # neighborhood longitude value
neighborhood_name = toronto_data.loc[0, 'Neighborhood'] # neighborhood name
print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Regent Park, Harbourfront are 43.6542599, -79.3606359.


In [30]:
# Now, create the GET request URL
# for the top 100 venues that are in Regent Park, Harbourfront 
# within a radius of 500 meters.

LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

url

'https://api.foursquare.com/v2/venues/explore?&client_id=N0ICIYH1JETTJIRN43TOFSXSRFLBOA40BD4W0ROCHJOMJFW5&client_secret=DQLS4RSMJOD0EMTVREXJA3IK2PP0QD2W2IV3EFDSBCMDTATY&v=20180605&ll=43.6542599,-79.3606359&radius=500&limit=100'

In [31]:
import requests

In [32]:
# Send the GET request to Foursquare and examine the results.
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '600009dc32d7e642f97738e8'},
 'response': {'headerLocation': 'Corktown',
  'headerFullLocation': 'Corktown, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 46,
  'suggestedBounds': {'ne': {'lat': 43.6587599045, 'lng': -79.3544279001486},
   'sw': {'lat': 43.6497598955, 'lng': -79.36684389985142}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '54ea41ad498e9a11e9e13308',
       'name': 'Roselle Desserts',
       'location': {'address': '362 King St E',
        'crossStreet': 'Trinity St',
        'lat': 43.653446723052674,
        'lng': -79.3620167174383,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.653446723052674,
          'lng': -79.3620167174383}],
        'distance': 143,
       

In [33]:
# Define the get_category_type function (from the Foursquare lab).
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [34]:
# Clean the json and structure it into a pandas dataframe.
venues = results['response']['groups'][0]['items']
    
nearby_venues = pd.json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Roselle Desserts,Bakery,43.653447,-79.362017
1,Tandem Coffee,Coffee Shop,43.653559,-79.361809
2,Morning Glory Cafe,Breakfast Spot,43.653947,-79.361149
3,Cooper Koo Family YMCA,Distribution Center,43.653249,-79.358008
4,Body Blitz Spa East,Spa,43.654735,-79.359874


In [35]:
print('Foursquare returned {} venues in {}.'.format(nearby_venues.shape[0], toronto_data.loc[0, 'Neighborhood']))

Foursquare returned 46 venues in Regent Park, Harbourfront.


#
#
### Next, use Foursquare to explore ALL neighborhoods in Downtown Toronto...

In [36]:
# Create a function to repeat the above process for all neighborhoods in Downtown Toronto
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [37]:
# Now run the above function on each Downtown Toronto neighborhood to create a new dataframe called toronto_venues.
toronto_venues = getNearbyVenues(names=toronto_data['Neighborhood'],
                                   latitudes=toronto_data['Latitude'],
                                   longitudes=toronto_data['Longitude']
                                  )

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
Stn A PO Boxes
St. James Town, Cabbagetown
First Canadian Place, Underground city
Church and Wellesley


In [39]:
print("Foursquare returned {} venues in Downtown Toronto. \
Here's a sample...".format(toronto_venues.shape[0]))
print()
toronto_venues.head()

Foursquare returned 1227 venues in Downtown Toronto. Here's a sample...



Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park, Harbourfront",43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot
3,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
4,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa


In [40]:
print('There are {} uniques venue categories in Downtown Toronto.'.format( \
                                    len(toronto_venues['Venue Category'].unique())))

There are 206 uniques venue categories in Downtown Toronto.


#
#
### Analyze Each Neighborhood for Venue Types

In [42]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

print('One-hot encoding gives this...')
print()
toronto_onehot.head()

One-hot encoding gives this...



Unnamed: 0,Yoga Studio,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [43]:
print('Grouping this by neighborhood, using the mean of the frequency of occurrence of each category...')

toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Grouping this by neighborhood, using the mean of the frequency of occurrence of each category...


Unnamed: 0,Neighborhood,Yoga Studio,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.017241,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0
1,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0625,0.0625,0.0625,0.125,0.0625,0.125,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Central Bay Street,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.016949,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.016949
3,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Church and Wellesley,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,...,0.0,0.012821,0.012821,0.012821,0.0,0.0,0.0,0.0,0.012821,0.0
5,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,...,0.01,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01
6,"First Canadian Place, Underground city",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,...,0.01,0.02,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.01
7,"Garden District, Ryerson",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.01,0.01,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.01
8,"Harbourfront East, Union Station, Toronto Islands",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.01
9,"Kensington Market, Chinatown, Grange Park",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.060606,0.0,0.045455,0.015152


In [44]:
print('Here are the the top 5 most common venue types for each of the {} Downtown Toronto neighborhoods...'.format(toronto_data.shape[0]))
print()

num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

Here are the the top 5 most common venue types for each of the 19 Downtown Toronto neighborhoods...

----Berczy Park----
            venue  freq
0     Coffee Shop  0.10
1    Cocktail Bar  0.05
2        Beer Bar  0.03
3  Farmers Market  0.03
4      Restaurant  0.03


----CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport----
              venue  freq
0    Airport Lounge  0.12
1  Airport Terminal  0.12
2       Coffee Shop  0.06
3   Harbor / Marina  0.06
4               Bar  0.06


----Central Bay Street----
                venue  freq
0         Coffee Shop  0.20
1                Café  0.07
2  Italian Restaurant  0.05
3      Sandwich Place  0.05
4        Burger Joint  0.03


----Christie----
                venue  freq
0       Grocery Store  0.25
1                Café  0.19
2                Park  0.12
3  Athletics & Sports  0.06
4          Restaurant  0.06


----Church and Wellesley----
                  venue  freq
0           Coffe

Put this info into a pandas dataframe

In [45]:
# First, write a function to sort the venues in descending order.
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [46]:
# Now create the new dataframe and display the top 10 venue types for each neighborhood.

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = \
    return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

print('Here are the top 10 venue types for each neighborhood...')
print()
neighborhoods_venues_sorted.head()

Here are the top 10 venue types for each neighborhood...



Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Cocktail Bar,Beer Bar,Farmers Market,Restaurant,Bakery,Cheese Shop,Seafood Restaurant,Clothing Store,Park
1,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Lounge,Airport Terminal,Coffee Shop,Harbor / Marina,Bar,Boat or Ferry,Sculpture Garden,Boutique,Airport,Plane
2,Central Bay Street,Coffee Shop,Café,Italian Restaurant,Sandwich Place,Burger Joint,Salad Place,Bubble Tea Shop,Department Store,Diner,Discount Store
3,Christie,Grocery Store,Café,Park,Athletics & Sports,Restaurant,Baby Store,Italian Restaurant,Nightclub,Candy Store,Coffee Shop
4,Church and Wellesley,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Restaurant,Fast Food Restaurant,Gay Bar,Yoga Studio,Men's Store,Pub,Smoke Shop


##
##
## Section 3b - CLUSTER the neighborhoods in Downtown Toronto using k-means...

In [47]:
from sklearn.cluster import KMeans

In [48]:
# Run k-means to cluster the neighborhood into 5 clusters.

# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 0, 1, 2, 1, 1, 1, 1, 1, 4], dtype=int32)

In [49]:
# Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [50]:
toronto_merged = toronto_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

print('Same as before, here are the top 10 venue types for each neighborhood, with the Cluster label added...')
print()
toronto_merged.head() # check the last columns!

Same as before, here are the top 10 venue types for each neighborhood, with the Cluster label added...



Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,1,Coffee Shop,Park,Café,Bakery,Pub,Breakfast Spot,Theater,Performing Arts Venue,Brewery,Shoe Store
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,1,Coffee Shop,Sushi Restaurant,Yoga Studio,Italian Restaurant,Café,College Auditorium,Creperie,Diner,Discount Store,Distribution Center
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,1,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Bubble Tea Shop,Hotel,Middle Eastern Restaurant,Japanese Restaurant,Diner,Fast Food Restaurant
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,1,Coffee Shop,Café,Cocktail Bar,American Restaurant,Gastropub,Moroccan Restaurant,Clothing Store,Cosmetics Shop,Creperie,Department Store
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,1,Coffee Shop,Cocktail Bar,Beer Bar,Farmers Market,Restaurant,Bakery,Cheese Shop,Seafood Restaurant,Clothing Store,Park


#
### Visualize the resulting clusters ...

In [52]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

In [53]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#
### And examine each cluster to determine the discriminating venue categories that distinguish each cluster. 

In [54]:
# Cluster 1
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Downtown Toronto,0,Airport Lounge,Airport Terminal,Coffee Shop,Harbor / Marina,Bar,Boat or Ferry,Sculpture Garden,Boutique,Airport,Plane


In [55]:
# Cluster 2
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,1,Coffee Shop,Park,Café,Bakery,Pub,Breakfast Spot,Theater,Performing Arts Venue,Brewery,Shoe Store
1,Downtown Toronto,1,Coffee Shop,Sushi Restaurant,Yoga Studio,Italian Restaurant,Café,College Auditorium,Creperie,Diner,Discount Store,Distribution Center
2,Downtown Toronto,1,Coffee Shop,Clothing Store,Café,Cosmetics Shop,Bubble Tea Shop,Hotel,Middle Eastern Restaurant,Japanese Restaurant,Diner,Fast Food Restaurant
3,Downtown Toronto,1,Coffee Shop,Café,Cocktail Bar,American Restaurant,Gastropub,Moroccan Restaurant,Clothing Store,Cosmetics Shop,Creperie,Department Store
4,Downtown Toronto,1,Coffee Shop,Cocktail Bar,Beer Bar,Farmers Market,Restaurant,Bakery,Cheese Shop,Seafood Restaurant,Clothing Store,Park
5,Downtown Toronto,1,Coffee Shop,Café,Italian Restaurant,Sandwich Place,Burger Joint,Salad Place,Bubble Tea Shop,Department Store,Diner,Discount Store
7,Downtown Toronto,1,Coffee Shop,Café,Restaurant,Gym,Hotel,Deli / Bodega,Clothing Store,Thai Restaurant,Bakery,Pizza Place
8,Downtown Toronto,1,Coffee Shop,Aquarium,Café,Hotel,Scenic Lookout,Brewery,Italian Restaurant,Restaurant,Fried Chicken Joint,Music Venue
9,Downtown Toronto,1,Coffee Shop,Hotel,Café,Seafood Restaurant,Italian Restaurant,American Restaurant,Restaurant,Salad Place,Japanese Restaurant,Asian Restaurant
10,Downtown Toronto,1,Coffee Shop,Restaurant,Café,Hotel,American Restaurant,Gym,Italian Restaurant,Japanese Restaurant,Seafood Restaurant,Deli / Bodega


In [56]:
# Cluster 3
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Downtown Toronto,2,Grocery Store,Café,Park,Athletics & Sports,Restaurant,Baby Store,Italian Restaurant,Nightclub,Candy Store,Coffee Shop


In [57]:
# Cluster 4
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Downtown Toronto,3,Park,Playground,Trail,Movie Theater,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop


In [58]:
# Cluster 5
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Downtown Toronto,4,Café,Bakery,Bookstore,Bar,Japanese Restaurant,Sushi Restaurant,Sandwich Place,French Restaurant,Restaurant,Beer Store
12,Downtown Toronto,4,Café,Vegetarian / Vegan Restaurant,Coffee Shop,Vietnamese Restaurant,Dessert Shop,Mexican Restaurant,Gaming Cafe,Caribbean Restaurant,Bar,Bakery


#
#
### Summary

So,

cluster 1 seems to be mainly defined by venues relating to the airport.

cluster 2 is the largest cluster, likely the most central and a great place to go for coffee shops, cafés and restaurants.

cluster 3 has a variety of amenities.

cluster 4 boasts the most parks.

cluster 5, though smaller than cluster 2, is also a great place find eating venues.