### Comparing Neighbourhoods in Toronto, Canada and Manhattan, New York, using K-Means Clustering

##### Dr Mehmood Syed

##### 

### Introduction

John Smith is a 37 year old IT Professional living in the neighbourhood of St. James Town in Toronto, Canada. John has a particular affinity to his current neighbourhood because of the eclectic mix of restaurants, bars, cafes and other venues in his immediate vicinity. For personal reasons however, John is required to relocate to Manhattan, New York. He is therefore seeking a shortlist of suitable neighbourhoods in Manhattan to which to relocate which are similar to St. James Town in terms of the local mix of venues. 

##### 

### Data

To provide a suitable recommendation to John Smith on the neighbourhoods in Manhattan which are similar to St. James Town, all neighbourhoods in both Toronto and Manhattan will be clustered using k-means clustering. The clustering will be based on the composition of the top 10 venues within a 500m radius of the geographic center of each neighbourhood. Those neighbourhoods in Manhattan which are also in the same cluster as St. James Town will then comprise the shortlist of neighbourhoods that John should consider relocating to. 

The FourSquare.com database has been queried previously and the top 10 venues within 500m of each neighbourhood in Manhattan have been identified. The latitude and longitude of each venue and each neighbourhood have been identified (see Coursera Capstone Project Week 3). Similarly, the top 10 venues within 500m of each neighbourhood of Toronto have been identified in the Capstone Project Week 3 Student Assignment. The link to the Jupyter Notebook explaining that analysis can be found here: https://github.com/msyed187/Coursera_Capstone/blob/master/Neighbourhood%20Clustering%20MSyed.ipynb 

##### 

### Method

To conduct the analysis the dataframes developed in previous assignments will be merged into a single dataframe and clustered using k-means clustering, based upon the mix of top 10 venues within 500m of each neighbourhood. The Cluster containing the neighbourhood of St. James Town, Toronto will then be identified. Neighbourhoods in Manhattan also included in the same cluster will then form the shortlist of recommended neighbourhoods for John Smith to consider for relocation.  

In [2]:
#Import the necessary libraries

import pandas as pd
import numpy as np
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim 
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
from sklearn import datasets
!conda install -c conda-forge folium=0.5.0 --yes
import folium

print('Libraries Imported')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries Imported


In [3]:
#Retrieve data from previous notebooks completed in Week 3 of the Coursera Capstone Project

%store -r manhattan_dataset
manhattan_dataset.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
1,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio
2,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
3,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop
4,Marble Hill,40.876551,-73.91066,Dunkin',40.877136,-73.906666,Donut Shop


In [4]:
#Rename columns to match UK English spelling used in Toronto dataset, add city borough as Manhattan and rearrange columns

manhattan_df=manhattan_dataset.rename(columns = {'Neighborhood':'Neighbourhood', 'Neighborhood Latitude': 'Neighbourhood Latitude', 'Neighborhood Longitude':'Neighbourhood Longitude'})
manhattan_df['City']= 'Manhattan'
col=manhattan_df.columns.tolist()
col=col[-1:] + col[:-1]
manhattan_df=manhattan_df[col]
manhattan_df.head()

Unnamed: 0,City,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Manhattan,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
1,Manhattan,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio
2,Manhattan,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
3,Manhattan,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop
4,Manhattan,Marble Hill,40.876551,-73.91066,Dunkin',40.877136,-73.906666,Donut Shop


In [5]:
#Retrieve Toronto Dataset from Week 3 Assignment 

%store -r toronto_dataset
toronto_dataset.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,The Beaches,43.676357,-79.293031,Glen Stewart Park,43.675278,-79.294647,Park
4,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood


In [6]:
#Add City borough of Toronto to this dataset and rearrange columns

toronto_df=toronto_dataset
toronto_df['City']= 'Toronto'
cols=toronto_df.columns.tolist()
cols=cols[-1:] + cols[:-1]
toronto_df=toronto_df[cols]
toronto_df.head()

Unnamed: 0,City,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Toronto,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,Toronto,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,Toronto,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,Toronto,The Beaches,43.676357,-79.293031,Glen Stewart Park,43.675278,-79.294647,Park
4,Toronto,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood


In [7]:
#Combine datasets for Manhattan and Toronto into a single dataframe and create a second dataframe without venue information, removing duplicate entries for neighbourhoods. 

combined_df=pd.concat([manhattan_df, toronto_df])
combined_data= combined_df.drop('Venue', axis=1)
combined_data= combined_data.drop('Venue Latitude', axis=1)
combined_data= combined_data.drop('Venue Longitude', axis=1)
combined_data= combined_data.drop('Venue Category', axis=1)

In [8]:
#Version of the Combined Data without Venue information
combined_data.drop_duplicates(subset=None, keep='first', inplace=True)
combined_data.head()

Unnamed: 0,City,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
26,Manhattan,Chinatown,40.715618,-73.994279
126,Manhattan,Washington Heights,40.851903,-73.9369
214,Manhattan,Inwood,40.867684,-73.92121
272,Manhattan,Hamilton Heights,40.823604,-73.949688


In [9]:
#Reset the index of the datafrmae wtihout venue information

combined_data.reset_index()
combined_data.head()

Unnamed: 0,City,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
26,Manhattan,Chinatown,40.715618,-73.994279
126,Manhattan,Washington Heights,40.851903,-73.9369
214,Manhattan,Inwood,40.867684,-73.92121
272,Manhattan,Hamilton Heights,40.823604,-73.949688


In [10]:
#Show the combined dataframe with Venue information

combined_df.head()

Unnamed: 0,City,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Manhattan,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
1,Manhattan,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio
2,Manhattan,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
3,Manhattan,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop
4,Manhattan,Marble Hill,40.876551,-73.91066,Dunkin',40.877136,-73.906666,Donut Shop


In [11]:
#Group by venue count in each neighbourhood of each city
combined_df.groupby('Neighbourhood').count()

Unnamed: 0_level_0,City,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Battery Park City,71,71,71,71,71,71,71
Berczy Park,58,58,58,58,58,58,58
"Brockton, Parkdale Village, Exhibition Place",23,23,23,23,23,23,23
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",17,17,17,17,17,17,17
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",14,14,14,14,14,14,14
...,...,...,...,...,...,...,...
Upper East Side,89,89,89,89,89,89,89
Upper West Side,81,81,81,81,81,81,81
Washington Heights,88,88,88,88,88,88,88
West Village,100,100,100,100,100,100,100


In [12]:
#Use One-hot encoding to identify which types of venues are present in each Neighbourhood
combined_onehot = pd.get_dummies(combined_df[['Venue Category']], prefix="", prefix_sep="")
combined_onehot['Neighbourhood'] = combined_df['Neighbourhood'] 
fixed_columns = [combined_onehot.columns[-1]] + list(combined_onehot.columns[:-1])
combined_onehot = combined_onehot[fixed_columns]
combined_onehot.head()

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,...,Video Store,Vietnamese Restaurant,Volleyball Court,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [13]:
#Find the mean of the number of venue types in each Neighbourhood
combined_grouped = combined_onehot.groupby('Neighbourhood').mean().reset_index()
combined_grouped

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,...,Video Store,Vietnamese Restaurant,Volleyball Court,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Battery Park City,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.00,0.0,0.0,0.0,0.014085,0.028169,0.00,0.014085,0.000000
1,Berczy Park,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.00,0.0,0.0,0.0,0.000000,0.000000,0.00,0.000000,0.000000
2,"Brockton, Parkdale Village, Exhibition Place",0.000000,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.00,0.0,0.0,0.0,0.000000,0.000000,0.00,0.000000,0.000000
3,"Business reply mail Processing Centre, South C...",0.000000,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.00,0.0,0.0,0.0,0.000000,0.000000,0.00,0.000000,0.058824
4,"CN Tower, King and Spadina, Railway Lands, Har...",0.000000,0.0,0.0,0.0,0.071429,0.071429,0.142857,0.142857,0.142857,...,0.00,0.00,0.0,0.0,0.0,0.000000,0.000000,0.00,0.000000,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
74,Upper East Side,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.00,0.0,0.0,0.0,0.000000,0.033708,0.00,0.011236,0.033708
75,Upper West Side,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.00,0.0,0.0,0.0,0.037037,0.012346,0.00,0.000000,0.012346
76,Washington Heights,0.011364,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.00,0.0,0.0,0.0,0.011364,0.022727,0.00,0.011364,0.000000
77,West Village,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.00,0.00,0.0,0.0,0.0,0.040000,0.010000,0.00,0.000000,0.000000


In [14]:
#Define a function to return most common venues from a dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

In [15]:
#Place top 10 venues in each Neighbourhood in a new dataframe:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))


neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = combined_grouped['Neighbourhood']

for ind in np.arange(combined_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(combined_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Park,Coffee Shop,Hotel,Gym,Memorial Site,Boat or Ferry,Food Court,Beer Garden,Mexican Restaurant,Plaza
1,Berczy Park,Coffee Shop,Cocktail Bar,Pub,Café,Cheese Shop,Restaurant,Seafood Restaurant,Beer Bar,Bakery,Fountain
2,"Brockton, Parkdale Village, Exhibition Place",Café,Breakfast Spot,Nightclub,Coffee Shop,Italian Restaurant,Bar,Bakery,Furniture / Home Store,Grocery Store,Gym
3,"Business reply mail Processing Centre, South C...",Light Rail Station,Yoga Studio,Restaurant,Farmers Market,Fast Food Restaurant,Pizza Place,Spa,Burrito Place,Butcher,Smoke Shop
4,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Terminal,Airport Lounge,Airport Service,Sculpture Garden,Airport,Airport Food Court,Harbor / Marina,Plane,Boutique,Bar


In [16]:
#Use K-Means to cluster Neighbourhoods into 8 Clusters, based on the types of venues in each Neighbourhood
#Using a lower value for k does not provide sufficient differentiation of neighbourhoods in Manhattan

kclusters = 8
combined_grouped_clustering = combined_grouped.drop('Neighbourhood', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(combined_grouped_clustering)
kmeans.labels_[0:10] 

array([4, 4, 4, 0, 0, 4, 4, 0, 0, 0], dtype=int32)

In [17]:
# add clustering labels and merge to dataframe without venue info (combined_data) data to add longitude and latitude for each Neighbourhood
neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
combined_merge = combined_data
combined_merge = combined_merge.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')
combined_merge.head()

Unnamed: 0,City,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,4,Sandwich Place,Gym,Coffee Shop,Yoga Studio,Miscellaneous Shop,Steakhouse,Shopping Mall,Supplement Shop,Seafood Restaurant,Donut Shop
26,Manhattan,Chinatown,40.715618,-73.994279,0,Chinese Restaurant,Bakery,Bubble Tea Shop,Cocktail Bar,Vietnamese Restaurant,American Restaurant,Spa,Coffee Shop,Ice Cream Shop,Optical Shop
126,Manhattan,Washington Heights,40.851903,-73.9369,0,Café,Bakery,Mobile Phone Shop,Grocery Store,Chinese Restaurant,Deli / Bodega,Coffee Shop,Supermarket,Supplement Shop,Sandwich Place
214,Manhattan,Inwood,40.867684,-73.92121,0,Mexican Restaurant,Lounge,Pizza Place,Restaurant,Café,Park,American Restaurant,Chinese Restaurant,Deli / Bodega,Frozen Yogurt Shop
272,Manhattan,Hamilton Heights,40.823604,-73.949688,0,Pizza Place,Café,Coffee Shop,Mexican Restaurant,Deli / Bodega,Yoga Studio,Bakery,School,Caribbean Restaurant,Sandwich Place


In [18]:
#Find the Cluster in which St. James Town, Toronto is located

df_lookup=combined_merge.set_index('Neighbourhood')
print('St. James Town is in Cluster Category ', df_lookup.loc['St. James Town', 'Cluster Labels'])

St. James Town is in Cluster Category  4


### Cluster 4 

Other Neighbourhoods in Cluster 4 are listed in the table below

In [19]:
combined_merge.loc[combined_merge['Cluster Labels'] == 4, combined_merge.columns[[0] + [1] + list(range(5, combined_merge.shape[1]))]]

Unnamed: 0,City,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,Sandwich Place,Gym,Coffee Shop,Yoga Studio,Miscellaneous Shop,Steakhouse,Shopping Mall,Supplement Shop,Seafood Restaurant,Donut Shop
1974,Manhattan,Morningside Heights,Park,Bookstore,Coffee Shop,American Restaurant,Café,Deli / Bodega,Sandwich Place,Burger Joint,Mediterranean Restaurant,Supermarket
2100,Manhattan,Battery Park City,Park,Coffee Shop,Hotel,Gym,Memorial Site,Boat or Ferry,Food Court,Beer Garden,Mexican Restaurant,Plaza
2171,Manhattan,Financial District,Coffee Shop,American Restaurant,Pizza Place,Hotel,Café,Steakhouse,Falafel Restaurant,Cocktail Bar,Sandwich Place,Juice Bar
2271,Manhattan,Carnegie Hill,Coffee Shop,Pizza Place,Café,Yoga Studio,Wine Shop,Japanese Restaurant,Bar,Bookstore,Gym,French Restaurant
114,Toronto,Davisville North,Hotel,Breakfast Spot,Food & Drink Shop,Park,Sandwich Place,Gym,Department Store,Egyptian Restaurant,Donut Shop,Drugstore
121,Toronto,"North Toronto West, Lawrence Park",Clothing Store,Coffee Shop,Yoga Studio,Grocery Store,Spa,Fast Food Restaurant,Mexican Restaurant,Metro Station,Café,Chinese Restaurant
179,Toronto,"Summerhill West, Rathnelly, South Hill, Forest...",Bank,Coffee Shop,Pub,American Restaurant,Bagel Shop,Light Rail Station,Liquor Store,Fried Chicken Joint,Restaurant,Sushi Restaurant
242,Toronto,Church and Wellesley,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Gay Bar,Restaurant,Yoga Studio,Bubble Tea Shop,Burger Joint,Café,Mediterranean Restaurant
319,Toronto,"Regent Park, Harbourfront",Coffee Shop,Pub,Bakery,Park,Theater,Breakfast Spot,Café,Yoga Studio,French Restaurant,Restaurant


## 

### Results

4 possible neighbourhoods in Manhattan have been identified as being in the same cluster as St. James Town. These are:
- Marble Hill
- Morningside Heights
- Battery Park City
- Financial District
- Carnegie Hill

The clusters are identified on the two maps below

In [25]:
#Set up Nominatim on Toronto, Canada 
address = 'Toronto, Canada'
geolocator = Nominatim(user_agent="city_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

In [26]:
%matplotlib inline
print('Map of Clusters in Toronto')

#Create Map of Clusters
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

#Set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

#Add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(combined_merge['Neighbourhood Latitude'], combined_merge['Neighbourhood Longitude'], combined_merge['Neighbourhood'], combined_merge['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

map_clusters

Map of Clusters in Toronto


In [27]:
#Set up Nominatim on Manhattan 
address = 'Manhattan, NY'
geolocator = Nominatim(user_agent="city_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

In [28]:
print('Map of Clusters in Manhattan')

#Create Map of Clusters
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

#Set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

#Add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(combined_merge['Neighbourhood Latitude'], combined_merge['Neighbourhood Longitude'], combined_merge['Neighbourhood'], combined_merge['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
map_clusters

Map of Clusters in Manhattan


### 

### Discussion

The above clustering of neighbourhoods in both Toronto and Manhattan successfully segments the neighbourhoods according to the mix of top 10 venues within a 500m radius of each neighbourhood. It is notable however, that several of the clusters contain multiple neighbourhoods and a lower value of k would result in too few clusters to make a meaningful recommendation.   

### 

### Conclusion

K-means clustering is an appropriate method of segmenting geographical neighbourhoods according to some distinct feature of each neighbourhood. In this example we have been able to make a shortlist of 4 neighbourhoods in Manhattan which are similar to John Smith's current location of St. James Town, Toronto. Thus, John is able to confidently limit his search for a new home to the four areas listed in the Results Section. 