# Business Problem

Cities within countries can sometimes tend to show similarities if they are near to each other. Even then there can be factors like geography, community, etc. which make affect the overall functioning of a city. Manhattan in USA and Toronto located in Canada are very big cities but situated miles apparat. 

With this project, we are trying to establish the relations, similarities, dissimilarities between the venues located in both of these cities. Due to the amount of data available, the task cannot be done manually. A solution has to be found using machine learning algorithm to get the best result possible.

## Approach

We will run clustering on both of these cities on various venues available in the neighborhoods. Top 10 venues will be selected for clustering for each neighborhood. 
We will synthesis features based on the data available from FiveSquare

## Code Repo : Clustering Toronto Neighborhoods

import libraries

In [1]:
import pandas as pd
import numpy as np

Fetch the data from url to dataframe

In [2]:
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

dataset=pd.read_html(url, header=0)[0]

dataset.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


Display shape of the dataset

In [3]:
dataset.shape

(288, 3)

## Data PreProcessing

In [4]:
dataset['Postcode'][0].index
dataset["Latitude"] = np.nan
dataset["Longitude"] = np.nan

In [5]:
geodata = pd.read_csv('https://cocl.us/Geospatial_data')
geodata.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [6]:
i = 0
for postalcode in dataset['Postcode']:
    j = 0
    for pcode in geodata['Postal Code']:
        if postalcode == pcode :
            dataset['Latitude'][i] = geodata['Latitude'][j]
            dataset['Longitude'][i] = geodata['Longitude'][j]
        j = j + 1
    i = i + 1

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  import sys


In [7]:
dataset = dataset.dropna()
dataset = dataset.reset_index(drop=True)
dataset.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
3,M5A,Downtown Toronto,Regent Park,43.65426,-79.360636
4,M6A,North York,Lawrence Heights,43.718518,-79.464763


In [8]:
toronto_data = dataset[dataset['Borough'].str.contains("Toronto")]
toronto_data.drop(columns=['Postcode'], inplace=True)
toronto_data.reset_index(inplace=True, drop=True)
toronto_data.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude
0,Downtown Toronto,Harbourfront,43.65426,-79.360636
1,Downtown Toronto,Regent Park,43.65426,-79.360636
2,Downtown Toronto,Ryerson,43.657162,-79.378937
3,Downtown Toronto,Garden District,43.657162,-79.378937
4,Downtown Toronto,St. James Town,43.651494,-79.375418


In [9]:
from geopy.geocoders import Nominatim
address = 'Toronto'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


## Display the Neighborhood before clustering

In [10]:
import folium
# create map of New York using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Borough'], toronto_data['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Explore  Toronto Neighbourhood

In [11]:
CLIENT_ID = 'L5GQQYIOZ2FY3KII3YW1QRKLFOLCR3SVJQ5P5B1HI4UMDISI' # your Foursquare ID
CLIENT_SECRET = 'CNVTYOSDSD1M4H5FHJ2YNOIPUWDHAAKA3UWZ2BQ2BWXMIAL3' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: L5GQQYIOZ2FY3KII3YW1QRKLFOLCR3SVJQ5P5B1HI4UMDISI
CLIENT_SECRET:CNVTYOSDSD1M4H5FHJ2YNOIPUWDHAAKA3UWZ2BQ2BWXMIAL3


## Get Nearby Top 10 Venues

In [12]:
import requests
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    LIMIT = 10
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [13]:
toronto_venues = getNearbyVenues(names=toronto_data['Neighbourhood'],
                                   latitudes=toronto_data['Latitude'],
                                   longitudes=toronto_data['Longitude']
                                  )

Harbourfront
Regent Park
Ryerson
Garden District
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Adelaide
King
Richmond
Dovercourt Village
Dufferin
Harbourfront East
Toronto Islands
Union Station
Little Portugal
Trinity
The Danforth West
Riverdale
Design Exchange
Toronto Dominion Centre
Brockton
Exhibition Place
Parkdale Village
The Beaches West
India Bazaar
Commerce Court
Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North
Forest Hill West
High Park
The Junction South
North Toronto West
The Annex
North Midtown
Yorkville
Parkdale
Roncesvalles
Davisville
Harbord
University of Toronto
Runnymede
Swansea
Moore Park
Summerhill East
Chinatown
Grange Park
Kensington Market
Deer Park
Forest Hill SE
Rathnelly
South Hill
Summerhill West
CN Tower
Bathurst Quay
Island airport
Harbourfront West
King and Spadina
Railway Lands
South Niagara
Rosedale
Stn A PO Boxes 25 The Esplanade
Cabbagetown
St. James Town
First Canadian Place
Underground city


In [14]:
print(toronto_venues.shape)
toronto_venues.head()

(687, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Harbourfront,43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,Harbourfront,43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Harbourfront,43.65426,-79.360636,Toronto Cooper Koo Family Cherry St YMCA Centre,43.653191,-79.357947,Gym / Fitness Center
3,Harbourfront,43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,Harbourfront,43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant


In [15]:
toronto_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adelaide,10,10,10,10,10,10
Bathurst Quay,10,10,10,10,10,10
Berczy Park,10,10,10,10,10,10
Brockton,10,10,10,10,10,10
Business Reply Mail Processing Centre 969 Eastern,10,10,10,10,10,10
CN Tower,10,10,10,10,10,10
Cabbagetown,10,10,10,10,10,10
Central Bay Street,10,10,10,10,10,10
Chinatown,10,10,10,10,10,10
Christie,10,10,10,10,10,10


## Feature Engineering

In [16]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 116 uniques categories.


In [17]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighbourhood,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Terminal,American Restaurant,Arts & Crafts Store,Asian Restaurant,Auto Workshop,...,Swim School,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [18]:
toronto_onehot.shape

(687, 117)

In [19]:
toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighbourhood,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Terminal,American Restaurant,Arts & Crafts Store,Asian Restaurant,Auto Workshop,...,Swim School,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,Adelaide,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,...,0.0,0.0,0.0,0.0,0.0,0.00,0.1,0.0,0.0,0.0
1,Bathurst Quay,0.1,0.1,0.1,0.2,0.1,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0
2,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.1,0.0,0.0,0.00,0.1,0.0,0.0,0.0
3,Brockton,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0
4,Business Reply Mail Processing Centre 969 Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,...,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0
5,CN Tower,0.1,0.1,0.1,0.2,0.1,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0
6,Cabbagetown,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0
7,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0
8,Chinatown,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.1,0.0,0.0
9,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0


In [20]:
toronto_grouped.shape

(73, 117)

In [21]:
num_top_venues = 5

for hood in toronto_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide----
                venue  freq
0               Plaza   0.1
1        Concert Hall   0.1
2  Seafood Restaurant   0.1
3          Steakhouse   0.1
4         Opera House   0.1


----Bathurst Quay----
                venue  freq
0      Airport Lounge   0.2
1             Airport   0.1
2            Boutique   0.1
3  Airport Food Court   0.1
4               Plane   0.1


----Berczy Park----
             venue  freq
0     Concert Hall   0.1
1  Thai Restaurant   0.1
2             Park   0.1
3   Farmers Market   0.1
4           Museum   0.1


----Brockton----
            venue  freq
0     Coffee Shop   0.2
1             Gym   0.1
2             Bar   0.1
3            Café   0.1
4  Breakfast Spot   0.1


----Business Reply Mail Processing Centre 969 Eastern----
            venue  freq
0         Brewery   0.1
1  Farmers Market   0.1
2     Pizza Place   0.1
3      Skate Park   0.1
4   Garden Center   0.1


----CN Tower----
                venue  freq
0      Airport Lounge   0.2
1        

                         venue  freq
0                    Gift Shop   0.2
1             Cuban Restaurant   0.1
2           Italian Restaurant   0.1
3  Eastern European Restaurant   0.1
4                   Restaurant   0.1


----Rosedale----
                venue  freq
0                Park  0.50
1          Playground  0.25
2               Trail  0.25
3             Airport  0.00
4  Mexican Restaurant  0.00


----Roselawn----
                venue  freq
0              Garden   1.0
1             Airport   0.0
2  Mexican Restaurant   0.0
3            Pharmacy   0.0
4           Pet Store   0.0


----Runnymede----
               venue  freq
0   Sushi Restaurant   0.2
1           Tea Room   0.1
2        Coffee Shop   0.1
3               Café   0.1
4  Fish & Chips Shop   0.1


----Ryerson----
              venue  freq
0       Pizza Place   0.1
1           Theater   0.1
2              Café   0.1
3    Clothing Store   0.1
4  Ramen Restaurant   0.1


----South Hill----
              venue  freq
0

In [22]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

## Generate Training Set for Clustering

In [23]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adelaide,Opera House,Asian Restaurant,Speakeasy,Steakhouse,Hotel,Plaza,Seafood Restaurant,Concert Hall,Greek Restaurant,Vegetarian / Vegan Restaurant
1,Bathurst Quay,Airport Lounge,Airport,Boutique,Harbor / Marina,Plane,Coffee Shop,Airport Food Court,Airport Gate,Airport Terminal,American Restaurant
2,Berczy Park,Beer Bar,Concert Hall,Vegetarian / Vegan Restaurant,Park,Museum,Thai Restaurant,French Restaurant,Farmers Market,Liquor Store,Steakhouse
3,Brockton,Coffee Shop,Breakfast Spot,Furniture / Home Store,Caribbean Restaurant,Café,Bar,Italian Restaurant,Pet Store,Gym,French Restaurant
4,Business Reply Mail Processing Centre 969 Eastern,Pizza Place,Auto Workshop,Restaurant,Burrito Place,Farmers Market,Skate Park,Brewery,Fast Food Restaurant,Comic Shop,Garden Center


# Clustering Toronto Neighborhood

In [24]:
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 3, 2, 1, 2, 3, 1, 1, 1, 1])

In [25]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,Harbourfront,43.65426,-79.360636,1,Breakfast Spot,Spa,Historic Site,Coffee Shop,Pub,Restaurant,Bakery,Park,Gym / Fitness Center,French Restaurant
1,Downtown Toronto,Regent Park,43.65426,-79.360636,1,Breakfast Spot,Spa,Historic Site,Coffee Shop,Pub,Restaurant,Bakery,Park,Gym / Fitness Center,French Restaurant
2,Downtown Toronto,Ryerson,43.657162,-79.378937,2,Comic Shop,Thai Restaurant,Burrito Place,Ramen Restaurant,Plaza,Café,Tea Room,Pizza Place,Theater,Clothing Store
3,Downtown Toronto,Garden District,43.657162,-79.378937,2,Comic Shop,Thai Restaurant,Burrito Place,Ramen Restaurant,Plaza,Café,Tea Room,Pizza Place,Theater,Clothing Store
4,Downtown Toronto,St. James Town,43.651494,-79.375418,1,Café,Restaurant,Japanese Restaurant,Italian Restaurant,Coffee Shop,Food Truck,Creperie,Indian Restaurant,Bakery,Jewelry Store


## Display the Clustered Neighborhood

In [26]:
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Segregate Each Cluster into Different DataFrame

In [27]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
49,Moore Park,Restaurant,Gym,Trail,Summer Camp,Yoga Studio,Farmers Market,Eastern European Restaurant,Dog Run,Diner,Dance Studio
50,Summerhill East,Restaurant,Gym,Trail,Summer Camp,Yoga Studio,Farmers Market,Eastern European Restaurant,Dog Run,Diner,Dance Studio


In [28]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Harbourfront,Breakfast Spot,Spa,Historic Site,Coffee Shop,Pub,Restaurant,Bakery,Park,Gym / Fitness Center,French Restaurant
1,Regent Park,Breakfast Spot,Spa,Historic Site,Coffee Shop,Pub,Restaurant,Bakery,Park,Gym / Fitness Center,French Restaurant
4,St. James Town,Café,Restaurant,Japanese Restaurant,Italian Restaurant,Coffee Shop,Food Truck,Creperie,Indian Restaurant,Bakery,Jewelry Store
7,Central Bay Street,Coffee Shop,Sushi Restaurant,Italian Restaurant,Modern European Restaurant,Ramen Restaurant,Spa,Pizza Place,Gastropub,Cosmetics Shop,Creperie
8,Christie,Café,Grocery Store,Italian Restaurant,Diner,Candy Store,Restaurant,Coffee Shop,Arts & Crafts Store,Asian Restaurant,Cuban Restaurant
12,Dovercourt Village,Bakery,Pharmacy,Brewery,Bar,Supermarket,Café,Middle Eastern Restaurant,Music Venue,Gym / Fitness Center,Dessert Shop
13,Dufferin,Bakery,Pharmacy,Brewery,Bar,Supermarket,Café,Middle Eastern Restaurant,Music Venue,Gym / Fitness Center,Dessert Shop
21,Design Exchange,Coffee Shop,Pub,Gastropub,Gym,Gym / Fitness Center,Café,Restaurant,Hotel,Beer Bar,Dog Run
22,Toronto Dominion Centre,Coffee Shop,Pub,Gastropub,Gym,Gym / Fitness Center,Café,Restaurant,Hotel,Beer Bar,Dog Run
23,Brockton,Coffee Shop,Breakfast Spot,Furniture / Home Store,Caribbean Restaurant,Café,Bar,Italian Restaurant,Pet Store,Gym,French Restaurant


In [29]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Ryerson,Comic Shop,Thai Restaurant,Burrito Place,Ramen Restaurant,Plaza,Café,Tea Room,Pizza Place,Theater,Clothing Store
3,Garden District,Comic Shop,Thai Restaurant,Burrito Place,Ramen Restaurant,Plaza,Café,Tea Room,Pizza Place,Theater,Clothing Store
5,The Beaches,Trail,Neighborhood,Health Food Store,Pub,Yoga Studio,Fast Food Restaurant,Concert Hall,Cosmetics Shop,Creperie,Cuban Restaurant
6,Berczy Park,Beer Bar,Concert Hall,Vegetarian / Vegan Restaurant,Park,Museum,Thai Restaurant,French Restaurant,Farmers Market,Liquor Store,Steakhouse
9,Adelaide,Opera House,Asian Restaurant,Speakeasy,Steakhouse,Hotel,Plaza,Seafood Restaurant,Concert Hall,Greek Restaurant,Vegetarian / Vegan Restaurant
10,King,Opera House,Asian Restaurant,Speakeasy,Steakhouse,Hotel,Plaza,Seafood Restaurant,Concert Hall,Greek Restaurant,Vegetarian / Vegan Restaurant
11,Richmond,Opera House,Asian Restaurant,Speakeasy,Steakhouse,Hotel,Plaza,Seafood Restaurant,Concert Hall,Greek Restaurant,Vegetarian / Vegan Restaurant
14,Harbourfront East,Park,Performing Arts Venue,Dessert Shop,Neighborhood,Sporting Goods Shop,Salad Place,Plaza,Lake,Hotel,Supermarket
15,Toronto Islands,Park,Performing Arts Venue,Dessert Shop,Neighborhood,Sporting Goods Shop,Salad Place,Plaza,Lake,Hotel,Supermarket
16,Union Station,Park,Performing Arts Venue,Dessert Shop,Neighborhood,Sporting Goods Shop,Salad Place,Plaza,Lake,Hotel,Supermarket


In [30]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
59,CN Tower,Airport Lounge,Airport,Boutique,Harbor / Marina,Plane,Coffee Shop,Airport Food Court,Airport Gate,Airport Terminal,American Restaurant
60,Bathurst Quay,Airport Lounge,Airport,Boutique,Harbor / Marina,Plane,Coffee Shop,Airport Food Court,Airport Gate,Airport Terminal,American Restaurant
61,Island airport,Airport Lounge,Airport,Boutique,Harbor / Marina,Plane,Coffee Shop,Airport Food Court,Airport Gate,Airport Terminal,American Restaurant
62,Harbourfront West,Airport Lounge,Airport,Boutique,Harbor / Marina,Plane,Coffee Shop,Airport Food Court,Airport Gate,Airport Terminal,American Restaurant
63,King and Spadina,Airport Lounge,Airport,Boutique,Harbor / Marina,Plane,Coffee Shop,Airport Food Court,Airport Gate,Airport Terminal,American Restaurant
64,Railway Lands,Airport Lounge,Airport,Boutique,Harbor / Marina,Plane,Coffee Shop,Airport Food Court,Airport Gate,Airport Terminal,American Restaurant
65,South Niagara,Airport Lounge,Airport,Boutique,Harbor / Marina,Plane,Coffee Shop,Airport Food Court,Airport Gate,Airport Terminal,American Restaurant


In [31]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,Roselawn,Garden,Yoga Studio,Fish Market,Concert Hall,Cosmetics Shop,Creperie,Cuban Restaurant,Dance Studio,Dessert Shop,Diner


## Code Repo : Clustering Manhattan Neighborhoods

#### Importing required datasets

In [32]:
import json
import pandas as pd
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [33]:
neighborhoods_data = newyork_data['features']

In [34]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [35]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [37]:
from geopy.geocoders import Nominatim
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


## Display New York Neighborhoods (Extra)

In [38]:
import folium

# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

## Generate Manhatten Related Neighborhoods from New York Data

In [39]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


In [40]:
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7896239, -73.9598939.


## Display Manhattan Neighborhood Before Clustering

In [41]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

## Exploring Manhattan Neighborhood

In [42]:
CLIENT_ID = 'L5GQQYIOZ2FY3KII3YW1QRKLFOLCR3SVJQ5P5B1HI4UMDISI' # your Foursquare ID
CLIENT_SECRET = 'CNVTYOSDSD1M4H5FHJ2YNOIPUWDHAAKA3UWZ2BQ2BWXMIAL3' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: L5GQQYIOZ2FY3KII3YW1QRKLFOLCR3SVJQ5P5B1HI4UMDISI
CLIENT_SECRET:CNVTYOSDSD1M4H5FHJ2YNOIPUWDHAAKA3UWZ2BQ2BWXMIAL3


In [43]:
neighborhood_latitude = manhattan_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = manhattan_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = manhattan_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Marble Hill are 40.87655077879964, -73.91065965862981.


## Retrieve Top 10 Venue Details from each Neighborhood

In [44]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        LIMIT = 10
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [45]:
import requests
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


In [46]:
print(manhattan_venues.shape)
manhattan_venues.head()

(400, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
1,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio
2,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
3,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop
4,Marble Hill,40.876551,-73.91066,Blink Fitness Riverdale,40.877147,-73.905837,Gym


In [47]:
manhattan_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Battery Park City,10,10,10,10,10,10
Carnegie Hill,10,10,10,10,10,10
Central Harlem,10,10,10,10,10,10
Chelsea,10,10,10,10,10,10
Chinatown,10,10,10,10,10,10
Civic Center,10,10,10,10,10,10
Clinton,10,10,10,10,10,10
East Harlem,10,10,10,10,10,10
East Village,10,10,10,10,10,10
Financial District,10,10,10,10,10,10


In [48]:
print('There are {} uniques categories.'.format(len(manhattan_venues['Venue Category'].unique())))

There are 153 uniques categories.


## Feature Engineering

In [52]:
# one hot encoding
manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = manhattan_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

print (manhattan_onehot.shape)
manhattan_onehot.head()

(400, 154)


Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,African Restaurant,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,...,Tourist Information Center,Trail,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Waterfront,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [53]:
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()
manhattan_grouped

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,African Restaurant,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,...,Tourist Information Center,Trail,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Waterfront,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Battery Park City,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Carnegie Hill,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.0
2,Central Harlem,0.0,0.0,0.1,0.1,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Chelsea,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,...,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0
4,Chinatown,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Civic Center,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Clinton,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,East Harlem,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,East Village,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Financial District,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [54]:
num_top_venues = 5

for hood in manhattan_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = manhattan_grouped[manhattan_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Battery Park City----
            venue  freq
0            Park   0.2
1      Food Court   0.2
2           Plaza   0.1
3  Cooking School   0.1
4      Smoke Shop   0.1


----Carnegie Hill----
                  venue  freq
0  Gym / Fitness Center   0.2
1          Gourmet Shop   0.1
2           Pizza Place   0.1
3      Community Center   0.1
4             Bookstore   0.1


----Central Harlem----
               venue  freq
0       Cycle Studio   0.1
1                Bar   0.1
2          Jazz Club   0.1
3        Music Venue   0.1
4  French Restaurant   0.1


----Chelsea----
                           venue  freq
0                   Cupcake Shop   0.1
1  Vegetarian / Vegan Restaurant   0.1
2                      Speakeasy   0.1
3                      Nightclub   0.1
4                       Beer Bar   0.1


----Chinatown----
                venue  freq
0               Hotel   0.1
1   Korean Restaurant   0.1
2  Chinese Restaurant   0.1
3         Pizza Place   0.1
4              Museum   0.1

In [55]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

## Generate Training Data for Clustering

In [56]:
import numpy as np
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Park,Food Court,Cooking School,BBQ Joint,Shopping Mall,Smoke Shop,Gym,Plaza,Gym / Fitness Center,Hotel
1,Carnegie Hill,Gym / Fitness Center,Community Center,Bookstore,Gourmet Shop,Gym,Dance Studio,Pizza Place,Wine Bar,Wine Shop,Cycle Studio
2,Central Harlem,Bar,Ethiopian Restaurant,African Restaurant,American Restaurant,Jazz Club,Music Venue,Cycle Studio,Library,French Restaurant,Beer Bar
3,Chelsea,Indian Restaurant,Asian Restaurant,Nightclub,Café,Speakeasy,Beer Bar,Theater,Cupcake Shop,Vegetarian / Vegan Restaurant,Ice Cream Shop
4,Chinatown,Cocktail Bar,Sandwich Place,Spa,Pizza Place,Hotel,Chinese Restaurant,Korean Restaurant,Bakery,Museum,Greek Restaurant


## Clustering Manhattan Neighborhood

In [57]:
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 5

manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 1, 0, 0, 4, 1, 1, 0, 1, 1])

In [58]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

manhattan_merged = manhattan_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

manhattan_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,1,Coffee Shop,Yoga Studio,Gym,Department Store,Diner,Seafood Restaurant,Donut Shop,Tennis Stadium,Pizza Place,Gym / Fitness Center
1,Manhattan,Chinatown,40.715618,-73.994279,4,Cocktail Bar,Sandwich Place,Spa,Pizza Place,Hotel,Chinese Restaurant,Korean Restaurant,Bakery,Museum,Greek Restaurant
2,Manhattan,Washington Heights,40.851903,-73.9369,0,Café,Italian Restaurant,Restaurant,Market,Park,Bakery,Ramen Restaurant,Deli / Bodega,Burger Joint,Dessert Shop
3,Manhattan,Inwood,40.867684,-73.92121,0,Pet Store,Wine Shop,Wine Bar,Bistro,Deli / Bodega,Café,Park,Mexican Restaurant,Farmers Market,Bakery
4,Manhattan,Hamilton Heights,40.823604,-73.949688,4,Yoga Studio,Cocktail Bar,Wine Bar,Italian Restaurant,Café,Caribbean Restaurant,Mexican Restaurant,Bakery,Ethiopian Restaurant,Food Court


## Display Manhattan Neighborhood after Clustering

In [59]:
import matplotlib.cm as cm
import matplotlib.colors as colors
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Segeregate Each Cluster into different DataFrames for Analysis

In [60]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Washington Heights,Café,Italian Restaurant,Restaurant,Market,Park,Bakery,Ramen Restaurant,Deli / Bodega,Burger Joint,Dessert Shop
3,Inwood,Pet Store,Wine Shop,Wine Bar,Bistro,Deli / Bodega,Café,Park,Mexican Restaurant,Farmers Market,Bakery
5,Manhattanville,Italian Restaurant,Japanese Curry Restaurant,Café,Juice Bar,Ramen Restaurant,Park,BBQ Joint,Bike Trail,Supermarket,Climbing Gym
6,Central Harlem,Bar,Ethiopian Restaurant,African Restaurant,American Restaurant,Jazz Club,Music Venue,Cycle Studio,Library,French Restaurant,Beer Bar
7,East Harlem,Mexican Restaurant,Thai Restaurant,Donut Shop,French Restaurant,Park,Steakhouse,Beer Bar,Sandwich Place,Filipino Restaurant,Farmers Market
9,Yorkville,Wine Shop,Bagel Shop,Sandwich Place,Park,Coffee Shop,Asian Restaurant,Diner,Gym,Liquor Store,Filipino Restaurant
10,Lenox Hill,Pizza Place,College Academic Building,Salad Place,Taco Place,Liquor Store,Thai Restaurant,Restaurant,Smoke Shop,Health Food Store,Women's Store
11,Roosevelt Island,Greek Restaurant,Waterfront,Park,Residential Building (Apartment / Condo),Coffee Shop,Outdoors & Recreation,Farmers Market,School,Liquor Store,Sandwich Place
15,Midtown,Salad Place,Park,Smoke Shop,Sporting Goods Shop,French Restaurant,Chinese Restaurant,Gym,Spa,Hotel,Plaza
17,Chelsea,Indian Restaurant,Asian Restaurant,Nightclub,Café,Speakeasy,Beer Bar,Theater,Cupcake Shop,Vegetarian / Vegan Restaurant,Ice Cream Shop


In [61]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Marble Hill,Coffee Shop,Yoga Studio,Gym,Department Store,Diner,Seafood Restaurant,Donut Shop,Tennis Stadium,Pizza Place,Gym / Fitness Center
14,Clinton,Theater,Gym / Fitness Center,Building,Café,Peruvian Restaurant,Comedy Club,Mediterranean Restaurant,Donut Shop,Food Court,Filipino Restaurant
19,East Village,Japanese Restaurant,Korean Restaurant,Beer Store,Scandinavian Restaurant,Dog Run,Bagel Shop,Coffee Shop,American Restaurant,Pet Café,Moroccan Restaurant
27,Gramercy,Coffee Shop,Yoga Studio,Gourmet Shop,Comedy Club,Spa,Food Truck,Playground,Beer Bar,Pizza Place,Gym / Fitness Center
29,Financial District,Gym / Fitness Center,Gym,Pizza Place,New American Restaurant,American Restaurant,Jewelry Store,Monument / Landmark,Coffee Shop,Doctor's Office,Dog Run
30,Carnegie Hill,Gym / Fitness Center,Community Center,Bookstore,Gourmet Shop,Gym,Dance Studio,Pizza Place,Wine Bar,Wine Shop,Cycle Studio
32,Civic Center,Spa,Dance Studio,French Restaurant,Cuban Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Gym / Fitness Center,Gym,General Entertainment,Falafel Restaurant
39,Hudson Yards,Gym / Fitness Center,Music School,American Restaurant,Art Gallery,Furniture / Home Store,Park,Asian Restaurant,Residential Building (Apartment / Condo),Theater,Supermarket


In [62]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Lincoln Square,Performing Arts Venue,Indie Movie Theater,Theater,Concert Hall,Library,Opera House,Dog Run,Filipino Restaurant,Farmers Market,Falafel Restaurant


In [63]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 3, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
33,Midtown South,Korean Restaurant,Lingerie Store,Cosmetics Shop,Dessert Shop,Boutique,Italian Restaurant,Hotel,Food Truck,Cycle Studio,Dance Studio


In [64]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 4, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Chinatown,Cocktail Bar,Sandwich Place,Spa,Pizza Place,Hotel,Chinese Restaurant,Korean Restaurant,Bakery,Museum,Greek Restaurant
4,Hamilton Heights,Yoga Studio,Cocktail Bar,Wine Bar,Italian Restaurant,Café,Caribbean Restaurant,Mexican Restaurant,Bakery,Ethiopian Restaurant,Food Court
8,Upper East Side,Boutique,Vegetarian / Vegan Restaurant,Burrito Place,Chocolate Shop,Bakery,Italian Restaurant,Spa,Hotel,Hotel Bar,Cosmetics Shop
12,Upper West Side,Movie Theater,Greek Restaurant,Southern / Soul Food Restaurant,Italian Restaurant,American Restaurant,Juice Bar,Chinese Restaurant,Bookstore,Bagel Shop,Gift Shop
16,Murray Hill,Japanese Restaurant,Hawaiian Restaurant,Shanghai Restaurant,Tea Room,Cocktail Bar,Coffee Shop,Ramen Restaurant,Restaurant,Vegetarian / Vegan Restaurant,Hotel
24,West Village,Italian Restaurant,Accessories Store,Mediterranean Restaurant,Coffee Shop,Chinese Restaurant,New American Restaurant,Boutique,Gourmet Shop,Speakeasy,Gym / Fitness Center
31,Noho,Cocktail Bar,Ice Cream Shop,Italian Restaurant,Hotel,Boutique,Deli / Bodega,Rock Club,Southern / Soul Food Restaurant,Gourmet Shop,Sandwich Place


## Conclusion Made By Comparing Clusters of Two Cities

    -Toronto
        1.More number of coffee shops are present . People tend to drink more coffeee
        2.People prefer entertainment venues such as comic book store, parks
        3.Ice cream shops are more in some areas
    -Manhattan
        1.A CLuster that shows domination on pubs are present.
        2.People prefer entertainment venues such as pubs, parks
        3.Higher number of gym/fitness centres are present