# Business Problem

Suppose we are shifting from one city to another and want to shift to a neighborhood similar to which we are residing or as tourists we want to compare two cities and visit the city which has popular places according to our tastes. So in this project we will be comparing the places and selecting the best among them. This will be particularly helpful for tourism industry and people who are shfting due to various reasons.


# Data Section

In this analysis we will be using geospatial data of New york and Toronto city along with their neighborhoods and boroughs. In particular Mahattan from New york and Downtown toronto from Toronto will be narrowed for comparison of neighborhoods. Example of the sample data is provided below.

             Borough    Neighborhood       Latitude    Longitude

      35    Manhattan    Turtle Bay        40.752042    -73.967708

      36    Manhattan    Tudor City        40.746917    -73.971219

      37    Manhattan    Stuyvesant Town   40.731000    -73.974052

      38    Manhattan    Flatiron          40.739673    -73.990947

      39    Manhattan    Hudson Yards      40.756658    -74.000111

Next from the Foursquare API we will be getting the top locations for a particular neighborhood. We will be using clustering by Kmeans to cluster similar venues and folium to visualize the data on maps. 

In [1]:
# Necessary libraries to be used for the project

import pandas as pd
import numpy as np

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          97 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-2.0.0-pyh9f0ad1d_0



Downloading and Extracting Packages
geopy-2.0.0          | 63 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ################################

In [2]:
pip install lxml

Collecting lxml
[?25l  Downloading https://files.pythonhosted.org/packages/79/37/d420b7fdc9a550bd29b8cfeacff3b38502d9600b09d7dfae9a69e623b891/lxml-4.5.2-cp36-cp36m-manylinux1_x86_64.whl (5.5MB)
[K     |████████████████████████████████| 5.5MB 8.3MB/s eta 0:00:01     |███████████████████████▌        | 4.1MB 8.3MB/s eta 0:00:01
[?25hInstalling collected packages: lxml
Successfully installed lxml-4.5.2
Note: you may need to restart the kernel to use updated packages.


# Methodology

Toronto city data extraction from wikipidea page

In [3]:
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [4]:
# Data cleaning 
#Drop a row if it contains a certain value
df1= df[df['Borough'] != 'Not assigned']
df1.head().reset_index(drop=True)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [5]:
#Since borough column does not have numerical value it will not change due to use of sum
Tdf = df1.groupby('Postal Code').agg(Borough=('Borough','sum'),Neighborhood=('Neighborhood', ', '.join)).reset_index()

# If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough
Tdf['Neighborhood'].replace("Not assigned", Tdf["Borough"],inplace=True)
Tdf.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [6]:
ll=pd.read_csv('Geospatial_Coordinates.csv')
ll.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [7]:
lldf=pd.merge(Tdf,ll,on=['Postal Code'], how='inner')


In [8]:
#Filtering downtown toronto 
Downtown_t = lldf[lldf['Borough']=='Downtown Toronto'].reset_index(drop=True)
Downtown_t.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529
1,M4X,Downtown Toronto,"St. James Town, Cabbagetown",43.667967,-79.367675
2,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316
3,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
4,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


New york city data 

In [9]:
ndf=pd.read_csv('New_York_data')


In [10]:
# Creating new Dataframe manhattan_data
manhattan_data = ndf[ndf['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0.1,Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,6,Manhattan,Marble Hill,40.876551,-73.91066
1,100,Manhattan,Chinatown,40.715618,-73.994279
2,101,Manhattan,Washington Heights,40.851903,-73.9369
3,102,Manhattan,Inwood,40.867684,-73.92121
4,103,Manhattan,Hamilton Heights,40.823604,-73.949688


# Foursquare API

In [11]:
CLIENT_ID = 'BLX0GALF5NO0MUY0X0BKRFRLOWH1TJQM45WLTV3OL4MZ25TR' # your Foursquare ID
CLIENT_SECRET = 'ZURXSS2P3O5SRCVBRXGE1HRZXEN5XH54Q4NN5KZRGFD345BE' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
limit = 20
print('Your credentails:')
print('CLIENT_ID: '+ CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: BLX0GALF5NO0MUY0X0BKRFRLOWH1TJQM45WLTV3OL4MZ25TR
CLIENT_SECRET:ZURXSS2P3O5SRCVBRXGE1HRZXEN5XH54Q4NN5KZRGFD345BE


In [12]:
# get the geographical coordinates of Downtown Toronto
latitude_downtown_toronto = 43.655115
longitude_downtown_toronto = -79.380219

#The geograpical coordinate of Manhattan are
latitude = 40.7900869 
longitude =-73.9598295


# Visualization

Downtown toronto

In [13]:
#create map of Downtown Toronto using latitude and longitude values
map_downtown_toronto = folium.Map(location=[latitude_downtown_toronto,longitude_downtown_toronto], zoom_start=12)

# add markers to map
for lat, lng, label in zip(Downtown_t['Latitude'], Downtown_t['Longitude'], Downtown_t['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_downtown_toronto)  
    
map_downtown_toronto


In [49]:
from folium import plugins
# create map of Downtown Toronto using latitude and longitude values
map_downtown_toronto = folium.Map(location=[latitude_downtown_toronto,longitude_downtown_toronto], zoom_start=12)
# instantiate a mark cluster object for the incidents in the dataframe
incidents = plugins.MarkerCluster().add_to(map_downtown_toronto)
# add markers to map
for lat, lng, label in zip(Downtown_t['Latitude'], Downtown_t['Longitude'], Downtown_t['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(incidents)  
    
map_downtown_toronto


Manhattan

In [50]:
#let's visualizat Manhattan the neighborhoods in it.
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan


In [51]:
#create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

grouping = plugins.MarkerCluster().add_to(map_manhattan)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(grouping)  
    
map_manhattan

Exploring Neighborhoods in Downtown Toronto


In [52]:
#Let's create a function to repeat the process to all the neighborhoods in Toronto
def getNearbyVenues(names, latitudes,longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names,latitudes,longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)


In [53]:
downtown_toronto_venues = getNearbyVenues(names=Downtown_t['Neighborhood'],
                                   latitudes=Downtown_t['Latitude'],
                                   longitudes=Downtown_t['Longitude'],
                                  )

Rosedale
St. James Town, Cabbagetown
Church and Wellesley
Regent Park, Harbourfront
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Stn A PO Boxes
First Canadian Place, Underground city
Christie
Queen's Park, Ontario Provincial Government


In [54]:
# Let's check the size of the resulting dataframe
print(downtown_toronto_venues.shape)
downtown_toronto_venues.head()

(358, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Rosedale,43.679563,-79.377529,Rosedale Park,43.682328,-79.378934,Playground
1,Rosedale,43.679563,-79.377529,Whitney Park,43.682036,-79.373788,Park
2,Rosedale,43.679563,-79.377529,Alex Murray Parkette,43.6783,-79.382773,Park
3,Rosedale,43.679563,-79.377529,Milkman's Lane,43.676352,-79.373842,Trail
4,"St. James Town, Cabbagetown",43.667967,-79.367675,Cranberries,43.667843,-79.369407,Diner


In [55]:
# Let's check how many venues were returned for each neighborhood
downtown_toronto_venues.groupby('Neighborhood').count()


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,20,20,20,20,20,20
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",17,17,17,17,17,17
Central Bay Street,20,20,20,20,20,20
Christie,17,17,17,17,17,17
Church and Wellesley,20,20,20,20,20,20
"Commerce Court, Victoria Hotel",20,20,20,20,20,20
"First Canadian Place, Underground city",20,20,20,20,20,20
"Garden District, Ryerson",20,20,20,20,20,20
"Harbourfront East, Union Station, Toronto Islands",20,20,20,20,20,20
"Kensington Market, Chinatown, Grange Park",20,20,20,20,20,20


In [56]:
# Let's find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(downtown_toronto_venues['Venue Category'].unique())))


There are 124 uniques categories.


Analyzing Each Neighborhood

In [57]:
# one hot encoding
downtown_toronto_onehot = pd.get_dummies(downtown_toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
downtown_toronto_onehot['Neighborhood'] = downtown_toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [downtown_toronto_onehot.columns[-1]] + list(downtown_toronto_onehot.columns[:-1])
downtown_toronto_onehot = downtown_toronto_onehot[fixed_columns]

downtown_toronto_onehot.head()

Unnamed: 0,Yoga Studio,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,Bakery,Bar,Basketball Stadium,Beer Bar,Belgian Restaurant,Boat or Ferry,Bookstore,Boutique,Breakfast Spot,Bubble Tea Shop,Burger Joint,Burrito Place,Butcher,Café,Candy Store,Caribbean Restaurant,Cheese Shop,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,College Rec Center,Comfort Food Restaurant,Comic Shop,Concert Hall,Cosmetics Shop,Creperie,Dance Studio,Deli / Bodega,Dessert Shop,Diner,Distribution Center,Electronics Store,Farmers Market,Fish Market,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Gastropub,General Entertainment,General Travel,Gift Shop,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Historic Site,Hobby Shop,Hotel,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Lake,Liquor Store,Martial Arts Dojo,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Monument / Landmark,Museum,Music Venue,Neighborhood,Nightclub,Organic Grocery,Park,Performing Arts Venue,Pet Store,Pizza Place,Plane,Playground,Plaza,Poke Place,Pub,Ramen Restaurant,Rental Car Location,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Sculpture Garden,Seafood Restaurant,Shopping Mall,Skating Rink,Smoke Shop,Smoothie Shop,Spa,Speakeasy,Sporting Goods Shop,Steakhouse,Supermarket,Sushi Restaurant,Tailor Shop,Taiwanese Restaurant,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Rosedale,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Rosedale,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Rosedale,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Rosedale,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"St. James Town, Cabbagetown",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [58]:
# Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
downtown_toronto_grouped = downtown_toronto_onehot.groupby('Neighborhood').mean().reset_index()

In [59]:
# Let's put that into a pandas dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [60]:
 #Now let's create the new dataframe and display the top 10 venues for each neighborhood.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = downtown_toronto_grouped['Neighborhood']

for ind in np.arange(downtown_toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(downtown_toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Farmers Market,Seafood Restaurant,Bakery,Jazz Club,Museum,Cocktail Bar,Coffee Shop,Breakfast Spot,Restaurant,Liquor Store
1,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Lounge,Airport Terminal,Boat or Ferry,Airport,Airport Food Court,Airport Gate,Coffee Shop,Sculpture Garden,Harbor / Marina
2,Central Bay Street,Coffee Shop,Gastropub,Spa,Park,Modern European Restaurant,Pizza Place,Miscellaneous Shop,Middle Eastern Restaurant,Bubble Tea Shop,Poke Place
3,Christie,Grocery Store,Café,Park,Athletics & Sports,Candy Store,Nightclub,Coffee Shop,Diner,Baby Store,Restaurant
4,Church and Wellesley,General Entertainment,Creperie,Park,Coffee Shop,Mexican Restaurant,Burger Joint,Bubble Tea Shop,Breakfast Spot,Pub,Ramen Restaurant
5,"Commerce Court, Victoria Hotel",Café,Coffee Shop,Bakery,Gastropub,Gym / Fitness Center,Ice Cream Shop,Japanese Restaurant,Museum,Pub,Restaurant
6,"First Canadian Place, Underground city",Café,Coffee Shop,Restaurant,Art Gallery,Seafood Restaurant,Bakery,Pub,Steakhouse,Gym,American Restaurant
7,"Garden District, Ryerson",Café,Clothing Store,Sporting Goods Shop,Music Venue,Burrito Place,Burger Joint,Plaza,Ramen Restaurant,Coffee Shop,College Rec Center
8,"Harbourfront East, Union Station, Toronto Islands",Park,Café,Plaza,Hotel,Supermarket,Salad Place,Skating Rink,Japanese Restaurant,Deli / Bodega,Ice Cream Shop
9,"Kensington Market, Chinatown, Grange Park",Café,Vietnamese Restaurant,Wine Bar,Bakery,Fish Market,Dessert Shop,Mexican Restaurant,Organic Grocery,Coffee Shop,Cocktail Bar


Clustering Neighborhoods

In [67]:
# set number of clusters
kclusters = 3

downtown_toronto_grouped_clustering = downtown_toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(downtown_toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 1, 0, 1, 1, 1, 1, 1], dtype=int32)

In [70]:
# neighborhood.
downtown_toronto_merged = Downtown_t

# add clustering labels
downtown_toronto_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
downtown_toronto_merged = downtown_toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

downtown_toronto_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529,0,Park,Playground,Trail,Dance Studio,Cheese Shop,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,College Gym
1,M4X,Downtown Toronto,"St. James Town, Cabbagetown",43.667967,-79.367675,0,Café,Restaurant,General Entertainment,Indian Restaurant,Park,Caribbean Restaurant,Pet Store,Gastropub,Butcher,Pub
2,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316,0,General Entertainment,Creperie,Park,Coffee Shop,Mexican Restaurant,Burger Joint,Bubble Tea Shop,Breakfast Spot,Pub,Ramen Restaurant
3,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,1,Coffee Shop,Bakery,Breakfast Spot,Park,Performing Arts Venue,Restaurant,Historic Site,Chocolate Shop,Farmers Market,Gym / Fitness Center
4,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,0,Café,Clothing Store,Sporting Goods Shop,Music Venue,Burrito Place,Burger Joint,Plaza,Ramen Restaurant,Coffee Shop,College Rec Center


Examine Clusters


Cluster 1

In [72]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 0, 
    downtown_toronto_merged.columns[[2] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Rosedale,0,Park,Playground,Trail,Dance Studio,Cheese Shop,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,College Gym
1,"St. James Town, Cabbagetown",0,Café,Restaurant,General Entertainment,Indian Restaurant,Park,Caribbean Restaurant,Pet Store,Gastropub,Butcher,Pub
2,Church and Wellesley,0,General Entertainment,Creperie,Park,Coffee Shop,Mexican Restaurant,Burger Joint,Bubble Tea Shop,Breakfast Spot,Pub,Ramen Restaurant
4,"Garden District, Ryerson",0,Café,Clothing Store,Sporting Goods Shop,Music Venue,Burrito Place,Burger Joint,Plaza,Ramen Restaurant,Coffee Shop,College Rec Center
10,"Toronto Dominion Centre, Design Exchange",0,Coffee Shop,Café,Gym,Steakhouse,Pizza Place,Pub,Restaurant,Beer Bar,Deli / Bodega,Japanese Restaurant
11,"Commerce Court, Victoria Hotel",0,Café,Coffee Shop,Bakery,Gastropub,Gym / Fitness Center,Ice Cream Shop,Japanese Restaurant,Museum,Pub,Restaurant
12,"University of Toronto, Harbord",0,Bakery,Bookstore,Restaurant,Japanese Restaurant,Italian Restaurant,Café,College Gym,Comfort Food Restaurant,Sandwich Place,Beer Bar
16,"First Canadian Place, Underground city",0,Café,Coffee Shop,Restaurant,Art Gallery,Seafood Restaurant,Bakery,Pub,Steakhouse,Gym,American Restaurant
18,"Queen's Park, Ontario Provincial Government",0,Coffee Shop,Diner,Arts & Crafts Store,Burrito Place,Mexican Restaurant,Beer Bar,Creperie,Park,Smoothie Shop,Italian Restaurant


Cluster 2

In [73]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 1, 
    downtown_toronto_merged.columns[[2] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,"Regent Park, Harbourfront",1,Coffee Shop,Bakery,Breakfast Spot,Park,Performing Arts Venue,Restaurant,Historic Site,Chocolate Shop,Farmers Market,Gym / Fitness Center
5,St. James Town,1,Gastropub,Japanese Restaurant,Coffee Shop,Gym,BBQ Joint,Food Truck,Hotel,Italian Restaurant,Diner,Creperie
6,Berczy Park,1,Farmers Market,Seafood Restaurant,Bakery,Jazz Club,Museum,Cocktail Bar,Coffee Shop,Breakfast Spot,Restaurant,Liquor Store
7,Central Bay Street,1,Coffee Shop,Gastropub,Spa,Park,Modern European Restaurant,Pizza Place,Miscellaneous Shop,Middle Eastern Restaurant,Bubble Tea Shop,Poke Place
8,"Richmond, Adelaide, King",1,Hotel,Seafood Restaurant,Steakhouse,Café,Pizza Place,Coffee Shop,Plaza,Concert Hall,Restaurant,Smoke Shop
9,"Harbourfront East, Union Station, Toronto Islands",1,Park,Café,Plaza,Hotel,Supermarket,Salad Place,Skating Rink,Japanese Restaurant,Deli / Bodega,Ice Cream Shop
14,"CN Tower, King and Spadina, Railway Lands, Har...",1,Airport Service,Airport Lounge,Airport Terminal,Boat or Ferry,Airport,Airport Food Court,Airport Gate,Coffee Shop,Sculpture Garden,Harbor / Marina
15,Stn A PO Boxes,1,Farmers Market,Café,Cocktail Bar,Jazz Club,Museum,Restaurant,Beer Bar,Concert Hall,Seafood Restaurant,Hotel
17,Christie,1,Grocery Store,Café,Park,Athletics & Sports,Candy Store,Nightclub,Coffee Shop,Diner,Baby Store,Restaurant


Cluster 2

In [74]:
downtown_toronto_merged.loc[downtown_toronto_merged['Cluster Labels'] == 2, 
    downtown_toronto_merged.columns[[2] + list(range(5, downtown_toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,"Kensington Market, Chinatown, Grange Park",2,Café,Vietnamese Restaurant,Wine Bar,Bakery,Fish Market,Dessert Shop,Mexican Restaurant,Organic Grocery,Coffee Shop,Cocktail Bar


# Exploring Neighborhoods in Manhattan

In [75]:
#Let's create a function to repeat the process to all the neighborhoods in Toronto
def getNearbyVenues(names, latitudes,longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names,latitudes,longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)


In [76]:
# Now write the code to run the above function on each neighborhood and create a new dataframe called manhattan_venues
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude'],
                                  )

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


In [77]:
# Let's check how many venues were returned for each neighborhood
manhattan_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Battery Park City,20,20,20,20,20,20
Carnegie Hill,20,20,20,20,20,20
Central Harlem,20,20,20,20,20,20
Chelsea,20,20,20,20,20,20
Chinatown,20,20,20,20,20,20
Civic Center,20,20,20,20,20,20
Clinton,20,20,20,20,20,20
East Harlem,20,20,20,20,20,20
East Village,20,20,20,20,20,20
Financial District,20,20,20,20,20,20


In [78]:
# Let's find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(manhattan_venues['Venue Category'].unique())))

There are 204 uniques categories.


Analyzing the Neighborhoods

In [79]:
# one hot encoding
manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = manhattan_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

manhattan_onehot.head()

Unnamed: 0,Neighborhood,Adult Boutique,African Restaurant,American Restaurant,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bar,Baseball Field,Beer Bar,Beer Garden,Beer Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Breakfast Spot,Bridge,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Café,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Circus,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Academic Building,College Arts Building,Comedy Club,Community Center,Concert Hall,Convenience Store,Cooking School,Cosmetics Shop,Cuban Restaurant,Cupcake Shop,Cycle Studio,Czech Restaurant,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Diner,Discount Store,Doctor's Office,Dog Run,Donut Shop,Dumpling Restaurant,Duty-free Shop,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Filipino Restaurant,Fish Market,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gas Station,Gastropub,General Entertainment,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hawaiian Restaurant,Health Food Store,Heliport,Historic Site,Hobby Shop,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Irish Pub,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Korean Restaurant,Latin American Restaurant,Library,Lingerie Store,Liquor Store,Lounge,Market,Martial Arts Dojo,Medical Center,Mediterranean Restaurant,Memorial Site,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music School,Music Venue,Nail Salon,New American Restaurant,Noodle House,Opera House,Optical Shop,Outdoor Sculpture,Outdoors & Recreation,Park,Pedestrian Plaza,Performing Arts Venue,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Pie Shop,Pizza Place,Playground,Plaza,Poke Place,Pool,Pub,Public Art,Ramen Restaurant,Residential Building (Apartment / Condo),Restaurant,Rock Club,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Snack Place,Soccer Field,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tea Room,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Tourist Information Center,Trail,Turkish Restaurant,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Waterfront,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [80]:
# Set Index
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()

In [81]:
# Let's put that into a pandas dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [82]:
# Now let's create the new dataframe and display the top 10 venues for each neighborhood.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Park,Food Court,Memorial Site,Sandwich Place,Cooking School,Burrito Place,Smoke Shop,Shopping Mall,Gym,Plaza
1,Carnegie Hill,Gym / Fitness Center,Coffee Shop,Italian Restaurant,Gym,Bagel Shop,Shoe Store,French Restaurant,Dance Studio,Bookstore,Gourmet Shop
2,Central Harlem,French Restaurant,American Restaurant,Bar,Beer Bar,Café,Food Truck,Music Venue,Boutique,Gym / Fitness Center,Library
3,Chelsea,Seafood Restaurant,Hotel,Theater,French Restaurant,Scenic Lookout,Fish Market,Speakeasy,Market,Coffee Shop,New American Restaurant
4,Chinatown,Spa,Sandwich Place,Chinese Restaurant,Greek Restaurant,Pizza Place,Dessert Shop,Noodle House,New American Restaurant,Sake Bar,Cocktail Bar
5,Civic Center,Spa,Yoga Studio,Taco Place,Cuban Restaurant,Park,Falafel Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Burrito Place,French Restaurant
6,Clinton,Theater,Gym / Fitness Center,Indie Theater,Peruvian Restaurant,Comedy Club,Café,Building,Pie Shop,French Restaurant,Sporting Goods Shop
7,East Harlem,Mexican Restaurant,Thai Restaurant,Cuban Restaurant,Beer Bar,Park,Doctor's Office,New American Restaurant,Sandwich Place,Pharmacy,French Restaurant
8,East Village,Vietnamese Restaurant,Dessert Shop,Ice Cream Shop,Coffee Shop,Scandinavian Restaurant,Korean Restaurant,Beer Store,Dog Run,Moroccan Restaurant,Bar
9,Financial District,Coffee Shop,Gym / Fitness Center,Restaurant,New American Restaurant,Salad Place,Monument / Landmark,Doctor's Office,Gym,Jewelry Store,Event Space



CLUSTERING NEIGHBORHOODS

Now we applied Machine Learning Technique “Clustering” to segment the neighborhoods in similar objects cluster. This will help to analyze from Tourist perspective and we can easily extract the Tourist places which are present on one of the clusters

In [83]:
# Run k-means to cluster the neighborhood into 5 clusters.
# set number of clusters
kclusters = 5

manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 0, 2, 2, 4, 4, 3, 4, 1, 0], dtype=int32)

In [84]:
# Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood
manhattan_merged = manhattan_data

# add clustering labels
manhattan_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

manhattan_merged.head() # check the last columns!

Unnamed: 0.1,Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,6,Manhattan,Marble Hill,40.876551,-73.91066,1,Gym,Coffee Shop,Sandwich Place,Yoga Studio,Tennis Stadium,Miscellaneous Shop,Donut Shop,Discount Store,Diner,Department Store
1,100,Manhattan,Chinatown,40.715618,-73.994279,0,Spa,Sandwich Place,Chinese Restaurant,Greek Restaurant,Pizza Place,Dessert Shop,Noodle House,New American Restaurant,Sake Bar,Cocktail Bar
2,101,Manhattan,Washington Heights,40.851903,-73.9369,2,Café,Wine Shop,Park,Ramen Restaurant,Deli / Bodega,Breakfast Spot,Frozen Yogurt Shop,Cocktail Bar,Market,Coffee Shop
3,102,Manhattan,Inwood,40.867684,-73.92121,2,Bakery,Park,Yoga Studio,Deli / Bodega,Diner,Restaurant,Farmers Market,Café,Mexican Restaurant,Pharmacy
4,103,Manhattan,Hamilton Heights,40.823604,-73.949688,4,Yoga Studio,Caribbean Restaurant,Cocktail Bar,Mexican Restaurant,Smoke Shop,Bakery,Mediterranean Restaurant,Wine Bar,Indian Restaurant,Bar




EXAMINE CLUSTERS

Cluster 1- manhattan

In [86]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0,
        manhattan_merged.columns[[2] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Chinatown,0,Spa,Sandwich Place,Chinese Restaurant,Greek Restaurant,Pizza Place,Dessert Shop,Noodle House,New American Restaurant,Sake Bar,Cocktail Bar
9,Yorkville,0,Deli / Bodega,Italian Restaurant,Coffee Shop,Wine Shop,Bagel Shop,Beer Store,Liquor Store,Video Store,Hobby Shop,Asian Restaurant
11,Roosevelt Island,0,Food & Drink Shop,Gym,Monument / Landmark,Farmers Market,School,Sandwich Place,Dog Run,Liquor Store,Coffee Shop,Bus Line
13,Lincoln Square,0,Theater,Concert Hall,Performing Arts Venue,Indie Movie Theater,Cycle Studio,Fountain,Circus,Opera House,College Arts Building,Gym / Fitness Center
16,Murray Hill,0,Coffee Shop,Burger Joint,Japanese Restaurant,Hotel,Theater,Café,Museum,Speakeasy,Sandwich Place,Sushi Restaurant
18,Greenwich Village,0,Italian Restaurant,Cosmetics Shop,Café,Sushi Restaurant,Yoga Studio,Beer Bar,Coffee Shop,Sandwich Place,Clothing Store,Caribbean Restaurant
19,East Village,0,Vietnamese Restaurant,Dessert Shop,Ice Cream Shop,Coffee Shop,Scandinavian Restaurant,Korean Restaurant,Beer Store,Dog Run,Moroccan Restaurant,Bar
22,Little Italy,0,Ice Cream Shop,Wine Bar,Sandwich Place,Thai Restaurant,Café,French Restaurant,Chinese Restaurant,Salon / Barbershop,Salad Place,Clothing Store
27,Gramercy,0,Pizza Place,Coffee Shop,Yoga Studio,Bike Rental / Bike Share,Irish Pub,Gourmet Shop,Liquor Store,Mexican Restaurant,Filipino Restaurant,Playground
34,Sutton Place,0,Yoga Studio,Beer Garden,Grocery Store,Gym,Steakhouse,Bakery,Beer Store,Deli / Bodega,French Restaurant,Greek Restaurant


cluster 2 manhattan

In [87]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1,
        manhattan_merged.columns[[2] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Marble Hill,1,Gym,Coffee Shop,Sandwich Place,Yoga Studio,Tennis Stadium,Miscellaneous Shop,Donut Shop,Discount Store,Diner,Department Store
8,Upper East Side,1,Hotel,Gym / Fitness Center,Hotel Bar,Park,Cosmetics Shop,Optical Shop,Coffee Shop,Sandwich Place,Shoe Store,French Restaurant
10,Lenox Hill,1,Gym,Thai Restaurant,Middle Eastern Restaurant,Taco Place,Cycle Studio,Pizza Place,Dessert Shop,Restaurant,College Academic Building,Salad Place
15,Midtown,1,Hotel,Cuban Restaurant,Steakhouse,Sporting Goods Shop,Spa,Smoke Shop,Salad Place,Clothing Store,Park,Szechuan Restaurant
25,Manhattan Valley,1,Bar,Yoga Studio,Pizza Place,Park,Coffee Shop,Chinese Restaurant,Mexican Restaurant,Fried Chicken Joint,Grocery Store,Korean Restaurant
26,Morningside Heights,1,Bookstore,Park,American Restaurant,Coffee Shop,Ice Cream Shop,Pub,Outdoor Sculpture,Salad Place,Sandwich Place,Seafood Restaurant
28,Battery Park City,1,Park,Food Court,Memorial Site,Sandwich Place,Cooking School,Burrito Place,Smoke Shop,Shopping Mall,Gym,Plaza
30,Carnegie Hill,1,Gym / Fitness Center,Coffee Shop,Italian Restaurant,Gym,Bagel Shop,Shoe Store,French Restaurant,Dance Studio,Bookstore,Gourmet Shop
32,Civic Center,1,Spa,Yoga Studio,Taco Place,Cuban Restaurant,Park,Falafel Restaurant,Monument / Landmark,Molecular Gastronomy Restaurant,Burrito Place,French Restaurant
33,Midtown South,1,Korean Restaurant,Hotel,Snack Place,Gift Shop,Coffee Shop,Grocery Store,Clothing Store,Lingerie Store,Scenic Lookout,Building


Cluster 3 - Manhattan

In [89]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2,
        manhattan_merged.columns[[2] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Washington Heights,2,Café,Wine Shop,Park,Ramen Restaurant,Deli / Bodega,Breakfast Spot,Frozen Yogurt Shop,Cocktail Bar,Market,Coffee Shop
3,Inwood,2,Bakery,Park,Yoga Studio,Deli / Bodega,Diner,Restaurant,Farmers Market,Café,Mexican Restaurant,Pharmacy
12,Upper West Side,2,Italian Restaurant,American Restaurant,Seafood Restaurant,Nail Salon,Greek Restaurant,Juice Bar,Movie Theater,Pub,Bar,Bagel Shop
20,Lower East Side,2,Art Gallery,Cocktail Bar,Coffee Shop,Yoga Studio,Juice Bar,French Restaurant,Chinese Restaurant,Bubble Tea Shop,Clothing Store,Mediterranean Restaurant
21,Tribeca,2,American Restaurant,Park,Yoga Studio,Sushi Restaurant,Café,Coffee Shop,Cycle Studio,Dog Run,Hotel,Italian Restaurant
36,Tudor City,2,Park,Yoga Studio,Taco Place,Hawaiian Restaurant,Japanese Restaurant,Gym,Deli / Bodega,Pizza Place,Salad Place,Seafood Restaurant
37,Stuyvesant Town,2,Park,Bar,Bistro,Coffee Shop,Cocktail Bar,Gym / Fitness Center,Harbor / Marina,Gas Station,Baseball Field,Skating Rink
38,Flatiron,2,Cycle Studio,Wine Shop,Furniture / Home Store,Japanese Restaurant,Yoga Studio,Thai Restaurant,Miscellaneous Shop,Café,Sports Club,Salon / Barbershop
39,Hudson Yards,2,American Restaurant,Gym / Fitness Center,Hotel,Public Art,Pet Store,Music School,Furniture / Home Store,Supermarket,Cocktail Bar,Gym


#  RESULTS

After clustering the data of the respective neighborhoods, both cities (Boroughs) have venues which can be explored and attract the Tourists. The neighborhoods are much similar in features like Theaters, opera houses, food places, clubs, museums, parks etc. As far as concern to dissimilarity, it differs in terms of some unique places like historical places and monuments.
  


# Observations

When we compare the tourist places, we observe that the historical place is only situated in Downtown Toronto and the Monument or landmark venue is in Manhattan neighborhoods. Similarly, Airport facility, Harbor, Sculpture garden and Boat or ferry services are also available i**n Downtown Toronto while venues like Nightlife, Climbing gym and Museums are present in Manhattan. 

Another obsevation found by clustering is that it will help in setting up of warehouses (For vegetables,fruits or regular items for bakerys, restaurants, cafes etc) at suitable locations.

# Conclusion


The downtown Toronto and Manhattan neighborhoods have more like similar venues. As we know that every place is unique in its own way, so that’s argument is present in both neighborhoods. The dissimilarity exists in terms of some different venues and facilities but not on a larger extent.