# Introduction
Problem and discussion:
	
In September of 2017, Amazon announced via a request for proposal (RFP) that it was seeking a second North American headquarters in which it could potentially hire up to 50,000 employees. Over 200 cities and municipalities responded to Amazon’s RFP with a multitude of options and tax incentives.

In January of 2018 the list of 200+ was reduced to 20 finalists. Among those finalists were New York and Toronto. In November 2018 Amazon made its decision by splitting it’s second headquarters in two, one part in New York, the other in Northern Virginia. Although Toronto lost, they did make it to the finalist list and are still a very desirable location and competitor for other large technology corporations seeking to expand their operations not only to the East Coast, but also to one of Canada’s technology hubs. 

Assuming the role of a real estate developer, or that of a restaurateur, it may be interesting to know what kinds of venues are located in the areas that both New York and Toronto chose as potential sites for Amazon’s HQ2. Any location that could potentially host up to 50,000 people would definitely provide opportunities for opening or expanding small businesses like eateries, restaurants, bars, and more. In the case of New York, the neighborhood of Long Island City was chosen. As for Toronto, Harbourfront East was one of their primary choices. 

There are many factors that must have gone into Amazon’s decision as to what site to choose for HQ2. This project will only look at these neighborhoods from the perspective of what types of venues are located within these neighborhoods and rank how common certain venues are. We’ll then compare those of Long Island City and Harbourfront to see any similarities or differences. The resulting data may or may not help to inform future business planning or investment in Harbourfront based on what worked for Long Island City.


# Data
The data used for this project will be acquired from Wikipedia, publicly available geospatial data, as well as from FourSquare via their API.  The sources are listed below:

Toronto Data: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

New York Data: https://ibm.box.com/shared/static/fbpwbovar7lf8p5sgddm06cgipa2rxpe.json

Foursquare: https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}

These data will be ingested, transformed, and assessed within IBM Watson Studio using a Python v3.5 notebook. First the New York data will be assessed to find the specific location parameters for Long Island City. Then the Foursquare API will be leveraged to bring in information pertaining to venues within Long Island City. At the end of the assessment we’ll see the top ranked venue types within Long Island City utilizing k-mean clustering. Once complete, the same process will be applied to the Toronto data. In this case though we will be focusing our assessment on Harbourfront East. The end result will be a list of most common venue types within both locations for us to compare and contrast. 

	  


## Importing libraries

In [66]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt # plotting library
# backend for rendering plots within the browser
%matplotlib inline 

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
geopy                     1.17.0                     py_0    conda-forge
Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
folium                    0.5.0                      py_0    conda-forge
Libraries imported.


## Load and transform Long Island City data

In [2]:
!wget -q -O 'newyork_data.json' https://ibm.box.com/shared/static/fbpwbovar7lf8p5sgddm06cgipa2rxpe.json
print('Data downloaded!')

Data downloaded!


In [3]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [4]:
newyork_data

{'bbox': [-74.2492599487305,
  40.5033187866211,
  -73.7061614990234,
  40.9105606079102],
 'crs': {'properties': {'name': 'urn:ogc:def:crs:EPSG::4326'}, 'type': 'name'},
 'features': [{'geometry': {'coordinates': [-73.84720052054902,
     40.89470517661],
    'type': 'Point'},
   'geometry_name': 'geom',
   'id': 'nyu_2451_34572.1',
   'properties': {'annoangle': 0.0,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661],
    'borough': 'Bronx',
    'name': 'Wakefield',
    'stacked': 1},
   'type': 'Feature'},
  {'geometry': {'coordinates': [-73.82993910812398, 40.87429419303012],
    'type': 'Point'},
   'geometry_name': 'geom',
   'id': 'nyu_2451_34572.2',
   'properties': {'annoangle': 0.0,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.874294193

In [5]:
nyneighborhoods_data = newyork_data['features']

In [6]:
nyneighborhoods_data[0]

{'geometry': {'coordinates': [-73.84720052054902, 40.89470517661],
  'type': 'Point'},
 'geometry_name': 'geom',
 'id': 'nyu_2451_34572.1',
 'properties': {'annoangle': 0.0,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661],
  'borough': 'Bronx',
  'name': 'Wakefield',
  'stacked': 1},
 'type': 'Feature'}

In [7]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
nyneighborhoods = pd.DataFrame(columns=column_names)

In [8]:
for data in nyneighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    nyneighborhoods = nyneighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [9]:
nyneighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [10]:
lic_data = nyneighborhoods[nyneighborhoods['Neighborhood'] == 'Long Island City'].reset_index(drop=True)
lic_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Queens,Long Island City,40.750217,-73.939202


In [11]:
address = 'Long Island City, NY'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Long Island City are {}, {}.'.format(latitude, longitude))



The geograpical coordinate of Long Island City are 40.7415369, -73.9571249.


In [13]:
# create map of Long Island City using latitude and longitude values
map_lic = folium.Map(location=[latitude, longitude], zoom_start=15)

# add markers to map
for lat, lng, label in zip(lic_data['Latitude'], lic_data['Longitude'], lic_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_lic)  
    
map_lic

In [14]:
CLIENT_ID = '01LI5PMDR4JUIH03BSWHKV5PDWP2NA0AA502KV40ZEFRRI5F' # your Foursquare ID
CLIENT_SECRET = 'H5LD2541BHOWDKAL32S1O022JB124ATFEEWLHV1LVGV1NHMB' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 01LI5PMDR4JUIH03BSWHKV5PDWP2NA0AA502KV40ZEFRRI5F
CLIENT_SECRET:H5LD2541BHOWDKAL32S1O022JB124ATFEEWLHV1LVGV1NHMB


In [15]:
lic_data.loc[0, 'Neighborhood']

'Long Island City'

In [16]:
licneighborhood_latitude = lic_data.loc[0, 'Latitude'] # neighborhood latitude value
licneighborhood_longitude = lic_data.loc[0, 'Longitude'] # neighborhood longitude value

licneighborhood_name = lic_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               licneighborhood_latitude, 
                                                               licneighborhood_longitude))

Latitude and longitude values of Fox Hills are 40.75021734610528, -73.93920223915505.


In [17]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    licneighborhood_latitude, 
    licneighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=01LI5PMDR4JUIH03BSWHKV5PDWP2NA0AA502KV40ZEFRRI5F&client_secret=H5LD2541BHOWDKAL32S1O022JB124ATFEEWLHV1LVGV1NHMB&v=20180605&ll=40.75021734610528,-73.93920223915505&radius=500&limit=100'

In [18]:
licresults = requests.get(url).json()
licresults

{'meta': {'code': 200, 'requestId': '5c1655561ed2195d6a6638f5'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-5523e3a6498eef3943a338dc-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/travel/hotel_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d1fa931735',
         'name': 'Hotel',
         'pluralName': 'Hotels',
         'primary': True,
         'shortName': 'Hotel'}],
       'id': '5523e3a6498eef3943a338dc',
       'location': {'address': '29-21 41st Avenue',
        'cc': 'US',
        'city': 'Long Island City',
        'country': 'United States',
        'distance': 195,
        'formattedAddress': ['29-21 41st Avenue',
         'Long Island City, NY 11101',
         'United States'],
        'labeledLatLngs': [{'label': 'display',
          'lat'

In [19]:
def get_category_type(row):
    try:
        liccategories_list = row['categories']
    except:
        liccategories_list = row['venue.categories']
        
    if len(liccategories_list) == 0:
        return None
    else:
        return liccategories_list[0]['name']

In [20]:
licvenues = licresults['response']['groups'][0]['items']
    
licnearby_venues = json_normalize(licvenues) # flatten JSON

# filter columns
licfiltered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
licnearby_venues =licnearby_venues.loc[:, licfiltered_columns]

# filter the category for each row
licnearby_venues['venue.categories'] = licnearby_venues.apply(get_category_type, axis=1)

# clean columns
licnearby_venues.columns = [col.split(".")[-1] for col in licnearby_venues.columns]

licnearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Hilton Garden Inn New York Long Island City/Ma...,Hotel,40.750216,-73.936886
1,Clever Blend Lic,Coffee Shop,40.750228,-73.939608
2,The Beast Next Door Cafe & Bar,Bar,40.748888,-73.940876
3,Commissary Market,Café,40.750511,-73.939734
4,Baker House Market,Convenience Store,40.752137,-73.939235


In [123]:
print('{} venues were returned by Foursquare.'.format(licnearby_venues.shape[0]))

64 venues were returned by Foursquare.


In [22]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    licvenues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        licresults = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        licvenues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in licresults])

    licnearby_venues = pd.DataFrame([item for licvenue_list in licvenues_list for item in licvenue_list])
    licnearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(licnearby_venues)

In [23]:
lic_venues = getNearbyVenues(names=lic_data['Neighborhood'],
                                   latitudes=lic_data['Latitude'],
                                   longitudes=lic_data['Longitude'])

Long Island City


In [24]:
print(lic_venues.shape)
lic_venues.head()

(64, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Long Island City,40.750217,-73.939202,Hilton Garden Inn New York Long Island City/Ma...,40.750216,-73.936886,Hotel
1,Long Island City,40.750217,-73.939202,Clever Blend Lic,40.750228,-73.939608,Coffee Shop
2,Long Island City,40.750217,-73.939202,The Beast Next Door Cafe & Bar,40.748888,-73.940876,Bar
3,Long Island City,40.750217,-73.939202,Commissary Market,40.750511,-73.939734,Café
4,Long Island City,40.750217,-73.939202,Baker House Market,40.752137,-73.939235,Convenience Store


In [25]:
# one hot encoding
lic_onehot = pd.get_dummies(lic_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
lic_onehot['Neighborhood'] = lic_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [lic_onehot.columns[-1]] + list(lic_onehot.columns[:-1])
lic_onehot = lic_onehot[fixed_columns]

lic_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Art Gallery,Art Museum,Arts & Entertainment,Bank,Bar,Bookstore,Boutique,Brewery,Burger Joint,Bus Station,Café,Climbing Gym,Cocktail Bar,Coffee Shop,Convenience Store,Deli / Bodega,Donut Shop,Fast Food Restaurant,General Entertainment,Gym,Gym / Fitness Center,Hostel,Hotel,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Mediterranean Restaurant,Mexican Restaurant,Mobile Phone Shop,Motel,Office,Pizza Place,Post Office,Residential Building (Apartment / Condo),Restaurant,Sandwich Place,Steakhouse,Supermarket,Thai Restaurant
0,Long Island City,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Long Island City,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Long Island City,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Long Island City,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Long Island City,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [26]:
lic_onehot.shape

(64, 41)

In [27]:
lic_grouped = lic_onehot.groupby('Neighborhood').mean().reset_index()
lic_grouped

Unnamed: 0,Neighborhood,American Restaurant,Art Gallery,Art Museum,Arts & Entertainment,Bank,Bar,Bookstore,Boutique,Brewery,Burger Joint,Bus Station,Café,Climbing Gym,Cocktail Bar,Coffee Shop,Convenience Store,Deli / Bodega,Donut Shop,Fast Food Restaurant,General Entertainment,Gym,Gym / Fitness Center,Hostel,Hotel,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Mediterranean Restaurant,Mexican Restaurant,Mobile Phone Shop,Motel,Office,Pizza Place,Post Office,Residential Building (Apartment / Condo),Restaurant,Sandwich Place,Steakhouse,Supermarket,Thai Restaurant
0,Long Island City,0.015625,0.015625,0.015625,0.015625,0.015625,0.046875,0.015625,0.015625,0.015625,0.015625,0.015625,0.078125,0.015625,0.015625,0.09375,0.015625,0.015625,0.03125,0.015625,0.015625,0.015625,0.015625,0.015625,0.09375,0.015625,0.015625,0.015625,0.015625,0.0625,0.015625,0.015625,0.015625,0.078125,0.015625,0.015625,0.015625,0.015625,0.015625,0.015625,0.015625


In [28]:
lic_grouped.shape

(1, 41)

In [29]:
num_top_venues = 5

for hood in lic_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = lic_grouped[lic_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Long Island City----
                venue  freq
0               Hotel  0.09
1         Coffee Shop  0.09
2         Pizza Place  0.08
3                Café  0.08
4  Mexican Restaurant  0.06




In [30]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [125]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
licneighborhoods_venues_sorted = pd.DataFrame(columns=columns)
licneighborhoods_venues_sorted['Neighborhood'] = lic_grouped['Neighborhood']

for ind in np.arange(lic_grouped.shape[0]):
    licneighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(lic_grouped.iloc[ind, :], num_top_venues)

licneighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Long Island City,Coffee Shop,Hotel,Pizza Place,Café,Mexican Restaurant,Bar,Donut Shop,Thai Restaurant,Bus Station,Convenience Store


In [85]:
# set number of clusters
kclusters = 1

lic_grouped_clustering = lic_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(lic_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0], dtype=int32)

In [33]:
lic_merged = lic_data

# add clustering labels
lic_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
lic_merged = lic_merged.join(licneighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

lic_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Queens,Long Island City,40.750217,-73.939202,0,Coffee Shop,Hotel,Pizza Place,Café,Mexican Restaurant,Bar,Donut Shop,Thai Restaurant,Bus Station,Convenience Store


In [34]:
# create map
licmap_clusters = folium.Map(location=[latitude, longitude], zoom_start=15)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(lic_merged['Latitude'], lic_merged['Longitude'], lic_merged['Neighborhood'], lic_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(licmap_clusters)
       
licmap_clusters

In [35]:
lic_merged.loc[lic_merged['Cluster Labels'] == 0, lic_merged.columns[[1] + list(range(5, lic_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Long Island City,Coffee Shop,Hotel,Pizza Place,Café,Mexican Restaurant,Bar,Donut Shop,Thai Restaurant,Bus Station,Convenience Store


## Load and transform Downtown Toronto data

In [36]:
df = pd.read_html("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M", header=0, keep_default_na=False)
pct = df[0]
pct.columns = ['PostalCode','Borough', 'Neighborhood']
pct1 = pct.query('Borough ! = "Not assigned"').reset_index(drop=True)
df1 = pd.read_csv("https://cocl.us/Geospatial_data", header=0)
df1.columns = ['PostalCode','Latitude', 'Longitude']
df2 = pd.merge(df1, pct1, on='PostalCode')
toronto_data = df2[['PostalCode', 'Borough', 'Neighborhood', 'Latitude', 'Longitude']]
column_names = ['PostalCode', 'Borough', 'Neighborhood', 'Latitude', 'Longitude']
neighborhoods = pd.DataFrame(columns=column_names)
toronto_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,Rouge,43.806686,-79.194353
1,M1B,Scarborough,Malvern,43.806686,-79.194353
2,M1C,Scarborough,Highland Creek,43.784535,-79.160497
3,M1C,Scarborough,Rouge Hill,43.784535,-79.160497
4,M1C,Scarborough,Port Union,43.784535,-79.160497


In [37]:
he_data = toronto_data[toronto_data['Neighborhood'] == 'Harbourfront East'].reset_index(drop=True)
he_data

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5J,Downtown Toronto,Harbourfront East,43.640816,-79.381752


In [38]:
address = 'Harbourfront East, ON'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Harbourfront East are {}, {}.'.format(latitude, longitude))



The geograpical coordinate of Harbourfront East are 43.6400801, -79.3801495.


In [39]:
# create map of Harbourfront East using latitude and longitude values
map_he = folium.Map(location=[latitude, longitude], zoom_start=14)

# add markers to map
for lat, lng, label in zip(he_data['Latitude'], he_data['Longitude'], he_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_he)  
    
map_he

In [40]:
he_data.loc[0, 'Neighborhood']

'Harbourfront East'

In [41]:
heneighborhood_latitude = he_data.loc[0, 'Latitude'] # neighborhood latitude value
heneighborhood_longitude = he_data.loc[0, 'Longitude'] # neighborhood longitude value

heneighborhood_name = he_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               heneighborhood_latitude, 
                                                               heneighborhood_longitude))

Latitude and longitude values of Fox Hills are 43.6408157, -79.38175229999999.


In [42]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    heneighborhood_latitude, 
    heneighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=01LI5PMDR4JUIH03BSWHKV5PDWP2NA0AA502KV40ZEFRRI5F&client_secret=H5LD2541BHOWDKAL32S1O022JB124ATFEEWLHV1LVGV1NHMB&v=20180605&ll=43.6408157,-79.38175229999999&radius=500&limit=100'

In [43]:
heresults = requests.get(url).json()
heresults

{'meta': {'code': 200, 'requestId': '5c16558edb04f54e6c05f785'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4bfaa3494a67c928d08528cf-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/parks_outdoors/neighborhood_',
          'suffix': '.png'},
         'id': '4f2a25ac4b909258e854f55f',
         'name': 'Neighborhood',
         'pluralName': 'Neighborhoods',
         'primary': True,
         'shortName': 'Neighborhood'}],
       'id': '4bfaa3494a67c928d08528cf',
       'location': {'cc': 'CA',
        'city': 'Toronto',
        'country': 'Canada',
        'distance': 167,
        'formattedAddress': ['Toronto ON', 'Canada'],
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.639525632239106,
          'lng': -79.38068838052389}],
        'lat': 43.639

In [44]:
def get_category_type(row):
    try:
        hecategories_list = row['categories']
    except:
        hecategories_list = row['venue.categories']
        
    if len(hecategories_list) == 0:
        return None
    else:
        return hecategories_list[0]['name']

In [45]:
hevenues = heresults['response']['groups'][0]['items']
    
henearby_venues = json_normalize(hevenues) # flatten JSON

# filter columns
hefiltered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
henearby_venues =henearby_venues.loc[:, hefiltered_columns]

# filter the category for each row
henearby_venues['venue.categories'] = henearby_venues.apply(get_category_type, axis=1)

# clean columns
henearby_venues.columns = [col.split(".")[-1] for col in henearby_venues.columns]

henearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Harbourfront,Neighborhood,43.639526,-79.380688
1,Roundhouse Park,Park,43.641745,-79.384279
2,Lake Ontario,Lake,43.639398,-79.379589
3,iQ Food Co,Salad Place,43.642851,-79.382081
4,Longo's Maple Leaf Square,Supermarket,43.642517,-79.381393


In [46]:
print('{} venues were returned by Foursquare.'.format(henearby_venues.shape[0]))

100 venues were returned by Foursquare.


In [47]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    hevenues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        heresults = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        hevenues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in heresults])

    henearby_venues = pd.DataFrame([item for hevenue_list in hevenues_list for item in hevenue_list])
    henearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(henearby_venues)

In [48]:
he_venues = getNearbyVenues(names=he_data['Neighborhood'],
                                   latitudes=he_data['Latitude'],
                                   longitudes=he_data['Longitude'])

Harbourfront East


In [49]:
print(he_venues.shape)
he_venues.head()

(100, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Harbourfront East,43.640816,-79.381752,Harbourfront,43.639526,-79.380688,Neighborhood
1,Harbourfront East,43.640816,-79.381752,Roundhouse Park,43.641745,-79.384279,Park
2,Harbourfront East,43.640816,-79.381752,Lake Ontario,43.639398,-79.379589,Lake
3,Harbourfront East,43.640816,-79.381752,iQ Food Co,43.642851,-79.382081,Salad Place
4,Harbourfront East,43.640816,-79.381752,Longo's Maple Leaf Square,43.642517,-79.381393,Supermarket


In [50]:
# one hot encoding
he_onehot = pd.get_dummies(he_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
he_onehot['Neighborhood'] = he_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [he_onehot.columns[-1]] + list(he_onehot.columns[:-1])
he_onehot = he_onehot[fixed_columns]

he_onehot.head()

Unnamed: 0,Wine Bar,Aquarium,Art Gallery,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Bistro,Brewery,Bubble Tea Shop,Café,Chinese Restaurant,Coffee Shop,Dance Studio,Deli / Bodega,Event Space,Fast Food Restaurant,Fried Chicken Joint,History Museum,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Lake,Lounge,Monument / Landmark,Music Venue,Neighborhood,New American Restaurant,Office,Park,Performing Arts Venue,Pizza Place,Plaza,Restaurant,Salad Place,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Skating Rink,Smoothie Shop,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Theater,Train Station,Vegetarian / Vegan Restaurant,Whisky Bar
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Harbourfront East,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Harbourfront East,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,Harbourfront East,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Harbourfront East,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Harbourfront East,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0


In [51]:
he_onehot.shape

(100, 56)

In [52]:
he_grouped = he_onehot.groupby('Neighborhood').mean().reset_index()
he_grouped

Unnamed: 0,Neighborhood,Wine Bar,Aquarium,Art Gallery,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Bistro,Brewery,Bubble Tea Shop,Café,Chinese Restaurant,Coffee Shop,Dance Studio,Deli / Bodega,Event Space,Fast Food Restaurant,Fried Chicken Joint,History Museum,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Lake,Lounge,Monument / Landmark,Music Venue,New American Restaurant,Office,Park,Performing Arts Venue,Pizza Place,Plaza,Restaurant,Salad Place,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Skating Rink,Smoothie Shop,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Theater,Train Station,Vegetarian / Vegan Restaurant,Whisky Bar
0,Harbourfront East,0.01,0.05,0.01,0.02,0.01,0.02,0.01,0.01,0.01,0.03,0.01,0.04,0.01,0.14,0.01,0.01,0.01,0.01,0.02,0.02,0.05,0.01,0.01,0.01,0.01,0.03,0.01,0.01,0.01,0.01,0.02,0.01,0.01,0.02,0.01,0.04,0.01,0.03,0.01,0.01,0.03,0.01,0.01,0.01,0.01,0.02,0.02,0.01,0.01,0.01,0.01,0.01,0.02,0.01,0.01


In [53]:
he_grouped.shape

(1, 56)

In [54]:
num_top_venues = 5

for hood in he_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = he_grouped[he_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Harbourfront East----
         venue  freq
0  Coffee Shop  0.14
1        Hotel  0.05
2     Aquarium  0.05
3         Café  0.04
4  Pizza Place  0.04




In [55]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [56]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
heneighborhoods_venues_sorted = pd.DataFrame(columns=columns)
heneighborhoods_venues_sorted['Neighborhood'] = he_grouped['Neighborhood']

for ind in np.arange(lic_grouped.shape[0]):
    heneighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(he_grouped.iloc[ind, :], num_top_venues)

heneighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Harbourfront East,Coffee Shop,Aquarium,Hotel,Café,Pizza Place,Brewery,Italian Restaurant,Restaurant,Scenic Lookout,Music Venue


In [57]:
# set number of clusters
kclusters = 1

he_grouped_clustering = he_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(lic_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0], dtype=int32)

In [58]:
he_merged = he_data

# add clustering labels
he_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
he_merged = he_merged.join(heneighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

he_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5J,Downtown Toronto,Harbourfront East,43.640816,-79.381752,0,Coffee Shop,Aquarium,Hotel,Café,Pizza Place,Brewery,Italian Restaurant,Restaurant,Scenic Lookout,Music Venue


In [59]:
# create map
hemap_clusters = folium.Map(location=[latitude, longitude], zoom_start=15)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(he_merged['Latitude'], he_merged['Longitude'], he_merged['Neighborhood'], he_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(hemap_clusters)
       
hemap_clusters

In [60]:
he_merged.loc[he_merged['Cluster Labels'] == 0, he_merged.columns[[1] + list(range(5, he_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,0,Coffee Shop,Aquarium,Hotel,Café,Pizza Place,Brewery,Italian Restaurant,Restaurant,Scenic Lookout,Music Venue


In [61]:
lic_merged.loc[lic_merged['Cluster Labels'] == 0, lic_merged.columns[[1] + list(range(5, lic_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Long Island City,Coffee Shop,Hotel,Pizza Place,Café,Mexican Restaurant,Bar,Donut Shop,Thai Restaurant,Bus Station,Convenience Store


## Comparing data frames side by side

In [140]:
frames = [lic_venues, he_venues]
lic_he_combined = pd.concat(frames)

In [141]:
lic_he_combined.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Long Island City,40.750217,-73.939202,Hilton Garden Inn New York Long Island City/Ma...,40.750216,-73.936886,Hotel
1,Long Island City,40.750217,-73.939202,Clever Blend Lic,40.750228,-73.939608,Coffee Shop
2,Long Island City,40.750217,-73.939202,The Beast Next Door Cafe & Bar,40.748888,-73.940876,Bar
3,Long Island City,40.750217,-73.939202,Commissary Market,40.750511,-73.939734,Café
4,Long Island City,40.750217,-73.939202,Baker House Market,40.752137,-73.939235,Convenience Store


In [142]:
# one hot encoding
lic_he_combined_onehot = pd.get_dummies(lic_he_combined[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
lic_he_combined_onehot['Neighborhood'] = lic_he_combined['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [lic_he_combined_onehot.columns[-1]] + list(lic_he_combined_onehot.columns[:-1])
lic_he_combined_onehot = lic_he_combined_onehot[fixed_columns]

lic_he_combined_onehot.head()

Unnamed: 0,Wine Bar,American Restaurant,Aquarium,Art Gallery,Art Museum,Arts & Entertainment,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Bistro,Bookstore,Boutique,Brewery,Bubble Tea Shop,Burger Joint,Bus Station,Café,Chinese Restaurant,Climbing Gym,Cocktail Bar,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega,Donut Shop,Event Space,Fast Food Restaurant,Fried Chicken Joint,General Entertainment,Gym,Gym / Fitness Center,History Museum,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Lake,Lounge,Mediterranean Restaurant,Mexican Restaurant,Mobile Phone Shop,Monument / Landmark,Motel,Music Venue,Neighborhood,New American Restaurant,Office,Park,Performing Arts Venue,Pizza Place,Plaza,Post Office,Residential Building (Apartment / Condo),Restaurant,Salad Place,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Skating Rink,Smoothie Shop,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Train Station,Vegetarian / Vegan Restaurant,Whisky Bar
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Long Island City,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Long Island City,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Long Island City,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Long Island City,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Long Island City,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [143]:
lic_he_combined_onehot.shape

(164, 78)

In [144]:
lic_he_combined_grouped = lic_he_combined_onehot.groupby('Neighborhood').mean().reset_index()
lic_he_combined_grouped

Unnamed: 0,Neighborhood,Wine Bar,American Restaurant,Aquarium,Art Gallery,Art Museum,Arts & Entertainment,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Bistro,Bookstore,Boutique,Brewery,Bubble Tea Shop,Burger Joint,Bus Station,Café,Chinese Restaurant,Climbing Gym,Cocktail Bar,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega,Donut Shop,Event Space,Fast Food Restaurant,Fried Chicken Joint,General Entertainment,Gym,Gym / Fitness Center,History Museum,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Lake,Lounge,Mediterranean Restaurant,Mexican Restaurant,Mobile Phone Shop,Monument / Landmark,Motel,Music Venue,New American Restaurant,Office,Park,Performing Arts Venue,Pizza Place,Plaza,Post Office,Residential Building (Apartment / Condo),Restaurant,Salad Place,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Skating Rink,Smoothie Shop,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Train Station,Vegetarian / Vegan Restaurant,Whisky Bar
0,Harbourfront East,0.01,0.0,0.05,0.01,0.0,0.0,0.02,0.01,0.02,0.01,0.01,0.01,0.0,0.0,0.03,0.01,0.0,0.0,0.04,0.01,0.0,0.0,0.14,0.0,0.01,0.01,0.0,0.01,0.01,0.02,0.0,0.0,0.0,0.02,0.0,0.05,0.01,0.01,0.01,0.01,0.03,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.01,0.02,0.01,0.04,0.01,0.0,0.0,0.03,0.01,0.01,0.03,0.01,0.01,0.01,0.01,0.02,0.02,0.01,0.01,0.01,0.01,0.0,0.01,0.02,0.01,0.01
1,Long Island City,0.0,0.015625,0.0,0.015625,0.015625,0.015625,0.0,0.015625,0.046875,0.0,0.0,0.0,0.015625,0.015625,0.015625,0.0,0.015625,0.015625,0.078125,0.0,0.015625,0.015625,0.09375,0.015625,0.0,0.015625,0.03125,0.0,0.015625,0.0,0.015625,0.015625,0.015625,0.0,0.015625,0.09375,0.0,0.015625,0.015625,0.0,0.015625,0.0,0.0,0.0,0.015625,0.0625,0.015625,0.0,0.015625,0.0,0.0,0.015625,0.0,0.0,0.078125,0.0,0.015625,0.015625,0.015625,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.015625,0.0,0.0,0.015625,0.0,0.0,0.0,0.0


In [145]:
lic_he_combined_grouped.shape

(2, 78)

In [146]:
num_top_venues = 5

for hood in lic_he_combined_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = lic_he_combined_grouped[lic_he_combined_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Harbourfront East----
         venue  freq
0  Coffee Shop  0.14
1     Aquarium  0.05
2        Hotel  0.05
3  Pizza Place  0.04
4         Café  0.04


----Long Island City----
                venue  freq
0               Hotel  0.09
1         Coffee Shop  0.09
2                Café  0.08
3         Pizza Place  0.08
4  Mexican Restaurant  0.06




In [147]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [161]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
lic_he_combined_grouped_sorted = pd.DataFrame(columns=columns)
lic_he_combined_grouped_sorted['Neighborhood'] = lic_he_combined_grouped['Neighborhood']

for ind in np.arange(lic_he_combined_grouped.shape[0]):
    lic_he_combined_grouped_sorted.iloc[ind, 1:] = return_most_common_venues(lic_he_combined_grouped.iloc[ind, 2:], num_top_venues)

lic_he_combined_grouped_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Harbourfront East,Coffee Shop,Hotel,Aquarium,Pizza Place,Café,Restaurant,Brewery,Scenic Lookout,Italian Restaurant,Sports Bar
1,Long Island City,Coffee Shop,Hotel,Pizza Place,Café,Mexican Restaurant,Bar,Donut Shop,Cocktail Bar,Mediterranean Restaurant,Convenience Store


## Cluster the two Neighborhoods with k-means

In [192]:
# set number of clusters
kclusters = 2

lic_he_combined_grouped_clustering = lic_he_combined_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(lic_he_combined_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 1], dtype=int32)

In [194]:
lic_he_merged = lic_he_combined_grouped

# add clustering labels
lic_he_merged['Cluster Labels'] = kmeans.labels_
3 
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
lic_he_merged = lic_he_merged.join(lic_he_combined_grouped_sorted.set_index('Neighborhood'), on='Neighborhood')

lic_he_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Wine Bar,American Restaurant,Aquarium,Art Gallery,Art Museum,Arts & Entertainment,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Bistro,Bookstore,Boutique,Brewery,Bubble Tea Shop,Burger Joint,Bus Station,Café,Chinese Restaurant,Climbing Gym,Cocktail Bar,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega,Donut Shop,Event Space,Fast Food Restaurant,Fried Chicken Joint,General Entertainment,Gym,Gym / Fitness Center,History Museum,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Lake,Lounge,Mediterranean Restaurant,Mexican Restaurant,Mobile Phone Shop,Monument / Landmark,Motel,Music Venue,New American Restaurant,Office,Park,Performing Arts Venue,Pizza Place,Plaza,Post Office,Residential Building (Apartment / Condo),Restaurant,Salad Place,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Skating Rink,Smoothie Shop,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Train Station,Vegetarian / Vegan Restaurant,Whisky Bar,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Harbourfront East,0.01,0.0,0.05,0.01,0.0,0.0,0.02,0.01,0.02,0.01,0.01,0.01,0.0,0.0,0.03,0.01,0.0,0.0,0.04,0.01,0.0,0.0,0.14,0.0,0.01,0.01,0.0,0.01,0.01,0.02,0.0,0.0,0.0,0.02,0.0,0.05,0.01,0.01,0.01,0.01,0.03,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.01,0.02,0.01,0.04,0.01,0.0,0.0,0.03,0.01,0.01,0.03,0.01,0.01,0.01,0.01,0.02,0.02,0.01,0.01,0.01,0.01,0.0,0.01,0.02,0.01,0.01,0,Coffee Shop,Hotel,Aquarium,Pizza Place,Café,Restaurant,Brewery,Scenic Lookout,Italian Restaurant,Sports Bar
1,Long Island City,0.0,0.015625,0.0,0.015625,0.015625,0.015625,0.0,0.015625,0.046875,0.0,0.0,0.0,0.015625,0.015625,0.015625,0.0,0.015625,0.015625,0.078125,0.0,0.015625,0.015625,0.09375,0.015625,0.0,0.015625,0.03125,0.0,0.015625,0.0,0.015625,0.015625,0.015625,0.0,0.015625,0.09375,0.0,0.015625,0.015625,0.0,0.015625,0.0,0.0,0.0,0.015625,0.0625,0.015625,0.0,0.015625,0.0,0.0,0.015625,0.0,0.0,0.078125,0.0,0.015625,0.015625,0.015625,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.015625,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,1,Coffee Shop,Hotel,Pizza Place,Café,Mexican Restaurant,Bar,Donut Shop,Cocktail Bar,Mediterranean Restaurant,Convenience Store


In [195]:
lic_he_merged.loc[lic_he_merged['Cluster Labels'] == 0, lic_he_merged.columns[[1] + list(range(5, lic_he_merged.shape[1]))]]

Unnamed: 0,Wine Bar,Art Museum,Arts & Entertainment,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Bistro,Bookstore,Boutique,Brewery,Bubble Tea Shop,Burger Joint,Bus Station,Café,Chinese Restaurant,Climbing Gym,Cocktail Bar,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega,Donut Shop,Event Space,Fast Food Restaurant,Fried Chicken Joint,General Entertainment,Gym,Gym / Fitness Center,History Museum,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Lake,Lounge,Mediterranean Restaurant,Mexican Restaurant,Mobile Phone Shop,Monument / Landmark,Motel,Music Venue,New American Restaurant,Office,Park,Performing Arts Venue,Pizza Place,Plaza,Post Office,Residential Building (Apartment / Condo),Restaurant,Salad Place,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Skating Rink,Smoothie Shop,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Train Station,Vegetarian / Vegan Restaurant,Whisky Bar,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0.01,0.0,0.0,0.02,0.01,0.02,0.01,0.01,0.01,0.0,0.0,0.03,0.01,0.0,0.0,0.04,0.01,0.0,0.0,0.14,0.0,0.01,0.01,0.0,0.01,0.01,0.02,0.0,0.0,0.0,0.02,0.0,0.05,0.01,0.01,0.01,0.01,0.03,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.01,0.02,0.01,0.04,0.01,0.0,0.0,0.03,0.01,0.01,0.03,0.01,0.01,0.01,0.01,0.02,0.02,0.01,0.01,0.01,0.01,0.0,0.01,0.02,0.01,0.01,0,Coffee Shop,Hotel,Aquarium,Pizza Place,Café,Restaurant,Brewery,Scenic Lookout,Italian Restaurant,Sports Bar


In [196]:
lic_he_merged.loc[lic_he_merged['Cluster Labels'] == 1, lic_he_merged.columns[[1] + list(range(5, lic_he_merged.shape[1]))]]

Unnamed: 0,Wine Bar,Art Museum,Arts & Entertainment,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Bistro,Bookstore,Boutique,Brewery,Bubble Tea Shop,Burger Joint,Bus Station,Café,Chinese Restaurant,Climbing Gym,Cocktail Bar,Coffee Shop,Convenience Store,Dance Studio,Deli / Bodega,Donut Shop,Event Space,Fast Food Restaurant,Fried Chicken Joint,General Entertainment,Gym,Gym / Fitness Center,History Museum,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Lake,Lounge,Mediterranean Restaurant,Mexican Restaurant,Mobile Phone Shop,Monument / Landmark,Motel,Music Venue,New American Restaurant,Office,Park,Performing Arts Venue,Pizza Place,Plaza,Post Office,Residential Building (Apartment / Condo),Restaurant,Salad Place,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Skating Rink,Smoothie Shop,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Train Station,Vegetarian / Vegan Restaurant,Whisky Bar,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,0.0,0.015625,0.015625,0.0,0.015625,0.046875,0.0,0.0,0.0,0.015625,0.015625,0.015625,0.0,0.015625,0.015625,0.078125,0.0,0.015625,0.015625,0.09375,0.015625,0.0,0.015625,0.03125,0.0,0.015625,0.0,0.015625,0.015625,0.015625,0.0,0.015625,0.09375,0.0,0.015625,0.015625,0.0,0.015625,0.0,0.0,0.0,0.015625,0.0625,0.015625,0.0,0.015625,0.0,0.0,0.015625,0.0,0.0,0.078125,0.0,0.015625,0.015625,0.015625,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.015625,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,1,Coffee Shop,Hotel,Pizza Place,Café,Mexican Restaurant,Bar,Donut Shop,Cocktail Bar,Mediterranean Restaurant,Convenience Store


# Results
The results are listed as following:

1. The most common venues located within Long Island City are Coffee Shops, Hotels, Pizza places, Cafes, Mexican Restaurants, Bars, Donut Shops, Cocktail Bars, Mediterranean Restaurants, and Convenience stores. 

2. The most common venues located within Harbourfront East are Coffee Shops, Hotels, Aquariums, Pizza places, Cafes, Restaurants (general), Breweries, Scenic lookouts, Italian Restaurants, and Sports Bars.

3. Long Island City and Harbourfront East appear to be similar in that they both have a similar concentration of Coffee shops, hotels, Pizza places, and Cafes.



# Discussion

It would appear that Long Island City and Harbourfront East are not too dissimilar compared to the venue data we analyzed. The largest difference noted was the prevalence of aquariums and scenic lookouts located within Harbourfront East. These locations by their nature may be very difficult to repurpose or otherwise utilize for retail or hotel space. Also, there were a larger number of unclassified restaurants listed within Harbourfront East. It may be work further investigation to determine what types of restaurants these are and what their varieties may be.  

# Conclusion
In conclusion it does appear that Harbourfront East and Long Island City are very similar, but with some unique differences in regard to the concentration of aquariums and scenic lookouts in Harbourfront East. It would be worth further exploration to determine if these venues provide inflexible real estate constraints when considering the placement of a large corporate headquarters within the same vicinity. Also, it would be worth further investigation to classify the general restaurants listed within Harbourfront East to better determine the diversity of cuisine in the area. If a real estate developer were looking to emulate similar business found in Long Island City, they may consider investigating whether opening a convenience store or a donut/pastry shop would be worthwhile.  