### Applied Data Science Capstone - Week 4
### The Battle of Neighborhoods

# INTRODUCTION
## Since the start of the pandemic, Asians have been the target of hate crimes.  NBC News reported that Anti-Asian hate crimes increased by nearly 150% in 2020, mostly in New York and Los Angeles.
### Let us study the neighborhoods of Manhattan and understand where Asian restaurants are located.  We will recommend increase in police visibility and awareness on community and personal protections.

Let's first download all the dependencies that we will need.

# Import Libraries

In [13]:
## Import Necessary Libraries
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes 
#!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes 
!pip install folium==0.5.0
import folium # map rendering library

print('Libraries imported.')

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Libraries imported.


# DATA 
### Download and Explore Dataset

In [14]:
!wget -q -O 'newyork_data.json' https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json
print('Data downloaded!')

Data downloaded!


In [15]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
print('Data loaded')

Data loaded


In [16]:
newyork_data

{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

In [17]:
neighborhoods_data = newyork_data['features']

Let's take a look at the first item in this list.

In [18]:
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

Tranform the data into a pandas dataframe.

In [19]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


In [20]:
# then let's loop through the data and fill the dataframe one row at a time.
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)


Let's examine the resulting dataframe.

In [21]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


In [22]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


**Use geopy library to get the latitude and longitude values of New York City.**

In [23]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


**Create a map of New York with neighborhoods superimposed on top.**

In [24]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

**Let's segment and cluster only the neighborhoods in Manhattan**

In [25]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


In [26]:
print('There are {} uniques neighborhoods.'.format(len(manhattan_data['Neighborhood'].unique())))

There are 40 uniques neighborhoods.


**Let's get the geographical coordinates of Manhattan.**

In [27]:
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7896239, -73.9598939.


**Let's visualize Manhattan and the neighborhoods in it.**

In [28]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

**Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.**

### Define Foursquare Credentials and Version

In [38]:
# Define Foursquare Credentials and Version

CLIENT_ID = 'B2RZMLQG25MP5VOYJH1WCXJSVYEXNBYF231IIUVXBQPPDTKO' # your Foursquare ID
CLIENT_SECRET = 'UT45FSH0HYMD3IAG2R2TDV3ICQIIGTSTD4AZW3FEUOAC5U15' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)


Your credentails:
CLIENT_ID: B2RZMLQG25MP5VOYJH1WCXJSVYEXNBYF231IIUVXBQPPDTKO
CLIENT_SECRET:UT45FSH0HYMD3IAG2R2TDV3ICQIIGTSTD4AZW3FEUOAC5U15


###  Let's get the nearby venues using FourSquare data.

## Explore the Manhattan Venues

In [39]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [40]:
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


**Let's check the size of the resulting dataframe**

In [41]:
print(manhattan_venues.shape)
manhattan_venues.head()

(3225, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio
1,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
2,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
3,Marble Hill,40.876551,-73.91066,Astral Fitness & Wellness Center,40.876705,-73.906372,Gym
4,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop


**Let's check how many venues were returned for each neighborhood**

In [42]:
manhattan_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Battery Park City,82,82,82,82,82,82
Carnegie Hill,90,90,90,90,90,90
Central Harlem,48,48,48,48,48,48
Chelsea,100,100,100,100,100,100
Chinatown,100,100,100,100,100,100
Civic Center,100,100,100,100,100,100
Clinton,100,100,100,100,100,100
East Harlem,39,39,39,39,39,39
East Village,100,100,100,100,100,100
Financial District,100,100,100,100,100,100


**Let's find out how many unique categories can be curated from all the returned venues**

In [43]:
print('There are {} uniques categories.'.format(len(manhattan_venues['Venue Category'].unique())))

There are 329 uniques categories.


**Total Count per Venue Category**

In [44]:
manhattan_venues.groupby('Venue Category').count()

Unnamed: 0_level_0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Accessories Store,4,4,4,4,4,4
Adult Boutique,3,3,3,3,3,3
Afghan Restaurant,1,1,1,1,1,1
African Restaurant,2,2,2,2,2,2
American Restaurant,74,74,74,74,74,74
Antique Shop,1,1,1,1,1,1
Argentinian Restaurant,5,5,5,5,5,5
Art Gallery,27,27,27,27,27,27
Art Museum,3,3,3,3,3,3
Arts & Crafts Store,4,4,4,4,4,4


**Let's find all the restaurants in Manhattan**

In [45]:
manhattan_resto = manhattan_venues[manhattan_venues['Venue Category'].str.contains("Restaurant")].reset_index(drop=True)
manhattan_resto

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Land & Sea Restaurant,40.877885,-73.905873,Seafood Restaurant
1,Marble Hill,40.876551,-73.91066,Grill 26 at TCR,40.878802,-73.915672,American Restaurant
2,Chinatown,40.715618,-73.994279,Spicy Village,40.71701,-73.99353,Chinese Restaurant
3,Chinatown,40.715618,-73.994279,Kiki's,40.714476,-73.992036,Greek Restaurant
4,Chinatown,40.715618,-73.994279,Wah Fung Number 1 Fast Food 華豐快餐店,40.717278,-73.994177,Chinese Restaurant
5,Chinatown,40.715618,-73.994279,Xi'an Famous Foods,40.715232,-73.997263,Chinese Restaurant
6,Chinatown,40.715618,-73.994279,Forgtmenot,40.714459,-73.991546,New American Restaurant
7,Chinatown,40.715618,-73.994279,Ling Kee Malaysian Beef Jerky,40.714713,-73.991538,Asian Restaurant
8,Chinatown,40.715618,-73.994279,Dimes,40.71483,-73.991719,American Restaurant
9,Chinatown,40.715618,-73.994279,Cervo's,40.714763,-73.991455,Spanish Restaurant


In [46]:
manhattan_resto_unique = manhattan_resto['Venue Category'].unique()
print(sorted(manhattan_resto_unique))

['Afghan Restaurant', 'African Restaurant', 'American Restaurant', 'Argentinian Restaurant', 'Asian Restaurant', 'Australian Restaurant', 'Austrian Restaurant', 'Brazilian Restaurant', 'Cajun / Creole Restaurant', 'Cantonese Restaurant', 'Caribbean Restaurant', 'Chinese Restaurant', 'Cuban Restaurant', 'Czech Restaurant', 'Dim Sum Restaurant', 'Dumpling Restaurant', 'Eastern European Restaurant', 'Empanada Restaurant', 'English Restaurant', 'Ethiopian Restaurant', 'Falafel Restaurant', 'Fast Food Restaurant', 'Filipino Restaurant', 'French Restaurant', 'German Restaurant', 'Greek Restaurant', 'Hawaiian Restaurant', 'Himalayan Restaurant', 'Hotpot Restaurant', 'Indian Restaurant', 'Israeli Restaurant', 'Italian Restaurant', 'Japanese Curry Restaurant', 'Japanese Restaurant', 'Jewish Restaurant', 'Kebab Restaurant', 'Korean Restaurant', 'Kosher Restaurant', 'Latin American Restaurant', 'Lebanese Restaurant', 'Malay Restaurant', 'Mediterranean Restaurant', 'Mexican Restaurant', 'Middle Ea

**Let's get all the Asian Restaurants**

In [47]:
manhattan_resto_asian = manhattan_resto[manhattan_resto['Venue Category'].str.contains('Asian|Cantonese|Chinese|Dim|Dumpling|Filipino|Japanese|Korean|Malay|Shanghai|Sushi|Szechuan|Taiwanese|Thai|Udon|Vietnamese')].reset_index(drop=True)
manhattan_resto_asian

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Chinatown,40.715618,-73.994279,Spicy Village,40.71701,-73.99353,Chinese Restaurant
1,Chinatown,40.715618,-73.994279,Wah Fung Number 1 Fast Food 華豐快餐店,40.717278,-73.994177,Chinese Restaurant
2,Chinatown,40.715618,-73.994279,Xi'an Famous Foods,40.715232,-73.997263,Chinese Restaurant
3,Chinatown,40.715618,-73.994279,Ling Kee Malaysian Beef Jerky,40.714713,-73.991538,Asian Restaurant
4,Chinatown,40.715618,-73.994279,Wayla,40.718291,-73.992584,Thai Restaurant
5,Chinatown,40.715618,-73.994279,Joe's Shanghai 鹿嗚春,40.715661,-73.996693,Shanghai Restaurant
6,Chinatown,40.715618,-73.994279,Hwa Yuan,40.713618,-73.995978,Chinese Restaurant
7,Chinatown,40.715618,-73.994279,Simple,40.718145,-73.991988,Asian Restaurant
8,Chinatown,40.715618,-73.994279,Yi Ji Shi Mo Noodle Corp,40.718254,-73.99593,Chinese Restaurant
9,Chinatown,40.715618,-73.994279,Phở Grand,40.717824,-73.992801,Vietnamese Restaurant


**Let's count the number of Asian restaurants and display the neighborhoods with the most Asian restaurants**

In [57]:
top_count = manhattan_resto_asian.groupby('Neighborhood').count()
top_count.sort_values(by='Venue Category', ascending=False)

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Chinatown,23,23,23,23,23,23
Midtown South,21,21,21,21,21,21
Murray Hill,13,13,13,13,13,13
Yorkville,12,12,12,12,12,12
Turtle Bay,12,12,12,12,12,12
East Village,12,12,12,12,12,12
Greenwich Village,10,10,10,10,10,10
Tudor City,10,10,10,10,10,10
Noho,10,10,10,10,10,10
Little Italy,10,10,10,10,10,10


In [61]:
print('There are {} uniques neighborhoods.'.format(len(manhattan_resto_asian['Neighborhood'].unique())))

There are 37 uniques neighborhoods.


## Analyze Each Neighborhood

**Use One Hot Encoding**

In [62]:
# one hot encoding
manhattan_onehot = pd.get_dummies(manhattan_resto_asian[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = manhattan_resto_asian['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

manhattan_onehot.head()

Unnamed: 0,Neighborhood,Asian Restaurant,Cantonese Restaurant,Chinese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Filipino Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Korean Restaurant,Malay Restaurant,Shanghai Restaurant,Sushi Restaurant,Szechuan Restaurant,Taiwanese Restaurant,Thai Restaurant,Udon Restaurant,Vietnamese Restaurant
0,Chinatown,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Chinatown,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Chinatown,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Chinatown,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Chinatown,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0


In [63]:
manhattan_onehot.shape

(249, 18)

## Analyze each neighbourhood by grouping the rows by neighbourhood and taking the mean of the frequency of occurrence of each venue category

In [64]:
manhattan_asian_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()
manhattan_asian_grouped

Unnamed: 0,Neighborhood,Asian Restaurant,Cantonese Restaurant,Chinese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Filipino Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Korean Restaurant,Malay Restaurant,Shanghai Restaurant,Sushi Restaurant,Szechuan Restaurant,Taiwanese Restaurant,Thai Restaurant,Udon Restaurant,Vietnamese Restaurant
0,Battery Park City,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Carnegie Hill,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.2
2,Central Harlem,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Chelsea,0.142857,0.0,0.142857,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.285714,0.0,0.0,0.285714,0.0,0.0
4,Chinatown,0.086957,0.043478,0.434783,0.043478,0.086957,0.0,0.0,0.0,0.0,0.086957,0.043478,0.0,0.0,0.043478,0.043478,0.0,0.086957
5,Civic Center,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0
6,Clinton,0.0,0.0,0.25,0.25,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0
7,East Harlem,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.75,0.0,0.0
8,East Village,0.0,0.0,0.083333,0.0,0.0,0.083333,0.0,0.166667,0.25,0.0,0.0,0.083333,0.0,0.083333,0.0,0.0,0.25
9,Financial District,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [65]:
manhattan_asian_grouped.shape

(37, 18)

## Top Venues per Neighborhood

In [66]:
num_top_venues = 5

for hood in manhattan_asian_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = manhattan_asian_grouped[manhattan_asian_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Battery Park City----
                 venue  freq
0   Chinese Restaurant   0.5
1  Japanese Restaurant   0.5
2     Asian Restaurant   0.0
3  Shanghai Restaurant   0.0
4      Udon Restaurant   0.0


----Carnegie Hill----
                   venue  freq
0       Sushi Restaurant   0.4
1  Vietnamese Restaurant   0.2
2     Chinese Restaurant   0.2
3    Japanese Restaurant   0.2
4    Shanghai Restaurant   0.0


----Central Harlem----
                venue  freq
0  Chinese Restaurant   1.0
1    Asian Restaurant   0.0
2    Malay Restaurant   0.0
3     Udon Restaurant   0.0
4     Thai Restaurant   0.0


----Chelsea----
                 venue  freq
0      Thai Restaurant  0.29
1     Sushi Restaurant  0.29
2     Asian Restaurant  0.14
3   Chinese Restaurant  0.14
4  Japanese Restaurant  0.14


----Chinatown----
                   venue  freq
0     Chinese Restaurant  0.43
1       Asian Restaurant  0.09
2    Dumpling Restaurant  0.09
3       Malay Restaurant  0.09
4  Vietnamese Restaurant  0.09

In [67]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [68]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_asian_grouped['Neighborhood']

for ind in np.arange(manhattan_asian_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_asian_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Japanese Restaurant,Chinese Restaurant,Vietnamese Restaurant,Cantonese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Filipino Restaurant,Japanese Curry Restaurant,Korean Restaurant,Udon Restaurant
1,Carnegie Hill,Sushi Restaurant,Vietnamese Restaurant,Japanese Restaurant,Chinese Restaurant,Cantonese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Filipino Restaurant,Japanese Curry Restaurant,Korean Restaurant
2,Central Harlem,Chinese Restaurant,Vietnamese Restaurant,Japanese Restaurant,Cantonese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Filipino Restaurant,Japanese Curry Restaurant,Korean Restaurant,Udon Restaurant
3,Chelsea,Thai Restaurant,Sushi Restaurant,Asian Restaurant,Chinese Restaurant,Japanese Restaurant,Japanese Curry Restaurant,Cantonese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Filipino Restaurant
4,Chinatown,Chinese Restaurant,Vietnamese Restaurant,Malay Restaurant,Dumpling Restaurant,Asian Restaurant,Shanghai Restaurant,Taiwanese Restaurant,Dim Sum Restaurant,Thai Restaurant,Cantonese Restaurant


## Use silhouette_score to get the best number of cluster for k-means clustering

In [69]:
from sklearn.metrics import silhouette_score

In [70]:
manhattan_grouped_clustering = manhattan_asian_grouped.drop('Neighborhood', 1)

for n_cluster in range(2, 10):
    kmeans = KMeans(n_clusters=n_cluster).fit(manhattan_grouped_clustering)
    label = kmeans.labels_
    sil_coeff = silhouette_score(manhattan_grouped_clustering, label, metric='euclidean')
    print("For n_clusters={}, The Silhouette Coefficient is {}".format(n_cluster, sil_coeff))

For n_clusters=2, The Silhouette Coefficient is 0.2543527395874119
For n_clusters=3, The Silhouette Coefficient is 0.24676980421284506
For n_clusters=4, The Silhouette Coefficient is 0.2794820096433795
For n_clusters=5, The Silhouette Coefficient is 0.25230624227598264
For n_clusters=6, The Silhouette Coefficient is 0.22996454728915713
For n_clusters=7, The Silhouette Coefficient is 0.22107346980048131
For n_clusters=8, The Silhouette Coefficient is 0.22940624159309886
For n_clusters=9, The Silhouette Coefficient is 0.22404139094553852


## Cluster Neighborhoods

Run k-means to cluster the neighborhoods into 4 clusters

In [71]:
kclusters = 4

manhattan_grouped_clustering = manhattan_asian_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 3, 1, 3, 1, 2, 2, 0, 2, 3], dtype=int32)

In [75]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

#manhattan_merged = manhattan_data
manhattan_merged = neighborhoods_venues_sorted

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
##manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
manhattan_merged = manhattan_merged.join(manhattan_data.set_index('Neighborhood'), on='Neighborhood')

manhattan_merged.head() # check the last columns!

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Borough,Latitude,Longitude
0,0,Battery Park City,Japanese Restaurant,Chinese Restaurant,Vietnamese Restaurant,Cantonese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Filipino Restaurant,Japanese Curry Restaurant,Korean Restaurant,Udon Restaurant,Manhattan,40.711932,-74.016869
1,0,Carnegie Hill,Sushi Restaurant,Vietnamese Restaurant,Japanese Restaurant,Chinese Restaurant,Cantonese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Filipino Restaurant,Japanese Curry Restaurant,Korean Restaurant,Manhattan,40.782683,-73.953256
2,1,Central Harlem,Chinese Restaurant,Vietnamese Restaurant,Japanese Restaurant,Cantonese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Filipino Restaurant,Japanese Curry Restaurant,Korean Restaurant,Udon Restaurant,Manhattan,40.815976,-73.943211
3,0,Chelsea,Thai Restaurant,Sushi Restaurant,Asian Restaurant,Chinese Restaurant,Japanese Restaurant,Japanese Curry Restaurant,Cantonese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Filipino Restaurant,Manhattan,40.744035,-74.003116
4,0,Chinatown,Chinese Restaurant,Vietnamese Restaurant,Malay Restaurant,Dumpling Restaurant,Asian Restaurant,Shanghai Restaurant,Taiwanese Restaurant,Dim Sum Restaurant,Thai Restaurant,Cantonese Restaurant,Manhattan,40.715618,-73.994279


## Let's visualize the clusters

In [76]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examine Clusters

Let's examine each cluster and determine the discriminating venue categories that distinguish each cluster.

In [83]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(2, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Borough,Latitude,Longitude
0,Battery Park City,Japanese Restaurant,Chinese Restaurant,Vietnamese Restaurant,Cantonese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Filipino Restaurant,Japanese Curry Restaurant,Korean Restaurant,Udon Restaurant,Manhattan,40.711932,-74.016869
1,Carnegie Hill,Sushi Restaurant,Vietnamese Restaurant,Japanese Restaurant,Chinese Restaurant,Cantonese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Filipino Restaurant,Japanese Curry Restaurant,Korean Restaurant,Manhattan,40.782683,-73.953256
3,Chelsea,Thai Restaurant,Sushi Restaurant,Asian Restaurant,Chinese Restaurant,Japanese Restaurant,Japanese Curry Restaurant,Cantonese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Filipino Restaurant,Manhattan,40.744035,-74.003116
4,Chinatown,Chinese Restaurant,Vietnamese Restaurant,Malay Restaurant,Dumpling Restaurant,Asian Restaurant,Shanghai Restaurant,Taiwanese Restaurant,Dim Sum Restaurant,Thai Restaurant,Cantonese Restaurant,Manhattan,40.715618,-73.994279
6,Clinton,Korean Restaurant,Thai Restaurant,Chinese Restaurant,Dim Sum Restaurant,Japanese Restaurant,Cantonese Restaurant,Dumpling Restaurant,Filipino Restaurant,Japanese Curry Restaurant,Vietnamese Restaurant,Manhattan,40.759101,-73.996119
9,Financial District,Japanese Restaurant,Japanese Curry Restaurant,Vietnamese Restaurant,Cantonese Restaurant,Chinese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Filipino Restaurant,Korean Restaurant,Udon Restaurant,Manhattan,40.707107,-74.010665
10,Flatiron,Japanese Restaurant,Korean Restaurant,Thai Restaurant,Sushi Restaurant,Japanese Curry Restaurant,Cantonese Restaurant,Chinese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Filipino Restaurant,Manhattan,40.739673,-73.990947
11,Gramercy,Vietnamese Restaurant,Thai Restaurant,Sushi Restaurant,Dim Sum Restaurant,Filipino Restaurant,Japanese Curry Restaurant,Cantonese Restaurant,Chinese Restaurant,Dumpling Restaurant,Korean Restaurant,Manhattan,40.73721,-73.981376
12,Greenwich Village,Sushi Restaurant,Vietnamese Restaurant,Chinese Restaurant,Udon Restaurant,Japanese Curry Restaurant,Cantonese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Filipino Restaurant,Korean Restaurant,Manhattan,40.726933,-73.999914
13,Hamilton Heights,Chinese Restaurant,Sushi Restaurant,Japanese Restaurant,Vietnamese Restaurant,Cantonese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Filipino Restaurant,Japanese Curry Restaurant,Korean Restaurant,Manhattan,40.823604,-73.949688


In [84]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(2, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Borough,Latitude,Longitude
2,Central Harlem,Chinese Restaurant,Vietnamese Restaurant,Japanese Restaurant,Cantonese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Filipino Restaurant,Japanese Curry Restaurant,Korean Restaurant,Udon Restaurant,Manhattan,40.815976,-73.943211
15,Inwood,Chinese Restaurant,Vietnamese Restaurant,Japanese Restaurant,Cantonese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Filipino Restaurant,Japanese Curry Restaurant,Korean Restaurant,Udon Restaurant,Manhattan,40.867684,-73.92121
17,Lincoln Square,Chinese Restaurant,Vietnamese Restaurant,Japanese Restaurant,Cantonese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Filipino Restaurant,Japanese Curry Restaurant,Korean Restaurant,Udon Restaurant,Manhattan,40.773529,-73.985338


In [85]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2, manhattan_merged.columns[[1] + list(range(2, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Borough,Latitude,Longitude
5,Civic Center,Korean Restaurant,Sushi Restaurant,Asian Restaurant,Taiwanese Restaurant,Szechuan Restaurant,Thai Restaurant,Shanghai Restaurant,Malay Restaurant,Udon Restaurant,Japanese Restaurant,Manhattan,40.715229,-74.005415
8,East Village,Vietnamese Restaurant,Korean Restaurant,Japanese Restaurant,Taiwanese Restaurant,Filipino Restaurant,Chinese Restaurant,Thai Restaurant,Szechuan Restaurant,Sushi Restaurant,Shanghai Restaurant,Manhattan,40.727847,-73.982226
20,Manhattan Valley,Thai Restaurant,Vietnamese Restaurant,Szechuan Restaurant,Korean Restaurant,Taiwanese Restaurant,Sushi Restaurant,Shanghai Restaurant,Malay Restaurant,Udon Restaurant,Japanese Restaurant,Manhattan,40.797307,-73.964286
23,Midtown South,Korean Restaurant,Japanese Restaurant,Cantonese Restaurant,Chinese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Filipino Restaurant,Japanese Curry Restaurant,Vietnamese Restaurant,Udon Restaurant,Manhattan,40.74851,-73.988713
29,Tribeca,Korean Restaurant,Sushi Restaurant,Japanese Restaurant,Cantonese Restaurant,Chinese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Filipino Restaurant,Japanese Curry Restaurant,Vietnamese Restaurant,Manhattan,40.721522,-74.010683


In [86]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 3, manhattan_merged.columns[[1] + list(range(2, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Borough,Latitude,Longitude
7,East Harlem,Thai Restaurant,Vietnamese Restaurant,Japanese Restaurant,Cantonese Restaurant,Chinese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Filipino Restaurant,Japanese Curry Restaurant,Korean Restaurant,Manhattan,40.792249,-73.944182
14,Hudson Yards,Thai Restaurant,Vietnamese Restaurant,Japanese Restaurant,Cantonese Restaurant,Chinese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Filipino Restaurant,Japanese Curry Restaurant,Korean Restaurant,Manhattan,40.756658,-74.000111
