# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

New York City is most densely populated major city in the United States. With an estimated population of 8.3 million in 2019. It is composed of five boroughs, Brooklyn, Queens, Manhattan, the Bronx, and Staten Island. In this project we will find an optimal location for stakeholders who are intrested in opening a Filipino Restaurant in one of the Boroughs of New York City. This project will take into account several aspects including restaurant density and diversity within the area. It will also include proximity to city centers. 

This project will incorporate several data science techniques to find the optimal location based on several criterias. It will also list several other choices that each of which will have their advantages and disadvantages compared to the best possible location.

## Data <a name="data"></a>

Selection Criteria for our Business Problem:
* Restaurant Density within an area
* Restaurant Diversity within an area
* Distance from city center

Data sources that will be needed for the project:
* Neighbourhoods data from wikipidea or other webpages by web scraping using **BeautifulSoup**
* Latitude and longitude of neighbourhoods using **Google Maps Geodcoding API**
* Number and type of restaurant in an area using **Foursquare API**


### Libraries

Let's First import and install all necessary libraries that will be needed in this project

In [1]:
import numpy as np
import pandas as pd
import urllib.request
import matplotlib.cm as cm
import matplotlib.colors as colors
import json
import geopy.distance
from sklearn.cluster import KMeans
from bs4 import BeautifulSoup
!conda install -c conda-forge folium=0.5.0 --yes
import folium
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ------------------------------------------------------------
                       

Next we create the Latitude and Longitude coordinates of our City Center, New York City, NY.

In [2]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


Next we will download and explore data sets for New York City, data will include boroughs, neighborhoods, Longitude and Latitude. We will also add the distance of each neighborhood from the city center.

In [3]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

neighborhoods_data = newyork_data['features']
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude', 'Distance']
neighborhoods = pd.DataFrame(columns=column_names)

for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    coords_1 = (40.7127281, -74.0060152)
    coords_2 = (neighborhood_lat, neighborhood_lon)
    miles = geopy.distance.geodesic(coords_1, coords_2).miles
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon,'Distance': miles}, ignore_index=True)
    
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Distance
0,Bronx,Wakefield,40.894705,-73.847201,15.067261
1,Bronx,Co-op City,40.874294,-73.829939,14.475958
2,Bronx,Eastchester,40.887556,-73.827806,15.2596
3,Bronx,Fieldston,40.895437,-73.905643,13.661937
4,Bronx,Riverdale,40.890834,-73.912585,13.230378


Let's focus our project on covering an area within a 10 mile radius from the City Center.

In [4]:
neighborhoods1=neighborhoods[neighborhoods.Distance <= 10].reset_index(drop=True)
neighborhoods1.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Distance
0,Bronx,High Bridge,40.836623,-73.926102,9.521564
1,Bronx,Melrose,40.819754,-73.909422,8.956612
2,Bronx,Mott Haven,40.806239,-73.9161,7.993237
3,Bronx,Port Morris,40.801664,-73.913221,7.833702
4,Bronx,Longwood,40.815099,-73.895788,9.129218


For now, lets visualize the remaining neighborhoods within a 10 mile radius of the city center.

In [5]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods1['Latitude'], neighborhoods1['Longitude'], neighborhoods1['Borough'], neighborhoods1['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

Let's start exploring each neighborhood using foursquare API to get information on every establishment in each neighborhood.

In [6]:
#define Foursquare Credentials and Version
CLIENT_ID = 'BG3H1TE4TIMDI2WM3QFIR1PCWOMEMLWJQ0P4PI0FAVC3GH0Z' # your Foursquare ID
CLIENT_SECRET = 'VYOMTT4DL5OSHO5ENHWVORWHJPJLZ2JDW2FYF2KP5P05RYYW' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: BG3H1TE4TIMDI2WM3QFIR1PCWOMEMLWJQ0P4PI0FAVC3GH0Z
CLIENT_SECRET:VYOMTT4DL5OSHO5ENHWVORWHJPJLZ2JDW2FYF2KP5P05RYYW


In [7]:
import requests
radius = 500
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(neighborhoods1['Latitude'], neighborhoods1['Longitude'], neighborhoods1['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [8]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'Venue Name', 'Venue Latitude', 'Venue Longitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(7521, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,Venue Name,Venue Latitude,Venue Longitude,VenueCategory
0,High Bridge,40.836623,-73.926102,Fine Fare,40.835736,-73.927793,Market
1,High Bridge,40.836623,-73.926102,Rite Aid,40.835608,-73.928108,Pharmacy
2,High Bridge,40.836623,-73.926102,Mullaly Park,40.83292,-73.924331,Park
3,High Bridge,40.836623,-73.926102,Retro Fitness,40.836646,-73.922683,Gym
4,High Bridge,40.836623,-73.926102,CVS pharmacy,40.835307,-73.920722,Pharmacy


At this point we have data on every establishment in all of the neighborhoods within a 10 mile radius. Now let us Filter venue category to only restaurants

In [9]:
rest=venues_df[venues_df.VenueCategory.str.contains('Restaurant')].reset_index(drop=True)

print(rest.shape)
rest.head()

(1903, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,Venue Name,Venue Latitude,Venue Longitude,VenueCategory
0,High Bridge,40.836623,-73.926102,Happy Garden,40.836523,-73.927094,Asian Restaurant
1,High Bridge,40.836623,-73.926102,wah yong,40.838573,-73.924693,Chinese Restaurant
2,High Bridge,40.836623,-73.926102,Justine Restaurant,40.835502,-73.921439,Latin American Restaurant
3,High Bridge,40.836623,-73.926102,Dong King,40.833692,-73.927466,Chinese Restaurant
4,High Bridge,40.836623,-73.926102,El Tina,40.838686,-73.92967,Seafood Restaurant


Now lets Filter venue category to only filipino restaurants

In [10]:
fil_rest=rest[rest.VenueCategory == 'Filipino Restaurant'].reset_index(drop=True)
print(fil_rest.shape)
fil_rest

(12, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,Venue Name,Venue Latitude,Venue Longitude,VenueCategory
0,Carroll Gardens,40.68054,-73.994654,Fob Restaurant,40.68266,-73.993166,Filipino Restaurant
1,East Village,40.727847,-73.982226,Mama Fina's,40.728222,-73.982022,Filipino Restaurant
2,East Village,40.727847,-73.982226,Jeepney Filipino Gastropub,40.730307,-73.983617,Filipino Restaurant
3,Lower East Side,40.717807,-73.98089,Pig and Khao,40.719275,-73.984891,Filipino Restaurant
4,Gramercy,40.73721,-73.981376,Grill 21,40.735743,-73.97986,Filipino Restaurant
5,Woodside,40.746349,-73.901842,House of Inasal,40.745986,-73.898449,Filipino Restaurant
6,Woodside,40.746349,-73.901842,Krystal's Cafe,40.746233,-73.896142,Filipino Restaurant
7,Woodside,40.746349,-73.901842,Rosario's Ihawan,40.74607,-73.896083,Filipino Restaurant
8,Woodside,40.746349,-73.901842,Baby's Grill & Restaurant,40.746187,-73.896591,Filipino Restaurant
9,Rosebank,40.615305,-74.069805,Phil-Am Kusina,40.612426,-74.071455,Filipino Restaurant


We know now that out of 1903 restaurants in our database, only 12 are Filipino Restaurants. Let us further analyze this piece of data.

In [11]:
print('Total number of restaurants:', len(rest))
print('Total number of Filipino restaurants:', len(fil_rest))
print('Percentage of Filipino restaurants: {:.2f}%'.format(len(fil_rest) / len(rest) * 100))

Total number of restaurants: 1903
Total number of Filipino restaurants: 12
Percentage of Filipino restaurants: 0.63%


Visualize all restaurants and in our project and change the color of Filipino restaurants to red to outline.

In [12]:
map_res = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, lng, label in zip(rest['Venue Latitude'],rest['Venue Longitude'], rest['Venue Name']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_res)  
    
for lat, lng, label in zip(fil_rest['Venue Latitude'],fil_rest['Venue Longitude'], fil_rest['Venue Name']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#ff0000',
        fill_opacity=0.7,
        parse_html=False).add_to(map_res)
    
map_res

## Methodology <a name="methodology"></a>

This project will focus on finding the best area to establish a Filipino restaurant in New York City with a low number of Filipino restaurants and is within a radius of 10 miles within the city center.

In first step we have collected the data of every establishment within the 10 miles of New York City center and extracted its' location and type and name. We have also identified the number of restaurants within that radius and focused our analysis on Filipino restaurants with the use of Foursquare.

Second step in our analysis will be calculation and exploration of 'restaurant density' across different neighborhoods of New York City. We will determine the frequency of each type of restaurant in the neighborhood and rank them as to which is more common in the area.

In third and final step we will be using k-means clustering on those neighborhoods we have identified and identify the most promising areas and within those created clusters. We will also present a map of all such locations but also create clusters (using k-means clustering) of those locations to identify clustered neighborhoods to better identify.

## Analysis <a name="analysis"></a>

Let start our analysis by creating a new data frame analyze the frequency of each type of restaurant.

In [13]:
newyork_onehot=pd.get_dummies(rest[['VenueCategory']], prefix="", prefix_sep="")
newyork_onehot['Neighborhood']= rest['Neighborhood']
fixed_columns=[newyork_onehot.columns[-1]] + list(newyork_onehot.columns[:-1])
newyork_onehot=newyork_onehot[fixed_columns]
newyork_grouped = newyork_onehot.groupby('Neighborhood').mean().reset_index()
newyork_grouped.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,Austrian Restaurant,Brazilian Restaurant,...,Swiss Restaurant,Szechuan Restaurant,Taiwanese Restaurant,Tapas Restaurant,Thai Restaurant,Tibetan Restaurant,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,Arlington,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Arrochar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Astoria,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.026316,...,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.026316,0.0
3,Astoria Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bath Beach,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0


Let us identify the top 5 most common Restaurant of each neighborhood

In [14]:
num_top_venues = 5

for hood in newyork_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = newyork_grouped[newyork_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Arlington----
                     venue  freq
0      American Restaurant   1.0
1        Afghan Restaurant   0.0
2  New American Restaurant   0.0
3               Restaurant   0.0
4         Ramen Restaurant   0.0


----Arrochar----
                       venue  freq
0         Italian Restaurant   0.4
1          Polish Restaurant   0.2
2   Mediterranean Restaurant   0.2
3  Middle Eastern Restaurant   0.2
4          Afghan Restaurant   0.0


----Astoria----
                       venue  freq
0  Middle Eastern Restaurant  0.16
1   Mediterranean Restaurant  0.11
2           Greek Restaurant  0.11
3         Seafood Restaurant  0.08
4          Indian Restaurant  0.08


----Astoria Heights----
                     venue  freq
0       Italian Restaurant   0.5
1       Chinese Restaurant   0.5
2        Afghan Restaurant   0.0
3  North Indian Restaurant   0.0
4               Restaurant   0.0


----Bath Beach----
                  venue  freq
0    Chinese Restaurant  0.14
1    Italian Restauran

                  venue  freq
0    Italian Restaurant  0.43
1      Sushi Restaurant  0.14
2  Fast Food Restaurant  0.14
3    Chinese Restaurant  0.14
4            Restaurant  0.14


----Downtown----
                       venue  freq
0         Chinese Restaurant  0.12
1          French Restaurant  0.08
2  Middle Eastern Restaurant  0.08
3       Pakistani Restaurant  0.04
4        Hawaiian Restaurant  0.04


----Dumbo----
                      venue  freq
0       American Restaurant  0.29
1        Italian Restaurant  0.29
2        Seafood Restaurant  0.14
3  Mediterranean Restaurant  0.14
4        Mexican Restaurant  0.14


----East Elmhurst----
                     venue  freq
0      American Restaurant   1.0
1        Afghan Restaurant   0.0
2  New American Restaurant   0.0
3               Restaurant   0.0
4         Ramen Restaurant   0.0


----East Flatbush----
                  venue  freq
0  Caribbean Restaurant   0.5
1    Chinese Restaurant   0.5
2     Afghan Restaurant   0.0
3   R

                     venue  freq
0          Thai Restaurant  0.43
1               Restaurant  0.29
2      Japanese Restaurant  0.14
3       Mexican Restaurant  0.14
4  New American Restaurant  0.00


----Kew Gardens----
                     venue  freq
0       Chinese Restaurant  0.33
1        Indian Restaurant  0.22
2  New American Restaurant  0.11
3      American Restaurant  0.11
4       Italian Restaurant  0.11


----Kew Gardens Hills----
                       venue  freq
0           Sushi Restaurant  0.50
1                 Restaurant  0.25
2  Middle Eastern Restaurant  0.25
3    New American Restaurant  0.00
4           Ramen Restaurant  0.00


----Lefrak City----
                     venue  freq
0               Restaurant   0.5
1       Mexican Restaurant   0.5
2        Afghan Restaurant   0.0
3  New American Restaurant   0.0
4         Ramen Restaurant   0.0


----Lenox Hill----
                venue  freq
0  Italian Restaurant  0.19
1    Sushi Restaurant  0.15
2  Turkish Restaura

                     venue  freq
0       Mexican Restaurant  0.20
1          Thai Restaurant  0.12
2      American Restaurant  0.08
3  New American Restaurant  0.08
4               Restaurant  0.08


----Prospect Lefferts Gardens----
                  venue  freq
0  Caribbean Restaurant  0.25
1      Sushi Restaurant  0.17
2     Indian Restaurant  0.17
3    Italian Restaurant  0.08
4      Ramen Restaurant  0.08


----Prospect Park South----
                             venue  freq
0             Caribbean Restaurant  0.33
1             Fast Food Restaurant  0.25
2        Latin American Restaurant  0.17
3               Mexican Restaurant  0.17
4  Southern / Soul Food Restaurant  0.08


----Queensboro Hill----
                           venue  freq
0             Chinese Restaurant   0.5
1  Vegetarian / Vegan Restaurant   0.1
2               Asian Restaurant   0.1
3            Shanghai Restaurant   0.1
4            Dumpling Restaurant   0.1


----Queensbridge----
                venue  freq

                       venue  freq
0           Sushi Restaurant   0.2
1         Chinese Restaurant   0.2
2        American Restaurant   0.2
3         Italian Restaurant   0.2
4  Middle Eastern Restaurant   0.2


----Wingate----
                  venue  freq
0  Fast Food Restaurant   1.0
1     Afghan Restaurant   0.0
2     Korean Restaurant   0.0
3            Restaurant   0.0
4      Ramen Restaurant   0.0


----Woodhaven----
                       venue  freq
0  Latin American Restaurant   0.2
1                 Restaurant   0.2
2           Arepa Restaurant   0.2
3            Thai Restaurant   0.2
4         Chinese Restaurant   0.2


----Woodside----
                       venue  freq
0            Thai Restaurant  0.19
1  Latin American Restaurant  0.15
2        Filipino Restaurant  0.15
3        American Restaurant  0.11
4         Chinese Restaurant  0.07


----Yorkville----
                   venue  freq
0     Italian Restaurant  0.21
1       Sushi Restaurant  0.14
2     Mexican Restau

In [15]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Here we have all the neighborhoods within a 10 mile radius and showing their top 5 most common restaureant within their area.

In [16]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Restaurant'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Restaurant'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = newyork_grouped['Neighborhood']

for ind in np.arange(newyork_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(newyork_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant
0,Arlington,American Restaurant,Vietnamese Restaurant,Greek Restaurant,Empanada Restaurant,English Restaurant
1,Arrochar,Italian Restaurant,Polish Restaurant,Mediterranean Restaurant,Middle Eastern Restaurant,Vietnamese Restaurant
2,Astoria,Middle Eastern Restaurant,Greek Restaurant,Mediterranean Restaurant,Indian Restaurant,Seafood Restaurant
3,Astoria Heights,Italian Restaurant,Chinese Restaurant,Vietnamese Restaurant,German Restaurant,Empanada Restaurant
4,Bath Beach,Chinese Restaurant,Italian Restaurant,Fast Food Restaurant,Cantonese Restaurant,Spanish Restaurant


## Clustering

In [17]:
k=5
newyork_grouped_clustering = newyork_grouped.drop(['Neighborhood'], 1)
kmeans = KMeans(n_clusters=k, random_state=0).fit(newyork_grouped_clustering)
kmeans.labels_[0:10]

array([2, 0, 0, 4, 0, 2, 0, 3, 0, 4], dtype=int32)

In [18]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [19]:
newyork_merged = neighborhoods1.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

newyork_merged.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Distance,Cluster Labels,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant
0,Bronx,High Bridge,40.836623,-73.926102,9.521564,4.0,Chinese Restaurant,Seafood Restaurant,Spanish Restaurant,Latin American Restaurant,Asian Restaurant
1,Bronx,Melrose,40.819754,-73.909422,8.956612,2.0,Mexican Restaurant,Vietnamese Restaurant,Greek Restaurant,Empanada Restaurant,English Restaurant
2,Bronx,Mott Haven,40.806239,-73.9161,7.993237,2.0,Spanish Restaurant,Peruvian Restaurant,Latin American Restaurant,Vietnamese Restaurant,Empanada Restaurant
3,Bronx,Port Morris,40.801664,-73.913221,7.833702,2.0,Spanish Restaurant,Latin American Restaurant,Peruvian Restaurant,Chinese Restaurant,Restaurant
4,Bronx,Longwood,40.815099,-73.895788,9.129218,2.0,Latin American Restaurant,Fast Food Restaurant,Vietnamese Restaurant,Greek Restaurant,Empanada Restaurant


### Examine Clusters

Let us examine each cluster and see if Filipino Restaurants appear within the top 5 most common Restaurant in the area.

In [20]:
cluster_0=newyork_merged.loc[newyork_merged['Cluster Labels'] == 0, newyork_merged.columns[[1,2,3] + list(range(5, newyork_merged.shape[1]))]]
cluster_0.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant
5,Hunts Point,40.80973,-73.883315,0.0,Restaurant,Spanish Restaurant,Vietnamese Restaurant,Egyptian Restaurant,Empanada Restaurant
8,Bay Ridge,40.625801,-74.030621,0.0,Italian Restaurant,Chinese Restaurant,American Restaurant,Greek Restaurant,Thai Restaurant
9,Bensonhurst,40.611009,-73.99518,0.0,Italian Restaurant,Sushi Restaurant,Russian Restaurant,Hotpot Restaurant,Shabu-Shabu Restaurant
11,Greenpoint,40.730201,-73.954241,0.0,Sushi Restaurant,Mexican Restaurant,French Restaurant,Restaurant,New American Restaurant
13,Brighton Beach,40.576825,-73.965094,0.0,Restaurant,Russian Restaurant,Eastern European Restaurant,Sushi Restaurant,Mediterranean Restaurant


In [21]:
cluster_1=newyork_merged.loc[newyork_merged['Cluster Labels'] == 1, newyork_merged.columns[[1,2,3] + list(range(5, newyork_merged.shape[1]))]]
cluster_1.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant
18,East Flatbush,40.641718,-73.936103,1.0,Chinese Restaurant,Caribbean Restaurant,Vietnamese Restaurant,Egyptian Restaurant,English Restaurant
35,Starrett City,40.647589,-73.87937,1.0,American Restaurant,Caribbean Restaurant,Vietnamese Restaurant,Greek Restaurant,English Restaurant
36,Canarsie,40.635564,-73.902093,1.0,Thai Restaurant,Asian Restaurant,Caribbean Restaurant,Vietnamese Restaurant,Greek Restaurant
37,Flatlands,40.630446,-73.929113,1.0,Fast Food Restaurant,Caribbean Restaurant,Chinese Restaurant,Vietnamese Restaurant,Egyptian Restaurant
40,Coney Island,40.574293,-73.988683,1.0,Caribbean Restaurant,Vegetarian / Vegan Restaurant,Egyptian Restaurant,English Restaurant,Ethiopian Restaurant


In [22]:
cluster_2=newyork_merged.loc[newyork_merged['Cluster Labels'] == 2, newyork_merged.columns[[1,2,3] + list(range(5, newyork_merged.shape[1]))]]
cluster_2.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant
1,Melrose,40.819754,-73.909422,2.0,Mexican Restaurant,Vietnamese Restaurant,Greek Restaurant,Empanada Restaurant,English Restaurant
2,Mott Haven,40.806239,-73.9161,2.0,Spanish Restaurant,Peruvian Restaurant,Latin American Restaurant,Vietnamese Restaurant,Empanada Restaurant
3,Port Morris,40.801664,-73.913221,2.0,Spanish Restaurant,Latin American Restaurant,Peruvian Restaurant,Chinese Restaurant,Restaurant
4,Longwood,40.815099,-73.895788,2.0,Latin American Restaurant,Fast Food Restaurant,Vietnamese Restaurant,Greek Restaurant,Empanada Restaurant
6,Morrisania,40.823592,-73.901506,2.0,Fast Food Restaurant,Seafood Restaurant,Vietnamese Restaurant,Eastern European Restaurant,Empanada Restaurant


In [23]:
cluster_3=newyork_merged.loc[newyork_merged['Cluster Labels'] == 3, newyork_merged.columns[[1,2,3] + list(range(5, newyork_merged.shape[1]))]]
cluster_3.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant
15,Manhattan Terrace,40.614433,-73.957438,3.0,Japanese Restaurant,Greek Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant
25,Bedford Stuyvesant,40.687232,-73.941785,3.0,Japanese Restaurant,New American Restaurant,Jewish Restaurant,Empanada Restaurant,English Restaurant
113,Lindenwood,40.663918,-73.849638,3.0,Japanese Restaurant,Greek Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant
127,Castleton Corners,40.613336,-74.119181,3.0,Japanese Restaurant,Greek Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant


In [24]:
cluster_4=newyork_merged.loc[newyork_merged['Cluster Labels'] == 4, newyork_merged.columns[[1,2,3] + list(range(5, newyork_merged.shape[1]))]]
cluster_4.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant
0,High Bridge,40.836623,-73.926102,4.0,Chinese Restaurant,Seafood Restaurant,Spanish Restaurant,Latin American Restaurant,Asian Restaurant
7,Concourse,40.834284,-73.915589,4.0,Spanish Restaurant,Italian Restaurant,Chinese Restaurant,Caribbean Restaurant,Vietnamese Restaurant
12,Gravesend,40.59526,-73.973471,4.0,Italian Restaurant,Chinese Restaurant,Vietnamese Restaurant,German Restaurant,Empanada Restaurant
22,Brownsville,40.66395,-73.910235,4.0,Restaurant,Chinese Restaurant,Spanish Restaurant,Vietnamese Restaurant,Egyptian Restaurant
45,Marine Park,40.609748,-73.931344,4.0,Chinese Restaurant,Vietnamese Restaurant,Eastern European Restaurant,Empanada Restaurant,English Restaurant


Let us combine each cluster and visualize it on a map.

In [25]:
map_clusall = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, lng, label in zip(cluster_0['Latitude'],cluster_0['Longitude'], cluster_0['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_clusall)

for lat, lng, label in zip(cluster_1['Latitude'],cluster_1['Longitude'], cluster_1['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#ff0000',
        fill_opacity=0.7,
        parse_html=False).add_to(map_clusall) 
    
for lat, lng, label in zip(cluster_2['Latitude'],cluster_2['Longitude'], cluster_2['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='yellow',
        fill=True,
        fill_color='#ffff00',
        fill_opacity=0.7,
        parse_html=False).add_to(map_clusall)
    
for lat, lng, label in zip(cluster_3['Latitude'],cluster_3['Longitude'], cluster_3['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='#00FF00',
        fill_opacity=0.7,
        parse_html=False).add_to(map_clusall)

for lat, lng, label in zip(cluster_4['Latitude'],cluster_4['Longitude'], cluster_4['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='violet',
        fill=True,
        fill_color='#7f00ff',
        fill_opacity=0.7,
        parse_html=False).add_to(map_clusall) 
    
map_clusall

From here we want to know which Neighborhoods actually have a Filipino Restaurant and how they rank within their top 5 Restaurants in their respective areas.

In [26]:
filter_list = ['Woodside','East Village','Carroll Gardens','Lower East Side','Gramercy','Rosebank','Sunnyside Gardens']
fil_clus=newyork_merged[newyork_merged.Neighborhood.isin(filter_list)]
fil_clus

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Distance,Cluster Labels,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant
28,Brooklyn,Carroll Gardens,40.68054,-73.994654,2.299783,0.0,Italian Restaurant,French Restaurant,Thai Restaurant,Sushi Restaurant,Latin American Restaurant
78,Manhattan,East Village,40.727847,-73.982226,1.627299,0.0,Mexican Restaurant,Ramen Restaurant,Japanese Restaurant,Vietnamese Restaurant,Italian Restaurant
79,Manhattan,Lower East Side,40.717807,-73.98089,1.364899,0.0,Chinese Restaurant,Ramen Restaurant,Latin American Restaurant,Japanese Restaurant,Filipino Restaurant
86,Manhattan,Gramercy,40.73721,-73.981376,2.12762,0.0,Mexican Restaurant,Italian Restaurant,American Restaurant,Thai Restaurant,Sushi Restaurant
90,Queens,Woodside,40.746349,-73.901842,5.940073,2.0,Thai Restaurant,Latin American Restaurant,Filipino Restaurant,American Restaurant,Chinese Restaurant
120,Staten Island,Rosebank,40.615305,-74.069805,7.511623,0.0,Italian Restaurant,Mexican Restaurant,Eastern European Restaurant,Filipino Restaurant,Cajun / Creole Restaurant
164,Queens,Sunnyside Gardens,40.745652,-73.918193,5.1394,0.0,Turkish Restaurant,American Restaurant,Korean Restaurant,French Restaurant,Peruvian Restaurant


Finally let us visualize the neighborhoods that contains Filipino Restaurants within their top 5 most common Restaurants

In [27]:
map_fil = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, lng, label in zip(fil_clus['Latitude'],fil_clus['Longitude'], fil_clus['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_fil)
    
map_fil

## Results and Discussion <a name="results"></a>

Based from our analysis it shows that although there is a great number of restaurants within the 10 mile radius of New York City, there are only a handful of Filipino Restaurants within the vicinity. Our initial analysis shows that the Highest concentration of Filipinos restaurants is in Queens Borough, Woodside Neighborhood. The second highest concentration of Filipino restaurant is in Sunnyside Gardens, Queens and East Village Manhattan.

After initial review of identifying that out of 1903 restaurants and only having 12 Filipino Restaurants, we have established that only 0.63% of the total restaurants are Filipino Restaurants. Although at this point we know that there is a low density against other filipino restaurant. After clustering we have identified 5 cluster, but after further analysis, Filipino Restaurant does not often appear within the top 5 most common restaurants. We furthered our analysis by filtering our database using the neighborhoods that only contain Filipino Restaurants and discovered that Filipino Restaurants are within clusters 0 and 2. This does not imply that those clusters 0 and 2 are actually optimal locations for a new restaurant, the Purpose of this analysis was to only provide info on areas within 10 miles of New York City but this analysis did discover that Filipino Restaurants are not abundant within the area only consisting 0.63% of the total restaurants. 

Listed neighborhoods should therefor be only considered as a starting point of a much more detailed analysis on finding a suitable location for a Filipino Restaurant which has not only no nearby same categorical competition but also other factors taken into account and all other relevant conditions met.

## Conclusion <a name="conclusion"></a>

The purpose of this project was to identify areas within a 10 mile radius of New York City that shows a low number of Filipino restaurants. It is evident that within that radius there is an abundance of restaurants, but the project outlines that there is only a handful of Filipino Restaurants. In order for stakeholders to find the optimal location, they must further look into the listed neighborhoods above based on specific characteristics of neighborhoods and locations in every recommended neighborhood. Stakeholders must also take into account additional factors like attractiveness of each location foot traffic, proximity to public transport, real estate prices, socioeconomic factors and culture of neighborhood.