# Applied Data Science Capstone

## The Battle of Neighborhoods 

## 1. Introduction(Business Problem & Target Audience)

### 1) Business Problem

Queens is the easternmost of the five boroughs of New York City as well as having a large population of Koreans. According to the 2010 United States Census, the Korean population of Queens was 64,107, representing the largest municipality in the United States with a density of at least 500 Korean Americans per square mile. Due to the high percentage of Koreans in this area as well as the increasing popularity of Korean food, Queens, NY is an ideal location to open a Korean restaurant.

However, there are already so many Korean restaurants operating in this area and the market is highly competitive. As it is a highly developed city, the cost of doing business is also one of the highest. Thus, any new business venture or expansion needs to be analysed carefully.

In accordance with this, the idea of this study is to help Koreans who are planning to open new Korean restaurants in Queens, NY to choose the right location by providing relevant data.

### 2) Target Audience

The target audience will be Koreans who are planning on opening a restaurant in Queens, so I will only focus on that borough during my analysis. The objective is to locate and recommend to the management which neighborhood of Queens will be the best in which to open a restaurant. The management should also be able to understand the rationale of the recommendations made.

## 2. Data

### 1) Data 1. Link to the dataset is : https://geo.nyu.edu/catalog/nyu_2451_34572
Newyork has a total of 5 boroughs and 306 neighborhoods. In order to segement the neighborhoods and explore them, I will essentially need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood. This dataset exists on the web. 

### 2) Data 2. Foursquare API_Korean Restaurant category ID: 4bf58dd8d48988d113941735 

Newyork city geographical coordinates data will be utilized as input for the Foursquare API, that will be leveraged to provision venues information for each neighborhood.We will use the Foursquare API to explore neighborhoods in New York City. 
In addition, Korean Restaurant category Id 4bf58dd8d48988d113941735 is used for retrieving data from Foursquare API.

## 3. Methodology

In this project, I will use the basic methodology as taught in Week 3 lab.

First, I will convert addresses into their equivalent latitude and longitude values.
Then I will use the Foursquare API to explore neighborhoods in Queens, NY.
After that, I will obtain data on the most common venue categories in each neighborhood,
and then use this information to group the neighborhoods into clusters
K-means clustering algorithm will be used to complete this task. And also, I will use the Folium library to visualize the neighborhoods in Queens, NY.

### 1) Import Libraries

In this section, I imported the libraries that will be required to process the data.

The first library is Pandas. Pandas is an open source, BSD-licensed library, providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

In [1]:
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim
import urllib.request
import json
from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests
from pandas.io.json import json_normalize

import matplotlib.cm as cm
import matplotlib.colors as colors
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import matplotlib.colors as colors
%matplotlib inline
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes
import folium

print('Libraries imported.')

Libraries imported.


### 2) Download and Explore Dataset

In [2]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

Data downloaded!


In [3]:
neighborhoods_data = newyork_data['features']
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

### 3) Transform the data into a pandas dataframe

In [4]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


### 4) Use geopy library to get the latitude and longitude values of New York City.

In [5]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


##### In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent ny_explorer, as shown below.

In [6]:
Queens_data = neighborhoods[neighborhoods['Borough'] == 'Queens'].reset_index(drop=True)
Queens_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Queens,Astoria,40.768509,-73.915654
1,Queens,Woodside,40.746349,-73.901842
2,Queens,Jackson Heights,40.751981,-73.882821
3,Queens,Elmhurst,40.744049,-73.881656
4,Queens,Howard Beach,40.654225,-73.838138


### 5) Create a map of Queens, NY with neighborhoods superimposed on top.

In [7]:
import folium
# create map of Queens, NY using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(Queens_data['Latitude'], Queens_data['Longitude'], Queens_data['Borough'], Queens_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork


In [8]:
import urllib
def getNearbyVenues(names, latitudes, longitudes, radius=5000, categoryIds=''):
    try:
        venues_list=[]
        for name, lat, lng in zip(names, latitudes, longitudes):
            #print(name)

            # create the API request URL
            url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, LIMIT)

            if (categoryIds != ''):
                url = url + '&categoryId={}'
                url = url.format(categoryIds)

            # make the GET request
            response = requests.get(url).json()
            results = response["response"]['venues']

            # return only relevant information for each nearby venue
            for v in results:
                success = False
                try:
                    category = v['categories'][0]['name']
                    success = True
                except:
                    pass

                if success:
                    venues_list.append([(
                        name, 
                        lat, 
                        lng, 
                        v['name'], 
                        v['location']['lat'], 
                        v['location']['lng'],
                        v['categories'][0]['name']
                    )])

        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude',  
                  'Venue Category']
    
    except:
        print(url)
        print(response)
        print(results)
        print(nearby_venues)

    return(nearby_venues)

### 6) Foursquare venues

In [9]:
LIMIT = 500 
radius = 5000 
CLIENT_ID = 'PX4X0C5C2WPLGEAXMEGRPARKNPKQSDI3U4VSFCVQKC23WVKV' # your Foursquare ID
CLIENT_SECRET = 'BID5NDUYTVWRCOZ0BPERZGWK31DLRDBUQREVOPQAGTF5EYEP' # your Foursquare Secret
VERSION = '20180604'


In [10]:
#https://developer.foursquare.com/docs/resources/categories
#Korean = 4bf58dd8d48988d113941735
neighborhoods = neighborhoods[neighborhoods['Borough'] == 'Queens'].reset_index(drop=True)
newyork_venues_Korean = getNearbyVenues(names=neighborhoods['Neighborhood'], latitudes=neighborhoods['Latitude'], longitudes=neighborhoods['Longitude'], radius=1000, categoryIds='4bf58dd8d48988d113941735')
newyork_venues_Korean.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Astoria,40.768509,-73.915654,Kal,40.76575,-73.91871,Korean Restaurant
1,Astoria,40.768509,-73.915654,Drunken Chicken,40.762962,-73.92768,Korean Restaurant
2,Astoria,40.768509,-73.915654,Mokja,40.760224,-73.921423,Korean Restaurant
3,Astoria,40.768509,-73.915654,Bonchon,40.762931,-73.927423,Korean Restaurant
4,Woodside,40.746349,-73.901842,Unidentified Flying Chickens (UFC),40.746487,-73.894057,Fried Chicken Joint


In [11]:
newyork_venues_Korean.shape

(269, 7)

In [12]:
def addToMap(df, color, existingMap):
    for lat, lng, local, venue, venueCat in zip(df['Venue Latitude'], df['Venue Longitude'], df['Neighborhood'], df['Venue'], df['Venue Category']):
        label = '{} ({}) - {}'.format(venue, venueCat, local)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color=color,
            fill=True,
            fill_color=color,
            fill_opacity=0.7).add_to(existingMap)

In [13]:
map_newyork_Korean = folium.Map(location=[latitude, longitude], zoom_start=10)
addToMap(newyork_venues_Korean, 'red', map_newyork_Korean)

map_newyork_Korean

In [14]:
def addColumn(startDf, columnTitle, dataDf):
    grouped = dataDf.groupby('Neighborhood').count()
    
    for n in startDf['Neighborhood']:
        try:
            startDf.loc[startDf['Neighborhood'] == n,columnTitle] = grouped.loc[n, 'Venue']
        except:
            startDf.loc[startDf['Neighborhood'] == n,columnTitle] = 0

In [15]:
Queens_grouped = newyork_venues_Korean.groupby('Neighborhood').count()
Queens_grouped
#print('There are {} uniques categories.'.format(len(newyork_venues_Korean['Venue Category'].unique())))

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Astoria,4,4,4,4,4,4
Auburndale,36,36,36,36,36,36
Bayside,36,36,36,36,36,36
Blissville,3,3,3,3,3,3
College Point,2,2,2,2,2,2
Corona,1,1,1,1,1,1
Douglaston,8,8,8,8,8,8
Elmhurst,9,9,9,9,9,9
Flushing,49,49,49,49,49,49
Forest Hills,1,1,1,1,1,1


### 7) Analyze Each Neighborhood in Queens

In [16]:
# one hot encoding
Queens_onehot = pd.get_dummies(newyork_venues_Korean[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Queens_onehot['Neighborhood'] = newyork_venues_Korean['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Queens_onehot.columns[-1]] + list(Queens_onehot.columns[:-1])
Queens_onehot = Queens_onehot[fixed_columns]

Queens_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Asian Restaurant,Bakery,Cajun / Creole Restaurant,Chinese Restaurant,Fried Chicken Joint,Korean Restaurant,New American Restaurant,Sushi Restaurant
0,Astoria,0,0,0,0,0,0,1,0,0
1,Astoria,0,0,0,0,0,0,1,0,0
2,Astoria,0,0,0,0,0,0,1,0,0
3,Astoria,0,0,0,0,0,0,1,0,0
4,Woodside,0,0,0,0,0,1,0,0,0


In [17]:
Queens_grouped = Queens_onehot.groupby('Neighborhood').mean().reset_index()
Queens_grouped

Unnamed: 0,Neighborhood,American Restaurant,Asian Restaurant,Bakery,Cajun / Creole Restaurant,Chinese Restaurant,Fried Chicken Joint,Korean Restaurant,New American Restaurant,Sushi Restaurant
0,Astoria,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
1,Auburndale,0.027778,0.0,0.0,0.0,0.0,0.0,0.944444,0.0,0.027778
2,Bayside,0.0,0.027778,0.027778,0.0,0.0,0.0,0.944444,0.0,0.0
3,Blissville,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
4,College Point,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
5,Corona,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
6,Douglaston,0.0,0.0,0.0,0.125,0.0,0.0,0.875,0.0,0.0
7,Elmhurst,0.0,0.0,0.0,0.0,0.0,0.111111,0.888889,0.0,0.0
8,Flushing,0.0,0.0,0.0,0.0,0.040816,0.0,0.959184,0.0,0.0
9,Forest Hills,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0


In [18]:
Queens_onehot.shape

(269, 10)

In [19]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [20]:
num_top_venues = 7

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = Queens_grouped['Neighborhood']

for ind in np.arange(Queens_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Queens_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,Astoria,Korean Restaurant,Sushi Restaurant,New American Restaurant,Fried Chicken Joint,Chinese Restaurant,Cajun / Creole Restaurant,Bakery
1,Auburndale,Korean Restaurant,Sushi Restaurant,American Restaurant,New American Restaurant,Fried Chicken Joint,Chinese Restaurant,Cajun / Creole Restaurant
2,Bayside,Korean Restaurant,Bakery,Asian Restaurant,Sushi Restaurant,New American Restaurant,Fried Chicken Joint,Chinese Restaurant
3,Blissville,Korean Restaurant,Sushi Restaurant,New American Restaurant,Fried Chicken Joint,Chinese Restaurant,Cajun / Creole Restaurant,Bakery
4,College Point,Korean Restaurant,Sushi Restaurant,New American Restaurant,Fried Chicken Joint,Chinese Restaurant,Cajun / Creole Restaurant,Bakery


#### Print each neighborhood along with the top 7 most common venues

In [21]:
# set number of clusters
kclusters = 5

Queens_grouped_clustering = Queens_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Queens_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 0, 0, 0, 0, 2, 3, 0, 0], dtype=int32)

In [22]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Queens_merged = Queens_data
Queens_merged = Queens_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

Queens_merged.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,Queens,Astoria,40.768509,-73.915654,0.0,Korean Restaurant,Sushi Restaurant,New American Restaurant,Fried Chicken Joint,Chinese Restaurant,Cajun / Creole Restaurant,Bakery
1,Queens,Woodside,40.746349,-73.901842,3.0,Korean Restaurant,Fried Chicken Joint,Sushi Restaurant,New American Restaurant,Chinese Restaurant,Cajun / Creole Restaurant,Bakery
2,Queens,Jackson Heights,40.751981,-73.882821,0.0,Korean Restaurant,Sushi Restaurant,New American Restaurant,Fried Chicken Joint,Chinese Restaurant,Cajun / Creole Restaurant,Bakery
3,Queens,Elmhurst,40.744049,-73.881656,3.0,Korean Restaurant,Fried Chicken Joint,Sushi Restaurant,New American Restaurant,Chinese Restaurant,Cajun / Creole Restaurant,Bakery
4,Queens,Howard Beach,40.654225,-73.838138,,,,,,,,


In [23]:
Queens_merged = Queens_merged.dropna(subset=['Cluster Labels'])

In [24]:
Queens_merged['Cluster Labels'] = Queens_merged['Cluster Labels'].astype(int)

In [25]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Queens_merged['Latitude'], Queens_merged['Longitude'], Queens_merged['Neighborhood'], Queens_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters


#### - Clust 0

## 4. Result

### K-mean Cluster: Using K-mean to clustering data areas with less number of Korean restaurants

Based on dataframe analysis above Cluster 1 and Cluster 4 areas are the best places to open new Korean restaurants.

In [26]:
Queens_merged.loc[Queens_merged['Cluster Labels'] == 0, Queens_merged.columns[[1] + list(range(5, Queens_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
0,Astoria,Korean Restaurant,Sushi Restaurant,New American Restaurant,Fried Chicken Joint,Chinese Restaurant,Cajun / Creole Restaurant,Bakery
2,Jackson Heights,Korean Restaurant,Sushi Restaurant,New American Restaurant,Fried Chicken Joint,Chinese Restaurant,Cajun / Creole Restaurant,Bakery
5,Corona,Korean Restaurant,Sushi Restaurant,New American Restaurant,Fried Chicken Joint,Chinese Restaurant,Cajun / Creole Restaurant,Bakery
6,Forest Hills,Korean Restaurant,Sushi Restaurant,New American Restaurant,Fried Chicken Joint,Chinese Restaurant,Cajun / Creole Restaurant,Bakery
9,Flushing,Korean Restaurant,Chinese Restaurant,Sushi Restaurant,New American Restaurant,Fried Chicken Joint,Cajun / Creole Restaurant,Bakery
11,Sunnyside,Korean Restaurant,Sushi Restaurant,New American Restaurant,Fried Chicken Joint,Chinese Restaurant,Cajun / Creole Restaurant,Bakery
14,Ridgewood,Korean Restaurant,Sushi Restaurant,New American Restaurant,Fried Chicken Joint,Chinese Restaurant,Cajun / Creole Restaurant,Bakery
20,College Point,Korean Restaurant,Sushi Restaurant,New American Restaurant,Fried Chicken Joint,Chinese Restaurant,Cajun / Creole Restaurant,Bakery
22,Bayside,Korean Restaurant,Bakery,Asian Restaurant,Sushi Restaurant,New American Restaurant,Fried Chicken Joint,Chinese Restaurant
23,Auburndale,Korean Restaurant,Sushi Restaurant,American Restaurant,New American Restaurant,Fried Chicken Joint,Chinese Restaurant,Cajun / Creole Restaurant


#### - Clust 1

In [27]:
Queens_merged.loc[Queens_merged['Cluster Labels'] == 1, Queens_merged.columns[[1] + list(range(5, Queens_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
67,Forest Hills Gardens,New American Restaurant,Korean Restaurant,Sushi Restaurant,Fried Chicken Joint,Chinese Restaurant,Cajun / Creole Restaurant,Bakery


#### - Clust 2

In [28]:
Queens_merged.loc[Queens_merged['Cluster Labels'] == 2, Queens_merged.columns[[1] + list(range(5, Queens_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
24,Little Neck,Korean Restaurant,Cajun / Creole Restaurant,Sushi Restaurant,New American Restaurant,Fried Chicken Joint,Chinese Restaurant,Bakery
25,Douglaston,Korean Restaurant,Cajun / Creole Restaurant,Sushi Restaurant,New American Restaurant,Fried Chicken Joint,Chinese Restaurant,Bakery


#### - Clust 3

In [29]:
Queens_merged.loc[Queens_merged['Cluster Labels'] == 3, Queens_merged.columns[[1] + list(range(5, Queens_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
1,Woodside,Korean Restaurant,Fried Chicken Joint,Sushi Restaurant,New American Restaurant,Chinese Restaurant,Cajun / Creole Restaurant,Bakery
3,Elmhurst,Korean Restaurant,Fried Chicken Joint,Sushi Restaurant,New American Restaurant,Chinese Restaurant,Cajun / Creole Restaurant,Bakery


#### - Clust 4

In [30]:
Queens_merged.loc[Queens_merged['Cluster Labels'] == 4, Queens_merged.columns[[1] + list(range(5, Queens_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue
32,Oakland Gardens,Korean Restaurant,Sushi Restaurant,New American Restaurant,Fried Chicken Joint,Chinese Restaurant,Cajun / Creole Restaurant,Bakery


## 5. Discussion

Although all of the goals of this project were met there is definitely room for further improvement and development as noted below. However, the goals of the project were met and, with some more work, could easily be devleoped into a fully phledged application that could support the opening a business idea in an unknown location.

As per the neighbourhood or restaurant type mentioned like Korean restaurants analysis can be checked. A venue with lowest risk and competition can be identified.


In this section, I would be discussing the observations I have noted and the recommendation that I can make based on the results. 

This analysis is performed on limited data. This may be right or may be wrong. But if good amount of data is available there is scope to come up with better results.
- There is high competition in Murray Hill, Flushing, Auburndale, and Bayside so it is very risky to open business in these areas.
- There is low competition in Corona, Forest Hills, Hunters Point, Pomonok, Ridgewood, and Utopia so it is not risky to open business in these areas.
- It can be done more detailed analysis by adding other factors such as transportation, demographics of inhabitants.   

Finally, FourSquare proved to be a good source of data but frustrating at times. Despite having a Developer account I regularly exceeded my hourly limit locking me out for the day.

## 6. Conclusion

Although all of the goals of this project were met there is definitely room for further improvement and development as noted below. However, the goals of the project were met and, with some more work, could easily be devleoped into a fully phledged application that could support the opening a business idea in an unknown location.

As per the neighbourhood or restaurant type mentioned like Sushi restaurants analysis can be checked. A venue with lowest risk and competition can be identified.
