# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find an optimal location for a restaurant. Specifically, this report will be targeted to stakeholders interested in opening an **Sushi restaurant** in **The City of New York**, US.

Since there are lots of restaurants in Berlin we will try to detect **locations that are not already crowded with restaurants**. We are also particularly interested in **areas with no SuShi restaurants in vicinity**. We would also prefer locations **as close to city center as possible**, assuming that first two conditions are met.

We will use our data science powers to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decission are:
* number of existing restaurants in the neighborhood (any type of restaurant)
* number of and distance to Italian restaurants in the neighborhood, if any
* distance of neighborhood from city center

We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:
* Neighborhood has a total of 5 boroughs and 306 neighborhoods. In order to segement the neighborhoods and explore them, we will essentially need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood. This dataset exists for free on the web. Link to the dataset is: **https://geo.nyu.edu/catalog/nyu_2451_34572**
* Newyork city geographical coordinates data will be utilized as input for the Foursquare API, that will be leveraged to provision venues information for each neighborhood.We will use the Foursquare API to explore neighborhoods in New York City. The below is image of the Foursquare API data.In addition, Sushi category Id 4bf58dd8d48988d1d2941735 is used for retrieving data from Foursquare API

## Import Libraries

In this section we import the libraries that will be required to process the data.

The first library is Pandas. Pandas is an open source, BSD-licensed library, providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

In [1]:
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim
import urllib.request
import json
from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests
from pandas.io.json import json_normalize

import matplotlib.cm as cm
import matplotlib.colors as colors
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import matplotlib.colors as colors
%matplotlib inline
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes
import folium

print('Libraries imported.')

Libraries imported.


## Download and Explore Dataset

Download and Explore Dataset Neighborhood has a total of 5 boroughs and 306 neighborhoods. In order to segement the neighborhoods and explore them, we will essentially need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood.

This dataset exists for free on the web. Feel free to try to find this dataset on your own, but here is the link to the dataset: https://geo.nyu.edu/catalog/nyu_2451_34572



In [2]:
with open('C:\\newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

### Tranform the data into a pandas dataframe

In [3]:
neighborhoods_data = newyork_data['features']
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [4]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [5]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


### Create a map of New York with neighborhoods superimposed on top.

In [6]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


In [7]:
import folium
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Borough'], manhattan_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

## Foursquare venues

In [9]:
import urllib
def getNearbyVenues(names, latitudes, longitudes, radius=5000, categoryIds=''):
    try:
        venues_list=[]
        for name, lat, lng in zip(names, latitudes, longitudes):
            #print(name)

            # create the API request URL
            url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, LIMIT)

            if (categoryIds != ''):
                url = url + '&categoryId={}'
                url = url.format(categoryIds)

            # make the GET request
            response = requests.get(url).json()
            results = response["response"]['venues']

            # return only relevant information for each nearby venue
            for v in results:
                success = False
                try:
                    category = v['categories'][0]['name']
                    success = True
                except:
                    pass

                if success:
                    venues_list.append([(
                        name, 
                        lat, 
                        lng, 
                        v['name'], 
                        v['location']['lat'], 
                        v['location']['lng'],
                        v['categories'][0]['name']
                    )])

        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude',  
                  'Venue Category']
    
    except:
        print(url)
        print(response)
        print(results)
        print(nearby_venues)

    return(nearby_venues)

In [10]:
LIMIT = 500 
radius = 5000 
CLIENT_ID = 'BKCWDTKU4QCDOER0YQZEI3DOWDU1NK5KVEYHLE1PY3AQ3PMX'
CLIENT_SECRET = 'CVCHP2F45PP5LLUEHKYW55BGKC0Q1PDSF0TQ2FNDZK0IBFKR'
VERSION = '20210520'

In [11]:
#https://developer.foursquare.com/docs/resources/categories
#Sushi = 4bf58dd8d48988d1d2941735
neighborhoods = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
newyork_venues_sushi = getNearbyVenues(names=neighborhoods['Neighborhood'], latitudes=neighborhoods['Latitude'], longitudes=neighborhoods['Longitude'], radius=1000, categoryIds='4bf58dd8d48988d1d2941735')
newyork_venues_sushi.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Planet Tokyo,40.886233,-73.909479,Sushi Restaurant
1,Chinatown,40.715618,-73.994279,Sushi Hatsune,40.715994,-73.992859,Sushi Restaurant
2,Chinatown,40.715618,-73.994279,Nakaji,40.715912,-73.996597,Sushi Restaurant
3,Chinatown,40.715618,-73.994279,Mikaku Sushi,40.721419,-73.996731,Sushi Restaurant
4,Chinatown,40.715618,-73.994279,Shinsen,40.715608,-73.996611,Japanese Restaurant


In [12]:
newyork_venues_sushi.shape

(1096, 7)

In [13]:
def addToMap(df, color, existingMap):
    for lat, lng, local, venue, venueCat in zip(df['Venue Latitude'], df['Venue Longitude'], df['Neighborhood'], df['Venue'], df['Venue Category']):
        label = '{} ({}) - {}'.format(venue, venueCat, local)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color=color,
            fill=True,
            fill_color=color,
            fill_opacity=0.7).add_to(existingMap)

In [14]:
map_newyork_sushi = folium.Map(location=[latitude, longitude], zoom_start=10)
addToMap(newyork_venues_sushi, 'red', map_newyork_sushi)

map_newyork_sushi

In [15]:
def addColumn(startDf, columnTitle, dataDf):
    grouped = dataDf.groupby('Neighborhood').count()
    
    for n in startDf['Neighborhood']:
        try:
            startDf.loc[startDf['Neighborhood'] == n,columnTitle] = grouped.loc[n, 'Venue']
        except:
            startDf.loc[startDf['Neighborhood'] == n,columnTitle] = 0

In [16]:
manhattan_grouped = newyork_venues_sushi.groupby('Neighborhood').count()
manhattan_grouped
#print('There are {} uniques categories.'.format(len(newyork_venues_sushi['Venue Category'].unique())))

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Battery Park City,22,22,22,22,22,22
Carnegie Hill,24,24,24,24,24,24
Central Harlem,3,3,3,3,3,3
Chelsea,41,41,41,41,41,41
Chinatown,25,25,25,25,25,25
Civic Center,32,32,32,32,32,32
Clinton,36,36,36,36,36,36
East Harlem,3,3,3,3,3,3
East Village,50,50,50,50,50,50
Financial District,22,22,22,22,22,22


## Methodology <a name="methodology"></a>

In this project, I will use the basic methodology as taught in Week 3 lab.I have done convert addresses into their equivalent latitude and longitude values. Then we will use the Foursquare API to explore neighborhoods in Manhattan,New York. After that, explore function to get sushi restaurant categories in each neighborhood.Then use this feature to group the neighborhoods into clusters K-means clustering algorithm will be use to complete this task. And also, the Folium library to visualize the neighborhoods in Manhattan and its emerging clusters.

## 3. Analysis <a name="analysis"></a>

In [17]:
# one hot encoding
manhattan_onehot = pd.get_dummies(newyork_venues_sushi[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = newyork_venues_sushi['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

manhattan_onehot.head()

Unnamed: 0,Neighborhood,Asian Restaurant,Bakery,Bubble Tea Shop,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant,Indian Chinese Restaurant,Japanese Restaurant,Noodle House,Ramen Restaurant,Restaurant,Sake Bar,Sandwich Place,Seafood Restaurant,Smoothie Shop,Steakhouse,Sushi Restaurant,Vegetarian / Vegan Restaurant
0,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
1,Chinatown,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
2,Chinatown,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
3,Chinatown,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
4,Chinatown,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0


In [18]:
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()
manhattan_grouped

Unnamed: 0,Neighborhood,Asian Restaurant,Bakery,Bubble Tea Shop,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant,Indian Chinese Restaurant,Japanese Restaurant,Noodle House,Ramen Restaurant,Restaurant,Sake Bar,Sandwich Place,Seafood Restaurant,Smoothie Shop,Steakhouse,Sushi Restaurant,Vegetarian / Vegan Restaurant
0,Battery Park City,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.863636,0.0
1,Carnegie Hill,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.75,0.041667
2,Central Harlem,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
3,Chelsea,0.04878,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.097561,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.804878,0.02439
4,Chinatown,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.88,0.0
5,Civic Center,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.90625,0.0
6,Clinton,0.027778,0.0,0.0,0.027778,0.027778,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.805556,0.0
7,East Harlem,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
8,East Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.88,0.02
9,Financial District,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.818182,0.0


In [19]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [20]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Sushi Restaurant,Japanese Restaurant,Noodle House,Hawaiian Restaurant,Bakery,Bubble Tea Shop,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market
1,Carnegie Hill,Sushi Restaurant,Japanese Restaurant,Indian Chinese Restaurant,Vegetarian / Vegan Restaurant,Asian Restaurant,Grocery Store,Bakery,Bubble Tea Shop,Chinese Restaurant,Cocktail Bar
2,Central Harlem,Sushi Restaurant,Vegetarian / Vegan Restaurant,Indian Chinese Restaurant,Bakery,Bubble Tea Shop,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store
3,Chelsea,Sushi Restaurant,Japanese Restaurant,Asian Restaurant,Vegetarian / Vegan Restaurant,Fish Market,Seafood Restaurant,Bakery,Bubble Tea Shop,Chinese Restaurant,Cocktail Bar
4,Chinatown,Sushi Restaurant,Japanese Restaurant,Indian Chinese Restaurant,Bakery,Bubble Tea Shop,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store


### Cluster Neighborhoods

In [21]:
# set number of clusters
kclusters = 5

manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 2, 3, 2, 1, 1, 2, 3, 1, 4])

In [22]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

manhattan_merged = manhattan_data
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

manhattan_merged.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,3,Sushi Restaurant,Vegetarian / Vegan Restaurant,Indian Chinese Restaurant,Bakery,Bubble Tea Shop,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store
1,Manhattan,Chinatown,40.715618,-73.994279,1,Sushi Restaurant,Japanese Restaurant,Indian Chinese Restaurant,Bakery,Bubble Tea Shop,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store
2,Manhattan,Washington Heights,40.851903,-73.9369,3,Sushi Restaurant,Vegetarian / Vegan Restaurant,Indian Chinese Restaurant,Bakery,Bubble Tea Shop,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store
3,Manhattan,Inwood,40.867684,-73.92121,3,Sushi Restaurant,Vegetarian / Vegan Restaurant,Indian Chinese Restaurant,Bakery,Bubble Tea Shop,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store
4,Manhattan,Hamilton Heights,40.823604,-73.949688,3,Sushi Restaurant,Vegetarian / Vegan Restaurant,Indian Chinese Restaurant,Bakery,Bubble Tea Shop,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store


In [23]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [30]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Manhattanville,Sushi Restaurant,Japanese Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant,Indian Chinese Restaurant
12,Upper West Side,Sushi Restaurant,Japanese Restaurant,Grocery Store,Asian Restaurant,Seafood Restaurant,Sandwich Place,Sake Bar,Restaurant,Ramen Restaurant,Noodle House
20,Lower East Side,Sushi Restaurant,Japanese Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant,Indian Chinese Restaurant
29,Financial District,Sushi Restaurant,Japanese Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant,Indian Chinese Restaurant


In [31]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Marble Hill,Sushi Restaurant,Vegetarian / Vegan Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant,Indian Chinese Restaurant
2,Washington Heights,Sushi Restaurant,Vegetarian / Vegan Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant,Indian Chinese Restaurant
3,Inwood,Sushi Restaurant,Vegetarian / Vegan Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant,Indian Chinese Restaurant
4,Hamilton Heights,Sushi Restaurant,Vegetarian / Vegan Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant,Indian Chinese Restaurant
6,Central Harlem,Sushi Restaurant,Vegetarian / Vegan Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant,Indian Chinese Restaurant
7,East Harlem,Sushi Restaurant,Vegetarian / Vegan Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant,Indian Chinese Restaurant
18,Greenwich Village,Sushi Restaurant,Japanese Restaurant,Sake Bar,Indian Chinese Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store
21,Tribeca,Sushi Restaurant,Noodle House,Vegetarian / Vegan Restaurant,Indian Chinese Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store


In [24]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Upper East Side,Sushi Restaurant,Japanese Restaurant,Asian Restaurant,Vegetarian / Vegan Restaurant,Fish Market,Seafood Restaurant,Bakery,Bubble Tea Shop,Chinese Restaurant,Cocktail Bar
9,Yorkville,Sushi Restaurant,Japanese Restaurant,Fish Market,Indian Chinese Restaurant,Vegetarian / Vegan Restaurant,Asian Restaurant,Sandwich Place,Sake Bar,Restaurant,Ramen Restaurant
10,Lenox Hill,Sushi Restaurant,Asian Restaurant,Japanese Restaurant,Vegetarian / Vegan Restaurant,Fish Market,Seafood Restaurant,Bakery,Bubble Tea Shop,Chinese Restaurant,Cocktail Bar
13,Lincoln Square,Sushi Restaurant,Japanese Restaurant,Smoothie Shop,Chinese Restaurant,Grocery Store,Hawaiian Restaurant,Bakery,Bubble Tea Shop,Cocktail Bar,Deli / Bodega
14,Clinton,Sushi Restaurant,Japanese Restaurant,Chinese Restaurant,Cocktail Bar,Asian Restaurant,Seafood Restaurant,Sandwich Place,Sake Bar,Restaurant,Ramen Restaurant
15,Midtown,Sushi Restaurant,Asian Restaurant,Seafood Restaurant,Japanese Restaurant,Vegetarian / Vegan Restaurant,Bakery,Restaurant,Ramen Restaurant,Sandwich Place,Fish Market
16,Murray Hill,Sushi Restaurant,Japanese Restaurant,Asian Restaurant,Vegetarian / Vegan Restaurant,Bakery,Restaurant,Ramen Restaurant,Grocery Store,Bubble Tea Shop,Chinese Restaurant
17,Chelsea,Sushi Restaurant,Japanese Restaurant,Asian Restaurant,Vegetarian / Vegan Restaurant,Fish Market,Seafood Restaurant,Bakery,Bubble Tea Shop,Chinese Restaurant,Cocktail Bar
25,Manhattan Valley,Sushi Restaurant,Japanese Restaurant,Hawaiian Restaurant,Indian Chinese Restaurant,Bakery,Bubble Tea Shop,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market
26,Morningside Heights,Sushi Restaurant,Japanese Restaurant,Hawaiian Restaurant,Indian Chinese Restaurant,Bakery,Bubble Tea Shop,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market


In [25]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 3, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Marble Hill,Sushi Restaurant,Vegetarian / Vegan Restaurant,Indian Chinese Restaurant,Bakery,Bubble Tea Shop,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store
2,Washington Heights,Sushi Restaurant,Vegetarian / Vegan Restaurant,Indian Chinese Restaurant,Bakery,Bubble Tea Shop,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store
3,Inwood,Sushi Restaurant,Vegetarian / Vegan Restaurant,Indian Chinese Restaurant,Bakery,Bubble Tea Shop,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store
4,Hamilton Heights,Sushi Restaurant,Vegetarian / Vegan Restaurant,Indian Chinese Restaurant,Bakery,Bubble Tea Shop,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store
5,Manhattanville,Sushi Restaurant,Vegetarian / Vegan Restaurant,Indian Chinese Restaurant,Bakery,Bubble Tea Shop,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store
6,Central Harlem,Sushi Restaurant,Vegetarian / Vegan Restaurant,Indian Chinese Restaurant,Bakery,Bubble Tea Shop,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store
7,East Harlem,Sushi Restaurant,Vegetarian / Vegan Restaurant,Indian Chinese Restaurant,Bakery,Bubble Tea Shop,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store
18,Greenwich Village,Sushi Restaurant,Japanese Restaurant,Sake Bar,Hawaiian Restaurant,Bakery,Bubble Tea Shop,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market
21,Tribeca,Sushi Restaurant,Noodle House,Vegetarian / Vegan Restaurant,Hawaiian Restaurant,Bakery,Bubble Tea Shop,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market


In [26]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 4, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
20,Lower East Side,Sushi Restaurant,Japanese Restaurant,Indian Chinese Restaurant,Bakery,Bubble Tea Shop,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store
29,Financial District,Sushi Restaurant,Japanese Restaurant,Indian Chinese Restaurant,Bakery,Bubble Tea Shop,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store


## Results and Discussion <a name="results"></a>

We analyzed data for K-mean Cluster Using K-mean to clustering data area with less number of sushi bars.Based on dataframe analysis above Cluster 3 (Upper West Side ) and Cluster 4 
(Morningside Heights) areas are the best places to open a new sushi bar business.


In this section, I would be discussing the observations I have noted and the recommendation that I can make based on the results.
This analysis is performed on limited data. This may be right or may be wrong. But if good amount of data is available there is scope to come up with better results.

1.) There is high competition in Midtown and Soho so it is very risky to open business in these areas.
2•) Central Harlem has also potential where closes to Morningside Heights area.
3•) It can be done more detailed analysis by adding other factors such as transportation, demographics of inhabitants.

Finally, FourSquare proved to be a good source of data but frustrating at times. Despite having a Developer account I regularly exceeded my hourly limit locking me out for the day

## Conclusion <a name="conclusion"></a>

Although all of the goals of this project were met there is definitely room for further improvement and development as noted below. However, the goals of the project were 
met and, with some more work, could easily be devleoped into a fully phledged application that could support the opening a business idea in an unknown location.
As per the neighbourhood or restaurant type mentioned like Sushi restaurants analysis can be checked. A venue with lowest risk and competition can be identified.