# Locations for Opening New Japanese Restaurant in Manhattan

## Introduction

This notebook describes the distribution of restaurant in Manhattan, New York City and provides some recommendations on good locations for opening new japanese restaurant in the area according to the findings. It is targeted for readers who are considering his/her own casual style Japanese restaurant within the area.

## Data

The data referenced in this notebook are based on the New York City Neighborhood information provided by New York (City)
Department of City Planning (https://geo.nyu.edu/catalog/nyu_2451_34572) and New York City Population By Neighborhood
Tabulation Areas published by NYC Open Data (https://data.cityofnewyork.us/City-Government/Manhattan-populations-by-
neighborhood/8m6s-esnp). Combination of these data offers a overall picture on the distribution of the existing
Japanese restaurant against the distribution of the potential customers in Manhattan. The two key ideas on selecting a
new location for a new restaurant are highly population but low competition.

## Analysis

As preparation, we first import necessary data analysis tools:

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-4.0.1               |             py_0         575 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.0 MB

The following NEW packages will be 

At the beginning, we download new york city dataset from New York City Department of Planning:

In [2]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset

Next, we read the new york city data from the downloaded file and tranform it into a data frame:

In [3]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [4]:
# transform new york data into a dataframe
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude']
neighborhoods_data = newyork_data['features']
neighborhoods = pd.DataFrame(columns=column_names)
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


Then, all the neighborhoods in Manhattan are extracted from the data frame:

In [5]:
# extract all neighborhoods in Manhattan
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


Next, we define the search parameters for finding the existing Japanese restaurants with Manhattan:

In [6]:
# define foursquare API parameters
CLIENT_ID = 'CY2VUJW5PSUOTPDV5BY5OAHY1KQ1N5PWRRUAKYGFJYCY5UET' # your Foursquare ID
CLIENT_SECRET = 'CNOKO3MA2JZRHDK01UKPOIWDHEE0BN3OO5YGQEXBRUGLK2OS' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [7]:
# define search parameters for finding japanese restaurant within the area
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 1000 # define radius
CATALOGID = '4bf58dd8d48988d111941735' # japanese restaurant

Next, we retrieve the information of all the Japanese restaurants from FourSquare.com:

In [8]:
# define a function for retrieving the restaurant info from foursquare.com recursively
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,
            CATALOGID)
            
        # make the GET request
        results = requests.get(url).json()["response"]['venues']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['name'], 
            v['location']['lat'], 
            v['location']['lng'],  
            v['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [9]:
# get all japanese restaurants in Manhattan from foursquare.com
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )

For example, the first 5 Japanese restaurants and their information reported by FourSquare.com are as follow:

In [10]:
print(manhattan_venues.shape)
manhattan_venues.head()

(973, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Empire Chinese And Japanese Cuisine,40.878588,-73.917446,Japanese Restaurant
1,Chinatown,40.715618,-73.994279,Shinka Ramen & Sake Bar,40.717147,-73.995206,Japanese Restaurant
2,Chinatown,40.715618,-73.994279,Gohan,40.715143,-73.99168,Japanese Restaurant
3,Chinatown,40.715618,-73.994279,Shinsen,40.715608,-73.996611,Japanese Restaurant
4,Chinatown,40.715618,-73.994279,Cocoron,40.72023,-73.992968,Japanese Restaurant


Display all venue categories related to japanese restaurant and the corresponding total number of existing restaurants in the area:

In [11]:
manhattan_venues.groupby('Venue Category').count()

Unnamed: 0_level_0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
American Restaurant,1,1,1,1,1,1
Art Gallery,1,1,1,1,1,1
Asian Restaurant,21,21,21,21,21,21
BBQ Joint,2,2,2,2,2,2
Bakery,1,1,1,1,1,1
Beer Bar,1,1,1,1,1,1
Buffet,1,1,1,1,1,1
Building,1,1,1,1,1,1
Burger Joint,1,1,1,1,1,1
Café,2,2,2,2,2,2


Since there are some restaurants in the table in which they are not actually offering Japanese cuisine, we remove them from our list of japanese restaurants:

In [12]:
# remove restaurants which are actually offering non-japanese style cuisine
jplist = ['Japanese Curry Restaurant','Japanese Restaurant','Noodle House','Poke Place','Ramen Restaurant','Sake Bar','Shabu-Shabu Restaurant','Soba Restaurant','Sushi Restaurant','Tonkatsu Restaurant','Udon Restaurant']
jp_restaurant = manhattan_venues[manhattan_venues['Venue Category'].isin(jplist)]
jp_restaurant.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Empire Chinese And Japanese Cuisine,40.878588,-73.917446,Japanese Restaurant
1,Chinatown,40.715618,-73.994279,Shinka Ramen & Sake Bar,40.717147,-73.995206,Japanese Restaurant
2,Chinatown,40.715618,-73.994279,Gohan,40.715143,-73.99168,Japanese Restaurant
3,Chinatown,40.715618,-73.994279,Shinsen,40.715608,-73.996611,Japanese Restaurant
4,Chinatown,40.715618,-73.994279,Cocoron,40.72023,-73.992968,Japanese Restaurant


Next, we count the number of japanese restaurants in each neighborhood in Manhattan:

In [13]:
df_count = jp_restaurant.groupby('Neighborhood').count()
df_count = df_count[['Venue']].reset_index()
df_count.rename(columns={'Venue':'Count'}, inplace=True)
df_count

Unnamed: 0,Neighborhood,Count
0,Battery Park City,6
1,Carnegie Hill,14
2,Central Harlem,2
3,Chelsea,34
4,Chinatown,36
5,Civic Center,35
6,Clinton,29
7,East Harlem,2
8,East Village,45
9,Financial District,39


Next, we retrieve the Manhattan population data from NYC Open Data:
https://data.cityofnewyork.us/City-Government/Manhattan-populations-by-neighborhood/8m6s-esnp

In [20]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Borough,Year,FIPS County Code,NTA Code,NTA Name,Population
0,Manhattan,2010,61,MN01,Marble Hill,46746
1,Manhattan,2010,61,MN02,Inwood,46746
2,Manhattan,2010,61,MN03,Central Harlem,75282
3,Manhattan,2010,61,MN04,Hamilton Heights,48520
4,Manhattan,2010,61,MN05,Manhattanville,22950


To visualize the data together with the restaurant information, we combine the resturant count table with the population data:

In [21]:
df_summary = df_count.join(population.set_index('NTA Name'), on='Neighborhood')
df_summary.head()

Unnamed: 0,Neighborhood,Count,Borough,Year,FIPS County Code,NTA Code,Population
0,Battery Park City,6,Manhattan,2010,61,MN29,39699
1,Carnegie Hill,14,Manhattan,2010,61,MN51,84491
2,Central Harlem,2,Manhattan,2010,61,MN03,75282
3,Chelsea,34,Manhattan,2010,61,MN11,70150
4,Chinatown,36,Manhattan,2010,61,MN31,47844


For comparison, we define "market index" as population of a neighborhood divided by the total number of restaurants in the same area:

In [24]:
# Compute "market index" (population / count) in each neighborhood
df_summary['Market'] = df_summary['Population'] / df_summary['Count']
df_summary.head()

Unnamed: 0,Neighborhood,Count,Borough,Year,FIPS County Code,NTA Code,Population,Market
0,Battery Park City,6,Manhattan,2010,61,MN29,39699,6616.5
1,Carnegie Hill,14,Manhattan,2010,61,MN51,84491,6035.071429
2,Central Harlem,2,Manhattan,2010,61,MN03,75282,37641.0
3,Chelsea,34,Manhattan,2010,61,MN11,70150,2063.235294
4,Chinatown,36,Manhattan,2010,61,MN31,47844,1329.0


At last, we sort neighborhoods according to their market indices and then display the top 5 in the list:

In [25]:
df_summary = df_summary.sort_values(by=['Market'], ascending=False)
df_summary.head()

Unnamed: 0,Neighborhood,Count,Borough,Year,FIPS County Code,NTA Code,Population,Market
27,Roosevelt Island,1,Manhattan,2010,61,MN34,80771,80771.0
24,Morningside Heights,1,Manhattan,2010,61,MN07,55929,55929.0
36,Washington Heights,3,Manhattan,2010,61,MN37,151574,50524.666667
21,Marble Hill,1,Manhattan,2010,61,MN01,46746,46746.0
2,Central Harlem,2,Manhattan,2010,61,MN03,75282,37641.0


## Result

The market index indicates the existing market size (i.e. potential number of customers) each existing
Japanese restaurant in the given neighborhood is having. The following table shows the 5 neighborhoods
with the largest market index:

In [26]:
df_summary.head()

Unnamed: 0,Neighborhood,Count,Borough,Year,FIPS County Code,NTA Code,Population,Market
27,Roosevelt Island,1,Manhattan,2010,61,MN34,80771,80771.0
24,Morningside Heights,1,Manhattan,2010,61,MN07,55929,55929.0
36,Washington Heights,3,Manhattan,2010,61,MN37,151574,50524.666667
21,Marble Hill,1,Manhattan,2010,61,MN01,46746,46746.0
2,Central Harlem,2,Manhattan,2010,61,MN03,75282,37641.0


## Discussion

From the result, Roosevelt Island has a large population and only 1 Japanese restaurant, and so it would be the best place for opening a new Japanese restaurant as the competition at there is small. Other good locations with similar low level of competition are Moningside Heights, Washington Heights, Marble Hill and Central Harlem.

## Conclusion

Based on the analysis on the distribution of restaurant in Manhattan, New York City with their market indices, several good locations with low level of competition and high population have been identified for opening new Japanese restaurant in the Manhattan area.