# Capstone

## Battle of Neighbourhoods

### Chinese restaurants in NY

#### Introduction

 <font size="4"> 
    
 During recent decades China has experienced fast economic growth. This resulted in the expansion of Chinese cuisine worldwide. 
 The development of Chinese restaurants and food chains can be stimulated by the 
 two factors. First, there the number of tourists and immigrants from China 
 is apparently increasing in Western Europe and US. Second, Chinese cuisine becomes more and more popular
 among people of non-Chinese origin. Therefore, opening the Chinese  restaurant in one of the 
 major cities seems to the a plausible business project.  The choice of New York City
 is motivated by the fact that it hosts the larges Chinese-American population of any city proper, with over half a million (Wikipedia). 
 To figure out which location for such a restaurant would be more profitable we will analyze the available data on existing restaurants of the 
 same type in different New York neighborhoods and boroughs.
    </font>
    

#### Target audience and importance

<font size="4"> 
    
Suggested analysis will be interesting for the two groups of target audience. First, they are NY locals and visitors who like Chinese cuisine. Second, they are potential investors who think of starting the restaurant business in NY. 
</font>

#### Data

<font size="4"> 
    
- Data on New York City neighborhoods, boroughs to 
include boundaries, latitude, longitude, restaurants, and restaurant ratings and tips are required.
- New York City data containing the neighborhoods and boroughs, latitudes, and longitudes will be 
obtained from the data source: https://cocl.us/new_york_dataset
- New York City data containing neighborhood boundaries will be obtained from the data source: 
https://data.cityofnewyork.us/City-Government/Borough-Boundaries/tqmj-j8zm
- All data related to locations and quality of Italian restaurants will be obtained via the 
FourSquare API utilized via the Request library in Python.
</font>

#### Methodology

<font size="4">

  - We will download the data from https://cocl.us/new_york_dataset and cleaned transform it into a dataframe.
  - We will use FourSquare to find Chinese restaurants and get their ratings, tips, and likes. This information will be added to the dataframe.
  - Data will be presented using different visualisation tools available in Python.
</font>

#### Problem Statement

<font size="4">
    
- Where are the most popular Chinese cuisine restaurant are located in New York City?
- Which location is optimal for opening the new Chinese restaurant in New York City ?
</font>

#### Example of what features can be extracted from the data

Neighborhoods in NY and their coordinates

In [18]:
ny_data = get_new_york_data()
ny_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


List of Chinese restaurants in each neighborhood in NY

In [17]:
chinese_rest_ny.tail()

Unnamed: 0,Borough,Neighborhood,ID,Name
186,Brooklyn,Homecrest,4c54639f72cf0f47cd247bd4,Winley
187,Staten Island,Prince's Bay,4bb7d282b35776b0b83dc801,Island Taste
188,Bronx,Allerton,4c7dbf7f10916dcba78e2a96,Chef King
189,Bronx,Kingsbridge Heights,4d965fd061a3a1cde4989a42,Lucky Star Chinese Restaurant
190,Brooklyn,Erasmus,4e10c87122713f7d7bcb9d1d,Kam Man Chinese Food


### Technical details

In [1]:
import numpy as np  # useful for many scientific computing in Python
import pandas as pd # primary data structure library
!conda install -c conda-forge lxml --yes # uncomment this line if you haven't completed the Foursquare API lab

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

import seaborn as sns
print('Libraries imported.')

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - lxml


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    scikit-learn-0.20.1        |   py36h22eb022_0         5.7 MB
    liblapack-3.8.0            |      11_openblas          10 KB  conda-forge
    scipy-1.3.2                |   py36h921218d_0        18.0 MB  conda-forge
    libopenblas-0.3.6          |       h5a2b251_2         7.7 MB
    liblapacke-3.8.0           |      11_openblas          10 KB  conda-forge
    numpy-1.17.3               |   py36h95a1406_0         5.2 MB  conda-forge
    libcblas-3.8.0             |      11_openblas          10 KB  conda-forge
    lxml-4.4.1                 |   py36h7ec2d77_0        

In [10]:
def get_new_york_data():
    url='https://cocl.us/new_york_dataset'
    resp=requests.get(url).json()
    # all data is present in features label
    features=resp['features']
    # define the dataframe columns
    column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
    # instantiate the dataframe
    new_york_data = pd.DataFrame(columns=column_names)
    for data in features:
        borough = data['properties']['borough'] 
        neighborhood_name = data['properties']['name']
        neighborhood_latlon = data['geometry']['coordinates']
        neighborhood_lat = neighborhood_latlon[1]
        neighborhood_lon = neighborhood_latlon[0]
        new_york_data = new_york_data.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
    return new_york_data


def get_venues(lat,lng):
    #set variables
    radius=400
    LIMIT=100
    #url to fetch data from foursquare api
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
    # get all the data
    results = requests.get(url).json()
    venue_data=results["response"]['groups'][0]['items']
    venue_details=[]
    for row in venue_data:
        try:
            venue_id=row['venue']['id']
            venue_name=row['venue']['name']
            venue_category=row['venue']['categories'][0]['name']
            venue_details.append([venue_id,venue_name,venue_category])
        except KeyError:
            pass
    column_names=['ID','Name','Category']
    df = pd.DataFrame(venue_details,columns=column_names)
    return df

In [5]:
ny_data = get_new_york_data()
ny_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [13]:
# queens has most neighborhoods
# prepare neighborhood list that contains chinese resturants
column_names=['Borough', 'Neighborhood', 'ID','Name']
chinese_rest_ny=pd.DataFrame(columns=column_names)
count=1
for row in ny_data.values.tolist():
    Borough, Neighborhood, Latitude, Longitude=row
    venues = get_venues(Latitude,Longitude)
    chinese_resturants=venues[venues['Category']=='Chinese Restaurant']   
   # print('(',count,'/',len(ny_data),')','Chinese Resturants in '+Neighborhood+', '+Borough+':'+str(len(italian_resturants)))
   # print(row)
    for resturant_detail in chinese_resturants.values.tolist():
        id, name , category=resturant_detail
        chinese_rest_ny = chinese_rest_ny.append({'Borough': Borough,
                                                'Neighborhood': Neighborhood, 
                                                'ID': id,
                                                'Name' : name
                                               }, ignore_index=True)
 

In [14]:
chinese_rest_ny.tail()

Unnamed: 0,Borough,Neighborhood,ID,Name
186,Brooklyn,Homecrest,4c54639f72cf0f47cd247bd4,Winley
187,Staten Island,Prince's Bay,4bb7d282b35776b0b83dc801,Island Taste
188,Bronx,Allerton,4c7dbf7f10916dcba78e2a96,Chef King
189,Bronx,Kingsbridge Heights,4d965fd061a3a1cde4989a42,Lucky Star Chinese Restaurant
190,Brooklyn,Erasmus,4e10c87122713f7d7bcb9d1d,Kam Man Chinese Food
