# Capstone Project - The Battle of the Neighborhoods - Italian Restaurant Manhattan

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

I would like to open a Italian restaurant in Manhattan. Which neighbourhood would make most sense? Location that is not already crowded with restaurants and especially Italian restaurants. 

## Data: <a name="data"></a>

Based on definition of our problem, factors that will influence our decission are:
* number of existing restaurants in the neighborhood (any type of restaurant)
* number of and distance to Italian restaurants in the neighborhood, if any
* Neighborhood names
* Neighborhood Latitude
* Neighborhood Longitude
* Venue Latitude
* Venue Longitude
* Venue Category


Following data sources will be needed to extract/generate the required information:

* Number of restaurants and their type and location in every neighborhood will be obtained using **Foursquare API**
* Cordinate of neighbourhoods will be obtained using **Geocoder**
* Population count of neighborhoods is obtained from **WorldAtlas web page**


In [7]:
pip install geocoder

Note: you may need to restart the kernel to use updated packages.


In [6]:
pip install wget

Note: you may need to restart the kernel to use updated packages.


In [334]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

usage: conda-script.py [-h] [-V] command ...
conda-script.py: error: unrecognized arguments: # uncomment this line if you haven't completed the Foursquare API lab


Libraries imported.


usage: conda-script.py [-h] [-V] command ...
conda-script.py: error: unrecognized arguments: # uncomment this line if you haven't completed the Foursquare API lab


## Retrieving the dataset:<a name="introduction"></a>

In [12]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [49]:
newyork_data

{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

In [50]:
neighborhoods_data = newyork_data['features']

In [15]:
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

In [16]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [265]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [21]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [103]:
#Leave only Manhattan  

df_Manhattan = neighborhoods.loc[neighborhoods['Borough'] == 'Manhattan']
df_Manhattan.reset_index(drop=True, inplace=True)

In [158]:
df_Manhattan.shape

(80, 4)

In [339]:
df_Manhattan.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


In [51]:
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7896239, -73.9598939.


#### Define Foursquare Credentials and Version

In [42]:
CLIENT_ID = 'TO2PMVGMQEREGBX4QMIYZFIPSD4U4553C4RLOS2RWMM22SQO' # your Foursquare ID
CLIENT_SECRET = 'UDIN1UILC0AMN1XR4I4PIW2WHHVPPIEY3RE05Y01RY0TJCSL' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [53]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df_Manhattan['Latitude'], df_Manhattan['Longitude'], df_Manhattan['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

In [147]:
category = '4d4b7105d754a06374d81259'

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [161]:
manhattan_venues = getNearbyVenues(names=df_Manhattan['Neighborhood'],
                                   latitudes=df_Manhattan['Latitude'],
                                   longitudes=df_Manhattan['Longitude']
                                  )

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards
Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyve

In [190]:
print(manhattan_venues.shape)
manhattan_venues.head()

(6374, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
1,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio
2,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
3,Marble Hill,40.876551,-73.91066,Dunkin',40.877136,-73.906666,Donut Shop
4,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop


In [210]:
manhattan_venues.groupby('Venue Category').count()

Unnamed: 0_level_0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Accessories Store,4,4,4,4,4,4
Adult Boutique,2,2,2,2,2,2
Afghan Restaurant,2,2,2,2,2,2
African Restaurant,6,6,6,6,6,6
American Restaurant,144,144,144,144,144,144
Antique Shop,2,2,2,2,2,2
Arepa Restaurant,6,6,6,6,6,6
Argentinian Restaurant,10,10,10,10,10,10
Art Gallery,40,40,40,40,40,40
Art Museum,4,4,4,4,4,4


In [331]:
italian_restaurants = manhattan_venues.loc[manhattan_venues['Venue Category'] == 'Italian Restaurant']

(246, 7)

In [332]:
italian_restaurants.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
61,Chinatown,40.715618,-73.994279,Bacaro,40.714468,-73.991589,Italian Restaurant
128,Washington Heights,40.851903,-73.9369,Saggio Restaurant,40.851423,-73.939761,Italian Restaurant
270,Hamilton Heights,40.823604,-73.949688,Fumo,40.821412,-73.950499,Italian Restaurant
340,Manhattanville,40.816934,-73.957385,Pisticci Ristorante,40.814015,-73.960266,Italian Restaurant
348,Manhattanville,40.816934,-73.957385,Bettolona,40.814084,-73.959574,Italian Restaurant


In [258]:
df2 = italian_restaurants.groupby('Neighborhood').count()
df2.rename(columns = {'Venue': 'Italian restaurant'}, inplace=True)
df2.drop(df2.columns.difference(['Neighborhood','Italian restaurant']), 1, inplace=True)
df2.sort_values(by='Italian restaurant', axis=0, ascending=True, inplace=True)
df2.reset_index(level=0, inplace=True)
df2.head(10)

Unnamed: 0,Neighborhood,Italian restaurant
0,Battery Park City,2
1,Chinatown,2
2,Washington Heights,2
3,Midtown South,2
4,Hamilton Heights,2
5,Manhattan Valley,2
6,Lower East Side,2
7,Civic Center,4
8,Manhattanville,4
9,Carnegie Hill,6


#### Conclusion:

There are total 7 neighborhoods in Manhattan with only 2 Italian restaurants:
Battery Park City, Chinatown Washington Heights, Midtown South, Hamilton Heights, Manhattan Valley, Lower East Side


#### Find out the most populated neighborhoods in manhattan:

In [264]:
#Scrape neighborhood population from World Atlas web page

table = pd.read_html("https://www.worldatlas.com/articles/manhattan-neighborhoods-by-population.html")
manhattan_population = table[0]
manhattan_population.head(10)

Unnamed: 0,Rank,﻿Neighborhood,Population
0,1,Midtown,391371
1,2,Lower Manhattan,382654
2,3,Harlem,335109
3,4,Upper East Side,229688
4,5,Upper West Side,209084
5,6,Washington Heights,158318
6,7,East Harlem,115921
7,8,Chinatown,100000
8,9,Lower East Village,72957
9,10,Alphabet City,63347


#### Conclusion:

There are total 7 neighborhoods in Manhattan with only 2 Italian restaurants:
0	1	Midtown	391371
1	2	Lower Manhattan	382654
2	3	Harlem	335109
3	4	Upper East Side	229688
4	5	Upper West Side	209084
5	6	Washington Heights	158318
6	7	East Harlem	115921
7	8	Chinatown	100000
8	9	Lower East Village	72957
9	10	Alphabet City	63347

#### Draw a map for the best place for the new Italian restaurant

In [323]:
best_location = neighborhoods[(neighborhoods['Neighborhood'].str.contains('Midtown')) | (neighborhoods['Neighborhood'].str.contains('Washington Heights'))]

best_location.head(10)

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
101,Manhattan,Washington Heights,40.851903,-73.9369
114,Manhattan,Midtown,40.754691,-73.981669
250,Manhattan,Midtown South,40.74851,-73.988713
407,Manhattan,Washington Heights,40.851903,-73.9369
420,Manhattan,Midtown,40.754691,-73.981669
556,Manhattan,Midtown South,40.74851,-73.988713


In [328]:
best_location.drop_duplicates( inplace=True,)
best_location.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  best_location.drop_duplicates( inplace=True,)


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
101,Manhattan,Washington Heights,40.851903,-73.9369
114,Manhattan,Midtown,40.754691,-73.981669
250,Manhattan,Midtown South,40.74851,-73.988713
