# THE BATTLE OF THE NEIGHBORHOODS
## My final Capstone Project

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

#### Background

New York is a truly diverse city situated in the east coast of USA. Because of its cultural appeal it attracts many people both local and foreigner. In general, New York has a very rich social life where any moment is a good time to dine out, visit a historical site or spend hours at any bar.
Currently, globalization has taken a very important role in the spread of many cultures. More and more we see new fashion trends, social habits, and cuisines form other countries. It seems as if national borders cannot do anything to prevent this explosion of new trends.
New restaurants catering to diverse crowds pop up all the time. People are constantly coming up with new dishes, bringing them from their hometown or getting inspired by their international trips. Very recently we moved to New York in search of new opportunities. For the first few weeks we immersed ourselves in the city and it’s culture, however, being form another country, we started to miss our cuisine, this is why we decided to search for the best Spanish restaurants in the city. Unfortunately, there weren’t many. Because of this, an idea formed in our mind, why not place our very own Spanish restaurant in NYC?

#### New Opportunity

Given NYC’s cultural openness and the very few Spanish restaurants available in the city we determined it would be a good market opportunity.


## Data <a name="data"></a>

1. Accurate study of the different Boroughs and Neighbourhoods of NYC.
2. A study of all the Spanish restaurants in the area.

This information will be taken from Foursquare and GeoSpace and imported into a Jupyter Notebook, where the analysis will be done.

## Methodology <a name="methodology"></a>

Collect NYC data. Using Foursquare find all venues for each neiguborhood and filter out all except spanish restaurants

#### Answer the following questions

* How many Spanish restaurants are there per neighbourhood?
* Which area would be the best to place a Spanish restaurant?

## Analysis <a name="analysis"></a>

#### Import libraries

In [25]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: \ ^C
failed

CondaError: KeyboardInterrupt

Solving environment: / ^C
failed

CondaError: KeyboardInterrupt

Libraries imported.


In [2]:
CLIENT_ID = 'HBGJDQU5BARJSWX0AR4APR3UFUCPKZJ2O032PB2SXEVNVQH1' # your Foursquare ID
CLIENT_SECRET = '33YVXRFQXECDMKRPA2QPU5BZC4GBNY2KHLMVDJQLXAOC01PO' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: HBGJDQU5BARJSWX0AR4APR3UFUCPKZJ2O032PB2SXEVNVQH1
CLIENT_SECRET:33YVXRFQXECDMKRPA2QPU5BZC4GBNY2KHLMVDJQLXAOC01PO


### Define the city's Boroughs and Neighbourhoods

In [8]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


In [10]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [12]:
neighborhoods_data = newyork_data['features']

In [14]:
# Transform into a pandas dataframe:

# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)


for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
    
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [16]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


### Create a map

In [21]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


In [23]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

Given the geography of New York as well as it's raising popularity as far as new trends goes we decided to choose Brooklyn as our chosen location.

In [33]:
brooklyn_data = neighborhoods[neighborhoods['Borough'] == 'Brooklyn'].reset_index(drop=True)
brooklyn_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Brooklyn,Bay Ridge,40.625801,-74.030621
1,Brooklyn,Bensonhurst,40.611009,-73.99518
2,Brooklyn,Sunset Park,40.645103,-74.010316
3,Brooklyn,Greenpoint,40.730201,-73.954241
4,Brooklyn,Gravesend,40.59526,-73.973471


In [34]:
# Get coordinates of the Borough

address = 'Brooklyn, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Brooklyn are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Brooklyn are 40.6501038, -73.9495823.


In [35]:
# Visualize the neighbourhoods

# create map of Brooklyn using latitude and longitude values
map_brooklyn = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, label in zip(brooklyn_data['Latitude'], brooklyn_data['Longitude'], brooklyn_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_brooklyn)  
    
map_brooklyn

### Explore all venues within the coordinates of Brooklyn

In [36]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat,  
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [38]:
brooklyn_venues = getNearbyVenues(names=brooklyn_data['Neighborhood'],
                                   latitudes=brooklyn_data['Latitude'],
                                   longitudes=brooklyn_data['Longitude']
                                  )

Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker Heights
Gerritsen Beach
Marine Park
Clinton Hill
Sea Gate
Downtown
Boerum Hill
Prospect Lefferts Gardens
Ocean Hill
City Line
Bergen Beach
Midwood
Prospect Park South
Georgetown
East Williamsburg
North Side
South Side
Ocean Parkway
Fort Hamilton
Ditmas Park
Wingate
Rugby
Remsen Village
New Lots
Paerdegat Basin
Mill Basin
Fulton Ferry
Vinegar Hill
Weeksville
Broadway Junction
Dumbo
Homecrest
Highland Park
Madison
Erasmus


In [39]:
print(brooklyn_venues.shape)
brooklyn_venues.head()

(1601, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bay Ridge,40.625801,-74.030621,Pilo Arts Day Spa and Salon,40.624748,-74.030591,Spa
1,Bay Ridge,40.625801,-74.030621,Bagel Boy,40.627896,-74.029335,Bagel Shop
2,Bay Ridge,40.625801,-74.030621,Cocoa Grinder,40.623967,-74.030863,Juice Bar
3,Bay Ridge,40.625801,-74.030621,Pegasus Cafe,40.623168,-74.031186,Breakfast Spot
4,Bay Ridge,40.625801,-74.030621,Leo's Casa Calamari,40.6242,-74.030931,Pizza Place


In [40]:
brooklyn_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bath Beach,30,30,30,30,30,30
Bay Ridge,30,30,30,30,30,30
Bedford Stuyvesant,26,26,26,26,26,26
Bensonhurst,30,30,30,30,30,30
Bergen Beach,6,6,6,6,6,6
Boerum Hill,30,30,30,30,30,30
Borough Park,20,20,20,20,20,20
Brighton Beach,30,30,30,30,30,30
Broadway Junction,14,14,14,14,14,14
Brooklyn Heights,30,30,30,30,30,30


In [41]:
print('There are {} uniques categories.'.format(len(brooklyn_venues['Venue Category'].unique())))

There are 240 uniques categories.


In [42]:
print (brooklyn_venues['Venue Category'].unique())

['Spa' 'Bagel Shop' 'Juice Bar' 'Breakfast Spot' 'Pizza Place'
 'Taco Place' 'Bookstore' 'Coffee Shop' 'Grocery Store'
 'Caucasian Restaurant' 'Middle Eastern Restaurant' 'Sports Bar'
 'Greek Restaurant' 'Hookah Bar' 'Chinese Restaurant' 'Bar' 'Lounge'
 'Optical Shop' 'Italian Restaurant' 'Ice Cream Shop' 'Tea Room' 'Café'
 'Sushi Restaurant' 'Cosmetics Shop' 'Bakery' 'Butcher'
 'Shabu-Shabu Restaurant' 'Noodle House' 'Liquor Store'
 'Hotpot Restaurant' 'Donut Shop' 'Pet Store' 'Mobile Phone Shop' 'Park'
 'Smoke Shop' 'Latin American Restaurant' 'Mexican Restaurant' 'Bank'
 'Fried Chicken Joint' 'Creperie' 'Record Shop' 'Pharmacy' 'Gym'
 'Video Game Store' 'Supplement Shop' "Women's Store" 'Yoga Studio'
 'Polish Restaurant' 'Szechuan Restaurant' 'Gymnastics Gym'
 'French Restaurant' 'Furniture / Home Store' 'Cocktail Bar' 'Gastropub'
 'Beer Store' 'Restaurant' 'Deli / Bodega' 'Sandwich Place' 'Boutique'
 'Baseball Field' 'Martial Arts Dojo' 'Wine Shop' 'Gourmet Shop'
 'Arts & Crafts St

In [43]:
spanish_venues = brooklyn_venues.where(brooklyn_venues['Venue Category'] == 'Spanish Restaurant')
spanish_venues.dropna(inplace=True)
print (spanish_venues.shape)
spanish_venues

(7, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
372,Brownsville,40.66395,-73.910235,Tropical Coffee Shop,40.665831,-73.909984,Spanish Restaurant
508,Cobble Hill,40.68792,-73.998561,La Vara,40.687851,-73.995582,Spanish Restaurant
697,Cypress Hills,40.682391,-73.876616,Faro Del Pacifico Pupuseria,40.683029,-73.875259,Spanish Restaurant
707,Cypress Hills,40.682391,-73.876616,rico pollo 2,40.683627,-73.871979,Spanish Restaurant
980,Prospect Lefferts Gardens,40.65842,-73.954899,El Castillo de Jagua,40.656685,-73.960068,Spanish Restaurant
1559,Highland Park,40.681999,-73.890346,Lilliam Restaurant,40.683051,-73.889334,Spanish Restaurant
1569,Highland Park,40.681999,-73.890346,Restaurante Maria Internacional,40.677613,-73.891376,Spanish Restaurant


### Create a map with the Spanish restaurants

In [44]:
# create map of spanish venues using latitude and longitude values
map_spanish_venues = folium.Map(location=[latitude, longitude], zoom_start=12)


# add markers to map
for lat, lng, label in zip(spanish_venues['Venue Latitude'], spanish_venues['Venue Longitude'], spanish_venues['Venue']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_spanish_venues)
    
map_spanish_venues

In [45]:
spanish_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Brownsville,1,1,1,1,1,1
Cobble Hill,1,1,1,1,1,1
Cypress Hills,2,2,2,2,2,2
Highland Park,2,2,2,2,2,2
Prospect Lefferts Gardens,1,1,1,1,1,1


## Results and Discussion <a name="results"></a>

Our analysis has shown that there are at least 7 Spanish restaurants in Brooklyn, one of the biggest Boroughs of NYC. However, there are very few restaurants per neighborhood. Only Cypress Hills and Highland Park have two. This means that none of them are very dense but that it may be a good idea to focus on placing our restaurant on a neighborhood that already has a customer base of this type of cuisine. Also, because these two neighborhoods are practically side by side, we determined that placing it in either one of them or on a nearby neighborhood such as East New York, City line or Broadway Junction might me a good market opportunity.

## Conclusion <a name="conclusion"></a>

Opening a Spanish Restaurant in a very cultural city such as NYC has a good opportunity of being successful. Moreover, after studying the different areas and the density of Spanish restaurants we determined that the best location was somewhere around Cypress Hills and Highland Park in the Borough of Brooklyn.

# To see all maps look at the report or click on this link to see the notebook on IBM
url:https://eu-gb.dataplatform.cloud.ibm.com/analytics/notebooks/v2/4028fac5-fb77-4be4-88b0-769f410a393b/view?access_token=2dd7e9a35dd33df554bc43de3c5b0d1c253ccb6888fec3917314b8ad99c5728f