# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

### Done by Ali Farhat

### Bronx Middle Estern Resturants 


## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>


In this project, I am interested in The **Bronx** neighborhood. The Bronx has a rich
history and has passed through many stages in it is history. Its history during
the 20th century may be divided into four periods: a boom period during 1900–
29, with a population growth by a factor of six from 200,000 in 1900 to 1.3 million
in 1930. The Great Depression and post-World War II years saw a slowing of
growth leading into an eventual decline. The mid to late century were hard times,
the Bronx declined 1950–85 from a predominantly moderate-income to a
predominantly lower-income area with high rates of violent crime and poverty.
The Bronx has experienced an economic and developmental resurgence starting
in the late 1980s that continues into today.   

As a **Middle Eastern**, I always had a passion to open a **Middle Eastern
restaurant.** As we all aware location is the primary success factor in restaurant
business. The Bronx is ideal for opening a Middle Eastern restaurant due to its
diversity. My problem is to find the **proper location** that will give me a
competitive edge but yet an attraction zoon. The competition is going to be high
so in my study, I will look for neighborhoods with many restaurants but with few Middle
Eastern options.  

**Target Audience :**   
The target audience of this report is any one that is interested in opening a Middle Eastern
restaurant in the Bronx. Since my dream is to open a Middle Eastern restaurant that then
the main targeted audience is myself and my family. 

## Data <a name="data"></a>

The data comes from New York Geo Data Center from the following link
https://geo.nyu.edu/catalog/nyu_2451_34572. The data contains neighborhood
information the larger city of New York. The data is formatted in JSON objects 
  



The JSON object include many elements that will be relevant to our study. From the JSON object we   
are interested in the following attributes: 

<UL>
    <li> properties.name : Which is the neighborhood   
<li> properties.borough : Which is the city, in which we are looking for the Bronx   
    <li> geometry.coordinates : That contains two values, latitude and longitude      
</UL>
  


  
  


# Downloading and preparing the data and installing necessary libraries  

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    openssl-1.1.1c             |       h516909a_0         2.1 MB  conda-forge
    ca-certificates-2019.9.11  |       hecc5488_0         144 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0         conda-forge
    geopy:           1.20.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

## 1. Download and Explore Dataset


Neighborhood has a total of 5 boroughs and 306 neighborhoods. In order to segement the neighborhoods and explore them, we will essentially need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood. 

Luckily, this dataset exists for free on the web. Feel free to try to find this dataset on your own, but here is the link to the dataset: https://geo.nyu.edu/catalog/nyu_2451_34572

In [2]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


#### Load and explore the data

Lets also read the  json data 

In [3]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [4]:
neighborhoods_data = newyork_data['features']



## Tranform the data into a *pandas* dataframe 

### As mentioned above from the JSON, we are interested the the following attributes.  So we create a pandas dataframe and populate it with the json data objects



In [5]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

## Populate the data frame with data downloaded 

In [6]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [7]:
# Review the neighborhood Data 
neighborhoods.head()



Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [8]:
# Extract Only Bronx Data 

bronx_data = neighborhoods[neighborhoods['Borough'] == 'Bronx'].reset_index(drop=True)
bronx_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


### Neighborhood Of interests 

Let's create latitude & longitude coordinates for centroids of our candidate neighborhoods. We will create a grid of cells covering our area of interest which is aprox. 500 m radious around the center of the bronx city center.

Let's first find the latitude & longitude of bronx city center, using specific, well known address and Google Maps geocoding API.

In [9]:
# Drwa Bronx Neighborhood
address = 'Bronx, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate for the bronx are {}, {}.'.format(latitude, longitude))


The geograpical coordinate for the bronx are 40.85048545, -73.8404035580209.


In [10]:
# create map of The Bronx  using latitude and longitude values
map_bronx = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(bronx_data['Latitude'], bronx_data['Longitude'], bronx_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_bronx)  
    
map_bronx

## Use Foursequer to get Resturants 

The next step is to take at all the neighborhoods in the Bronx using the latitude and longitude and get
the **100 restaurant venues within a radius of 500 meters**. The source of the data related to the venues
comes from the FourSquares.
The request to FourSquares is formatted in the following manner 

#### Define Foursquare Credentials and Version



In [11]:
# @hidden_cell

CLIENT_ID = 'IX3XTPJPMM4TGAOEUORCXYKC0K13Y3KCR5YN3TZ0KCZN15ZG' # your Foursquare ID
CLIENT_SECRET = 'DZYZJBO2M0NNJNTNU2UQG4B5PIDYXGJ1KR1PNRJALYRGCO3M' # your Foursquare Secret
VERSION = '20180605'
LIMIT = 30

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: IX3XTPJPMM4TGAOEUORCXYKC0K13Y3KCR5YN3TZ0KCZN15ZG
CLIENT_SECRET:DZYZJBO2M0NNJNTNU2UQG4B5PIDYXGJ1KR1PNRJALYRGCO3M


In [12]:
neighborhood_latitude = bronx_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = bronx_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = bronx_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Wakefield are 40.89470517661, -73.84720052054902.



### Now, let's get the top 100 venues that are in  within a radius of 1000 meters of wakefield .

### Create a url that gets the data from forsquares


In [13]:
# type your answer here

LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 1000 # define radius
CATID = '4bf58dd8d48988d115941735'


# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT, 
    CATID)
url # display URL
 


'https://api.foursquare.com/v2/venues/explore?&client_id=IX3XTPJPMM4TGAOEUORCXYKC0K13Y3KCR5YN3TZ0KCZN15ZG&client_secret=DZYZJBO2M0NNJNTNU2UQG4B5PIDYXGJ1KR1PNRJALYRGCO3M&v=20180605&ll=40.89470517661,-73.84720052054902&radius=1000&limit=100'

Execute a get command to Forusquares and get the result in JSON 



In [14]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5db85029a30619002c46a5c1'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'},
    {'name': '$-$$$$', 'key': 'price'}]},
  'headerLocation': 'Wakefield',
  'headerFullLocation': 'Wakefield, Bronx',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 44,
  'suggestedBounds': {'ne': {'lat': 40.903705185610015,
    'lng': -73.83531662200086},
   'sw': {'lat': 40.88570516760999, 'lng': -73.85908441909719}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4c537892fd2ea593cb077a28',
       'name': 'Lollipops Gelato',
       'location': {'address': '4120 Baychester Ave',
        'crossStreet': 'Edenwald & Bussing Ave',
        'lat': 40.894123150205274,
        'ln

All data are represented in JSON objects is in the *items* key format. 

Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [15]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

#### The next step is to clean the Json data and structure it into a pandas dataframe 


In [16]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Lollipops Gelato,Dessert Shop,40.894123,-73.845892
1,Ripe Kitchen & Bar,Caribbean Restaurant,40.898152,-73.838875
2,Ali's Roti Shop,Caribbean Restaurant,40.894036,-73.856935
3,Jackie's West Indian Bakery,Caribbean Restaurant,40.889283,-73.84331
4,Rite Aid,Pharmacy,40.896649,-73.844846


### Explore Neighborhoods in Bronx

#### Lets Create a function to repeat the same process for all neighborhoods in the Bronx that has middle eastern retrurants.  We use the Category ID 4bf58dd8d48988d115941735 to narrow our seatch to middle eastern resturants only. 



In [17]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    CATID = '4bf58dd8d48988d115941735'
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,
         CATID)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called *Bronx_venues*.  (Middle Eastern Resturants only)

In [18]:
bronx_venues = getNearbyVenues(names=bronx_data['Neighborhood'],
                                   latitudes=bronx_data['Latitude'],
                                   longitudes=bronx_data['Longitude']
                                  )

Wakefield
Co-op City
Eastchester
Fieldston
Riverdale
Kingsbridge
Woodlawn
Norwood
Williamsbridge
Baychester
Pelham Parkway
City Island
Bedford Park
University Heights
Morris Heights
Fordham
East Tremont
West Farms
High  Bridge
Melrose
Mott Haven
Port Morris
Longwood
Hunts Point
Morrisania
Soundview
Clason Point
Throgs Neck
Country Club
Parkchester
Westchester Square
Van Nest
Morris Park
Belmont
Spuyten Duyvil
North Riverdale
Pelham Bay
Schuylerville
Edgewater Park
Castle Hill
Olinville
Pelham Gardens
Concourse
Unionport
Edenwald
Claremont Village
Concourse Village
Mount Eden
Mount Hope
Bronxdale
Allerton
Kingsbridge Heights


## Methodology <a name="methodology"></a>

In this project we will direct our efforts on detecting areas of bronx  that have limited number of middle eastern resturants.  We will limit our analysis to area ~1km around  the Bronx city center.

In first step we have collected the required **data: location and type (category) of every restaurant within 1km from Bronx center**. We will **identify  restaurants** (according to Foursquare categorization).


In second and final step we will focus on most promising areas and within those create **clusters of locations that meet some basic requirements** established in discussion with stakeholders: we will take into consideration locations with **no more than two restaurants in radius of 250 meters**, and we want locations **without Middle Eastern restaurants in radius of 1000 meters**. We will present map of all such locations but also create clusters (using **k-means clustering**) of those locations to identify general zones / neighborhoods / addresses which should be a starting point for final 'street level' exploration and search for optimal venue location by stakeholders.

## Analysis <a name="analysis"></a>

Next, let's group rows by neighborhood and by taking the count  of the frequency of occurrence of each area.  We can use the grouped data to see where are the middle eastern resturans.  We can use two approches, we can select the areas with high concentrations of middle eastern resturants, this insures that there is a market for this type of food.  The other approch we select areas with no middle eastern resturnant frequency the compition is less but the risk may be high. 


In [19]:
print(bronx_venues.shape)
bronx_venues.head()
bronx_venues.groupby('Neighborhood').count()

(15, 7)


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bedford Park,1,1,1,1,1,1
Bronxdale,3,3,3,3,3,3
Concourse,1,1,1,1,1,1
Concourse Village,1,1,1,1,1,1
Hunts Point,1,1,1,1,1,1
Kingsbridge Heights,1,1,1,1,1,1
Morris Heights,1,1,1,1,1,1
Morrisania,1,1,1,1,1,1
Mount Eden,1,1,1,1,1,1
Norwood,1,1,1,1,1,1


In [20]:
bronx_onehot["Middle Eastern Restaurant"].count()

NameError: name 'bronx_onehot' is not defined

## Results and Discussion <a name="results"></a>



Looking at the result above, we notice the maximum number of Middle Eastern restaurants are in two areas with each 3 restaurants.   According to the methodology I choose, these two areas will be my top pics.    Before making the final decision, we will examine the demographics of each area.   The areas with higher population and with high concentrations of Middle Eastern population will be my choice.   

**Bronxdale**   
Bronxdale population is 79, 188.   The demographics of Bronxdale is made of the following    

<li> Citizen US Born	44,972
<li> Citizen Not US Born 	21,611
<li> Not Citizens 	12,605       
        
        
<br>

**Van Nest**   
Van Nest population is 20,069.   The demographics of Bronxdale is made of the following 


<li> Citizen US Born	    11,370
<li> Citizen Not US Born 	5,378
<li> Not Citizens 	        3,321      


<br>

From the above data, I would choose Bronx dale since it has higher population and large number of immigrants.  This will increase the success of the restaurant


## Conclusion <a name="conclusion"></a>

Purpose of this project was to identify a location in the Bronx where I can open a Middle Eastern restaurant.  Since I don’t have lots of funds, I wanted to minimize the risk of failure and increase the chances of success.   Bronxdale, has high number of Middle Eastern restaurants.  This means that Middle Eastern restaurants are popular.  The other factor demographics.  It looks like Bronxdale has much higher population then Van Nest, this can be translated into higher revenue if we brand ourselves on pricing and quality food.  My plan, is to makes all meals affordable that way we can transform existing competition into business opportunities. 

