# Final project 
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)



## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find an optimal location for a restaurant. Specifically, this report will be targeted to stakeholders interested in opening an **Italian restaurant** in **NY city**.


The idea is to find a place in Manhattan that is **not so close to other italian restaurant**. However, the owner of the future restaurant is searching for a place **close to hotels or other tourist attractions**.

Thanks to our knowledge we will find a good location for the new restaurant.

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decission are:
* number of existing restaurants in the neighborhood (any type of restaurant)
* number of and distance to Italian restaurants in the neighborhood, if any
* tourist attractions and hotels nearby

Following data sources will be needed to extract/generate the required information:
* number of restaurants and their type and location in every neighborhood will be obtained using **Foursquare API**

In [2]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.22.0               |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          97 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-1.22.0-pyh9f0ad1d_0



Downloading and Extracting Packages
geopy-1.22.0         | 63 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ###############################

In [3]:
CLIENT_ID = 'ST1COQ5NWY2HVWDGTZMSVF41SHGCI3WZVNZPNT10WDR3IAEK' # your Foursquare ID
CLIENT_SECRET = '5EYJAOVB5FMB5XI4QXYMXQVWZJZQPQ5O34QFARELZDHXKM23' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 1000
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ST1COQ5NWY2HVWDGTZMSVF41SHGCI3WZVNZPNT10WDR3IAEK
CLIENT_SECRET:5EYJAOVB5FMB5XI4QXYMXQVWZJZQPQ5O34QFARELZDHXKM23


We first find the coordinates for Manhattan

In [4]:
address = 'Manhattan, New York, NY'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

40.7896239 -73.9598939


We now search all the italian restaurants in a radius of 5 KM

## Methodology <a name="methodology"></a>

In this project we will direct our efforts on detecting areas of Manhattan, NY that have low italian restaurant density and high hotels density. We will limit our analysis to area ~5km around city center.

In first step we will collect the required **data: location and type (category) of every italian restaurant within 5km from Manhattan**.

Second step in our analysis will be the building of a folium map in order to make the stakeholders able to explore and search for optimal venue location.

## Analysis <a name="analysis"></a>

In [5]:
search_query = 'Italian'
radius = 5000

In [6]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=ST1COQ5NWY2HVWDGTZMSVF41SHGCI3WZVNZPNT10WDR3IAEK&client_secret=5EYJAOVB5FMB5XI4QXYMXQVWZJZQPQ5O34QFARELZDHXKM23&ll=40.7896239,-73.9598939&v=20180604&query=Italian&radius=5000&limit=1000'

In [7]:
results = requests.get(url).json()

In [8]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

  """


Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.crossStreet,location.lat,location.lng,location.labeledLatLngs,...,location.country,location.formattedAddress,delivery.id,delivery.url,delivery.provider.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.icon.name,venuePage.id,location.neighborhood
0,4a7778a1f964a5209be41fe3,Carmine's Italian Restaurant,"[{'id': '4bf58dd8d48988d110941735', 'name': 'I...",v-1590855178,False,2450 Broadway,btwn W 90th & W 91st,40.791096,-73.973991,"[{'label': 'display', 'lat': 40.7910963, 'lng'...",...,United States,"[2450 Broadway (btwn W 90th & W 91st), New Yor...",294727.0,https://www.seamless.com/menu/carmines-upper-w...,seamless,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_seamless_20180129.png,,
1,4ba00318f964a520285237e3,The Italian Academy (Casa Italiana),"[{'id': '4bf58dd8d48988d1a8941735', 'name': 'G...",v-1590855178,False,1161 Amsterdam Ave,West 118th Street,40.807645,-73.960396,"[{'label': 'display', 'lat': 40.80764460477974...",...,United States,"[1161 Amsterdam Ave (West 118th Street), New Y...",,,,,,,,
2,4b786cecf964a52052cd2ee3,Bellini Italian Restaurant & Brick Oven Pizza,"[{'id': '4bf58dd8d48988d110941735', 'name': 'I...",v-1590855178,False,483 Columbus Ave,btwn 83rd & 84th St,40.784656,-73.973522,"[{'label': 'display', 'lat': 40.78465555633674...",...,United States,"[483 Columbus Ave (btwn 83rd & 84th St), New Y...",383815.0,https://www.seamless.com/menu/bellini-italian-...,seamless,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_seamless_20180129.png,101733469.0,
3,4a90dbbef964a520a11920e3,Italian Village Pizzeria,"[{'id': '4bf58dd8d48988d1ca941735', 'name': 'P...",v-1590855178,False,1526 1st Ave,79th and 1st,40.772669,-73.952319,"[{'label': 'display', 'lat': 40.772669, 'lng':...",...,United States,"[1526 1st Ave (79th and 1st), New York, NY 100...",1659847.0,https://www.seamless.com/menu/italian-village-...,seamless,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_seamless_20180129.png,,Yorkville
4,51e7310c498e639ed27062b1,Quality Italian,"[{'id': '4bf58dd8d48988d110941735', 'name': 'I...",v-1590855178,False,57 W 57th St,at 6th Ave,40.764513,-73.976827,"[{'label': 'display', 'lat': 40.76451329448865...",...,United States,"[57 W 57th St (at 6th Ave), New York, NY 10019...",,,,,,,,


In [11]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered.head()

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
0,Carmine's Italian Restaurant,Italian Restaurant,2450 Broadway,btwn W 90th & W 91st,40.791096,-73.973991,"[{'label': 'display', 'lat': 40.7910963, 'lng'...",1199,10024,US,New York,NY,United States,"[2450 Broadway (btwn W 90th & W 91st), New Yor...",,4a7778a1f964a5209be41fe3
1,The Italian Academy (Casa Italiana),General College & University,1161 Amsterdam Ave,West 118th Street,40.807645,-73.960396,"[{'label': 'display', 'lat': 40.80764460477974...",2006,10027,US,New York,NY,United States,"[1161 Amsterdam Ave (West 118th Street), New Y...",,4ba00318f964a520285237e3
2,Bellini Italian Restaurant & Brick Oven Pizza,Italian Restaurant,483 Columbus Ave,btwn 83rd & 84th St,40.784656,-73.973522,"[{'label': 'display', 'lat': 40.78465555633674...",1274,10024,US,New York,NY,United States,"[483 Columbus Ave (btwn 83rd & 84th St), New Y...",,4b786cecf964a52052cd2ee3
3,Italian Village Pizzeria,Pizza Place,1526 1st Ave,79th and 1st,40.772669,-73.952319,"[{'label': 'display', 'lat': 40.772669, 'lng':...",1992,10075,US,New York,NY,United States,"[1526 1st Ave (79th and 1st), New York, NY 100...",Yorkville,4a90dbbef964a520a11920e3
4,Quality Italian,Italian Restaurant,57 W 57th St,at 6th Ave,40.764513,-73.976827,"[{'label': 'display', 'lat': 40.76451329448865...",3138,10019,US,New York,NY,United States,"[57 W 57th St (at 6th Ave), New York, NY 10019...",,51e7310c498e639ed27062b1


In [12]:
dataframe_filtered.shape

(50, 16)

In [13]:
search_query = 'hotel'
radius = 5000

In [14]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=ST1COQ5NWY2HVWDGTZMSVF41SHGCI3WZVNZPNT10WDR3IAEK&client_secret=5EYJAOVB5FMB5XI4QXYMXQVWZJZQPQ5O34QFARELZDHXKM23&ll=40.7896239,-73.9598939&v=20180604&query=hotel&radius=5000&limit=1000'

In [15]:
results = requests.get(url).json()
#results

In [16]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

  """


Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.crossStreet,location.lat,location.lng,location.labeledLatLngs,...,location.country,location.formattedAddress,venuePage.id,location.neighborhood,delivery.id,delivery.url,delivery.provider.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.icon.name
0,4ad78cbff964a520140c21e3,Hotel Wales,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",v-1590855241,False,1295 Madison Ave,92nd St,40.784737,-73.955713,"[{'label': 'display', 'lat': 40.7847375, 'lng'...",...,United States,"[1295 Madison Ave (92nd St), New York, NY 1012...",,,,,,,,
1,4b9c6ac8f964a520276736e3,Days Inn Hotel New York City-Broadway,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",v-1590855241,False,215 West 94th Street,at Broadway,40.793298,-73.972092,"[{'label': 'display', 'lat': 40.7932977, 'lng'...",...,United States,"[215 West 94th Street (at Broadway), New York,...",,,,,,,,
2,4bf2fc262d629521cbe55f58,Broadway Hotel,"[{'id': '4bf58dd8d48988d1ee931735', 'name': 'H...",v-1590855241,False,230 W 101st St,at Broadway,40.797932,-73.969834,"[{'label': 'display', 'lat': 40.79793166906806...",...,United States,"[230 W 101st St (at Broadway), New York, NY 10...",,,,,,,,
3,4b1c3322f964a520210424e3,Belnord Hotel,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",v-1590855241,False,209 W 87th St,Broadway,40.788905,-73.975054,"[{'label': 'display', 'lat': 40.7889054, 'lng'...",...,United States,"[209 W 87th St (Broadway), New York, NY 10024,...",,,,,,,,
4,4bc3a05adce4eee125af719d,Hotel 99 Llc,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",v-1590855241,False,244 W 99th St,,40.79669,-73.970555,"[{'label': 'display', 'lat': 40.79669018312864...",...,United States,"[244 W 99th St, New York, NY 10025, United Sta...",,,,,,,,


In [18]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered2 = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered2['categories'] = dataframe_filtered2.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered2.columns = [column.split('.')[-1] for column in dataframe_filtered2.columns]

dataframe_filtered2.head()

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
0,Hotel Wales,Hotel,1295 Madison Ave,92nd St,40.784737,-73.955713,"[{'label': 'display', 'lat': 40.7847375, 'lng'...",648,10128,US,New York,NY,United States,"[1295 Madison Ave (92nd St), New York, NY 1012...",,4ad78cbff964a520140c21e3
1,Days Inn Hotel New York City-Broadway,Hotel,215 West 94th Street,at Broadway,40.793298,-73.972092,"[{'label': 'display', 'lat': 40.7932977, 'lng'...",1106,10025,US,New York,NY,United States,"[215 West 94th Street (at Broadway), New York,...",,4b9c6ac8f964a520276736e3
2,Broadway Hotel,Hostel,230 W 101st St,at Broadway,40.797932,-73.969834,"[{'label': 'display', 'lat': 40.79793166906806...",1247,10025,US,New York,NY,United States,"[230 W 101st St (at Broadway), New York, NY 10...",,4bf2fc262d629521cbe55f58
3,Belnord Hotel,Hotel,209 W 87th St,Broadway,40.788905,-73.975054,"[{'label': 'display', 'lat': 40.7889054, 'lng'...",1280,10024,US,New York,NY,United States,"[209 W 87th St (Broadway), New York, NY 10024,...",,4b1c3322f964a520210424e3
4,Hotel 99 Llc,Hotel,244 W 99th St,,40.79669,-73.970555,"[{'label': 'display', 'lat': 40.79669018312864...",1194,10025,US,New York,NY,United States,"[244 W 99th St, New York, NY 10025, United Sta...",,4bc3a05adce4eee125af719d


## Results and Discussion <a name="results"></a>

The final results is the map represented below. The idea is that through this map the **business people can make decision** by looking at the interactive plot.
Indeed, they are able to explore the different areas and add the information they have for example about the rent price of each neighbours or other business information they have.

In [19]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the Conrad Hotel

# add a red circle marker to represent the Conrad Hotel
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Manhattan',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)
    
# add the hotels as green circle markers
for lat, lng, label in zip(dataframe_filtered2.lat, dataframe_filtered2.lng, dataframe_filtered2.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='green',
        popup=label,
        fill = True,
        fill_color='green',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map