# Capstone project: Final assignment week 4 

## Introduction/Business problem

An US-based company has regular work travels in Rome, Naples, and Florence (Italy).

The company has already found the accomodations for these recurrent stays and would like to know which are the best-rated pizza places where their employers cold have dinner

**The objective of the study is to find a good trade-off between pizzeria-rating and closeness to the already chosen accomodations**

The maximum distance the employers should walk from their pre-selected accomodation is 500m

## Data Section

Data given by the companies are the three addresses of the three accomodations. 

Rome: 33 Via di Sant'Agata de Goti, Roma, RM
Nample: 133, Via Stella, Napoli, NA
Florence: 3, Via Del Leone, Firenze, FI

I leverage Foursquare data to get the locations and proceed with the analysis

### Visualization of the locations with Folium

In [1]:
import pandas as pd
import numpy as np
import requests # library to handle requests
from bs4 import BeautifulSoup
import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


### Some useful functions

In [3]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [4]:
def get_congitute_latitude_accomodation(address):
    geolocator = Nominatim(user_agent="foursquare_agent")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    return latitude, longitude

In [5]:
def filter_data(dataframe):
    # keep only columns that include venue name, and anything that is associated with location
    filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
    dataframe_filtered = dataframe.loc[:, filtered_columns]

    # filter the category for each row
    dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

    # clean column names by keeping only last term
    dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

    return dataframe_filtered

In [6]:
def query_into_dataframe(CLIENT_ID, CLIENT_SECRET, address,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT):
    latitude, longitude = get_congitute_latitude_accomodation(address)
    
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)
    results = requests.get(url).json()
    
    # assign relevant part of JSON to venues
    venues = results['response']['venues']

    # tranform venues into a dataframe
    dataframe = pd.json_normalize(venues)   
    return dataframe, latitude, longitude

In [11]:
def plot_map(latitude, longitude, dataframe_filtered):
    venues_map = folium.Map(location=[latitude, longitude], zoom_start=15) # generate map centred around the Conrad Hotel

    # add a red circle marker to represent the place rented
    folium.CircleMarker(
        [latitude, longitude],
        radius=10,
        color='red',
        popup='Rented place',
        fill = True,
        fill_color = 'red',
        fill_opacity = 0.6
    ).add_to(venues_map)

    # add the Italian restaurants as blue circle markers
    for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            color='blue',
            popup=label,
            fill = True,
            fill_color='blue',
            fill_opacity=0.6
        ).add_to(venues_map)

    # display map
    return venues_map

#### Visualise Rome address and closeby pizza locations

In [12]:
address = "33 Via di Sant'Agata de Goti, Roma, RM"
search_query = 'Pizza'
radius = 500 #500 meters
df, latitude, longitude = query_into_dataframe(CLIENT_ID, CLIENT_SECRET, address,ACCESS_TOKEN, VERSION, search_query,radius,LIMIT)
df = filter_data(df)
plot_map(latitude, longitude, df)

#### Visualise Naples address and closeby pizza locations

In [13]:
address = '133, Via Stella, Napoli, NA'
search_query = 'Pizza'
radius = 500 #500 meters
df, latitude, longitude = query_into_dataframe(CLIENT_ID, CLIENT_SECRET, address,ACCESS_TOKEN, VERSION, search_query,radius,LIMIT)
df = filter_data(df)
plot_map(latitude, longitude, df)

KeyError: "None of [Index(['name', 'categories', 'id'], dtype='object')] are in the [columns]"

**There are no pizza places in 500 meters, we redo the analysis in 1000 meters**

In [16]:
address = '133, Via Stella, Napoli, NA'
search_query = 'Pizza'
radius = 1000 #500 meters
df, latitude, longitude = query_into_dataframe(CLIENT_ID, CLIENT_SECRET, address,ACCESS_TOKEN, VERSION, search_query,radius,LIMIT)
df = filter_data(df)
plot_map(latitude, longitude, df)

#### Visualise Florence address and closeby pizza locations

In [15]:
address = '3, Via Del Leone, Firenze, FI'
search_query = 'Pizza'
radius = 500 #500 meters
df, latitude, longitude = query_into_dataframe(CLIENT_ID, CLIENT_SECRET, address,ACCESS_TOKEN, VERSION, search_query,radius,LIMIT)
df = filter_data(df)
plot_map(latitude, longitude, df)