# Capstone Project - The Battle of Neighborhoods 

### Introduction

Paris, known for it's iconic heritage attractions, haute French cuisine, and centuries of artistic expression. The City of Light is a global hub of culture and inspiration and those motivated to visit this beautiful city can often become overwhelmed navigating the expansive number of sites to see, neighborhoods to visit, and restaurants to indulge in. Though identifying how and where to spend time and money in Paris can be an exercise of exhaustion, getting the most of your Paris trip can be actually solved scientifically. Specifically by leveraging Computer and Data Science tools and techniques. 

For this project we will serve as a hypothetical bespoke travel planning company. This company is tasked with preparing a detailed itinerary and recommendations on areas to stay in Paris based on the clients preferences. To do this, we will utilize Python programming language, SQL, Cloud infrastructure and technologies, modeling and machine learning algorithms, Jupyter notebooks, and more. 

Lets call this company "Philip Wendt Travel Inc," and the client "Maya Girlfend."

While Maya is generally as easygoing as they come, she does have some specific requirements and taste preferences for her upcoming travels. Like many visitors to Paris, she has her bucket list attractions to visit such as The Eiffel Tower, The Louvre, Versailles, and a scenic ride on the River Siene. However, of particular interest to Maya is soaking in the idyllic Parisian lifestyle. Starting her day strolling in amongst cafes and brasseries in the Paris lite morning fog before settling down at one for a café au lait and croissant. Subsequently prompted by a kick of caffeine she'd roam through historic plazas and parks with bronze and stone statues until stumbling across an interesting shop or art gallery to explore. Considering the French aren't afraid to have a glass of wine or two with lunch, there will need to be a few wine bars nearby as she steps back out into the gentle afternoon sun. Of course French cuisine is an attraction of it's own right so Arrondissements with many French bistros and restaurants will be key, as will cocktail bars for the nightlife that reminds her of her college days. 

For Maya Girlfend we'll focus on finding the best Arrondissements of Paris allowing  her to take in as much of the Parisian atmosphere as one can. We'll focus on finding the neighborhoods that feature the following:
- Cafes and brasseries
- Plazas and gardens
- Art Museums
- French Restaurants and wine bars

We'll try to find an area with bakeries and ice cream shops as well considering Maya has a bit of a sweet tooth too.

### Data

Geo-Coordinate Data: Republic of France Open Platform Public Data

To derive our solution, we will leverage JSON data found at www.data.gouv.fr. The JSON file has details about all the boroughs in France. For this project we will limit it to include only Arrondissements' of Paris.

Venue and Point of Interest Data: Foursquare API

We will need data about different venues across all of Paris and connect each venue to its respective arrondissement. To gain this information, we will use Foursquare geolocation data. As a location data provider, Foursquare offers information about all manners of venues within a designated area. Such information includes venue names, locations, descriptions, photos, and more. Thus, the Foursquare developer platform will be used to source venue data which will be obtained through the API.

In [None]:
# Import libraries
import numpy as np # library to handle data in a vectorized manner
import json # library to handle JSON files
import pandas as pd
import requests
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
from bs4 import BeautifulSoup
# Import k-means from clustering stage
from sklearn.cluster import KMeans
import folium # map rendering library
print('Libraries imported.')

In [686]:
# Download the dataset and read it into a pandas dataframe.
# The Arrondissements dataset was downloaded from Paris|DATA:  https://opendata.paris.fr/explore/dataset/arrondissements/table/?dataChart
# Then placed on the GitHub repo for the project.
paris = pd.read_csv('https://raw.githubusercontent.com/flutieflakes/Coursera_Capstone/main/capstone_paris.csv')
paris

Unnamed: 0,CAR,NAME,NSQAR,CAR.1,CARINSEE,LAR,NSQCO,SURFACE,PERIMETRE,Geometry_X,Geometry_Y
0,3,Temple,750000003,3,3,3eme Ardt,750001537,1170882828,4519264,48.862872,2.360001
1,19,Buttes-Chaumont,750000019,19,19,19eme Ardt,750001537,6792651129,11253182,48.887076,2.384821
2,14,Observatoire,750000014,14,14,14eme Ardt,750001537,5614877309,10317483,48.829245,2.326542
3,10,Entrepot,750000010,10,10,10eme Ardt,750001537,2891739442,6739375,48.87613,2.360728
4,12,Reuilly,750000012,12,12,12eme Ardt,750001537,16314782637,24089666,48.834974,2.421325
5,16,Passy,750000016,16,16,16eme Ardt,750001537,16372542129,17416110,48.860392,2.261971
6,11,Popincourt,750000011,11,11,11eme Ardt,750001537,3665441552,8282012,48.859059,2.380058
7,2,Bourse,750000002,2,2,2eme Ardt,750001537,991153745,4554104,48.868279,2.342803
8,4,Hotel-de-Ville,750000004,4,4,4eme Ardt,750001537,1600585632,5420908,48.854341,2.35763
9,17,Batignolles-Monceau,750000017,17,17,17eme Ardt,750001537,5668834504,10775580,48.887327,2.306777


In [642]:
# Rename the necessary columns 'Geometry_X and Geometry_Y' etc...
# Neighborhood: name of the central District for the Arrondissement
# Arrondissement: the Arrondissement or district number used to identify it
# French_Name: the French label for each Arrondissement
paris.rename(columns={'NAME': 'Neighborhood ', 'CAR': 'Arrondissement', 'Geometry_X': 'Latitude', 'Geometry_Y': 'Longitude',  'LAR': 'French_Name'}, inplace=True)
# Clean up the dataset to remove unnecessary columns.
# Some of the columns are for mapping software - not required here.
paris.drop(['NSQAR','CAR.1','CARINSEE','NSQCO','SURFACE', 'PERIMETRE' ], axis=1, inplace=True)
paris

Unnamed: 0,Arrondissement,Neighborhood,French_Name,Latitude,Longitude
0,3,Temple,3eme Ardt,48.862872,2.360001
1,19,Buttes-Chaumont,19eme Ardt,48.887076,2.384821
2,14,Observatoire,14eme Ardt,48.829245,2.326542
3,10,Entrepot,10eme Ardt,48.87613,2.360728
4,12,Reuilly,12eme Ardt,48.834974,2.421325
5,16,Passy,16eme Ardt,48.860392,2.261971
6,11,Popincourt,11eme Ardt,48.859059,2.380058
7,2,Bourse,2eme Ardt,48.868279,2.342803
8,4,Hotel-de-Ville,4eme Ardt,48.854341,2.35763
9,17,Batignolles-Monceau,17eme Ardt,48.887327,2.306777


In [643]:
paris_df=paris
paris_df.head()

Unnamed: 0,Arrondissement,Neighborhood,French_Name,Latitude,Longitude
0,3,Temple,3eme Ardt,48.862872,2.360001
1,19,Buttes-Chaumont,19eme Ardt,48.887076,2.384821
2,14,Observatoire,14eme Ardt,48.829245,2.326542
3,10,Entrepot,10eme Ardt,48.87613,2.360728
4,12,Reuilly,12eme Ardt,48.834974,2.421325


Use the geopy library to get the latitude and longitude values of Paris

In [644]:
# Retrieve the Latitude and Longitude for Paris
from geopy.geocoders import Nominatim 
address = 'Paris'
# Define the user_agent as Paris_explorer
geolocator = Nominatim(user_agent="Paris_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinates of Paris France are {}, {}.'.format(latitude, longitude))

The geographical coordinates of Paris France are 48.8566969, 2.3514616.


Create a map of Paris with Arrondissements (Neighborhoods) superimposed

In [645]:
# create map of Paris using the above latitude and longitude values
map_paris = folium.Map(location=[latitude, longitude], zoom_start=12)
# add markers to map
for lat, lng, label in zip(paris['Latitude'], paris['Longitude'], paris['French_Name']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#e8dc54',
        fill_opacity=0.5,
        parse_html=False).add_to(map_paris)  
    
map_paris

Use the Foursquare API to explore the Arrondissements (Neighborhoods) of Paris 

In [646]:
CLIENT_ID = 'HIDDEN' 
CLIENT_SECRET = 'HIDDEN' 
VERSION = '20180605' 
radius=500
LIMIT=100

In [647]:
# Explore the first Neighborhood in our dataframe.
# Get the Neighborhood's French name.
paris_df.loc[0, 'French_Name']

'3eme Ardt'

In [648]:
# Get the Neighborhood's latitude and longitude values.
neighborhood_latitude = paris.loc[0, 'Latitude'] # Neighborhood latitude value
neighborhood_longitude = paris.loc[0, 'Longitude'] # Neighborhood longitude value
neighborhood_name = paris.loc[0, 'French_Name'] # Neighborhood name
print('Latitude and longitude values of the neighborhood {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of the neighborhood 3eme Ardt are 48.86287238, 2.3600009859999997.


In [649]:
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '602879ec41cc7e0fe945ad38'},
 'response': {'headerLocation': 'Enfants-Rouges',
  'headerFullLocation': 'Enfants-Rouges, Paris',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 92,
  'suggestedBounds': {'ne': {'lat': 48.8673723845, 'lng': 2.3668285468065267},
   'sw': {'lat': 48.8583723755, 'lng': 2.353173425193473}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4d974096a2c654814aa6d353',
       'name': 'Mmmozza',
       'location': {'address': '57 rue de Bretagne',
        'lat': 48.86391016055883,
        'lng': 2.360590696334839,
        'labeledLatLngs': [{'label': 'display',
          'lat': 48.86391016055883,
          'lng': 2.360590696334839}],
        'distance': 123,
        'postalCode': '75003',
        '

In [650]:
# define the function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [651]:
#Structure the json file into a pandas dataframe
venues = results['response']['groups'][0]['items'] 
nearby_venues = json_normalize(venues)
# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]
# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)
# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
nearby_venues.head(30)

  nearby_venues = json_normalize(venues)


Unnamed: 0,name,categories,lat,lng
0,Mmmozza,Sandwich Place,48.86391,2.360591
1,Chez Alain Miam Miam,Sandwich Place,48.862369,2.36195
2,Marché des Enfants Rouges,Farmers Market,48.862806,2.361996
3,Chez Alain Miam Miam,Sandwich Place,48.862781,2.362064
4,Les Enfants Rouges,Wine Bar,48.863013,2.36126
5,Square du Temple,Park,48.864475,2.360816
6,Bontemps,Dessert Shop,48.863956,2.360725
7,Okomusu,Okonomiyaki Restaurant,48.861453,2.360879
8,Fromagerie Jouannault,Cheese Shop,48.862947,2.36253
9,Le Burger Fermier des Enfants Rouges,Burger Joint,48.862831,2.362073


In [652]:
# Check how many venues there are in 3eme Ardt within a radius of 500 meters
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

92 venues were returned by Foursquare.


Create a nearby venues function for all the neighborhoods in Paris

In [653]:
def getNearbyVenues(name, latitudes, longitudes, radius=500):    
    venues_list=[]
    for name, lat, lng in zip(name, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['French_Name', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Create a new dataframe called for the venues of Paris called paris_venues

In [654]:
paris_venues = getNearbyVenues(name=paris['French_Name'],
                                   latitudes=paris['Latitude'],
                                   longitudes=paris['Longitude']
                                  )
paris_venues.head(30)

3eme Ardt
19eme Ardt
14eme Ardt
10eme Ardt
12eme Ardt
16eme Ardt
11eme Ardt
2eme Ardt
4eme Ardt
17eme Ardt
18eme Ardt
1er Ardt
5eme Ardt
7eme Ardt
20eme Ardt
8eme Ardt
9eme Ardt
13eme Ardt
15eme Ardt
6eme Ardt


Unnamed: 0,French_Name,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,3eme Ardt,48.862872,2.360001,Mmmozza,48.86391,2.360591,Sandwich Place
1,3eme Ardt,48.862872,2.360001,Chez Alain Miam Miam,48.862369,2.36195,Sandwich Place
2,3eme Ardt,48.862872,2.360001,Marché des Enfants Rouges,48.862806,2.361996,Farmers Market
3,3eme Ardt,48.862872,2.360001,Chez Alain Miam Miam,48.862781,2.362064,Sandwich Place
4,3eme Ardt,48.862872,2.360001,Les Enfants Rouges,48.863013,2.36126,Wine Bar
5,3eme Ardt,48.862872,2.360001,Square du Temple,48.864475,2.360816,Park
6,3eme Ardt,48.862872,2.360001,Bontemps,48.863956,2.360725,Dessert Shop
7,3eme Ardt,48.862872,2.360001,Okomusu,48.861453,2.360879,Okonomiyaki Restaurant
8,3eme Ardt,48.862872,2.360001,Fromagerie Jouannault,48.862947,2.36253,Cheese Shop
9,3eme Ardt,48.862872,2.360001,Le Burger Fermier des Enfants Rouges,48.862831,2.362073,Burger Joint


In [655]:
paris_venues.shape

(1274, 7)

Check how many venues were returned for each neighborhood
Please be aware of the 100 venue limit imposed by the free Foursquare account.

In [656]:
paris_venues.groupby('French_Name').count()

Unnamed: 0_level_0,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
French_Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
10eme Ardt,100,100,100,100,100,100
11eme Ardt,68,68,68,68,68,68
12eme Ardt,5,5,5,5,5,5
13eme Ardt,63,63,63,63,63,63
14eme Ardt,26,26,26,26,26,26
15eme Ardt,66,66,66,66,66,66
16eme Ardt,11,11,11,11,11,11
17eme Ardt,59,59,59,59,59,59
18eme Ardt,44,44,44,44,44,44
19eme Ardt,38,38,38,38,38,38


In [657]:
# Calculate how many unique categories there are.
print('There are {} unique venue categories.'.format(len(paris_venues['Venue Category'].unique())))

There are 207 unique venue categories.


In [658]:
# Analyze each of the Neighborhoods from the results
# one hot encoding
paris_onehot = pd.get_dummies(paris_venues[['Venue Category']], prefix="", prefix_sep="")
# add neighborhood column back to dataframe
paris_onehot['Neighborhood'] = paris_venues['French_Name'] 
# move neighborhood column to the first column
fixed_columns = [paris_onehot.columns[-1]] + list(paris_onehot.columns[:-1])
paris_onehot = paris_onehot[fixed_columns]
paris_onehot

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Zoo,Zoo Exhibit
0,3eme Ardt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,3eme Ardt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,3eme Ardt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,3eme Ardt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,3eme Ardt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1269,6eme Ardt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1270,6eme Ardt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1271,6eme Ardt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1272,6eme Ardt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [659]:
paris_grouped = paris_onehot.groupby('Neighborhood').mean().reset_index()
paris_grouped

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Zoo,Zoo Exhibit
0,10eme Ardt,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,...,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.0
1,11eme Ardt,0.014706,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.014706,...,0.0,0.0,0.014706,0.0,0.029412,0.029412,0.0,0.014706,0.0,0.0
2,12eme Ardt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2
3,13eme Ardt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.206349,...,0.0,0.0,0.0,0.0,0.206349,0.0,0.0,0.0,0.0,0.0
4,14eme Ardt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,15eme Ardt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015152,0.0,...,0.0,0.0,0.0,0.0,0.015152,0.0,0.015152,0.0,0.0,0.0
6,16eme Ardt,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,17eme Ardt,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,...,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,18eme Ardt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.022727,0.022727,0.0,0.0,0.0,0.0
9,19eme Ardt,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0
